A short primer on the Manifesto Project and its methodology

Manifesto Project Team,

02 September, 2025

This introduction should give a brief overview of the Manifesto Project methodology as well as illustrate the structure of the Manifesto Project Main Dataset and the Manifesto Corpus.1

The Manifesto Project collects and analyzes parties’ electoral programs (manifestos). Its data collection is publicly available and forms the basis for many publications in political science and other disciplines. From 2009 to 2024, the Manifesto Project was funded by the German Research Foundation under the name Manifesto Research on Political Representation (MARPOR). While traditionally located at the WZB Berlin Social Science Center, in 2021, MARPOR has found a second home at the Department for Democracy Studies at the University of Göttingen. MARPOR continues the work and data collection of the Comparative Manifestos Project (CMP) and the Manifesto Research Group (MRG) that go back until 1979.

Methodology

Collection and sampling

  • Countries: Democratic countries, mostly member countries of the OECD as well as many Central and Eastern Europe countries.2
  • Elections: Parliamentary (lower house) elections since the first democratic election in a country (and earliest since the end of 2nd world war).3
  • Parties: Programs of parties that gained at least one seat in parliament at the focal election.4
  • Documents: An authoritative document enacted and published by a party before an election that outlines a party’s policy plan for the time after the election and covers a broad range of policy issues.5

Training and Rules

The coding (or annotation as it is also called) is conducted by country experts. The coding follows strict rules that are described in detail in the coding instructions. Despite the long history of the project, the general coding methodology has only slightly changed over time which makes the data comparable over time. The current version of the coding instructions can be found on the website.6

The country expert coders are mostly political scientists or political science students and native speakers. They were trained to parse and code the documents according to the rules specified in the coding instructions. The expert training is done in English on two training documents. Only if the coding results in these documents surpass a certain level of accuracy, the coder will be asked to code the documents from his or her own country.

Coding Unit

The coding usually encompasses the entire text of a party’s electoral programmes. Only a few parts are excluded: preambles, text in tables and pictures, and headlines. The first step of the coding process is the unitization of the document. All text is split into so-called quasi-sentences - the general coding unit of the Manifesto Project. A quasi-sentence is a single statement. A grammatical sentence can contain more than one quasi sentence, but a quasi-sentence can never span over more than one grammatical sentence. The following example illustrates this process in more detail. The extract below is takem from the 2012 manifesto of the Democratic Party in the US.

[…] President Obama has already signed into law $2 trillion in spending reductions as part of a balanced plan to reduce our deficits by over $4 trillion over the next decade while taking immediate steps to strengthen the economy now. This approach includes tough spending cuts that will bring annual domestic spending to its lowest level as a share of the economy in 50 years, while still allowing us to make investments that benefit the middle class now and reduce our deficit over a decade. […]

— Democratic Party (US), Extract from 2012 Electoral Platform

The extract shows the text before the unitization process. The next extract illustrates the extract after the unitization. The coder added two slashes (//) between all quasi-sentences to indicate the end of one and the start of the following quasi-sentence.

[…] President Obama has already signed into law $2 trillion in spending reductions as part of a balanced plan to reduce our deficits by over $4 trillion over the next decade // while taking immediate steps to strengthen the economy now. // This approach includes tough spending cuts that will bring annual domestic spending to its lowest level as a share of the economy in 50 years, // while still allowing us to make investments that benefit the middle class now // and reduce our deficit over a decade. […]

— Democratic Party (US), Extract from 2012 Electoral Platform

This illustrates well that almost the entire text is split into quasi-sentences.

Three important remarks about the coding unit:

  • The coding unit is the quasi-sentence. One quasi-sentence equals one statement.
  • A grammatical sentence can contain several quasi-sentences, but a quasi-sentence should never span over more than one grammatical sentence.
  • Almost all text is parsed into quasi-sentences (exceptions are the preamble and headlines).

Code Allocation

In a next step the text is transformed into a table where each row contains one quasi-sentence. Then the quasi-sentences are allocated to codes. These codes belong to a category scheme that covers a broad range of policy issues. The following table lists the major codes of the category scheme:

The three most important coding rules are:

  • One (and only one) code should assigned to each quasi-sentence.
  • The coding of policy goals precedes over the coding of political means if both are mentioned in one quasi-sentence.
  • Coders should use as little context and personal knowledge as necessary to decide about the code of a quasi-sentence.

The extract shown above from the electoral program of the Democratic Party looks like following after the coding:

quasi_sentence category description
President Obama has already signed into law $2 trillion in spending reductions as part of a balanced plan to reduce our deficits by over $4 trillion over the next decade 414 Economic Orthodoxy
while taking immediate steps to strengthen the economy now. 408 Economic Goals
This approach includes tough spending cuts that will bring annual domestic spending to its lowest level as a share of the economy in 50 years, 414 Economic Orthodoxy
while still allowing us to make investments that benefit the middle class now 704 Middle Class and Professional Groups
and reduce our deficit over a decade. 414 Economic Orthodoxy

One code is allocated to each quasi-sentence that reflects the policy goal or issue mentioned in the statement. In essence, the coding methodology has only slightly changed since the begin of the Manifesto Project in 1979. A major change is that since 2009 the coding of quasi-sentences is done on the computer instead of on printed copies of the documents.

Manifesto Project Dataset (Main Dataset)

The Manifesto Project Main Dataset was first published in 2001 with the book Mapping Policy Preferences I (Budge et al. 2001). Since 2009 the dataset is available online.

Structure of the Main Dataset

  • Each row in the dataset represents one electoral program.
  • The perXXX variables indicate the share (per-centage) of quasi-sentences related to the focal category.
  • The variables party and date jointly uniquely identify every row in the dataset.

See below for a simplified version of the dataset with the most important variables. Country and countryname as well as edate and date identify the specific country and election in and for which the manifesto was published. The variable party is an identifier variable. partyname is a party’s name in English. The total variable indicates the number of quasi-sentences in the manifesto. The per-variables indicate the share of quasi-sentences related to eacht code. A value of 0.586 for the variable per101 for the manifesto of the Democratic Party means that 0.59% of quasi-sentences were coded as 101 (positive mentionings about a party’s foreign Relationships with a specific country). The peruncod indicates the share of sentences that were coded with the code 000 that is applied to quasi-sentences were no other code fits.

Note that you can scroll the table above horizontally. Please also be aware that the table above is a very simplified version of the dataset. The real dataset includes many more variables. The ones shown above are the most central variables in the dataset.

Note also that the dataset files for Stata and SPSS contain labels for variables and values whenever this is reasonable and therefore might look slightly different than shown here. A following tutorial will deal with the question how the Manifesto Project Main Dataest can be used to measure parties’ political preferences.

Coverage of the Main Dataset

The Manifesto Project Main Dataset covers 5285 manifestos issued at 877 elections in 67 countries.

Access to the Main Dataset

The Manifesto Project Main Dataset can be accessed in different ways:

  • You can download it from the Manifesto Project Website. Different file formats are available: .xlsx for Excel, .dta for Stata, .sav for SPSS, .csv as comma-separated values. To be able to download the dataset, you need to login on the website. Login is possible after having registered. Registration is free, simple and quick.

  • You can browse it online. The online dashboard is convenient for simple analysis, but does not offer the same analytical possibilites as a statistical software packages such as R, Stata or SPSS.

  • You can access the dataset directly in R or Stata using the Manifesto Project add-ons: manifestoR and manifestata. This circumvents the download from the website and instead conveniently loads the dataset directly in the software in a less error-prone manner.

The Manifesto Corpus

  • The Manifesto Corpus is a digital text collection of electoral programs based on the collection and coding that was conducted for the generation of the Manifesto Project Main dataset.
  • The Manifesto Corpus contains three types of informations: machine-readable texts, meta-information for each document (such as language and title), and (for some documents) annotations/codes on the quasi-sentence level.
  • The Manifesto Corpus uses the same identifier variables as the Manifesto Main dataset so that data from Corpus and Dataset can be easily linked - but machine-readable texts and annotations are not available for all manifestos that are covered by the Main dataset.
  • Since version 2024-1, English translations are available for all those documents for which we have the digitally annotated quasi-sentences in the corpus (not for those were we have only the full texts).

Structure of the Manifesto Corpus

The coverage of the Manifesto Corpus and the Manifesto Project Main Dataset are not exacly congruent. As in the past, the coding was done on printed copies, not all manifestos are available as digital texts. In particular, the codings are not always available digitally in the Manifesto Corpus. The Manifesto Corpus contains different types of documents:

  • machine-readable electoral programs, or
  • annotated documents (machine-readable electoral programs parsed into quasi-sentences and accompanied by codes)

The texts in the Corpus are thus a subset of the texts in the Main Dataset and the texts with digital annotations on quasi-sentence level are a subset of all texts in the Corpus.

Moreover, the meta-data of each document contain links to the pdf on our server to the scanned or downloaded copies of the original programs. The following shows a simplified version of the meta-data table for all manifestos of the Republican party in the US since 1980. The party variable indicates a party identifier (the same that is used in the Main dataset). The language refers to the language of the column. This can be useful for filtering documents for one or specific languages. The column annotation indicates whether a document is parsed into quasi-sentences and contains annotations or not. The “source” column refers to the project by which the document was collected.7

MARPOR refers to the current funding of Manifesto project. You can find more details on all other meta-information on the Manifesto Corpus website.

The following table shows exemplarily how information for each document is stored. This is a document that has annotations==TRUE, so that is parsed into quasi-sentences and comes along with codes next to each quasi-sentence.

One can see that the two first quasi-sentences do not have codes, that is because these are the title of the document and a headline. The number of rows in this document slightly differs from the value in the total column in the Main Dataset table above because for the total variable in the Main Dataset we only count sentences with codes (including 0 codes).

Coverage of the Manifesto Corpus

Due to the history of the Manifesto Project, not all manifestos are available in a machine-readable format with digital annotations. The following graphs illustrates the coverage of the Manifesto Corpus relative to the coverage of the Main Dataset (see figure above) in regard to whether documents are available in machine-readable format and whether documents are digitally annotated.