Manifesto Corpus

The Manifesto Corpus is a free, digital, multilingual, and annotated collection of electoral programmes. It is based on the collection of the Manifesto Project, comprising the currently largest collection of annotated electoral programmes.

Since the project Manifesto Research on Political Representation (MARPOR) took over the duty to maintain and update the Manifesto Project Dataset from the Comparative Manifestos Project the collection and the coding process were fully digitalised. The big advantage of the digitalisation of the project's infrastructure is the possibility to distribute the text data - machine-readable electoral programmes and the codings of every single quasi-sentence.

The Manifesto Corpus contains three types of information:

  • the machine-readable electoral programmes,
  • the unitising into-quasi sentences and the codes according to the Manifesto Coding scheme,
  • document meta-data such as the party and the election date.

The party and election dates can be used to link the corpus information to the Manifesto Project Main Dataset.

Coverage

The corpus currently covers electoral programmes from more than 50 different countries in more than 35 languages. It contains more than 2300 machine-readable programmes. For more than 1.150 of these, unitising and codings are available as well. These are more than 1,000,000 coded quasi-sentences.

Access

The Corpus is stored in an online database. It can be accessed by four different ways:

  • Explore online: Browse the corpus online in your browser by document or by keyword.
  • Download csv documents: Download individual electoral programmes in .csv format. These are encoded in UTF-8. Make sure to import them correctly. You need to login (or register) to be able to download documents.
  • Access using manifestoR: We offer an R package that facilitates downloading and processing the Manifesto Corpus. It allows bulk downloading several documents at once and transforms the downloaded data into a corpus format. You need an API-key to be able to download documents with manifestoR. Login and create the key on your profile page.
  • Access using manifestata: We offer a stata add-on that facilitates downloading and processing the Manifesto Corpus. It allows bulk downloading several documents at once. You need an API-key to be able to download documents with manifestoR. Login and create the key on your profile page.
  • Access via API: You are a programmer and would like to have direct access to our database? Our API returns all data in our database in a standardised JSON format. You need an API-key to be able to use the API. Login and create the key on your profile page.

Versions and Replication

We regularly update, correct and extend the Manifesto Corpus. To ensure that analyses with the corpus can be reproduced later, we save and distribute older versions of the Manifesto Corpus. When using manifestoR you can choose to download specific corpus versions. If you want to make sure that your work can be replicated later, note the version number you are working on.

Document meta information

The Manifesto Corpus contains document meta information for the following aspects. Note that this information cannot be accessed via the website, but only via manifestoR or the API.

  • party: the party code according to the general Manifesto Project party codes (see "List of Political Parties" on the dataset website
  • date: the election date in the format YYYYMM (201705 indicates an election date in May 2017)
  • language: the language of the document (eg. english, french, german,...)
  • source: the collection or project which originally collected the document (eg. MARPOR, or CEMP) (since corpus version 2015-4)
  • has_eu_code: whether a document contains "eu codes" (see the section "The eu_code column in the Manifesto Corpus" in the Subcategories tutorial)
  • is_primary_doc: is FALSE only in cases where for a single party and election date multiple manifestos are available and this is the document not used for coding by the Manifesto Project.
  • may_contradict_core_dataset: is TRUE for documents where the CMP codings in the corpus documents might be inconsistent with the coding aggregates in the Manifesto Project’s Main Dataset. This applies to manifestos which have been either recoded after they entered the dataset or cases where the dataset entries are derived from hand-written coding sheets used prior to the digitalization of the Manifesto Project’s data workflow, but the documents were digitalized and added to the Manifesto Corpus afterwards.
  • manifesto_id: a document id, usually the partycode_electiondate (eg. 42320_199809)
  • md5sum_text: a md5 check sum of the document content
  • url_original: an URL to the pdf document on the server
  • md5sum_original an md5 checksum of the pdf on the server
  • annotations: TRUE if the document is digitally coded (otherwise FALSE)
  • handbook: an integer that indicates the version of the coding instructions that was used for the coding (eg. 4 or 5) (since 2016-6)
  • is_copy_of: indicates whether a manifesto is the copy of another manifesto (eg. in case where two parties ran on the same document) (since 2017-1)
  • title: the title of the manifesto (in original language) (since 2017-1)

Citation

When publishing work using the Manifesto Corpus, please reference depending on the version you used (and replace the Xs accordingly):

  • Versions 2015-1 untill excluding 2015-3: Lehmann, Pola / Matthieß, Theres / Merz, Nicolas / Regel, Sven / Werner, Annika (201X): Manifesto Corpus. Version: XXXX-X. Berlin: WZB Berlin Social Science Center.
  • Versions 2015-3 untill excluding 2017-1: Lehmann, Pola / Matthieß, Theres / Merz, Nicolas / Regel, Sven / Werner, Annika (201X): Manifesto Corpus. Version: XXXX-X. Berlin: WZB Berlin Social Science Center.
  • Versions 2017-1 untill most recent version: Lehmann, Pola / Lewandowski, Jirka / Matthieß, Theres / Merz, Nicolas / Regel, Sven / Werner, Annika (201X): Manifesto Corpus. Version: XXXX-X. Berlin: WZB Berlin Social Science Center.

Make sure to provide the exact version you used for your analyses to ensure the replicability of your work.