The main dataset contains three sets of content analytical variables.

  • The three-digit main categories (per101 – 706)
  • The 4-digits Central and Eastern European (CEE)-subcategories (per1011 – 7062)
  • The 3+1 digit subcategories since version 5 of the coding instructions (per103_1 – 703_2)

All variables have in common that they indicate the share of quasi-sentences in the respective category calculated as a fraction of the overall number of allocated codes per document. A value of 5 within a cell in the column per501 indicates that 5% of allocated codes (the number of quasi-sentences) were coded with the code 501 (Positive mentions about the protection of the environment). This tutorial explains the existence, generation and usage of the different type of subcategories and their relation to the main categories in the main dataset, the South America dataset and the Manifesto Corpus.

The main categories: per101 – 706

The three digit variables (per101 –- per706) are the main categories of the coding scheme. For most of analyses these are the most relevant categories and they can be used without any precaution and without knowing anything about sub-categories at all. Data on the main categories is available for all countries and all elections covered by the dataset.

The CEE-subcategories: per1011 – 7062

The four digit variables (per1011 – per7062) are sub-categories mostly addressing issues in transitional democracies in (mostly) Central and Eastern European countries. However, they were introduced in version 1 of the coding instructions and were gradually abandoned in most of the countries as the issues could also be coded into the three digit main categories. Currently, the share of these categories is not included in the main categories. If analysts use observations from CEE countries for which the CEE codes were used and want to compare them to manifestos without CEE codes then they should aggregate such CEE codes into the main categories. For example per7062 should be added to per706. Our R package manifestoR can easily do this by using the function aggregate_pers_cee on the main dataset.

The following graph shows the elections were CEE categories were used.