Recoding and Refining the MARPOR code scheme

Lisa Zehnter & Paul Muscat

06 February 2020

In this tutorial, we will show how to individually recode and refine the MARPOR coding scheme for quasi-sentences accessed via the Manifesto Corpus. This is useful for research questions that need a more differentiated category scheme or which look at larger time periods, in which manifestos have been coded with different versions of the coding scheme.

We assume that you have already read First steps with manifestoR as well as Using the Manifesto Corpus with quanteda and that you are familiar with the pipe %>% operator.

Motivation for the tutorial

The newest version of the Manifesto Coding scheme contains 56 substantial categories. Over the years, the categories have changed and new sub-categories have been introduced. An overview of important changes between the different versions as well as the explicit Coding Instructions can be found here and here.

However, there are research questions that ask for a more finegrained category scheme or that take into account MARPOR-data from a wider time frame with different versions of coding schemes. For instances like these, this tutorial will show how you can modify the coding according to your research interest.

We will perform these steps:
1. Create a corpus
2. Create a shiny app to be used as a user interface for recoding
3. Keep track of the recoding by saving progress to a file
4. Once finished, read the codes in the file and update the corpus with them

manifestoR and shiny

We first use the usual “header” of a manifestoR script: loading packages, setting the api-key and fixing the corpus version (to ensure reproducibility).

library(manifestoR)
library(dplyr)
library(tidyr)
library(stringr)
library(shiny)

mp_setapikey(key.file = "manifesto_apikey.txt")
mp_use_corpus_version("2019-1")

This R code presumes that you have stored and downloaded the API key in a file named manifesto_apikey.txt in your current R working directory. Note that it is a security risk to store the API key file or a script containing the key in public repositories.

Example 1

Recode from one main category to one specific sub-category

From the tutorial Subcategories in the Manifesto Coding Scheme you know that since version 5 of the coding instructions there exist subcategories for 12 main categories.

In this first example we will recode quasi-sentences from the manifesto of the Social Democratic Party of Germany (SPD) for the 2013 election in Germany, which have been coded with category 202 (Democracy). In the newest version of the category scheme (Version 5), category 202 has four subcategories, which allow for a more finegrained analysis. So, if one of these sentences concerns direct democracy, it would be coded with 202.4 according to the newest coding scheme, which has been applied to the 2017 manifesto. The goal of this first example is to ensure comparability of two manifestos, which have originally been coded with different coding schemes.

In a first step we construct a corpus for the SPD manifestos from 2013 and 2017.

corpus_spd <- manifestoR::mp_corpus(countryname == "Germany" &
  year >= "2013" &
  party == "41320")
## Connecting to Manifesto Project DB API... 
## Connecting to Manifesto Project DB API... corpus version: 2019-1 
## Connecting to Manifesto Project DB API... 
## Connecting to Manifesto Project DB API... corpus version: 2019-1 
## Connecting to Manifesto Project DB API... corpus version: 2019-1 
## Connecting to Manifesto Project DB API... corpus version: 2019-1
print(corpus_spd)
## <<ManifestoCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 2
table(codes(corpus_spd))
## 
##   000   101   103   104   105   106   107   108   109   110   201 201.1 201.2 
##    37     4     4    24    46    68   216   194     6    12    75    22    37 
##   202 202.1 202.3 202.4   203   204   301   302   303   304   305 305.1 305.3 
##   129    83     1     6    13     6    95    37    51    21    78    31     2 
##   401   402   403   404   405   406   408   409   410   411   412   413   414 
##    28   159   323     4    56     2    25    20    90   408   173    22    40 
##   416 416.2   501   502   503   504   505   506   601 601.1 601.2   602 602.2 
##    30    77   204   137   517   483     7   286    29    24    16     3    24 
##   603   604   605 605.1 605.2   606 606.1   607 607.1 607.2 607.3   608 608.1 
##    26    61    67   151     4    94    65    34     7    29     1     7     2 
## 608.2   701   703 703.1   704   705   706     H 
##     2   392    15    18     2    12    69   107

We see that in the two manifestos 129 quasi-sentences have been coded with 202 and 6 with 202.4. The latter stem from the 2017 manifesto, which has already been coded according to the newest coding scheme.

In the next step we create a data frame with the quasi-sentences that have been allocated to the code(s) of interest. In this case we want to select quasi-sentences coded with 202.

to_be_recoded <- corpus_spd[["41320_201309"]]$content %>%
  mutate(pos = row_number()) %>%
  rename(code = cmp_code) %>%
  select(text, code, pos) %>%
  filter(code %in% c("202"))
text code pos
Demokratie 202 11
Die SPD ist und bleibt die große politische Kraft für Demokratie und Emanzipation in Deutschland. 202 19
Die Ablehnung des Ermächtigungsgesetzes der Nazis vor 80 Jahren durch die SPD ist bis heute ein beispielloser Ausweis für unsere demokratische Grundhaltung und Überzeugung. 202 20
Zu dieser großen sozialdemokratischen Geschichte gehört auch die Gründung der SDP oder Ost-SPD im Oktober 1989, mit der Sozialdemokratinnen und Sozialdemokraten ihren Beitrag zur friedlichen Revolution in Deutschland geleistet haben. 202 23
Wir leben Demokratie und werden dies weiter tun. 202 24
Die Politik muss dem Gemeinwohl verpflichtet sein und nicht wirtschaftlichen Einzelinteressen. 202 54
Die stärkste Lobby in Deutschland müssen endlich wieder die Bürgerinnen und Bürger sein. 202 55
Wir werden die Probleme und Sorgen der Bürgerinnen und Bürger wieder in den Mittelpunkt der Politik stellen – und nicht die Interessen anonymer Finanzmärkte. 202 103
Deshalb haben wir als erste Partei in Deutschland in einem breit angelegten Bürgerdialog die Menschen in Deutschland gefragt, was in unserem Land besser werden muss. 202 104
Die Antworten und Projekte aus diesem Bürgerdialog sind in dieses Regierungsprogramm eingeflossen. 202 105
Wir wollen das Gemeinwohl in den Mittelpunkt unserer Politik stellen. 202 111
Und von einer Politik des Gemeinwohls, nicht einer des Egoismus und der Lobby- und der Sonderinteressen. 202 119
Wir leben heute in einer radikal veränderten Welt. 202 209
Deshalb wollen wir die Demokratie stärken und das Vertrauen daraus zurückgewinnen, dass demokratisches Engagement und demokratische Politik unser Zusammenleben besser und gerechter machen können. 202 210
Deshalb sind vor allem wir Sozialdemokratinnen und Sozialdemokraten gefordert, auf neuen Wegen, die sozial und ökologisch ausgerichtet sind, unser historisches Projekt der Emanzipation neu zu begründen und zu verwirklichen. 202 211
Mehr Demokratie, 202 219
Wir werden deshalb nachweisen, wie hoch die zusätzlichen Einnahmen durch die genannten Steuererhöhungen sind 202 249
brauchen wir eine stärkere Demokratisierung Europas: 202 275
Europa gehört den Bürgerinnen und Bürgern. 202 276
Das gilt auch für die Eurozone. 202 277

Here we can see the first six of 129 quasi-sentences that have been coded with 202. Now we want to edit the codes, when a quasi-sentence concerns direct democracy to 202.4. For this we make use of a shiny-app.

A shiny-app is a web app written in R code. It consists of a UI function for the visible content and a server function containing the R code. For more information see the website.

We need to create the basic parts of a shiny app - the ui and server functions. See here for more info.

First, we create a function called createUIFunction, which returns the UI (user interface) function for the app. The key shiny code is contained within the basicPage function call.

Note that this includes some custom javascript, which is not within the scope of this tutorial to explain. You can take it as given.

# create the UI
createUIFunction <- function() {
  function() {
    customJS <- '
        //called from within shiny R code
        Shiny.addCustomMessageHandler("addListenersCallbackHandler", function(message) {
              //when selector is changed
              $("#df").on("change", "select", function(event) {
                //get row number from first column
                var row = parseInt($(event.target).closest("tr").children("td").first().text());
                var code = $(this).find(":selected").val();
               //pass code and row to shiny via input$selectedCode
               Shiny.onInputChange("selectedCode", [row, code]);
              });
        });
        '
    basicPage(
      # insert the javascript that attaches a listener to the html table,
      # to listen for when a code is selected
      tags$script(HTML(customJS)),
      div(style = "margin: 50px", actionButton("save", label = "Save Recoded")),
      div(dataTableOutput("df")),
      uiOutput("test")
    )
  }
}

Next we create a function called createServerFunction, which returns the server function for the app. The server function contains the “logic” for the app, written in R, which in our case includes modifying the codes as we select them and saving the changes to a working file. We can also export the progress to a CSV file called “recoded.csv”.

The function takes two arguments: codedDF which is the coded data frame we want to edit, and codeOptions which is a list of the possible codes we want to assign.

createServerFunction <- function(codeOptions, codedDF) {
  function(input, output, session) {
    workingFilePath <- "working_save.rds"

    proxy <- dataTableProxy("df") # used to update the datatable on the client side (in the html)

    if (is.null(codedDF)) {
      # no working file found
      if (!file.exists(workingFilePath)) {
        stop("Please provide a coded dataframe!")
      } # read from the working file
      else {
        df <- readRDS(workingFilePath)
      }
    } else {
      if (!("code" %in% names(codedDF))) {
        stop("Please provide a dataframe with a column named 'code'")
      }
      df <- data.frame(row = 1:nrow(codedDF), codedDF) # add a column called "row"
    }

    # generate the html for the input selectors
    selectorHTML <- function(i, selected = "") {
      as.character(selectInput(paste0("selectCode_", i),
        label = NULL,
        choices = c("", codeOptions), selected = selected
      ))
    }
    df$selector <- sapply(1:nrow(df), selectorHTML) # add the selectors as a column

    # create a reactive values object so that we can keep a copy of the data frame in R
    vals <- reactiveValues(df = df)

    # observe when an input is changed (passed in from javascript declared in ui function)
    observeEvent(input$selectedCode, {
      row <- as.integer(input$selectedCode[1])
      code <- input$selectedCode[2]
      df <- vals$df
      df$code[row] <- code # update code
      df$selector[row] <- selectorHTML(row, code) # update selector
      # push the updated dataframe to the page
      DT::replaceData(proxy, df, resetPaging = FALSE, rownames = FALSE)
      vals$df <- df # save the updated dataframe in R
    })

    output$df <- renderDataTable({
      # isolated so that it doesn't refresh when we change the table
      isolate({
        DT::datatable(df,
          escape = names(df) != "selector",
          selection = "none", rownames = FALSE
        )
      })
    })
    # every time the data frame is changed, save a working copy as an rds file
    observeEvent(vals$df, {
      saveRDS(vals$df, workingFilePath)
    })
    # if the save button is clicked,
    # save a csv of the data without the input selector HTML or "row" column
    observeEvent(input$save, {
      write.csv(vals$df[-which(names(vals$df) %in% c("selector", "row"))],
        "recoded.csv",
        row.names = FALSE, fileEncoding = "UTF-8"
      )
    })
    # send a message to the client side to attach the table listeners in javascript,
    # as the HTML table now exists
    observe({
      session$sendCustomMessage(type = "addListenersCallbackHandler", "")
    })
  }
}

Now we create a function called launchApp, which combines the ui and server functions into a shiny app and runs it. When called, it will launch the app in a web browser.

launchApp <- function(codeOptions, codedDF = NULL) {
  # if you cannot/ don't want to edit the coding in one session
  # and continue at a later point, leave codedDF==NULL

  library(shiny)
  library(DT)
  shinyApp(ui = createUIFunction(), server = createServerFunction(codeOptions, codedDF))
}

Now we launch the app with our coded data frame and possible codes.

launchApp(c("202", "202.4"), to_be_recoded)
App

App

As we code, the app automatically saves progress to a working file. If we want to close the app and resume this later, we call launchApp without specifying the second argument codedDF, and it will load what we have done from the working file.

launchApp(c("202", "202.4"))

When you are done recoding, you click the Save Recoded button to export the progress a csv file in your working directory. This is the file we will use to update the corpus. Once we have done this, we can read it in:

recoded_spd <- read.csv("recoded.csv", stringsAsFactors = FALSE, fileEncoding = "UTF-8")
text code pos
Demokratie 202.0 11
Die SPD ist und bleibt die große politische Kraft für Demokratie und Emanzipation in Deutschland. 202.0 19
Die Ablehnung des Ermächtigungsgesetzes der Nazis vor 80 Jahren durch die SPD ist bis heute ein beispielloser Ausweis für unsere demokratische Grundhaltung und Überzeugung. 202.0 20
Zu dieser großen sozialdemokratischen Geschichte gehört auch die Gründung der SDP oder Ost-SPD im Oktober 1989, mit der Sozialdemokratinnen und Sozialdemokraten ihren Beitrag zur friedlichen Revolution in Deutschland geleistet haben. 202.0 23
Wir leben Demokratie und werden dies weiter tun. 202.0 24
Die Politik muss dem Gemeinwohl verpflichtet sein und nicht wirtschaftlichen Einzelinteressen. 202.0 54
Die stärkste Lobby in Deutschland müssen endlich wieder die Bürgerinnen und Bürger sein. 202.0 55
Wir werden die Probleme und Sorgen der Bürgerinnen und Bürger wieder in den Mittelpunkt der Politik stellen – und nicht die Interessen anonymer Finanzmärkte. 202.0 103
Deshalb haben wir als erste Partei in Deutschland in einem breit angelegten Bürgerdialog die Menschen in Deutschland gefragt, was in unserem Land besser werden muss. 202.4 104
Die Antworten und Projekte aus diesem Bürgerdialog sind in dieses Regierungsprogramm eingeflossen. 202.4 105
Wir wollen das Gemeinwohl in den Mittelpunkt unserer Politik stellen. 202.0 111
Und von einer Politik des Gemeinwohls, nicht einer des Egoismus und der Lobby- und der Sonderinteressen. 202.0 119
Wir leben heute in einer radikal veränderten Welt. 202.0 209
Deshalb wollen wir die Demokratie stärken und das Vertrauen daraus zurückgewinnen, dass demokratisches Engagement und demokratische Politik unser Zusammenleben besser und gerechter machen können. 202.0 210
Deshalb sind vor allem wir Sozialdemokratinnen und Sozialdemokraten gefordert, auf neuen Wegen, die sozial und ökologisch ausgerichtet sind, unser historisches Projekt der Emanzipation neu zu begründen und zu verwirklichen. 202.0 211
Mehr Demokratie, 202.0 219
Wir werden deshalb nachweisen, wie hoch die zusätzlichen Einnahmen durch die genannten Steuererhöhungen sind 202.0 249
brauchen wir eine stärkere Demokratisierung Europas: 202.0 275
Europa gehört den Bürgerinnen und Bürgern. 202.0 276
Das gilt auch für die Eurozone. 202.0 277

Then we replace the codes that have been changed in the original corpus.

corpus_spd[["41320_201309"]]$content$cmp_code[recoded_spd$pos] <- recoded_spd$code

If we have another look at the used codes, we see that there now there are only 109 quasi-sentences coded with 202, whereas the number of quasi-sentences coded with 202.4 has risen from 6 to 26.

table(codes(corpus_spd))
## 
##   000   101   103   104   105   106   107   108   109   110   201 201.1 201.2 
##    37     4     4    24    46    68   216   194     6    12    75    22    37 
##   202 202.1 202.3 202.4   203   204   301   302   303   304   305 305.1 305.3 
##   109    83     1    26    13     6    95    37    51    21    78    31     2 
##   401   402   403   404   405   406   408   409   410   411   412   413   414 
##    28   159   323     4    56     2    25    20    90   408   173    22    40 
##   416 416.2   501   502   503   504   505   506   601 601.1 601.2   602 602.2 
##    30    77   204   137   517   483     7   286    29    24    16     3    24 
##   603   604   605 605.1 605.2   606 606.1   607 607.1 607.2 607.3   608 608.1 
##    26    61    67   151     4    94    65    34     7    29     1     7     2 
## 608.2   701   703 703.1   704   705   706     H 
##     2   392    15    18     2    12    69   107

Example 2

Recode from one main category to all sub- categories

In the first example we were only looking for quasi-sentences that concern direct democracy. However, the main category 202 actually has four sub-categories:
- 202.1 General: Positive
- 202.2 General: Negative
- 202.3 Representative Democracy: Positive
- 202.4 Direct Democracy: Positive

In order to select between all four sub-categories, the code for the app only has to be adapted by changing the first argument in the launchApp-function:

launchApp(c("202.1", "202.2", "202.3", "202.4"))
Dropdown menu

Dropdown menu

Example 3

Make your own categories

It might also be the case that you do not want to use MARPOR categories, but instead use your own code. For this, you just construct a dataframe with the quasi-sentences of interest and then give the app your own codes. Please be aware that you should not use numeric codes that are already used in the Coding scheme, but create new ones.

launchApp(c("a", "b", "tiger", "kitchen"))

Session Info

Tested with:

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting value                       
##  version R version 4.0.3 (2020-10-10)
##  date    2021-06-15                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib  source        
##  assertthat    0.2.0   2017-04-11 [NA] CRAN (R 4.0.3)
##  base64enc     0.1-3   2015-07-28 [NA] CRAN (R 4.0.2)
##  bookdown      0.22    2021-04-22 [NA] CRAN (R 4.0.2)
##  cli           1.1.0   2019-03-19 [NA] CRAN (R 4.0.3)
##  crayon        1.3.4   2017-09-16 [NA] CRAN (R 4.0.2)
##  curl          3.2     2018-03-28 [NA] CRAN (R 4.0.3)
##  digest        0.6.21  2019-09-20 [NA] CRAN (R 4.0.3)
##  dplyr       * 1.0.6   2021-05-05 [NA] CRAN (R 4.0.2)
##  DT            0.7     2019-06-11 [NA] CRAN (R 4.0.3)
##  ellipsis      0.3.2   2021-04-29 [NA] CRAN (R 4.0.3)
##  evaluate      0.14    2019-05-28 [NA] CRAN (R 4.0.1)
##  fansi         0.4.0   2018-10-05 [NA] CRAN (R 4.0.3)
##  fastmap       1.0.0   2019-07-28 [NA] CRAN (R 4.0.3)
##  foreign       0.8-70  2018-04-23 [NA] CRAN (R 4.0.3)
##  functional    0.6     2014-07-16 [NA] CRAN (R 4.0.2)
##  generics      0.0.2   2018-11-29 [NA] CRAN (R 4.0.2)
##  glue          1.4.2   2020-08-27 [NA] CRAN (R 4.0.2)
##  highr         0.6     2016-05-09 [NA] CRAN (R 4.0.3)
##  hms           0.4.2   2018-03-10 [NA] CRAN (R 4.0.3)
##  htmltools     0.4.0   2019-10-04 [NA] CRAN (R 4.0.3)
##  htmlwidgets   1.5.3   2020-12-10 [NA] CRAN (R 4.0.2)
##  httpuv        1.5.2   2019-09-11 [NA] CRAN (R 4.0.3)
##  httr          1.3.1   2017-08-20 [NA] CRAN (R 4.0.3)
##  jsonlite      1.6     2018-12-07 [NA] CRAN (R 4.0.3)
##  knitr         1.33    2021-04-24 [NA] CRAN (R 4.0.2)
##  later         1.0.0   2019-10-04 [NA] CRAN (R 4.0.3)
##  lattice       0.20-35 2017-03-25 [NA] CRAN (R 4.0.3)
##  lifecycle     1.0.0   2021-02-15 [NA] CRAN (R 4.0.2)
##  magrittr      2.0.1   2020-11-17 [NA] CRAN (R 4.0.2)
##  manifestoR  * 1.5.0   2020-11-29 [NA] CRAN (R 4.0.2)
##  mime          0.5     2016-07-07 [NA] CRAN (R 4.0.3)
##  mnormt        1.5-5   2016-10-15 [NA] CRAN (R 4.0.3)
##  nlme          3.1-131 2017-02-06 [NA] CRAN (R 4.0.3)
##  NLP         * 0.1-9   2016-02-18 [NA] CRAN (R 4.0.3)
##  pillar        1.6.1   2021-05-16 [NA] CRAN (R 4.0.2)
##  pkgconfig     2.0.2   2018-08-16 [NA] CRAN (R 4.0.3)
##  promises      1.1.0   2019-10-04 [NA] CRAN (R 4.0.3)
##  psych         1.8.3.3 2018-03-30 [NA] CRAN (R 4.0.3)
##  purrr         0.3.2   2019-03-15 [NA] CRAN (R 4.0.3)
##  R6            2.2.2   2017-06-17 [NA] CRAN (R 4.0.3)
##  Rcpp          1.0.0   2018-11-07 [NA] CRAN (R 4.0.3)
##  readr         1.3.1   2018-12-21 [NA] CRAN (R 4.0.3)
##  rlang         0.4.10  2020-12-30 [NA] CRAN (R 4.0.2)
##  rmarkdown     2.8     2021-05-07 [NA] CRAN (R 4.0.2)
##  rmdformats    1.0.2   2021-04-19 [NA] CRAN (R 4.0.2)
##  sessioninfo   1.1.1   2018-11-05 [NA] CRAN (R 4.0.2)
##  shiny       * 1.4.0   2019-10-10 [NA] CRAN (R 4.0.3)
##  slam          0.1-40  2016-12-01 [NA] CRAN (R 4.0.3)
##  stringi       1.1.7   2018-03-12 [NA] CRAN (R 4.0.3)
##  stringr     * 1.3.0   2018-02-19 [NA] CRAN (R 4.0.3)
##  tibble        3.1.2   2021-05-16 [NA] CRAN (R 4.0.2)
##  tidyr       * 0.8.0   2018-01-29 [NA] CRAN (R 4.0.3)
##  tidyselect    1.1.1   2021-04-30 [NA] CRAN (R 4.0.3)
##  tm          * 0.7-5   2018-07-29 [NA] CRAN (R 4.0.3)
##  utf8          1.1.3   2018-01-03 [NA] CRAN (R 4.0.3)
##  vctrs         0.3.8   2021-04-29 [NA] CRAN (R 4.0.3)
##  withr         2.1.2   2018-03-15 [NA] CRAN (R 4.0.3)
##  xfun          0.23    2021-05-15 [NA] CRAN (R 4.0.2)
##  xml2          1.2.0   2018-01-24 [NA] CRAN (R 4.0.3)
##  xtable        1.8-2   2016-02-05 [NA] CRAN (R 4.0.3)
##  yaml          2.2.0   2018-07-25 [NA] CRAN (R 4.0.3)
##  zoo           1.7-13  2016-05-03 [NA] CRAN (R 4.0.3)