The Congressional classification challenge: Domain specificity and partisan intensity

Hao Yan, Sanmay Das, Allen Lavoie, Sirui Li, Betsy Sinclair

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.

Original languageEnglish
Title of host publicationACM EC 2019 - Proceedings of the 2019 ACM Conference on Economics and Computation
PublisherAssociation for Computing Machinery, Inc
Pages71-89
Number of pages19
ISBN (Electronic)9781450367929
DOIs
StatePublished - Jun 17 2019
Event20th ACM Conference on Economics and Computation, EC 2019 - Phoenix, United States
Duration: Jun 24 2019Jun 28 2019

Publication series

NameACM EC 2019 - Proceedings of the 2019 ACM Conference on Economics and Computation

Conference

Conference20th ACM Conference on Economics and Computation, EC 2019
Country/TerritoryUnited States
CityPhoenix
Period06/24/1906/28/19

Keywords

  • Domain adaptation
  • Partisanship
  • Political ideology
  • Political science
  • Text classification

Fingerprint

Dive into the research topics of 'The Congressional classification challenge: Domain specificity and partisan intensity'. Together they form a unique fingerprint.

Cite this