TY - GEN
T1 - The Congressional classification challenge
T2 - 20th ACM Conference on Economics and Computation, EC 2019
AU - Yan, Hao
AU - Das, Sanmay
AU - Lavoie, Allen
AU - Li, Sirui
AU - Sinclair, Betsy
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/17
Y1 - 2019/6/17
N2 - In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.
AB - In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.
KW - Domain adaptation
KW - Partisanship
KW - Political ideology
KW - Political science
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85069053361&partnerID=8YFLogxK
U2 - 10.1145/3328526.3329582
DO - 10.1145/3328526.3329582
M3 - Conference contribution
AN - SCOPUS:85069053361
T3 - ACM EC 2019 - Proceedings of the 2019 ACM Conference on Economics and Computation
SP - 71
EP - 89
BT - ACM EC 2019 - Proceedings of the 2019 ACM Conference on Economics and Computation
PB - Association for Computing Machinery, Inc
Y2 - 24 June 2019 through 28 June 2019
ER -