Abstract
America's racial framework can be summarized using two distinct dimensions: superiority/inferiority and Americanness/foreignness. We investigated America's racial framework in a corpus of spoken and written language using word embeddings. Word embeddings place words on a low-dimensional space where words with similar meanings are proximate, allowing researchers to test whether the positions of group and attribute words in a semantic space reflect stereotypes. We trained a word embedding model on the Corpus of Contemporary American English - a corpus of 1 billion words that span 30 years and 8 text categories - and compared the positions of racial/ethnic groups with respect to superiority and Americanness. We found that America's racial framework is embedded in American English. We also captured an additional nuance: Asian people were stereotyped as more American than Hispanic people. These results are empirical evidence that America's racial framework is embedded in American English.
| Original language | English |
|---|---|
| Article number | pgad485 |
| Journal | PNAS Nexus |
| Volume | 3 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 1 2024 |
Keywords
- ethnicity
- natural language processing
- race
- stereotypes
- word embeddings