In my last year at UNC-Chapel Hill, I wrote an honors thesis that gave me the opportunity to connect my long-held interests in bilingualism, emotion, the structure of language, and good old machine learning. Working with Dr. Lucia Binotti, I set out to explore what happens when someone switches between Spanish and English mid-sentence.
At its heart, the project asks: What drives a code-switch when emotions are running high? Is it random? Syntactically motivated? Strategically deployed for effect? I was especially interested in whether speakers tend to favor one language over the other for expressing emotionally charged content, and whether that choice follows any consistent structural patterns.
To get at these questions, I pulled together a few mixed-code datasets (including the Bangor Miami corpus and LinCE). From there, I ran some computational experiments for the sentiment analysis task using multilingual language models and time-series analysis to see whether common metrics of code-switching could help predict emotional tone, or vice versa.
The results were nuanced: there’s a statistically significant tendency for emotional content to lead a code-switching event (rather than follow it), and some patterns in the data suggest that bilingual speakers do make meaningful choices about when and how to switch languages. But the models themselves didn’t show significant performance gains when fed structural features, reinforcing the idea that code-switching is deeply embedded in social context, and not something we can reduce to syntax alone.
What I came away with is this: if we want to understand and produce accurate models of bilingual speech, we need to account for the speaker’s lived experience, their emotional stakes, their cultural intuition. As my analysis showed, emotional cues often precede switches: code-switching isn’t just a grammatical edge case or a technical challenge for NLP; it’s a real and significant factor in the way we as humans communicate with one another.
I also came away from the experience feeling hopeful about the future of humanistically-inspired language models. As I found during my research, there are many top-notch scholars out there working tirelessly to ensure that the evolving field of natural language processing pays its due to the humanities and social sciences.
After all, if you’re modeling human behavior, why not ask a human?
The thesis is currently being prepared for digital archiving through the Carolina Digital Repository, and you can also read it here if you’re curious to know what two semesters’ worth of sweat, tears, and Celsius gets you (seriously, shout-out to the geniuses behind fizz-free Peach Mango Green Tea; y’all deserved some love in the acknowledgements).