Welcome to our second employee spotlight! We sat down with Katherine Thai, our founding AI research scientist, to discuss her unique path into NLP, her research on literary analysis, and what she's building at Pangram Labs. (Note: This interview was transcribed and lightly edited for readability by AI.)
How did you get interested in NLP and decide to pursue a PhD?
I never got directly interested in NLP initially. I studied math, computer science, and English in undergrad and did a lot of undergraduate research programs because I loved the idea of research and experimenting, but I didn't know exactly what I wanted to study.
When my senior year was coming up, a classmate suggested that my English degree would lend itself to studying NLP since it's the application of computers to text. I had never heard much about it—my undergrad institution didn't have NLP researchers or courses.
I eventually found my current advisor, Mohit Iyyer, who was doing work on narrative understanding of long stories and books. This really intrigued me because I love books and had written an undergrad thesis called "Narrative Mechanisms of Frustration." When I applied, my advisor thought these were technical computer science mechanisms, but they weren't—it was just how I described what was happening in literature! He found my background compelling and thought my math background would help me pick up the fundamentals. I literally took my first NLP course during my first semester of PhD.
Tell us about your PhD research.
My thesis is titled "Modes of Human-AI Collaboration in Text: Benchmarks, Metrics, and Interpretive Tasks." I'm interested in understanding how language models might interpret text and draw deeper conclusions that a humanities scholar might, rather than just surface-level attributes.
Early NLP work on literature focused on extracting named entities from books, mapping character interactions, and creating rough plot timelines. I'm much more interested in overarching themes across entire texts, how characters' motivations influence their decisions, and how texts are situated in the larger context of when and where the author wrote them.
I mainly work on this as an evaluation problem—seeing if language models are capable of extracting these higher-level ideas from literary texts.
What was it like studying literary analysis with AI as ChatGPT emerged during your PhD?
I have a crazy story about this. My first PhD work proposed a task called "literary evidence retrieval." Scholars always cite quotes from primary texts in their analysis, so we took paragraphs where humanities scholars analyzed The Great Gatsby, hid the quotes from the novel, and asked language models to retrieve those quotes.
My first work used a small dense RoBERTa-based retriever because we couldn't fit entire novels into language models. I literally wrote in the motivation section that we needed this approach because we couldn't put full novels into the context.
Five years later, my most recent work revisited this task with large language models that can fit entire novels. In February, I tried the task myself for the first time—it took me eight hours with physical copies of books. None of the models did as well as I did on 40 examples. But by the time the paper was accepted three months later, Gemini Pro 2.5 had come out and outperformed me. It was such a small sample, but crazy to see how fast things moved.
The beginning of my PhD, I didn't write any prompts. That was unheard of. Now my mom uses LLMs in her job—she used to never know what I worked on and now has enterprise LLM access.
Katherine defending her PhD thesis
How do you think LLMs read differently than humans?
The most obvious difference is speed—Gemini responds in 30 seconds while I took 12 minutes per example on average. When I reviewed my mistakes, often I just didn't remember specific sentences from 200-400 page novels, while the model had perfect recall.
I think LLMs process text token by token in a way that's similar to close reading in literary analysis, where you pick apart text at the word level. But when humans read 400 pages, not every word registers as a distinct unit in our brains the way it might for models.
Why is designing good evaluations so challenging, and why is there such a gap between current evals and what people actually experience with these models?
It's the tension between wanting to scale evaluations quickly with automatic evaluation versus needing fine-grained human expert evaluation. A lot of my work has focused on investing in hiring actual experts. For machine translation of literature, we hired literary translators with comparative literature PhDs. Their insights were definitely different from what you'd get from mechanical turkers, even for simple A/B tests.
The other side is the cost of creating evaluations. I helped work on a benchmark for agents this past year where we manually created questions and evaluated all the agents by hand. I spent probably all of March watching OpenAI's operator click around and look for things. It took a really long time to get through even 100-150 examples, but we learned so much from having human eyes on what the agents were doing.
There's this constant tension between wanting to scale up evaluations and needing slower, fine-grained human evaluation.
What are you working on at Pangram?
I'm working on a model that can detect how pervasive AI has been in a piece of text. We know people don't just generate text with AI—they often come with text they've written and ask AI to edit it. These edits range from minor grammar fixes to major restructuring or complete paraphrasing.
We want to measure that effect because we can look at the scale from human to fully AI-written text as a spectrum, with AI-edited text somewhere in between. We're training a model to identify where on that spectrum a text might fall.
This is really important for our educational customers, but we've had interest from many others since LLMs are integrated into text editors like Google Docs now. People want to know how invasive AI has been in a piece of text—which edits might be "forgivable" versus those that take significant cognitive load off the user.
Katherine and the team working late on a research paper
Why did you decide to join Pangram as a founding researcher?
I love the team here. Bradley and Max really knocked it out of the park with the founding team. I spend 90% of my time with Pangram people, but honestly wouldn't have it any other way—as evidenced by me working out with everyone in the last 10 days!
It's really nice having office space to go into. I was a remote PhD student for a while, and it's fun having a space where everyone works toward a similar goal. I started my PhD directly after undergrad during the first year of COVID, so it was fully remote with nowhere to go. I've never experienced working in an office or having a "normal job."
Bradley is one of the smartest people I've ever worked under—not even an exaggeration. I feel like I've learned so much and I'm getting hands-on experience with things I didn't get to do in my PhD. When LLMs came out, everyone wanted to do research on them and we forgot about modeling. There was no point trying to train your own model to keep up with the big labs, so I hadn't done much modeling besides fine-tuning.
It's been really cool picking up practical skills. I'm not a good software engineer because I'm a researcher, so that's been fun. Elyas was helping me fix GitHub issues for half an hour today! And being able to work with smart people, do research, and be in Brooklyn—it's a great location and I love the East Coast.
You're more of an AI skeptic than optimist and don't integrate AI much into your daily life. What underpins this skepticism?
Two things. On a micro scale, I'm the only one of my close college friends who went into computer science research. The others are actuaries and didn't know about language modeling when it came out. They started hearing about ChatGPT when Instagram added AI to search bars and chat features. For a long time, I was the only one who knew about these technologies, but my friends seemed fine living without them. I realized how much AI stuff was living in my head rent-free while they were blissfully unaware but doing just fine.
I was in this echo chamber of people either being AI doomers or really hyping up LLMs, but that's not what 95% of people talk about.
On a philosophical scale, through my writing journey—learning I don't want to write but love analyzing—I realized I only value text that comes out of humans. I don't care what LLMs write or if they can do literary analysis tasks, because I think the ability to do these things is valuable for humans. It's a skill humans can have, but I don't think it means anything if an LLM has this skill.
Writing is a very human task, and I really value that a human was behind it. It's made me a bad AI text detector because I just don't read AI text!
What do you like to do for fun outside of work?
I love walking my dogs around Brooklyn—I have two dogs and one of them really likes long walks. I like to work out, read fiction, and I'm pretty into knitting and crocheting.
You've made it a summer goal to work out with everyone on the Pangram team. What's been your favorite workout so far?
I think climbing with Lu, which is good because we're about to do it again in 45 minutes! Climbing is very social because you take breaks between attempts, so you chat and hang out.
I've done kickboxing which was high-intensity the entire time with individual bags, so not as team-oriented. And I did another workout with our founders that was chaos for the entire hour—no opportunity to talk, we were just trying to survive! The morale was high at times, though maybe low for Max at points. It was a great team bonding experience, but climbing wins for being the most social.
What advice would you give to someone looking to get into ML research?
Two main things: Don't try to do projects by yourself. Some early PhD students fall into this trap, but you need to collaborate with people more senior than you. If it's your first project, it's honestly fine if they're doing things that shock and impress you—you'll learn so much from working with very smart people.
Second, you need to try these things yourself and get out of your comfort zone. I learned Python only by deciding to use it as my only language one summer for a research project. Be very hands-on with everything, including the math—write out derivatives by hand!
I actually got addicted to Math Academy six months ago, which was insane but amazing for getting back into mathematical fundamentals.
Katherine at Pangram
Katherine recently completed her PhD in Computer Science at UMass Amherst and will be joining Pangram Labs full-time as our first founding research scientist. When she's not training AI detection models or analyzing literature with language models, you can find her walking her dogs around Brooklyn or planning the next team workout.
