Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

December 2025 πŸ…πŸ“ƒ: Very excited to have our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) selected for a Best Paper Award at NeurIPS 2025 (Datasets and Benchmarks Track)!! Huge congrats to the first author Liwei Jiang!!!

November 2025 πŸ’ŽπŸš€: Honored to be a Spring 2025 recipient of the Amazon Research Award for our project on measuring AI agentic safety!

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

[older news]


My research group:

Dan Chechelnitsky

CMU Portugal LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Malia Morgan

Pre-doctoral Young Investigator at Ai2

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith

HCII PhD student
co-advised with Motahhare Eslami

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Ethics and Human-Centered AI

My research group explores how to design, evaluate, and govern AI systems so they better align with human values and everyday needs. Recent work shows that explicit value framing can reduce over-reliance in writing support, while careful privacy redaction explanations can change how people understand AI-mediated interactions: [Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks](https://arxiv.org/abs/2605.20512) and [Examining the Effect of Explanations of AI Privacy Redaction in AI-mediated Interactions](https://arxiv.org/abs/2603.24735). We also study how safeguards shape user trust and preferences, as in [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195). Another thread asks how to measure human reliance and preference in a more interaction-centered way, reflected in [Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance](https://aclanthology.org/2025.naacl-long.556/). Together, these papers point to a broader agenda of making AI systems more transparent, safer, and more responsive to human judgment.

Narrative and Story Understanding

My research group explores how AI systems interpret, model, and generate stories, especially when narrative context shapes meaning and reception. A central focus is how narrative intent and audience reaction can be represented computationally, highlighted by [Social Story Frames: Contextual Reasoning about Narrative Intent and Reception](https://arxiv.org/abs/2512.15925). We also examine how personal backstory affects violence detection and interpretation in language, as in [Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication](https://arxiv.org/abs/2505.21451). Earlier work on [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633) shows how models can trace empathy and stylistic patterns in personal storytelling. More broadly, this line of research suggests that narrative understanding requires attention to context, perspective, and emotional nuance rather than surface text alone.

Social Intelligence and AI Agents

My research group explores how AI agents reason about people, coordinate in multi-agent settings, and simulate social interaction. A major recent direction is theory-of-mind style evaluation, exemplified by [SOTOPIA-ToM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind](https://arxiv.org/abs/2605.02307), which probes how agents manage beliefs and information across interactions. We also study whether human behavior can be simulated and improved in agentic settings, as in [Reinforcing Human Behavior Simulation via Verbal Feedback](https://arxiv.org/abs/2605.20506). Another important thread evaluates the gap between synthetic interaction and real social behavior, emphasized by [Mind the Sim2Real Gap in User Simulation for Agentic Tasks](https://arxiv.org/abs/2603.11245). These papers collectively show that building socially capable agents requires both stronger social reasoning and careful validation against real user behavior.

Language Variation and Multilingual Robustness

My research group explores how language models behave across dialects, registers, and languages, with an emphasis on robustness and fairness. Recent work shows that systems can fail under linguistic variation, as in [Out of Style: RAG's Fragility to Linguistic Variation](https://arxiv.org/abs/2504.08231). We also study multilingual moderation and safety, reflected in [PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages](https://arxiv.org/abs/2504.04377). Bias in reward models and ranking systems is another concern, highlighted by [Rejected Dialects: Biases Against African American Language in Reward Models](https://arxiv.org/abs/2502.12858). Together, these papers point toward a research program that treats linguistic diversity as a core requirement for reliable, equitable AI rather than an edge case.