Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

December 2025 πŸ…πŸ“ƒ: Very excited to have our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) selected for a Best Paper Award at NeurIPS 2025 (Datasets and Benchmarks Track)!! Huge congrats to the first author Liwei Jiang!!!

November 2025 πŸ’ŽπŸš€: Honored to be a Spring 2025 recipient of the Amazon Research Award for our project on measuring AI agentic safety!

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

[older news]


My research group:

Dan Chechelnitsky

CMU Portugal LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Malia Morgan

Pre-doctoral Young Investigator at Ai2

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith

HCII PhD student
co-advised with Motahhare Eslami

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Agentic collaboration and safety

My research group explores how AI agents can work more effectively with people while remaining safe, truthful, and robust in messy real-world settings. [TOM-SWE: User Mental Modeling For Software Engineering Agents](https://arxiv.org/abs/2510.21903) shows how agent performance can improve when systems infer user intent and mental state during software tasks. [Mind the Sim2Real Gap in User Simulation for Agentic Tasks](https://arxiv.org/abs/2603.11245) highlights a key challenge in evaluating these systems, namely that simulated users can diverge sharply from real ones. [OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety](https://arxiv.org/abs/2507.06134) provides a broader safety lens for testing agent behavior beyond narrow benchmarks.

Responsible AI and human values

My research group explores ethics, responsible AI, and human-centered design, with a focus on how systems affect trust, harm, and everyday decision-making. [PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm](https://arxiv.org/abs/2601.08951) expands evaluation beyond simple harmful/harmless labels to capture diverse human interpretations of impact. [Why (not) use AI? Analyzing People's Reasoning and Conditions for AI Acceptability](https://arxiv.org/abs/2502.07287) studies when people see AI as appropriate, useful, or unacceptable in practice. [Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits](https://arxiv.org/abs/2403.14791) adds a participatory approach for surfacing community priorities and concerns before deployment.

Stories, narratives, and empathy

My research group explores how language technologies understand stories, narrative intent, and the emotional structure of personal text. [Social Story Frames: Contextual Reasoning about Narrative Intent and Reception](https://arxiv.org/abs/2512.15925) pushes beyond surface text to ask how audiences interpret what a story is trying to do. [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633) examines how models can trace empathy and stylistic variation across personal narratives. [Modeling Empathic Similarity in Personal Narratives](https://arxiv.org/abs/2305.14246) further investigates how to measure relatedness between stories in ways that reflect human emotional judgment.

Social intelligence and world models

My research group explores social intelligence in language agents, including theory of mind, social reasoning, and the gap between simulated and human interaction. [Social World Models](https://arxiv.org/abs/2509.00559) reflects a move toward richer internal representations of social dynamics and interaction context. [SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions](https://arxiv.org/abs/2506.23046) examines whether models can track multiple perspectives in situated settings. [Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies](https://arxiv.org/abs/2604.15607) and [Is This the Real Life? Is This Just Fantasy? The Misleading Success of Simulating Social Interactions With LLMs](https://arxiv.org/abs/2403.05020) both underscore that synthetic interaction success can overstate real social competence.

Language variation, moderation, and bias

My research group explores how language models handle variation, moderation, and bias across dialects, styles, and multilingual settings. [PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages](https://arxiv.org/abs/2504.04377) shows the push toward practical safety systems that work across languages rather than only in English. [Rejected Dialects: Biases Against African American Language in Reward Models](https://arxiv.org/abs/2502.12858) reveals how preference and reward systems can penalize dialectal variation. [NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models](https://aclanthology.org/2025.naacl-long.120/) adds a complementary perspective by evaluating whether models adapt appropriately across cultural contexts.