Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

December 2025 πŸ…πŸ“ƒ: Very excited to have our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) selected for a Best Paper Award at NeurIPS 2025 (Datasets and Benchmarks Track)!! Huge congrats to the first author Liwei Jiang!!!

November 2025 πŸ’ŽπŸš€: Honored to be a Spring 2025 recipient of the Amazon Research Award for our project on measuring AI agentic safety!

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

[older news]


My research group:

Dan Chechelnitsky

CMU Portugal LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Malia Morgan

Pre-doctoral Young Investigator at Ai2

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith

HCII PhD student
co-advised with Motahhare Eslami

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Ethics and Human-Centered AI

My research group explores how to build AI systems that are safer, more respectful, and better aligned with human needs in real interactions. A core thread is understanding when guardrails, privacy mechanisms, and clarification strategies actually help users rather than merely making systems appear safer, as shown in [Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations](https://arxiv.org/abs/2604.27093), [Examining the Effect of Explanations of AI Privacy Redaction in AI-mediated Interactions](https://arxiv.org/abs/2603.24735), and [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195). We also study how AI systems can reflect or amplify bias and harm across language varieties and social identities, including [Black LLMirror: User (Self) Perceptions in Black American English Interactions with LLMs](https://dl.acm.org/doi/abs/10.1145/3772318.3791111) and [Rejected Dialects: Biases Against African American Language in Reward Models](https://arxiv.org/abs/2502.12858). Related work on [Common Sense or Ableism? Rethinking Commonsense Reasoning Through the Lens of Disability](https://aclanthology.org/2026.eacl-short.40/) and [NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models](https://aclanthology.org/2025.naacl-long.120/) pushes toward more inclusive evaluation standards. Across these papers, the emphasis is on participatory, context-aware design that treats human experience as central rather than incidental.

Narratives and Story Understanding

My research group explores how AI can analyze, model, and respond to stories, personal narratives, and the social meaning carried by text. We are especially interested in how narrative intent and reception vary across readers and contexts, highlighted by [Social Story Frames: Contextual Reasoning about Narrative Intent and Reception](https://arxiv.org/abs/2512.15925) and [The Empirical Variability of Narrative Perceptions of Social Media Texts](https://aclanthology.org/2024.emnlp-main.1113/). We also examine how LLMs can capture empathy and stylistic signals in lived experience through [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633) and how to compare people’s responses to stories using [Modeling Empathic Similarity in Personal Narratives](https://arxiv.org/abs/2305.14246). Together, these papers suggest that narrative understanding is not just about extracting content, but about modeling perspective, emotion, and interpretation.

Social Intelligence and Theory of Mind

My research group explores how AI agents understand people, anticipate beliefs, and succeed or fail in social interaction. A major theme is evaluating social intelligence more realistically, with work such as [SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents](https://arxiv.org/abs/2310.11667), [FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions](https://arxiv.org/abs/2310.15421), and [Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models](https://arxiv.org/abs/2305.14763). We also study richer models of social inference in [SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions](https://arxiv.org/abs/2506.23046) and [Social World Models](https://arxiv.org/abs/2509.00559). A recurring finding is that apparent social competence can be brittle or misleading, so our work focuses on separating genuine reasoning from surface-level imitation.

Agent Safety and Coordination

My research group explores how to make AI agents robust, trustworthy, and useful when they must act autonomously or alongside people. We focus on safety and coordination in realistic settings, especially through [OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety](https://arxiv.org/abs/2507.06134), [HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions](http://arxiv.org/abs/2409.16427), and [Mind the Sim2Real Gap in User Simulation for Agentic Tasks](https://arxiv.org/abs/2603.11245). We also examine how agent behavior changes under cooperation, deception, and human oversight in [Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies](http://arxiv.org/abs/2604.15607) and [AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents](https://aclanthology.org/2025.naacl-long.595/). This line of work aims to ensure that agents remain safe and legible while still being capable partners in complex tasks.

Language Variation and Safety Moderation

My research group explores how language technologies handle multilinguality, stylistic variation, and toxic or offensive content without losing nuance. A central concern is that models and safety tools can break down when language departs from standard forms, as seen in [Out of Style: RAG's Fragility to Linguistic Variation](https://arxiv.org/abs/2504.08231), [PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages](https://arxiv.org/abs/2504.04377), and [PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models](https://arxiv.org/abs/2405.09373). We also investigate social and cultural context in harmful language through [Counterspeakers’ Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate](https://arxiv.org/abs/2403.00179) and [Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures](https://arxiv.org/abs/2502.17710). Overall, this research asks how moderation systems can be both more effective and more culturally aware.