Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

December 2025 πŸ…πŸ“ƒ: Very excited to have our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) selected for a Best Paper Award at NeurIPS 2025 (Datasets and Benchmarks Track)!! Huge congrats to the first author Liwei Jiang!!!

November 2025 πŸ’ŽπŸš€: Honored to be a Spring 2025 recipient of the Amazon Research Award for our project on measuring AI agentic safety!

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

[older news]


My research group:

Dan Chechelnitsky

CMU Portugal LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Malia Morgan

Pre-doctoral Young Investigator at Ai2

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith

HCII PhD student
co-advised with Motahhare Eslami

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Human-centered AI ethics

My research group explores how to build AI systems that are safer, fairer, and more aligned with human values and lived experience. A central thread is understanding harms in real interactions, as in [OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety](https://arxiv.org/abs/2507.06134), which broadens safety evaluation beyond static benchmarks. We also study how AI behavior shapes user perception and trust, including [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195), which shows that safety interventions have social and emotional consequences. Complementing this, [PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm](https://arxiv.org/abs/2601.08951) helps characterize harm from multiple perspectives rather than a single narrow metric. Together, these papers point toward ethics research that is interaction-centered, context-aware, and grounded in human outcomes.

Narrative understanding of stories

My research group explores how language models interpret personal stories, narrative intent, and the social meaning carried by text. [Social Story Frames: Contextual Reasoning about Narrative Intent and Reception](https://arxiv.org/abs/2512.15925) highlights how the same story can be received differently depending on context and audience. [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633) investigates how empathy and style can be detected and modeled in first-person narratives. We are also interested in [The Empirical Variability of Narrative Perceptions of Social Media Texts](https://aclanthology.org/2024.emnlp-main.1113/), which shows that story interpretation can vary substantially across readers. Taken together, these works suggest a broader agenda on computational storytelling that emphasizes interpretation, empathy, and variability in how narratives are understood.

Social intelligence in agents

My research group explores how AI agents can reason about people, social context, and interactive behavior in realistic multi-agent settings. [Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies](http://arxiv.org/abs/2604.15607) examines how social outcomes depend on both human and model characteristics, especially when simulations and live studies diverge. [Social World Models](https://arxiv.org/abs/2509.00559) pushes toward richer representations of social dynamics that agents can use to anticipate behavior and consequences. We also build on [SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions](https://arxiv.org/abs/2506.23046), which probes whether systems can track multiple viewpoints in embodied settings. Across these efforts, the field is moving from isolated task success toward socially intelligent behavior that is robust, interactive, and realistically grounded.

Agent simulation and safety

My research group explores the design, evaluation, and simulation of AI agents that must act reliably in open-ended environments. [TOM-SWE: User Mental Modeling For Software Engineering Agents](https://arxiv.org/abs/2510.21903) studies whether software engineering agents can model user intentions well enough to operate effectively in complex workflows. [Mind the Sim2Real Gap in User Simulation for Agentic Tasks](https://arxiv.org/abs/2603.11245) underscores that user simulators can look strong in synthetic settings while failing to match real-world behavior. We also investigate [Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering](https://arxiv.org/abs/2502.13069), which tackles ambiguity through interactive clarification rather than brittle guessing. Overall, this theme focuses on building agents that are proactive, adaptable, and safe when the environment is underspecified or unpredictable.

Language variation and bias

My research group explores how large language models respond to linguistic diversity, dialect, and culturally specific expression. [Black LLMirror: User (Self) Perceptions in Black American English Interactions with LLMs](https://dl.acm.org/doi/abs/10.1145/3772318.3791111) examines how interactions with Black American English can shape users' self-perception and model behavior. [Rejected Dialects: Biases Against African American Language in Reward Models](https://arxiv.org/abs/2502.12858) exposes systematic bias in model preference signals against dialectal language. We also study [NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models](https://aclanthology.org/2025.naacl-long.120/), which provides a way to assess whether models can flex across cultural contexts without losing quality or respectfulness. These papers collectively show that language technologies must be evaluated not only for accuracy, but also for how they treat nonstandard, multilingual, and culturally grounded expression.