Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

December 2025 πŸ…πŸ“ƒ: Very excited to have our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) selected for a Best Paper Award at NeurIPS 2025 (Datasets and Benchmarks Track)!! Huge congrats to the first author Liwei Jiang!!!

November 2025 πŸ’ŽπŸš€: Honored to be a Spring 2025 recipient of the Amazon Research Award for our project on measuring AI agentic safety!

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

[older news]


My research group:

Dan Chechelnitsky

CMU Portugal LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Malia Morgan

Pre-doctoral Young Investigator at Ai2

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith

HCII PhD student
co-advised with Motahhare Eslami

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Human-AI ethics and trust

My research group explores how people evaluate AI systems as trustworthy, acceptable, and safe in real interactions. Recent work shows that framing, guardrails, and uncertainty cues can strongly shape user reliance and perceptions, as seen in [Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks](https://arxiv.org/abs/2605.20512), [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195), and [Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty](https://arxiv.org/abs/2401.06730). These papers suggest that model behavior is not only a technical issue but also a social one, where explanations and constraints can change how much people depend on AI. Across this line of research, a central goal is to reduce harmful overreliance while preserving useful support. The work also reflects growing attention to aligning model behavior with human expectations, values, and safety needs.

Narratives and social stories

My research group explores how narrative structure, personal stories, and contextual framing influence interpretation, empathy, and harm detection. A key direction is understanding how story context changes the meaning of communication, highlighted by [Social Story Frames: Contextual Reasoning about Narrative Intent and Reception](https://arxiv.org/abs/2512.15925), [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633), and [Modeling Empathic Similarity in Personal Narratives](https://arxiv.org/abs/2305.14246). These studies examine how readers and models perceive intent, emotional tone, and similarity in personal narratives. They also show that narrative-aware systems may need to distinguish between literal content and the social or emotional effects of a story. Overall, this theme points to models that can better reason about stories as lived human experiences rather than just text sequences.

Social simulation and theory of mind

My research group explores how AI agents simulate people, reason about minds, and behave in social settings. Recent progress is visible in [OdysSim: Building Foundation Models for Human Behavior Simulation](https://arxiv.org/abs/2606.14199), [SOTOPIA-ToM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind](https://arxiv.org/abs/2605.02307), and [Social World Models](https://arxiv.org/abs/2509.00559). These papers focus on whether models can represent beliefs, goals, and information flow across interacting agents. They also suggest that strong performance in social tasks requires more than surface-level dialogue imitation; it needs grounded modeling of human behavior and multi-agent dynamics. This research is pushing toward AI systems that can participate in, predict, and potentially help organize complex social interactions.

Agentic interaction and personalization

My research group explores how AI agents become more proactive, personalized, and effective in multi-turn tasks. Work such as [Training Proactive and Personalized LLM Agents](https://arxiv.org/abs/2511.02208), [OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety](https://arxiv.org/abs/2507.06134), and [TOM-SWE: User Mental Modeling For Software Engineering Agents](https://arxiv.org/abs/2510.21903) shows a move toward agents that adapt to users and operate safely in real environments. These papers emphasize that helpful agents must balance initiative with user intent, domain constraints, and safety risks. They also highlight the need for better evaluation of agent behavior outside controlled benchmarks, especially in software and open-ended workflows. Together, this theme reflects the shift from static language models to interactive systems that can reason, act, and adapt over time.