Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

December 2025 πŸ…πŸ“ƒ: Very excited to have our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) selected for a Best Paper Award at NeurIPS 2025 (Datasets and Benchmarks Track)!! Huge congrats to the first author Liwei Jiang!!!

November 2025 πŸ’ŽπŸš€: Honored to be a Spring 2025 recipient of the Amazon Research Award for our project on measuring AI agentic safety!

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

[older news]


My research group:

Dan Chechelnitsky

CMU Portugal LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Malia Morgan

Pre-doctoral Young Investigator at Ai2

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith

HCII PhD student
co-advised with Motahhare Eslami

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Responsible Human-Centered AI

My research group explores how to design AI systems that are more transparent, value-aware, and aligned with human judgment. Recent work shows that framing an AI with explicit values can reduce overreliance in writing tasks, while explanations of privacy redaction change how people experience AI-mediated interactions. We are also examining when guardrails help or frustrate users, including how intent clarification can recover utility in multi-turn conversations and how people perceive AI acceptability in the first place. Together, these papers point toward human-centered AI that respects user agency, privacy, and trust: [Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks](https://arxiv.org/abs/2605.20512), [Examining the Effect of Explanations of AI Privacy Redaction in AI-mediated Interactions](https://arxiv.org/abs/2603.24735), and [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195).

Social Agents and Theory of Mind

My research group explores how AI agents reason about people, relationships, and interaction dynamics in multi-party settings. A major focus is theory-of-mind evaluation, from information management in social exchanges to embodied multi-perspective interaction and stress tests of machine social reasoning. We are also seeing strong interest in whether simulated social interaction is truly faithful to real human behavior, especially when agents must model preferences, beliefs, and cooperation. Key studies include [SOTOPIA-ToM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind](https://arxiv.org/abs/2605.02307), [SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions](https://arxiv.org/abs/2506.23046), and [Is This the Real Life? Is This Just Fantasy? The Misleading Success of Simulating Social Interactions With LLMs](http://arxiv.org/abs/2403.05020).

Narrative and Story Understanding

My research group explores how models understand stories, personal narratives, and the social meaning embedded in text. Recent work investigates narrative intent and reception, showing that context and backstory can radically shape how violent or harmful communication is interpreted. We also study empathy in personal stories, including how narrative style influences perceived emotional content and how those perceptions vary empirically across readers. This theme is represented by [Social Story Frames: Contextual Reasoning about Narrative Intent and Reception](https://arxiv.org/abs/2512.15925), [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633), and [The Empirical Variability of Narrative Perceptions of Social Media Texts](https://aclanthology.org/2024.emnlp-main.1113/).

Agent Safety and Reliability

My research group explores how to make LLM agents safer, more reliable, and less prone to harmful or deceptive behavior in real-world deployment. One line of work focuses on sandboxing and evaluating safety risks systematically, while another studies the trade-off between utility and truthfulness in agentic responses. We are also interested in how agent behavior shifts under uncertainty and whether structured interaction can improve reliability without sacrificing usefulness. Important examples include [OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety](https://arxiv.org/abs/2507.06134), [AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents](https://aclanthology.org/2025.naacl-long.595/), and [Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty](https://arxiv.org/abs/2401.06730).

Language Variation and Fairness

My research group explores how language technologies behave across dialects, multilingual settings, and culturally diverse communication styles. Recent papers show that retrieval and reward models can be fragile or biased when faced with linguistic variation, especially for African American Language and other nonstandard forms. We are also studying cultural adaptability and multilingual moderation so that LLMs can better generalize beyond dominant language varieties while avoiding harmful bias. This theme includes [Out of Style: RAG's Fragility to Linguistic Variation](https://arxiv.org/abs/2504.08231), [Rejected Dialects: Biases Against African American Language in Reward Models](https://arxiv.org/abs/2502.12858), and [NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models](https://aclanthology.org/2025.naacl-long.120/).