Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. I was named a Packard Fellow in 2025.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

October 2025 πŸ…β­: I’m super excited and grateful to announce that I'm part of the 2025 class of Packard Fellows. The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈

October 2025 πŸ”πŸ§‘β€πŸŽ“: Due to my lab being quite full already, I'm not taking looking for any new students in this upcoming PhD application cycle 😟.

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

August 2025 πŸ‘¨πŸΌβ€πŸ«: Excited to give a (virtual) talk about Responsible AI for Diverse Users and Cultures at the Gender Bias in NLP workshop at ACL 2025!

July 2025 πŸ§ πŸ›‘οΈ: Five papers were accepted to COLM 2025! Highlights include HAICOSYSTEM, a framework for sandboxing safety risks in human-AI interaction; ALFA, which aligns LLMs to ask better clinical questions; and PolyGuard, a multilingual moderation tool for unsafe content. Two other papers to be released soon :)

[older news]


My research group:

Dan Chechelnitsky

LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith
(*co-advised with Motahhare Eslami)

HCII PhD student

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Ethical AI and Human-Centered Design

My research group explores the ethical implications of artificial intelligence through frameworks that prioritize human values and social responsibility. We delve into safeguards for AI interactions, as highlighted in the pivotal paper, [HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions](http://arxiv.org/abs/2409.16427), which provides a comprehensive approach to mitigating safety risks. Additionally, we investigate user perceptions of AI through mechanisms such as [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195), which emphasizes how users respond to AI interventions. Our work also includes studied perspectives on moral judgment and AI trustworthiness, as shown in [Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications](https://arxiv.org/abs/2411.07042).

Narrative Understanding and Analysis

My research group explores how narratives shape human understanding and response, particularly within digital contexts. Recent studies, such as [Quantifying the narrative flow of imagined versus autobiographical stories](https://www.pnas.org/doi/10.1073/pnas.2211715119), detail the differences in perception between various forms of storytelling. We also investigate emotional and social dimensions in narrative engagement, as highlighted in [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633), revealing how empathy interacts with narrative experience. Furthermore, our work on [Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication](https://arxiv.org/abs/2505.21451) emphasizes the need for sensitivity in language representation and narrative impact in digital dialogues.

AI Agents and Social Intelligence

My research group explores the dynamics of social intelligence in AI agents, leveraging multi-agent systems to enhance interactions. An essential contribution to this field is the paper, [SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions](https://arxiv.org/abs/2506.23046), which assesses how AI understands social contexts. We also examine the effectiveness of AI in simulating social scenarios, as seen in [SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents](https://arxiv.org/abs/2310.11667), providing tools to gauge interactions of AI in varied social settings. Lastly, our investigation into the effects of LLM personalities on negotiation offers insights, captured in [Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations](https://arxiv.org/abs/2507.20409), on how agents can effectively communicate and adapt in complex human-like interactions.

Evaluating and Enhancing Language Model Reliability

My research group explores the challenges associated with the reliability of language models in real-world applications. One important study, [Stay True to the Evidence: Measuring Belief Entrenchment in Reasoning LLMs via the Martingale Property](https://arxiv.org/abs/2506.00195), delves into how language models can remain grounded in reality amid conflicting information. We also evaluate the impact of model training techniques on user engagement and satisfaction, evidenced by the findings reported in [Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences](https://arxiv.org/abs/2506.00195). Additionally, our work on [Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance](https://aclanthology.org/2025.naacl-long.556/) presents innovative methods for assessing and improving the trust users place in AI-generated responses.