Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

October 2025 πŸ‡¨πŸ‡¦πŸŽ‰: Excited to be attending COLM 2025 in Montreal this October! I'll be giving a talk at the Social Sim Workshop on Unlocking Social Intelligence in AI agents. I'm also thrilled that five papers I co-authored will be presented by my amazing collaborators at COLM: HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions (led by Xuhui Zhou et al.), ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning (co-led by Jimin Mun et al.), PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, Fluid Language Model Benchmarking, and The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains.

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

August 2025 πŸ‘¨πŸΌβ€πŸ«: Excited to give a (virtual) talk about Responsible AI for Diverse Users and Cultures at the Gender Bias in NLP workshop at ACL 2025!

July 2025 πŸ§ πŸ›‘οΈ: Five papers were accepted to COLM 2025! Highlights include HAICOSYSTEM, a framework for sandboxing safety risks in human-AI interaction; ALFA, which aligns LLMs to ask better clinical questions; and PolyGuard, a multilingual moderation tool for unsafe content. Two other papers to be released soon :)

May 2025 πŸ§‘β€πŸ’»πŸ†: Super super excited to announce that our paper Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance received the Best Paper Runner Up award at NAACL 2025. Huge congratulations to Kaitlyn!

April 2025 πŸœοΈπŸš‚: Though I will not be attending NAACL 2025, my students and collaborators will be presenting some exciting papers: Joel Mire on Rejected Dialects: Biases Against African American Language in Reward Models, Akhila Yerukola on NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models; Kaitlyn Zhou on Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance; Xuhui Zhou on AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents.

[older news]


My research group:

Dan Chechelnitsky

LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI PhD student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Kynnedy Smith
(*co-advised with Motahhare Eslami

HCII PhD student

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Ethics in Human-AI Interaction

My research group explores the intricacies of ethical considerations in human-AI interactions. We delve into the importance of ensuring responsible AI by examining how different persona effects influence moral decisions, as highlighted in the paper [Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics](https://arxiv.org/abs/2506.12657). We're also investigating safety risks in human-AI interactions through the framework presented in the paper [HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions](http://arxiv.org/abs/2409.16427), which offers new methods for evaluating these risks. Furthermore, our work focuses on users' perceptions regarding AI biases and their influence on interactants, as explored in [User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions](https://arxiv.org/abs/2409.00862).

Understanding Narrative Dynamics

My research group explores the complexities of narratives and their emotional impacts on individuals. We investigate how empathy and narrative styles can be tracked through personal stories, as demonstrated in the paper [HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs](https://arxiv.org/abs/2405.17633). Our research also extends to quantifying narrative flow differences, particularly between imagined and autobiographical stories, highlighted in the paper [Quantifying the narrative flow of imagined versus autobiographical stories](https://www.pnas.org/doi/10.1073/pnas.2211715119). We also address violent communication through personalized modeling, as illustrated in [Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication](https://arxiv.org/abs/2505.21451).

Advances in Social Intelligence

My research group explores the simulation of social intelligence and interactions through advanced AI models. We are focused on refining the evaluation of social understanding in AI, a topic prominently addressed in the paper [Is This the Real Life? Is This Just Fantasy? The Misleading Success of Simulating Social Interactions With LLMs](http://arxiv.org/abs/2403.05020). Our efforts also include assessing the theory of mind in embodied interactions, as explored in [SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions](https://arxiv.org/abs/2506.23046). Additionally, we stress the importance of interactive evaluation frameworks for socially intelligent language agents, as proposed in [SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents](https://arxiv.org/abs/2310.11667).

Enhancing AI Understanding Through Personality

My research group explores the role of personality traits in shaping AI interactions and performance. Key studies include [BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data](http://arxiv.org/abs/2410.16491), which investigates how integrating human-like personalities can improve AI communications. Another notable paper, [Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models](https://arxiv.org/abs/2305.14763), challenges traditional methods by integrating personality aspects into AI understanding of complex social cues. We also address questions of utility and truthfulness in AI personas, reflected in [AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents](https://aclanthology.org/2025.naacl-long.595/).