Maarten Sap (he/him)
Positions
| | |
| ------------------------------------------------------------ | -------------: |
| **Carnegie Mellon University**: Language Technologies Institute | |
| Assistant Professor | 2022 – present |
| **Allen Institute for AI** | |
| Visiting Research Scientist | 2022 – present |
| Postdoctoral Researcher / Young Investigator | 2021 – 2022 |
| Research Intern | 2018 – 2019 |
| **Microsoft Research** | |
| Research Intern | 2019 |
Education
| | |
| ------------------------------------------------------------ | ----------: |
| **University of Washington**: Paul G. Allen School of Computer Science & Engineering | 2015 – 2022 |
| PhD in Computer Science & Engineering, research focus on Natural Language Processing | |
| advised by Yejin Choi & Noah Smith | |
| Thesis: [Positive AI with Social Commonsense Models](pdfs/sap2021positiveAIwithSocialCommonsenseModels.pdf) | |
| **École Polytechnique Fédérale de Lausanne**: School of Computer and Communication Sciences | 2010 – 2014 |
| BS in Communications and Information Systems | |
Advising
### PhD students
| | | |
| ------------------------------------------------------------ | ------- | --------------------: |
| Ji Min Mun she/her | LTI PhD | 09/2022–present |
| [Akhila Yerukola](https://akhila-yerukola.github.io/) she/her | LTI PhD | 09/2022–present |
| [Xuhui Zhou](https://xuhuizhou.github.io/) he/him | LTI PhD | 09/2022–present |
### Research Interns & Research Masters
| | | |
| ------------------------------------------------------------ | ------------------- | --------------------: |
| [Athiya Deviyani](https://www.athiyadeviyani.com/) she/her | LTI MSAII | 09/2022–present |
| Jocelyn Chen she/her (*primarily advised by Cynthia Breazeal*) | MIT Media Lab | 09/2023–present |
| Yiming Zhang he/him (*co-advised with [Sherry Tongshuang Wu]()*) | UChicago MS | 09/2022–present |
| [Julia Mendelsohn](https://juliamendelsohn.github.io/) she/her | AI2 Research Intern | 06/2022–01/2023 |
| [Sebastin Santy](http://sebastinsanty.com/) he/him | AI2 Research Intern | 06/2022–01/2023 |
### Undergraduates & Professional Masters
| | | |
| ------------------------------------------------------------ | ------------------- | --------------------: |
| Sravani Nanduri she/her (*co-advised with Liwei Jiang*) | UW CSE BS | 09/2021–10/2022 |
| [Skyler Hallinan](https://skylerhallinan.com/) he/him | UW CSE BS | 01/2021–08/2022 |
| Zhilin Wang he/him | UW CLMS | 01/2021–09/2021 |
| Michelle Ma she/her (*co-advised with Hannah Rashkin*) | UW CSE BS | 09/2019–12/2020 |
| Sam Gehman he/him | UW CSE MS | 09/2019–07/2020 |
| Aishwarya Nirmal she/her | UW CSE MS | 01/2018–06/2019 |
| Kenta Takatsu he/him | Cornell BS | 07/2018–03/2019 |
| [Zachary Horvitz](https://zacharyhorvitz.github.io/) he/him (*co-advised with Antoine Bosselut*) | AI2 Research Intern | 07/2018–03/2019 |
| Sarah Yu she/her | UW CSE BS | 03/2018–06/2018 |
| Lanhao Wu he/him (*co-advised with Saadia Gabriel*) | UW CSE BS | 03/2018–06/2018 |
| Boyan Li he/him (*co-advised with Saadia Gabriel*) | UW CSE BS | 01/2018–06/2018 |
| Amy Shah she/her (*co-advised with Elizabeth Clark*) | UW CSE BS | 09/2017–06/2018 |
| [Emily Allaway](https://emilyallaway.github.io/) she/her (*co-advised with Hannah Rashkin*) | UW CSE BS | 07/2017–06/2018 |
| Marcela Cindy Prasetio she/her (*co-advised with Hannah Rashkin*) | UW CSE BS | 01/2016–06/2017 |
Publications
Journal
- Maarten Sap, Anna Jafarpour, Yejin Choi, Noah A. Smith, James W. Pennebaker & Eric Horvitz (2022) Quantifying the narrative flow of imagined versus autobiographical stories. PNAS.
- Gregory Park, H Andrew Schwartz, Maarten Sap, Margaret L Kern, Evan Weingarten, Johannes C Eichstaedt, Jonah Berger, David J Stillwell, Michal Kosinski, Lyle H Ungar & Martin E P Seligman (2017) Living in the Past, Present, and Future: Measuring Temporal Orientation with Language. Journal of Personality.
- Margaret L Kern, Gregory Park, Johannes C Eichstaedt, H Andrew Schwartz, Maarten Sap, Laura K, Smith & Lyle H Ungar (2016) Gaining Insights From Social Media Language: Methodologies and Challenges. Psychological Methods.
- Johannes C Eichstaedt, H Andrew Schwartz, Margaret L Kern, Gregory Park, Darwin R Labarthe, Raina M Merchant, Sneha Jha, Megha Agrawal, Lukasz A Dziurzynski, Maarten Sap, Christopher Weeg, Emily Larson, Lyle H Ungar & Martin E P Seligman (2015) Psychological Language on Twitter Predicts County-level Heart Disease Mortality. Psychological Science 26(2). SAGE Publications. 159--169.
- Charlene A Wong, Maarten Sap, Hansen Andrew Schwartz, Robert Town, Tom Baker, Lyle Ungar & Raina M Merchant (2015) Twitter Sentiment Predicts Affordable Care Act Marketplace Enrollment. Journal of Medical Internet Research 17(2). JMIR Publications Inc..
- Raina M. Merchant, Yoonhee P. Ha, Charlene A. Wong, H. Andrew Schwartz, Maarten Sap, Lyle H. Ungar & David A. Asch (2014) The 2013 US Government Shutdown (#Shutdown) and Health: An Emerging Role for Social Media. American Journal of Public Health 2014. e1--e3.
Conference
- Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi & Maarten Sap (2022) ProsocialDialog: A Prosocial Backbone for Conversational Agents. EMNLP.
- Maarten Sap, Ronan LeBras, Daniel Fried & Yejin Choi (2022) Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs. EMNLP.
- Zhijing Jin, Sydney Levine, Fernando Gonzalez Adauto, Ojasv Kamal, Maarten Sap, Mrinmaya Sachan, Rada Mihalcea, Joshua B. Tenenbaum & Bernhard Schölkopf (2022) When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment. NeurIPS.
- Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi & Noah A. Smith (2022) Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. NAACL.
- Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hanna Hajishirzi, Yejin Choi & Noah A. Smith (2022) Aligning to Social Norms and Values in Interactive Narratives. NAACL.
- Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray & Ece Kamar (2022) ToxiGen: Controlling Language Models to Generate Implied and Adversarial Toxicity. ACL.
- Saadia Gabriel, Skyler Hallinan, Maarten Sap, Pemi Nguyen, Franziska Roesner, Eunsol Choi & Yejin Choi (2022) Misinfo Reaction Frames: Reasoning about Readers' Reactions to News Headlines. ACL.
- Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell & Matt Gardner (2021) Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. EMNLP.
- Ashutosh Baheti, Maarten Sap, Alan Ritter & Mark Riedl (2021) Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts. EMNLP.
- Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith & Yejin Choi (2021) DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts. ACL.
- Albert Xu, Eshaan Pathak, Eric Wallace, Suchin, Gururangan, Maarten Sap & Dan Klein (2021) Detoxifying Language Models Risks Marginalizing Minority Voices. NAACL.
- Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Yejin Choi & Noah A. Smith (2021) Challenges in Automated Debiasing for Toxic Language Detection. EACL.
- Xinyao Ma*, Maarten Sap*, Hannah Rashkin & Yejin Choi (2020) PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction. EMNLP.
- Sam Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi & Noah A Smith (2020) RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. Findings of EMNLP.
- Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap & Yejin Choi (2020) Social Chemistry 101: Learning to Reason about Social and Moral Norms. EMNLP.
- Maarten Sap, Eric Horvitz, Yejin Choi, Noah A Smith & James W. Pennebaker (2020) Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models. ACL.
- Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A Smith & Yejin Choi (2020) Social Bias Frames: Reasoning about Social and Power Implications of Language. ACL.
- Maarten Sap*, Hannah, Rashkin*, Derek Chen, Ronan LeBras & Yejin Choi (2019) Social IQa: Commonsense Reasoning about Social Interactions. EMNLP.
- Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi & Noah A Smith (2019) The Risk of Racial Bias in Hate Speech Detection. ACL.
- Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz & Yejin Choi (2019) COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. ACL.
- Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A Smith & Yejin Choi (2019) ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. AAAI.
- Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight & Yejin Choi (2018) Modeling Naive Psychology of Characters in Simple Commonsense Stories. ACL.
- Hannah Rashkin*, Maarten Sap*, Emily Allaway, Noah A. Smith & Yejin Choi (2018) Event2Mind: Commonsense Inference on Events, Intents, and Reactions. ACL.
- Maarten Sap, Marcella Cindy Prasetio, Ari Holtzman, Hannah Rashkin & Yejin Choi (2017) Connotation Frames of Power and Agency in Modern Films. EMNLP.
- Roy Schwartz, Maarten Sap, Ioannis Konstas, Li Zilles, Yejin Choi & Noah A Smith (2017) The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task. CoNLL.
- H. Andrew Schwartz, Gregory Park, Maarten Sap, Evan Weingarten, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Jonah Berger, Martin Seligman & Lyle Ungar (2015) Extracting Human Temporal Orientation from Facebook Language. NAACL.
- Maarten Sap, Gregory Park, Johannes C. Eichstaedt, Margaret L. Kern, David J. Stillwell, Michal Kosinski, Lyle H. Ungar & Hansen Andrew Schwartz (2014) Developing Age and Gender Predictive Lexica over Social Media. EMNLP.
Workshop
- Zhilin Wang, Anna Jafarpour & Maarten Sap (2022) Uncovering Surprising Event Boundaries in Narratives. Workshop on Narrative Understanding.
- Tal August, Maarten Sap, Elizabeth Clark, Katharina Reinecke & Noah A. Smith (2020) Exploring the Effect of Author and Reader Identity in Online Story Writing: the StoriesInTheWild Corpus. Workshop on Narrative Understanding, Storylines, and Events (NUSE)@ ACL.
- Roy Schwartz, Maarten Sap, Ioannis Konstas, Li Zilles, Yejin Choi & Noah A Smith (2017) Story Cloze task: UW NLP System. EACL Workshop LSD Sem. 52--55.
- Daniel Preotiuc-Pietro, Maarten Sap, H Andrew Schwartz & Lyle Ungar (2015) Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task. NAACL Workshop on CLPsych.
- Daniel Preotiuc-Pietro, Johannes Eichstaedt, Gregory Park, Maarten Sap, Laura Smith, Victoria Tobolsky, H Andrew Schwartz & Lyle Ungar (2015) The Role of Personality, Age and Gender in Tweeting about Mental Illnesses. NAACL Workshop on CLPsych.
- H Andrew Schwartz, Johannes Eichstaedt, Margaret L Kern, Gregory Park, Maarten Sap, David Stillwell, Michal Kosinski & Lyle Ungar (2014) Towards Assessing Changes in Degree of Depression through Facebook. ACL Workshop on CLPsych. 118--125.
Demo
- Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ariel Holtzman, Yejin Choi, Noah A Smith & Mari Ostendorf (2018) Sounding Board: A User-Centric and Content-Driven Social Chatbot. NAACL System Demonstrations.
- H Andrew Schwartz, Salvatore Giorgi, Maarten Sap, Patrick Crutchley, Lyle Ungar & Johannes Eichstaedt (2017) DLATK: Differential Language Analysis ToolKit. EMNLP System Demonstrations. 55--60.
Other
- Maarten Sap (2021) Positive AI with Social Commonsense Models.
- Hao Fang, Hao Cheng, Elizabeth Clark, Ariel Holtzman, Maarten Sap, Mari Ostendorf, Yejin Choi & Noah A Smith (2017) Sounding Board - University of Washington’s Alexa Prize Submission. Alexa Prize Proceedings.
- H Andrew Schwartz, Maarten Sap, Margaret L Kern, Johannes C Eichstaedt, Adam Kapelner, Megha Agrawal, Eduardo Blanco, Lukasz Dziurzynski, Gregory Park, David Stillwell, Michal Kosinski, Martin E P Seligman & Lyle H Ungar (2016) Predicting individual well-being through the language of social media. Biocomputing 2016: Proceedings of the Pacific Symposium. 516--527.
Preprint
- Hyunwoo Kim, Jack Hessel, Liwei Jiang, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap & Yejin Choi (2022) SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. arXiv.
- Skyler Hallinan, Alisa Liu, Yejin Choi & Maarten Sap (2022) Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts. arXiv.
- Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Le Bras Ronan, Maxwell Forbes, Jon Borchardt, Jenny Liang, Oren Etzioni, Maarten Sap & Yejin Choi (2021) Delphi: Towards Machine Ethics and Norms. arXiv.
Thesis Committees
| | | | |
| ---- | --------------- | --------------------- | ---: |
| PhD | Prakhar Gupta | Jeff Bingham, CMU | 2023 |
| Ms | Jocelyn Chen | Cytnhia Breazeal, MIT | 2023 |
| PhD | Chan Young Park | Yulia Tsvetkov, UW | 2023 |
| PhD | Paul Röttger | Scott Hale, University of Oxford | 2023 |
Teaching
### Courses
||||
|-|-|-:|
| [11-830 Computational Ethics](http://maartensap.com/11-830-Spring2023/) | | Spring 2023 |
### Guest lectures & Tutorials
| | | |
| ------------------------------------------------------ | -------------------- | ----------: |
| Bias in Natural Language Processing | 05-899 Guest Lecture | Spring 2023 |
| Bias in Natural Language Processing | 15-884 Guest Lecture | Fall 2022 |
| "Crowdsourcing Beyond Annotation" | Tutorial | EMNLP 2021 |
| "Commonsense Reasoning in Natural Language Processing" | Tutorial | ACL 2020 |
Service
### Workshops
| | | |
| ------------------------------------------------------------ | ------------------ | ---------: |
| [Multimodal Content Moderation Workshop](https://multimodal-content-moderation.github.io/) | co-organizer | CVPR 2023 |
| [NLP for Positive Impact Workshop](https://sites.google.com/view/nlp4positiveimpact) | steering committee | EMNLP 2022 |
| [NLP for Positive Impact Workshop](https://sites.google.com/view/nlp4positiveimpact/previous-workshops/acl-2021-workshop) | co-organizer | ACL 2021 |
### Committees
| | | |
| ------------------------------------------------ | ------- | --------------: |
| Diversity, Equity, and Inclusion committee | CMU LTI | 2022-present |
| PhD & MLT admissions committee | CMU LTI | 2022-present |
| Socio-cultural diversity and inclusion committee | ACL | 2020 |
| Diversity committee | UW CSE | 2016–2020 |
| Graduate student advisory council (G5PAC) | UW CSE | 01/2018–12/2020 |
#### Senior program committees
| | |
| ------------------ | -----------: |
| ACL rolling review | 2020–present |
| AAAI | 2021 |
#### Reviewing
| | |
| ------------------------------------------------------------ | -----------: |
| *Journals & conferences* | |
| ACL rolling review | 2020–present |
| ACL | 2019–present |
| Transactions of ACL | 2020, 2022 |
| EMNLP | 2018–2022 |
| AAAI | 2020 |
| ICWSM | 2021 |
| Dementia and Geriatric Cognitive Disorders Journal | 2020 |
| Computational Linguistic | 2019, 2020 |
| Humanities and Social Sciences Communications | 2019 |
| Journal of Artificial Intelligence Research | 2019 |
| IEEE Transactions on Cognitive and Developmental Systems | 2019 |
| Social Psychological and Personality Science | 2018 |
| *Workshops* | |
| Workshop on NLP for Positive Impact | 2022 |
| Workshop on NLP for Causal Inference | 2021 |
| NAACL Student Research Workshop | 2019 |
| CLPsych workshop | 2016–2018 |
| Stylistic Variation workshop | 2018 |
### Other service & outreach
| | |
| ------------------------------------------------------------ | --------------------: |
| Presentation to U.S. congressional appropriations committee about risks and implications of AI and LLMs | 03/2023 |
| [Red-teaming GPT-4](https://cdn.openai.com/papers/gpt-4-system-card.pdf) for OpenAI | 09/2022–12/2022 |
Talks
| | |
| ------------------------------------------------------------ | ------: |
| Toward Prosocial NLP: Reasoning About And Responding to Toxicity in Language | |
| MIT Media Lab Breazeal Group Meeting | 11/2022 |
| CMU S3D Computational Social Science Seminar | 11/2022 |
| Amazon Alexa Trust & Privacy | 11/2022 |
| University of Minnesota NLP seminar | 10/2022 |
| Detecting and Rewriting Social Biases in Language | |
| Pinterest NLP seminar | 09/2022 |
| UIUC Responsible Data Science Seminar Series | 02/2022 |
| MilaNLP seminar at Università Bocconi | 10/2021 |
| PAN workshop at CLEF | 09/2021 |
| Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection | |
| NAACL main conference | 07/2022 |
| Text As Data (TADA) | 10/2021 |
| Positive AI with Social Commonsense Models | |
| AKBC Workshop on Commonsense Reasoning | 10/2021 |
| University of Toronto Computer Science | 04/2021 |
| MIT EECS | 03/2021 |
| CMU LTI/MLD | 03/2021 |
| UChicago CS | 03/2021 |
| TTIC | 02/2021 |
| Emory CS | 02/2021 |
| Vanderbilt CS | 02/2021 |
| EPFL I&C | 01/2021 |
| Yale Data Science & Statistics seminar | 01/2021 |
| PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction | |
| EMNLP conference | 11/2020 |
| Social Bias Frames: Reasoning About Social and Power Dynamics | |
| WeCNLP Summit | 10/2020 |
| ACL Conference | 07/2020 |
| Reasoning about Social Dynamics and Social Bias in Language | |
| SRI seminar | 01/2021 |
| Georgia Tech NLP seminar | 10/2020 |
| Berkeley NLP seminar | 02/2020 |
| Stanford NLP seminar | 02/2020 |
| Social and Ethical Considerations in English Toxic Language Detection | |
| NLP with Friends | 08/2020 |
| Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models | |
| ACL Conference | 07/2020 |
| COMET: Commonsense Transformers for Automatic Knowledge Graph Construction | |
| DARPA Communicating with Computers grant meeting | 11/2019 |
| Social IQa: Commonsense Reasoning about Social Interactions | |
| EMNLP conference | 11/2019 |
| The Risk of Racial Bias in Hate Speech Detection | |
| ACL Conference | 07/2019 |
| ICML Queer in AI workshop | 06/2019 |
| ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning | |
| AAAI conference | 01/2019 |
| AI2 seminar | 01/2019 |
| Event2Mind: Commonsense Inference on Events, Intents, and Reactions | |
| DARPA Communicating with Computers grant meeting | 07/2018 |
| Detecting Implicit Bias in Text through Connotative Language | |
| UW Social Psychology seminar | 04/2018 |