Justin Cho


PhD Candidate in Computer Science

hd [dot] justincho [at] gmail [dot] com / google scholar / resume / blog

👋 Hi there!

I am a PhD candidate at the University of Southern California's (USC) Information Sciences Institute (ISI), advised by Jonathan May.

I'm interested in (i) improving and evaluating language models for human interactions in terms of dialogue, commonsense reasoning, and creativity (ii) developing language agents for accomplishing or facilitating complex human tasks, and (iii) doing so safely through scalable oversight and weak-to-strong generalization methods. I've always been mesmerized by superintelligent systems in science fiction such as TARS in Interstellar and I'm excited to make them a reality.
Previously, I did my undergraduate studies at Hong Kong University of Science and Technology with a major in Computer Science, where I first worked on NLP with Pascale Fung. I have also interned at Amazon Alexa, working with Nicolaas Jedema, Alessandro Moschitti, and Pedro Szekely(2022), at Meta AI on Alborz Geramifard's Cognitive AI team with Chinnadhurai Sankar (2022) and with Ahmad Beirami (2021), Stitch Fix (2020), and at ISI's Natural Language Group with Jonathan May (2019).

📆 Office hours

I host virtual office hours! I've had the pleasure making many new connections through these office hours. I'm open to discussing anything from research to career advice that I can help with. Please schedule through Calendly. Email me if the time slots don't work for you.

🔍 Research

Speechworthy Instruction-tuned Language Models
Hyundong J. Cho, Nicolaas Jedema, Leonardo F. R. Ribeiro, Karishma Sharma, Pedro Szekely, Alessandro Moschitti, Ruben Janssen, Jonathan May. Under review.
Current LLMs are fine-tuned with data exclusively with a text interface, which does not capture human preferences for speech, and thus generate text that is not suitable for text-to-speech systems. We collect 20K human preference data where annotators listen to the paired responses, instead of reading them through a text interface. We use this data for reinforcement learning with human feedback to adapt an instruction-tuned language model to generate speech-suitable text.
Hyundong J. Cho, Thamme Gowda, Yuyang Huang, Zixun Lu, Tianli Tong, Jonathan May. Under review. [project page]
We develop BotEval, an easily customizable, open-source, evaluation toolkit that focuses on enabling human-bot interactions as part of the evaluation process, as opposed to human evaluators evaluating a static input.
Hyundong J. Cho, Shuai Liu, Taiwei Shi, Darpan Jain, Basem Rizk, Yuyang Huang, Zixun Lu, Nuan Wen, Jonathan Gratch, Emilio Ferrera, Jonathan May. NAACL 2024
Deleting comments and banning users are iron-fisted moderation tactics that can lead to a chilling effect on free speech. Instead, conversational moderation aims to guide users to more constructive behavior. We investigate whether language models can be effective conversational moderators and thus be effective tools for improving the health of online discourse.
Hyundong J. Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Raghavi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar, EMNLP2023 [paper]
Estimating a user's goal in a dialogue can be done by asking natural language questions, and answering questions is a transferable skill that can be easily learned from examples. With this insight, we restructure dialogue state tracking (DST) to eliminate service-specific structured text and unify data from all services by decomposing each DST sample to a bundle of fine-grained example-guided question answering tasks. With a retriever trained to find examples that introduce similar updates to dialogue states, we find that our method can significantly boost continual learning performance, even for a model with just 60M parameters.
Jihyung Moon, Dong-Ho Lee, Hyundong J. Cho, Woojeong Jin, Chan Young Park, Minwoo Kim, Jonathan May, Jay Pujara, Sungjoon Park, EMNLP2023 [paper]
Toxic behavior in live-stream chat is a growing concern as live-streaming platforms such as Twitch and YouTube live are becoming increasingly popular. Previous detection methods are not effective for live-stream chat as each comment is only visible for a limited time and lacks a thread structure. To bridge this gap, we define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch and train live-stream chat-specific detection models.
Shuai Liu, Hyundong J. Cho, Marjorie Freedman, Xuezhe Ma, Jonathan May, ACL2023 [paper]
Endowing chatbots with a consistent persona is essential to an engaging conversation, yet it remains an unresolved challenge. In this work, we propose a new retrieval-enhanced approach for personalized response generation.
Hyundong J. Cho, Chinnadhurai Sankar, Christopher Lin, Kaushik Ram Sadagopan, Shahin Shayandeh, Asli Celikyilmaz, Jonathan May, Ahmad Beirami, EMNLP2022 Findings [paper] [project page]
Humans are robust to understanding dialogue states in the presence of noise and ambiguity, but dialogue state tracking (DST) models are not. This analysis of DST robustness has been sparse and uncoordinated in previous work. Our standardized and comprehensive DST diagnoses toolkit, CheckDST, is a collection of robustness tests and failure mode analytics. With CheckDST, we discover that different classes of DST models have clear strengths and weaknesses, where generation models are more promising for handling language variety while classification models are more robust to unseen entities.
Pei Zhou, Hyundong J. Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, Jay Pujara, Xiang Ren. EMNLP2022 [paper] [project page]
We introduce Reflect, a dataset that annotates dialogues with explicit CG and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data are rated as high quality and models trained using this data have even lower quality, while most Reflect responses are judged high quality. We also analyze whether CG can help models produce better-quality responses by using Reflect CG to guide RG models.
Pei Zhou, Pegah Jandaghi, Hyundong J. Cho, Bill Yuchen Lin, Jay Pujara, Xiang Ren. EMNLP2021 Findings [paper]
We collect 6k annotated explanations justifying responses from four dialogue datasets and ask humans to verify them and propose two probing settings to evaluate response generation models' commonsense reasoning capabilities. Probing results show that response generation models fail to capture the logical relations between commonsense explanations and responses and fine-tuning on in-domain data and increasing model sizes do not lead to understanding of commonsense reasoning.
Open-domain dialogue systems overlook an important phenomena that makes conversations engaging: the initiation of the next relevant contribution, which is the most proactive method of "grounding". We collect "Yes, and" type dialogue pairs that naturally embed such initations from an improv podcast and existing dialogue corpora to create the Spontaneanation Pairs Of Learnable ImprovisatioN (SPOLIN) dataset. Human evaluation shows that models fine-tuned with SPOLIN generate more engaging results.

🔍 Others

Hyundong J. Cho, Basel Shbita, Kartik Shenoy, Shuai Liu, Nikhil Patel, Hitesh Pindikanti, Jennifer Lee, Jonathan May. Alexa Prize Socialbot Grand Challenge 4 Proceedings, 2021
We present Viola, an open-domain dialogue system based on a simple generate-and-rank approach. Viola fetches a batch of response candidates from various neural dialogue models and template-based generators and chooses the final response with a poly-encoder ranker fine-tuned with annotated Alexa conversation data.

📢 News

➕ Miscellaneous

  • My pronouns are he, him, his.
  • I have been very fortunate to have lived in many different countries: Moscow, Russia; Oslo, Norway; Vienna, Austria; Abu Dhabi, UAE; Hong Kong; Tehran, Iran; Seoul, South Korea; Los Angeles, California. I look forward to living in new places and experiencing different cultures.
  • I'm working on a language learning app called AutoLang. I was frustrated that I couldn't say what I wanted to say to my Chinese peers even after mastering all the Chinese lessons on Duolingo. I realized I need a more personalized, bottom-up approach and decided to make one myself with a couple of friends. With AutoLang, you start learning things relevant to you from day 1 and build confidence to converse by talking to discover things you didn't know how to say or understand and automatically build your own curriculum with flashcards you create in the process!
  • I'm a bit of a productivity geek. Ask me about knowledge management and productivity tips! I use Logseq for knowledge management and use tips from How to Take Smart Notes, which recommends the Zettelkasten method.
  • I love playing football⚽ and I am a huge fan of FC Barcelona. Visca el Barça!
  • I am a citizen of South Korea and the US. I am fluent in both English and Korean. I am teaching myself Mandarin with Duolingo and HSK level tests. 我希望能在不远的将来用流利的中文和你交流。