Justin Cho

조현동

PhD Student in Computer Science

jcho [at] isi [dot] edu / google scholar / resume / blog

I am a PhD student at the University of Southern California's (USC) Information Sciences Institute (ISI), advised by Prof. Jonathan May.

My research interest is in natural language processing (NLP), specifically in natural language generation and dialogue systems. I am interested in making both open-domain and task-oriented dialogue systems more robust and versatile by 1) reducing the distribution shift from training time to deployment, 2) leveraging non-dialogue data to learn better representations about the world, and 3) learning continually and adapt to changes in the world.

Ultimately, I want to establish chatbots as the main interface that we use to interact with machines. Conversations are the most convenient means of communication for most people and so I believe my goal will make even complex intelligent systems accessible to a wider group of people, regardless of their technical expertise. Think of Samantha in Her and TARS in Interstellar!
Prior to my PhD program, I worked at ISI as a programmer analyst and graduated from HKUST with a Bachelor's in Computer Science. I have interned at Meta AI, Stitch Fix, ISI's Natural Language Group, and Imago.ai.

Office hours

I am hosting virtual office hours for those who want my advice/thoughts on their topic of interest. My primary intent for these office hours is knowledge transfer, helping you with productivity tips and getting started with research or with CS. Please schedule through Calendly. I will do my best to accommodate different time zones.

Research

checkdst-thumbnail
Hyundong J. Cho, Chinnadhurai Sankar, Christopher Lin, Kaushik Ram Sadagopan, Shahin Shayandeh, Asli Celikyilmaz, Jonathan May, Ahmad Beirami, EMNLP2022 Findings [paper]
We present our findings from standardized and comprehensive DST diagnoses, which have previously been sparse and uncoordinated, using our toolkit, CheckDST, a collection of robustness tests and failure mode analytics. We discover that different classes of DST models have clear strengths and weaknesses, where generation models are more promising for handling language variety while classification models are more robust to unseen entities.
reflect-thumbnail
Pei Zhou, Hyundong J. Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, Jay Pujara, Xiang Ren. EMNLP2022 [paper] [project page]
We introduce Reflect, a dataset that annotates dialogues with explicit CG and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data are rated as high quality and models trained using this data have even lower quality, while most Reflect responses are judged high quality. We also analyze whether CG can help models produce better-quality responses by using Reflect CG to guide RG models.
emnlp2021findings-thumbnail
Pei Zhou, Pegah Jandaghi, Hyundong J. Cho, Bill Yuchen Lin, Jay Pujara, Xiang Ren. EMNLP2021 Findings [paper]
We collect 6k annotated explanations justifying responses from four dialogue datasets and ask humans to verify them and propose two probing settings to evaluate response generation models' commonsense reasoning capabilities. Probing results show that response generation models fail to capture the logical relations between commonsense explanations and responses and fine-tuning on in-domain data and increasing model sizes do not lead to understanding of commonsense reasoning.
dialdoc2021-thumbnail
Hyundong J. Cho, Genevieve Bartlett, Marjorie Freedman. ACL2021 DialDoc Workshop. [paper]
We propose Puppeteer as a promising framework to combat social-engineering attacks by automatically responding to emails: a hybrid system that uses customizable probabilistic finite state transducers to orchestrate pushing agendas coupled with neural dialogue systems that generate responses to unexpected prompts. We emphasize the need for this system by highlighting each component's strengths and weaknesses and show how they complement each other.
spolin-thumbnail
Open-domain dialogue systems overlook an important phenomena that makes conversations engaging: grounding. We collect "Yes, and" type dialogue pairs from an improv podcast and existing dialogue corpora to create the Spontaneanation Pairs Of Learnable ImprovisatioN (SPOLIN) dataset. Human evaluation shows that models fine-tuned with SPOLIN generate more engaging results.

Preprints / Others

viola-thumbnail
Hyundong J. Cho, Basel Shbita, Kartik Shenoy, Shuai Liu, Nikhil Patel, Hitesh Pindikanti, Jennifer Lee, Jonathan May. Alexa Prize Socialbot Grand Challenge 4 Proceedings, 2021
We present Viola, an open-domain dialogue system based on a simple generate-and-rank approach. Viola fetches a batch of response candidates from various neural dialogue models and template-based generators and chooses the final response with a poly-encoder ranker fine-tuned with annotated Alexa conversation data.

News

Misc.

  • My pronouns are he, him, his.
  • I have been very fortunate to have lived in many different countries: Moscow, Russia; Oslo, Norway; Vienna, Austria; Abu Dhabi, UAE; Hong Kong; Tehran, Iran; Seoul, South Korea; Los Angeles, California. I look forward to living in new places and experiencing different cultures.
  • I love playing football⚽ and I am a huge fan of FC Barcelona. Visca el Barça!
  • I am a citizen of South Korea and the US. I am fluent in both English and Korean. I am teaching myself Mandarin with Duolingo and HSK level tests. 我希望能在不远的将来用流利的中文和你交流。