Papperoni

2015

A Neural Conversational Model

Oriol Vinyals, Quoc V. Le

Open PDF Google Scholar

citations

Cite Score

59

AI summary

This paper introduces a neural conversational model using the sequence-to-sequence framework, demonstrating its ability to generate simple conversations, extract knowledge from domain-specific and open-domain datasets, and perform basic common sense reasoning, though it suffers from consistency issues.

Main Contributions

Proposes a simple neural conversational model based on the sequence-to-sequence framework.
Demonstrates the model's ability to generate simple conversations from large conversational datasets.
Shows the model can extract knowledge from both domain-specific and open-domain datasets.
The model can perform simple forms of common sense reasoning.
Highlights the lack of consistency as a common failure mode of the model.

Abstract

Conversational modeling is an important task in natural language understanding and machine intelligence. Although previous approaches exist, they are often restricted to specific domains (e.g., booking an airline ticket) and require hand-crafted rules. In this paper, we present a simple approach for this task which uses the recently proposed sequence to sequence framework. Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation. The strength of our model is that it can be trained end-to-end and thus requires much fewer hand-crafted rules. We find that this straightforward model can generate simple conversations given a large conversational training dataset. Our preliminary results suggest that, despite optimizing the wrong objective function, the model is able to converse well. It is able extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles. On a domain-specific IT helpdesk dataset, the model can find a solution to a technical problem via conversations. On a noisy open-domain movie transcript dataset, the model can perform simple forms of common sense reasoning. As expected, we also find that the lack of consistency is a common failure mode of our model.

Citation Graph

Loading graph...

References [18]

Sort:

Filter:

[1]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[2]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[3]Computing Machinery and Intelligence

A. M. Turing - 1950

8 papers in library cite

A must-read, but it gets a bit boring halfway through (as he is describing every counter argument).

[4]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[5]A Neural Probabilistic Language Model

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

What started it all. Very simple and elegant.

[6]Show and Tell: A Neural Image Caption Generator

Dumitru Erhan - 2015

11 papers in library cite

It's nice and they beat a ton of SotA. However, I read the one that uses attention first so this is a bit less surprising.

[7]Recurrent Neural Network Based Language Model

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

The comeback of RNNs for language modeling. Not too exciting but impactful and a short read.

[8]Recurrent Continuous Translation Models

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Good paper, probably the first that used an encoder-decoder. But they used a conv. NN instead of a tradicional decoder, which I don't really like.

[9]On Using Very Large Target Vocabulary for Neural Machine Translation

Yoshua Bengio - 2014

12 papers in library cite

It's nice, but it starts getting a bit into the realm of "yeah, that seems like a minor improvement". It's nice that they use the importance sampling stuff from the previous paper though - I thought it had completely vanished :)

[10]Grammar as a Foreign Language

Geoffrey Hinton - 2015

9 papers in library cite

It's a nice paper showing that attention can be used for parsing. However, parsing is boring and is very derivative. Good paper nonetheless.

[11]Addressing the Rare Word Problem in Neural Machine Translation

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

The method was very poorly explained. It was also worse than a paper released sooner, and more complicated. Overall not that good.

[12]Neural Responding Machine for Short-Text Conversation

L. Shang, Z. L. Lu, H. Li - 2015

2 papers in library cite

Conversation bots

[13]A Neural Network Approach to Context-Sensitive Generation of Conversational Responses

A. Sordoni, M. Galley, Michael Auli, Chris Brockett, Yangfeng Ji, M. Mitchell, J. Y. Nie, Jianfeng Gao, B. Dolan - 2015

4 papers in library cite

Generating conversational responses

[14]Statistical Language Models Based on Neural Networks

Tomas Mikolov - 2012

17 papers in library cite

Mikolov's Thesis

[15]Conversational Agents

J. Lester, K. Branting, B. Mott - 2004

1 paper in library cites

[16]Creating a Dynamic Speech Dialogue

T. Will - 2007

1 paper in library cites

[17]News From OPUS - A Collection of Multi-Lingual Parallel corpora With Tools and Interfaces

J. Tiedemann - 2009

1 paper in library cites

[18]Speech and Language Processing

Dan Jurafsky, J. Martin - 2009

1 paper in library cites

Cited by

7

papers in your library

Cites

13

papers in your library

Read

on October 30, 2025

No new methodology, no measurements, no results. It should be like 4 pages long if they didn't fill in with the conversation logs.

Tags

Paper Aliases

No aliases