Papperoni

2019

A BERT Baseline for the Natural Questions

C. Alberti, K. Lee, Michael Collins

citations

Cite Score

AI summary

This paper introduces a BERT-based model for the Natural Questions dataset, achieving a 30% and 50% relative reduction in the gap between model F1 scores and the human upper bound for the long and short answer tasks, respectively.

Main Contributions

Introduces a BERT-based model for the Natural Questions dataset.
Jointly predicts short and long answers in a single model rather than using a pipeline approach.
Splits each document into multiple training instances by using overlapping windows of tokens.
Aggressively downsamples null instances at training time to create a balanced training set.
Uses the “[CLS]” token at training time to predict null instances and rank spans at inference time by the difference between the span score and the "[CLS]" score.

Abstract

This technical note describes a new baseline for the Natural Questions (Kwiatkowski et al., 2019). Our model is based on BERT (Devlin et al., 2018) and reduces the gap between the model F1 scores reported in the original dataset paper and the human upper bound by 30% and 50% relative for the long and short answer tasks respectively. This baseline has been submitted to the official NQ leaderboard. Code, preprocessed data and pretrained model are available.

Citation Graph

Loading graph...

References [9]

Sort:

Filter:

[1]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Google Scholar

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[2]BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Google Scholar

Simply amazing. It's very impressive how they make a leap vs. existing stuff (you can see from the references, pretty much no one is doing what they are doing, other than GPT)

[3]SQuAD: 100,000+ Questions for Machine Comprehension of Text

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Google Scholar

Nice paper that introduced an important dataset. Not much else though.

[4]Natural Questions: A Benchmark for Question Answering Research

T. Kwiatkowski, J. Palomaki, O. Rhinehart, Michael Collins, A. P. Parikh, C. Alberti, D. Epstein, Illia Polosukhin, M. Kelcey, Jacob Devlin, K. Lee, K. N. Toutanova, Llion Jones, M. W. Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov - 2019

9 papers in library cite

Google Scholar

The dataset and methodology is very nice - it's amazing to see how Google does the summaries in search. However, the paper is too complex with the math stuff - unnecessary.

[5]Know What You Don't Know: Un-Answerable Questions for SQuAD

P. Rajpurkar, R. Jia, Percy Liang - 2018

14 papers in library cite

Google Scholar

It's alright... It's an extension to the other paper/dataset. I feel that it didn't need to be a full paper (maybe a 6-pager).

[6]A Decomposable Attention Model for Natural Language Inference

A. P. Parikh, O. Tackstrom, Dipanjan Das, Jakob Uszkoreit - 2016

11 papers in library cite

Google Scholar

Very nice alternative to the common LSTM encoder-decoder architecture! Seems similar o the Transformers arch in the sense that they don't use RNNs. Nice that they analyze computational complexity as well.

[7]CoQA: A Conversational Question Answering Challenge

Siva Reddy, Deli Chen, Christopher D. Manning - 2018

6 papers in library cite

Google Scholar

It's a fine paper and a solid addition to QA data + NLU.

[8]Simple and Effective Multi-Paragraph Reading Comprehension

C. Clark, Matt Gardner - 2017

7 papers in library cite

Google Scholar

Very nice paper! I think it's a stretch to call it "simple", but the paper is very well written and easy to follow.

[9]Reading Wikipedia to Answer Open-Domain Questions

Deli Chen, Adam Fisch, Jason Weston, Antoine Bordes - 2017

10 papers in library cite

Google Scholar

Open Domain QA with wikipedia

Cited by

papers in your library

Cites

papers in your library

Read

on November 13, 2025

It's very simple and short, but it's nice that it set a baseline.