2019

A BERT Baseline for the Natural Questions

C. Alberti, K. Lee, Michael Collins

citations

Cite Score

8

AI summary

This paper introduces a BERT-based model for the Natural Questions dataset, achieving a 30% and 50% relative reduction in the gap between model F1 scores and the human upper bound for the long and short answer tasks, respectively.

Main Contributions

  • Introduces a BERT-based model for the Natural Questions dataset.
  • Jointly predicts short and long answers in a single model rather than using a pipeline approach.
  • Splits each document into multiple training instances by using overlapping windows of tokens.
  • Aggressively downsamples null instances at training time to create a balanced training set.
  • Uses the “[CLS]” token at training time to predict null instances and rank spans at inference time by the difference between the span score and the "[CLS]" score.

Abstract

This technical note describes a new baseline for the Natural Questions (Kwiatkowski et al., 2019). Our model is based on BERT (Devlin et al., 2018) and reduces the gap between the model F1 scores reported in the original dataset paper and the human upper bound by 30% and 50% relative for the long and short answer tasks respectively. This baseline has been submitted to the official NQ leaderboard. Code, preprocessed data and pretrained model are available.

Citation Graph

Loading graph...

References [9]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

T. Kwiatkowski, J. Palomaki, O. Rhinehart, Michael Collins, A. P. Parikh, C. Alberti, D. Epstein, Illia Polosukhin, M. Kelcey, Jacob Devlin, K. Lee, K. N. Toutanova, Llion Jones, M. W. Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov - 2019

9 papers in library cite

P. Rajpurkar, R. Jia, Percy Liang - 2018

14 papers in library cite

A. P. Parikh, O. Tackstrom, Dipanjan Das, Jakob Uszkoreit - 2016

11 papers in library cite

Siva Reddy, Deli Chen, Christopher D. Manning - 2018

6 papers in library cite

C. Clark, Matt Gardner - 2017

7 papers in library cite

Deli Chen, Adam Fisch, Jason Weston, Antoine Bordes - 2017

10 papers in library cite

Cited by

2

papers in your library

Cites

9

papers in your library

Read

on November 13, 2025

Your review

Tags

Paper Aliases

No aliases