2011

Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR

M. Shaik, A. Mousa, R. Schluter, Hermann Ney

citations

Cite Score

2

AI summary

This paper introduces a hybrid language model for German LVCSR using mixed sub-lexical units, including morphemes, syllables, and graphones, achieving a 5.0% relative reduction in Word Error Rate and recognizing 40% of Out-of-Vocabulary words, demonstrating improved lexical coverage and recognition performance.

Main Contributions

  • Introduces a novel hybrid language model for German LVCSR using a mixture of full-words, morphemes, and graphones.
  • Demonstrates improved lexical coverage by combining different types of sub-lexical units.
  • Achieves a 5.0% relative reduction in Word Error Rate compared to a traditional full-words system.
  • Recognizes around 40% of Out-of-Vocabulary (OOV) words.
  • Shows that morpheme-based units outperform syllable-based units for German LVCSR.

Abstract

German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary Continuous Speech Recognition (LVCSR) becomes a natural choice. In this paper, we investigate the use of mixed types of sub-lexical units in the same recognition lexicon. Namely, morphemic or syllabic units combined with pronunciations called graphones, normal graphemic morphemes or syllables along with full-words. This mixture of units is used for building hybrid LMs suitable for open vocabulary LVCSR where the system operates over an open, constantly changing vocabulary like in broadcast news, political debates, etc. A relative reduction of around 5.0% in Word Error Rate (WER) is obtained compared to a traditional full-words system. Moreover, around 40% of the OOVs are recognized.

Citation Graph

Loading graph...

References [23]

Sort:
Filter:

Andreas Stolcke - 2002

13 papers in library cite

M. A. Decker - 2003

1 paper in library cites

R. I. Damper, Y. Marchand, J. D. Marsters, A. Bazin - 2004

1 paper in library cites

R. Ordelman, A. V. Hassen, F. D. Jong - 2003

1 paper in library cites

A. Berton, P. Fetter, P. R. Brietzmann - 1996

1 paper in library cites

K. Kohler - 1995

1 paper in library cites

A. E. Desoky, C. Gollan, D. Rybach, R. Schluter, Hermann Ney - 2009

1 paper in library cites

M. Bisani, Hermann Ney - 2008

1 paper in library cites

T. Rotovnik, M. S. Maucec, Z. V. C. K. Civ - 2007

1 paper in library cites

T. Kemp, A. Jusek - 1996

1 paper in library cites

M. Creutz, T. Hirsimaki, M. Kurimo, A. Puurula, J. P. ". Onen, V. Siivola, M. Varjokallio, E. Arisoy, M. Saraclar, Andreas Stolcke - 2007

1 paper in library cites

W. Byrne, J. Hajic, P. Ircing, P. Krbec, J. Psutka - 2000

1 paper in library cites

M. A. Decker, G. Adda - 2000

1 paper in library cites

M. Bisani, Hermann Ney - 2005

1 paper in library cites

B. Xu, B. Ma, S. Zhang, F. Qu, T. Huang - 1996

1 paper in library cites

J. Kneissler, D. Klakow - 2001

1 paper in library cites

A. E. Desoky, M. Shaik, R. Schluter, Hermann Ney - 2010

1 paper in library cites

C. Schrumpf, M. Larson, S. Eickeler - 2005

1 paper in library cites

M. Creutz, K. Lagus - 2005

1 paper in library cites

B. Mobius - 1998

1 paper in library cites

Cited by

1

papers in your library

Cites

1

papers in your library

Read

on June 21, 2025

Your review

Tags

Paper Aliases

No aliases