Profile

M.Sc. Paul Voigtlaender
Room 127
Phone: +49 241 80 20 767
Fax: +49 241 80 22 731
Email: voigtlaender@vision.rwth-aachen.de

My CV



Publications


Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilia Kulikov, Ralf Schlüter, Hermann Ney
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, March 2017

In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns.

» Show BibTeX
@inproceedings{doetsch2017returnn, title={RETURNN: the RWTH extensible training framework for universal recurrent neural networks}, author={Doetsch, Patrick and Zeyer, Albert and Voigtlaender, Paul and Kulikov, Ilya and Schl{\"u}ter, Ralf and Ney, Hermann}, booktitle={IEEE International Conference on Acoustics, Speech, and Signal Processing}, year={2017}, month=mar, pages={5345--5349} }





Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, March 2017

Recent experiments show that deep bidirectional long short-term memory (BLSTM) recurrent neural network acoustic models outperform feedforward neural networks for automatic speech recognition (ASR). However, their training requires a lot of tuning and experience. In this work, we provide a comprehensive overview over various BLSTM training aspects and their interplay within ASR, which has been missing so far in the literature. We investigate on different variants of optimization methods, batching, truncated backpropagation, and regularization techniques such as dropout, and we study the effect of size and depth, training models of up to 10 layers. This includes a comparison of computation times vs. recognition performance. Furthermore, we introduce a pretraining scheme for LSTMs with layer-wise construction of the network showing good improvements especially for deep networks. The experimental analysis mainly was performed on the Quaero task, with additional results on Switchboard. The best BLSTM model gave a relative improvement in word error rate of over 15% compared to our best feed-forward baseline on our Quaero 50h task. All experiments were done using RETURNN and RASR, RWTH’s extensible training framework for universal recurrent neural networks and ASR toolkit. The training configuration files are publicly available.

» Show BibTeX
@inproceedings{zeyer2017lstm, title={A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition}, author={Zeyer, Albert and Doetsch, Patrick and Voigtlaender, Paul and Schl{\"u}ter, Ralf and Ney, Hermann}, booktitle={IEEE International Conference on Acoustics, Speech, and Signal Processing}, year={2017}, month=mar, pages={2462--2466} }





Paul Voigtlaender, Patrick Doetsch, Hermann Ney
International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, October 2016, IAPR Best Student Paper Award

Multidimensional long short-term memory recurrent neural networks achieve impressive results for handwriting recognition. However, with current CPU-based implementations, their training is very expensive and thus their capacity has so far been limited. We release an efficient GPU-based implementation which greatly reduces training times by processing the input in a diagonal-wise fashion. We use this implementation to explore deeper and wider architectures than previously used for handwriting recognition and show that especially the depth plays an important role. We outperform state of the art results on two databases with a deep multidimensional network.

» Show BibTeX
@InProceedings { voigtlaender16:mdlstm, author= {Voigtlaender, Paul and Doetsch, Patrick and Ney, Hermann}, title= {Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks}, booktitle= {International Conference on Frontiers in Handwriting Recognition}, year= 2016, pages= {228-233}, address= {Shenzhen, China}, month= oct, note= {IAPR Best Student Paper Award}, booktitlelink= {http://www.nlpr.ia.ac.cn/icfhr2016/} }





Paul Voigtlaender, Patrick Doetsch, Simon Wiesler, Ralf Schlüter, Hermann Ney
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia, April 2015

We investigate sequence-discriminative training of long short-term memory recurrent neural networks using the maximum mutual information criterion. We show that although recurrent neural networks already make use of the whole observation sequence and are able to incorporate more contextual information than feed forward networks, their performance can be improved with sequence-discriminative training. Experiments are performed on two publicly available handwriting recognition tasks containing English and French handwriting. On the English corpus, we obtain a relative improvement in WER of over 11% with maximum mutual information (MMI) training compared to cross-entropy training. On the French corpus, we observed that it is necessary to interpolate the MMI objective function with cross-entropy.

» Show BibTeX
@InProceedings { voigtlaender2015:seq, author= {Voigtlaender, Paul and Doetsch, Patrick and Wiesler, Simon and Schlüter, Ralf and Ney, Hermann}, title= {Sequence-Discriminative Training of Recurrent Neural Networks}, booktitle= {IEEE International Conference on Acoustics, Speech, and Signal Processing}, year= 2015, pages= {2100-2104}, address= {Brisbane, Australia}, month= apr, booktitlelink= {http://icassp2015.org/} }




Disclaimer Home Visual Computing institute RWTH Aachen University