Member-only story

Teacher Forcing — Summary

Lesser know, but very important, training technique in NLP

Nishu Jain
2 min readDec 20, 2022
Photo by airfocus on Unsplash

Teacher forcing is a training technique used in machine learning and natural language processing to improve the performance of recurrent neural networks (RNNs). It involves using the correct output sequence as the input to the RNN at each time step during training, rather than using the predicted output of the RNN as the input.

Here’s an example of how teacher forcing works in the context of machine translation:

  1. The input to the RNN is a source language sentence, and the desired output is the corresponding translation in the target language.
  2. During training, the RNN processes the source language sentence one word at a time and predicts the next word in the target language translation at each time step.
  3. With teacher forcing, the correct target language translation is used as the input to the RNN at each time step, rather than the predicted output of the RNN. This allows the RNN to learn from the correct output sequence, rather than its own predictions.
  4. When the RNN has processed the entire input sentence, the predicted output sequence is compared to the correct output sequence, and the error is used to update the RNN’s weights and biases.

Teacher forcing can improve the performance of RNNs by providing a strong training signal and reducing the error accumulation that can occur when the RNN’s own predictions are used as the input.

However, it can also lead to overfitting and reduced generalization, as the RNN is not exposed to the same distribution of inputs during training as it is during inference. Therefore, it is important to carefully consider whether to use teacher forcing in a given application.

--

--

Nishu Jain
Nishu Jain

Written by Nishu Jain

Obsessed with Tech & Biz | SaaS startup guy | Engineer + Wordsmith | My Medium Portfolio: https://mymedium.info/@nishu-jain

No responses yet