Long short-term memory

The Long Short-Term Memory (LSTM) cell can process data sequentially and keep its hidden state through time.

Long short-term memory (LSTM)[1] network is a recurrent neural network (RNN), aimed at dealing with the vanishing gradient problem[2] present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory".[1] It is applicable to classification, processing and predicting data based on time series, such as in handwriting,[3] speech recognition,[4][5] machine translation,[6][7] speech activity detection,[8] robot control,[9][10] video games,[11][12] and healthcare.[13]

A common LSTM unit is composed of a cell, an input gate, an output gate[14] and a forget gate.[15] The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. Forget gates decide what information to discard from a previous state by assigning a previous state, compared to a current input, a value between 0 and 1. A (rounded) value of 1 means to keep the information, and a value of 0 means to discard it. Input gates decide which pieces of new information to store in the current state, using the same system as forget gates. Output gates control which pieces of information in the current state to output by assigning a value from 0 to 1 to the information, considering the previous and current states. Selectively outputting relevant information from the current state allows the LSTM network to maintain useful, long-term dependencies to make predictions, both in current and future time-steps.

  1. ^ a b Cite error: The named reference lstm1997 was invoked but never defined (see the help page).
  2. ^ Cite error: The named reference hochreiter1991 was invoked but never defined (see the help page).
  3. ^ Cite error: The named reference graves2009 was invoked but never defined (see the help page).
  4. ^ Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). Archived from the original (PDF) on 2018-04-24.
  5. ^ Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL].
  6. ^ Cite error: The named reference GoogleTranslate was invoked but never defined (see the help page).
  7. ^ Cite error: The named reference FacebookTranslate was invoked but never defined (see the help page).
  8. ^ Sahidullah, Md; Patino, Jose; Cornell, Samuele; Yin, Ruiking; Sivasankaran, Sunit; Bredin, Herve; Korshunov, Pavel; Brutti, Alessio; Serizel, Romain; Vincent, Emmanuel; Evans, Nicholas; Marcel, Sebastien; Squartini, Stefano; Barras, Claude (2019-11-06). "The Speed Submission to DIHARD II: Contributions & Lessons Learned". arXiv:1911.02388 [eess.AS].
  9. ^ Cite error: The named reference mayer2006 was invoked but never defined (see the help page).
  10. ^ Cite error: The named reference OpenAIhand was invoked but never defined (see the help page).
  11. ^ Cite error: The named reference OpenAIfive was invoked but never defined (see the help page).
  12. ^ Cite error: The named reference alphastar was invoked but never defined (see the help page).
  13. ^ Cite error: The named reference decade2022 was invoked but never defined (see the help page).
  14. ^ Hochreiter, Sepp; Schmidhuber, Juergen (1996). LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems.
  15. ^ Cite error: The named reference lstm2000 was invoked but never defined (see the help page).