BOW creates a dictionary of unique d words in a corpus (collection of all tokens in the data). For example, the corpus in the image above consists of all the words of the sentences S1 and S2.


What does NLP stand for Crypto WiKi
What does NLP stand for

Long / short term memory (LSTM) networks try to combat the fading gradient problem by introducing gates and introducing a memory location. Each neuron is a memory cell with three gates: input, output and forget. These gates act as bodyguards for information, allowing or preventing the flow of information.


LSTMs have been shown to be able to learn from complex sequences and, for example, write in the style of Shakespeare or compose primitive music. Note that each of the gates is connected to a cell on the previous neuron with a certain weight, which requires more resources to operate. LSTMs are common and used in machine translation. In addition, it is the standard model for most sequence labeling tasks that consist of large amounts of data: doctranslator is a good example.

Machine translation

Gated recurrent units (hereinafter GRU) differ from LSTM, although they are also an extension for neural network machine learning. In GRU, there is one less gate, and the work is structured differently: instead of entering, leaving and forgetting, there is an update gate. It determines how much information needs to be saved from the last state and how much information to skip from the previous layers.

The reset gate functions are similar to the LSTM's forget gate, but the location is different. GRUs always transmit their full state, do not have an output gate. Often these shutters function like LSTMs, however, the big difference is that in GRUs the shutter is faster and easier to operate (but also less interpretable). In practice, they tend to neutralize each other, since a large neural network is needed to restore expressiveness, which negates the gains as a result. But in cases where extra expressiveness is not required, GRUs show better results than LSTMs.