AI Technology News

Your Daily Dose of AI Innovation & Insights

Demystifying Attention Mechanism: Revolutionising Encoder-Decoder Architectures | by Navdeep Sharma | Jan, 2025

Demystifying Attention Mechanism: Revolutionising Encoder-Decoder Architectures | by Navdeep Sharma | Jan, 2025

Put Ads For Free On FiverrClerks.com

Greetings!

In the previous blog Demystifying Encoder-Decoder Architecture: The Backbone of Sequence-to-Sequence Learning we studied there were 2 main problems with our traditional Encoder-Decoder architecture. One, if our input sequence is very large then we are putting load on the context vector to store the entire summary. Second, we are feeding a static input (Context Vector) to the decoder to generate the output (translation in our case). Let us relate this to how humans translate data from one language to another. If we are given a large paragraph (10–15 lines) and are then asked to translate it by reading it just once then we will not be able to do it as well. To translate a large text what we will do is give attention to a specific part of the text, translate it and move forward. Next we will find which part of the input sentence is useful to translate the input sentence further, pay attention to it, translate it and move further. This is how we will ideally translate a large paragraph. Let us use this mechanism and try to implement it in our traditional Encoder-Decoder architecture and improve its efficiency for larger inputs.

Addition of attention input in decoder part of the architecture

In the diagram above unlike traditional encoder-decoder architecture the LSTM cell in the decoder is receiving one extra input if the form of ci. The

Put Ads For Free On FiverrClerks.com

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Website by EzeSavers.
error: Content is protected !!