Introduction
In today's world of information overload, the significance of text summarization cannot be overstated. The overwhelming influx of information necessitates efficient techniques to distil large volumes of text while retaining the essence of the content of the text. In my previous article, we implemented extractive text summarization. Today, we will be looking at implementing abstractive summarization in Python.
Before delving into abstractive summarization, it is imperative to grasp the concept of extractive summarization and acknowledge its inherent limitations.
Extractive Summarization and its limitations
If you haven’t read the first part of this blog, here is a short overview of my previous article:
Extractive summarization is an approach of summarization that involves extracting important sentences from a large form of text, generally based on the frequency of relevant words in the sentence. In extractive summary, new sentences are not generated; rather, the most important sentence or phrases are used to create the summary, shortening the actual length of the summary.
When generating extractive summarisation, one of our main assumptions is that relevant words in textual content will appear relatively more. But this may not always be the case.
Abstractive Summarization
In extractive summarization, the summary is generated by selecting important sentences based on their word count scores without creating new sentences. Conversely, abstractive summarization involves the generation of summary sentences that may not be directly present in the original text. This stands in contrast to extractive summarization, where the summary is formed by identifying and utilizing sentences with the highest relevance scores determined by word significance, as discussed in the previous article.
Abstractive summarizers understand the semantics of the textual content much like how we, as humans, summarize the content. We don't calculate the importance of a sentence based on important word frequencies but rather understand what the textual content is trying to convey and use vocabulary to create words and phrases that may not be present in the original content. Abstractive summarizers try to understand the semantics of the textual content using NLP methods and generate textual content. While extractive summarization merely copies important fragments from the original textual content, abstractive summaries are linguistically fluent and convey the principal information of the content.
When it comes to abstractive summarization, the ML model itself does all of the heavy lifting. So, we will also try to understand how the model generates the summary for understanding abstractive summarization in detail with its implementation.
Transformers
Before diving into summarization, we have to first understand how transformers work. Transformers are at the heart of everything exciting in AI right now. The transformer model is a neural network architecture introduced in the paper Attention is All You Need by Vaswani et al. in 2017. They are designed to handle sequential data with the ability to keep the context of previous outputs to generate meaningful outputs. Transformers were introduced to address the limitations of traditional sequence-to-sequence models. It is an attention-based encoder-decoder type architecture. Language models like ChatGPT, Bert, etc., are all based on transformers.
Self-Attention Mechanism
Self-attention is the heart of the transformer model, which allows the model to weigh the importance of different words in a sentence. This mechanism allows the model to take previous words/tokens into text when deriving the importance of a word. It helps the model carefully calculate each word's relevance based on the previous tokens rather than merely looking at its frequency. To further enhance the model's ability to capture diverse patterns, self-attention is usually implemented as multi-head attention, which involves running a self-attention mechanism in parallel multiple times.
Encoder-Decoder Architecture
Each encoder layer employs self-attention mechanisms to weigh the different words in the input sequence, enabling the model to consider the context of previous outputs. The encoder takes input sentences and produces fixed-size vector representation. The decoder uses self-attention, where the attention mechanism is applied to the output of the encoder and the input of the decoder.
Summarization Model
We will use the BART model, which was pre-trained in English and fine-tuned on CNN Daily Mail and introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al.
Code
Firstly, we will create a virtual environment which lets us have a separate stable, reproducible and portable environment.
pip install virtualenv
virutalenv env
source env/bin/activate
Now, let's install the transformers
and load it.
pip install transformers
Now, we will chunk our main content into smaller sections to ensure our input is manageable for the summarization model and also add <eos>
at the end of the sentences to ensure the model recognizes the boundary between sentences.
Now, let's summarize the chunks and join the summary.
Conclusion
To conclude, we've successfully understood how abstractive text summarization is generated and implemented abstractive summarization in Python, ending the second part of our Text Summarization article.
Thank you for reading this article. Please subscribe for more such articles.