A Neural Attention Model For Abstractive Sentence Summarization.

For natural language, Summarization plays a vital role for understanding or interpretation of our knowledge that is how we perceive the information given in a document. Basic crux of summarization is to get a condensed representation of an input text which revolves around the core meaning of the original document.

There have been many summarization mechanisms which uses “extractive approaches” to get crux of the document but during that process and it does so by cropping out and stitching together portions of the input text to get a condensed version of it. Henceforth, this paper titled “A Neural Attention Model For Abstractive Sentence Summarization” beautifully focuses on the task of sentence-level summarization. Underlying techniques, which it uses are neural language model with contextual input encoder. To, the approach which is devised; is called as “Attention-Based Summarization”. Below is the heatmap, which illustrates a soft alignment between the input which is a sentence (on the right) and generated summary(on the top of it) by using ABS model.

Let’s understand, what all features do we need for Summarization? We know that for an input sentence,our goal is to produce it’s condensed summary and for that we need a scoring function, we need to define our inputs, indicator function for every input and our constraints on vocabulary. Let’s have an input sentence S, which comprises of say, M words labelled from x1, . . . , xM with a fixed vocabulary of size |V|. And if we take an indicator vector say, xi ∈ {0, 1} V for i ∈ {1, . . . , M}, for every word in a sentence then X will be a set of possible inputs. Basically we need this because it reflects which word should be added to summary or not; in simple terms we define a scoring function for every word which states its importance in relevance to summarization.

But, what are the problems while generating summaries? As the summarizer takes an input sentence S of M words, giving a summarized sentence say, S’ of N words then clearly N < M.But, we have a set of summarized sentences S’ of one given sentence S, then as the topic goes by “abstractive sentence summarization” so it will find the optimal sequence of summarized sentence from S’ under a scoring function but it not be necessary that summarized sentence is grammatically efficient. “Abstractive Sentence Summary” does closely refers to sentence compression problem which concentrates on deleting words from input hence, hard constraints need to follow.

Now, to delve deeper, let’s see how to build the neural model for language processing?We need to estimate contextual probability for every next word, that is whether the next word is in relevance of context or not whether that word is a stopword or maybe embellishment; we just need to weed that out. For this we need to use Deep Learning models - Convolutional Encoder and Attention-Based Encoder. What? Where? Why? These questions, you must be thinking when you saw CE(Convolutional Encoder) and ABE(Attention-Based Encoder) that what is going on? Hold on, dear readers pack up your seat belts; get ready to see dark-black magic of these models! CE - Convolutional Encoder : It is just an architecture which works on the bag-of-words and through black-magic gives local interactions between words so that semantically they bring out compressed “meaning” of a sentence. ABE - Attention-Based Encoder : It is neural machine translation model which assign weights to words which were processed by CE to get smoothed version of summarizing text from input text. Both these models work hand-in-hand to generate abstractive summarization.Below pictorial representation gives a birds-eye-view of encoder-decoder architecture for abstractive summarization method :

Let’s see the power of this model!?

So, this models works on small datasets to minimize error rate and generates input-summary pair by removing negative log-likelihood of a set of summaries and picks the one which suites the best. Below are certain heuristic filters for input-summary generation.
- Are there no non-stop words in common?
- Does the title contain a byline or other extraneous editing marks?
- Does the title have a question mark or colon?
Further, below certain steps taken for preprocessing dataset:
- Headline and input text were both lowercased and tokenized.
- All digits were replaced with ‘#’ symbol.
- Pruned less frequent words i.e. words whose frequency is less than 5.
- Stopwords were removed.
Finally the dataset comprises of 119 million word tokens with 110K unique word types.

In nutshell, this neural attention based model is very effective for abstractive summarization and extremely useful for headline-generation but it too has its pros and cons!.

Let’s see below example to get better glimpse of output.
In short, let's do it’s dissection and see where its pitfalls are : Firstly, this model needs a lot of improvement in grammatical summarization for the input text particularly in paragraph-level summaries. Secondly, this model needs to work over multi-paragraph to produce longer summaries as of now, it works only on smaller dataset or approx 500 words on training dataset, hence it needs to scale up for this issue.

Links:

https://arxiv.org/pdf/1509.00685.pdf

Amalgam

Search This Blog

A Neural Attention Model For Abstractive Sentence Summarization.

Comments

Post a Comment

Popular posts from this blog

NLP in Video Games

Discourse Analysis

Word embeddings and an application in SMT