Skip to main content

A Neural Attention Model For Abstractive Sentence Summarization.

For natural language, Summarization plays a vital role for understanding or interpretation of our knowledge that is how we perceive the information given in a document. Basic crux of summarization is to get a condensed representation of an input text which revolves around the core meaning of the original document. 
There have been many summarization mechanisms which uses “extractive approaches” to get crux of the document but during that process and it does so by cropping out and stitching together portions of the input text to get a condensed version of it. Henceforth, this paper titled A Neural Attention Model For Abstractive Sentence Summarization beautifully focuses on the task of sentence-level summarization. Underlying techniques, which it uses are neural language model with contextual input encoder. To, the approach which is devised; is called as “Attention-Based Summarization”. Below is the heatmap, which illustrates a soft alignment between the input which is a sentence (on the right) and generated summary(on the top of it) by using ABS model. 

  • Let’s understand, what all features do we need for Summarization? We know that for an input sentence,our goal is to produce it’s condensed summary and for that we need a scoring function, we need to define our inputs, indicator function for every input and our constraints on vocabulary. Let’s have an input sentence S, which comprises of say, M words labelled from x1, . . . , xM with a fixed vocabulary of size |V|. And if we take an indicator vector say, xi ∈ {0, 1} V for i ∈ {1, . . . , M}, for every word in a sentence then X will be a set of possible inputs. Basically we need this because it reflects which word should be added to summary or not; in simple terms we define a scoring function for every word which states its importance in relevance to summarization. 
  • But, what are the problems while generating summaries? As the summarizer takes an input sentence S of M words, giving a summarized sentence say, S’ of N words then clearly N < M.But, we have a set of summarized sentences S’ of one given sentence S, then as the topic goes by “abstractive sentence summarization” so it will find the optimal sequence of summarized sentence from S’ under a scoring function but it not be necessary that summarized sentence is grammatically efficient. “Abstractive Sentence Summary” does closely refers to sentence compression problem which concentrates on deleting words from input hence, hard constraints need to follow.
  • Now, to delve deeper, let’s see how to build the neural model for language processing?We need to estimate contextual probability for every next word, that is whether the next word is in relevance of context or not whether that word is a stopword or maybe embellishment; we just need to weed that out. For this we need to use Deep Learning models - Convolutional Encoder and Attention-Based Encoder. What? Where? Why? These questions, you must be thinking when you saw CE(Convolutional Encoder) and ABE(Attention-Based Encoder) that what is going on? Hold on, dear readers pack up your seat belts; get ready to see dark-black magic of these models! CE - Convolutional Encoder : It is just an architecture which works on the bag-of-words and through black-magic gives local interactions between words so that semantically they bring out compressed “meaning” of a sentence. ABE - Attention-Based Encoder : It is neural machine translation model which assign weights to words which were processed by CE to get smoothed version of summarizing text from input text. Both these models work hand-in-hand to generate abstractive summarization.Below pictorial representation gives a birds-eye-view of encoder-decoder architecture for abstractive summarization method :

  • Let’s see the power of this model!?
    So, this models works on small datasets to minimize error rate and generates input-summary pair by removing negative log-likelihood of a set of summaries and picks the one which suites the best. Below are certain heuristic filters for input-summary generation.
    • Are there no non-stop words in common?
    • Does the title contain a byline or other extraneous editing marks?
    • Does the title have a question mark or colon?
    Further, below certain steps taken for preprocessing dataset:
    • Headline and input text were both lowercased and tokenized.
    • All digits were replaced with ‘#’ symbol.
    • Pruned less frequent words i.e. words whose frequency is less than 5.
    • Stopwords were removed.
    Finally the dataset comprises of 119 million word tokens with 110K unique word types.

    In nutshell, this neural attention based model is very effective for abstractive summarization and extremely useful for headline-generation but it too has its pros and cons!.
    Let’s see below example to get better glimpse of output.
  • In short, let's do it’s dissection and see where its pitfalls are : Firstly, this model needs a lot of improvement in grammatical summarization for the input text particularly in paragraph-level summaries. Secondly, this model needs to work over multi-paragraph to produce longer summaries as of now, it works only on smaller dataset or approx 500 words on training dataset, hence it needs to scale up for this issue.
Links: 

Comments

Popular posts from this blog

NLP in Video Games

From the last few decades, NLP (Natural Language Processing) has obtained a high level of success in the field  of Computer Science, Artificial Intelligence and Computational Logistics. NLP can also be used in video games, in fact, it is very interesting to use NLP in video games, as we can see games like Serious Games includes Communication aspects. In video games, the communication includes linguistic information that is passed either through spoken content or written content. Now the question is why and where can we use NLP in video games?  There are some games that are related to pedagogy or teaching (Serious Games). So, NLP can be used in these games to achieve these objectives in the real sense. In other games, one can use the speech control using NLP so that the player can play the game by concentrating only on visuals rather on I/O. These things at last increases the realism of the game. Hence, this is the reason for using NLP in games.  We can use NLP to impr

Discourse Analysis

NLP makes machine to understand human language but we are facing issues like word ambiguity, sarcastic sentiments analysis and many more. One of the issue is to predict correctly relation between words like " Patrick went to the club on last Friday. He met Richard ." Here, ' He' refers to 'Patrick'. This kind of issue makes Discourse analysis one of the important applications of Natural Language Processing. What is Discourse Analysis ? The word discourse in linguistic terms means language in use. Discourse analysis may be defined as the process of performing text or language analysis, which involves text interpretation and knowing the social interactions. Discourse analysis may involve dealing with morphemes, n-grams, tenses, verbal aspects, page layouts, and so on. It is often used to refer to the analysis of conversations or verbal discourse. It is useful for performing tasks, like A naphora Resolution (AR) , Named Entity Recognition (NE

Coreference Resolution and Applications in NLP

In computational linguistics and natural language processing coreference resolution (CR) is an avidly studies problem in discourse which has managed to be only partially solved by the state of the art and consequently remain one of the most exciting open problems in this field. Introduction and Definition The process of linking together mentions of a particular entity in a speech or text excerpt that related to real world entities is termed as coreference resolution. This process identifies the dependence between a phrase with the rest of the sentence or other sentences in the text.  This is an integral part of natural languages to avoid repetition, demonstrate possession/relation etc. A basic example to illustrate the above definition is given below : Another example which uses elements from popular fiction literature : Harry  wouldn’t bother to read “ Hogwarts: A History ” as long as  Hermione  is around.  He  knows  she  knows  the book  by heart. The different type