HEADLINE GENERATION

Natural Language Processing is gaining importance gradually in almost all walks of life ranging from industries like financial, health care to day to day activities like searching on search engines, chatting with automated chat bots and so on. Another major application of NLP is in text summarization which is still a hot topic and being worked upon vibrantly.

Source: http://bit.ly/2y0HFMt

Many techniques have been worked upon for summarization but to retrieve a summary of less than a sentence (heading probably) has not been much worked upon and is gaining more popularity.

There can be many applications of finding short summaries of texts of less than a sentence. Some of them are:

· Generating Headlines automatically for newspaper stories.

· Generating table of contents for a document.

Some of the techniques touched upon to achieve this are as follows:

1. Context Free Grammars(CFGs):

Generating headlines through CFG involves two steps:

· Extracting sentence from the content-This involves various other steps like normalizing the text followed by feature extraction and sentence ranking.

· Headline generation- After the sentences have been extracted based on their scores, content words from them are extracted which represent the entire text. Finally the headings are generated using rules of CFG.

2. Recurrent Neural Networks(RNN): Neural Nets have been used in this field earlier as well but RNN's have increased the efficiency manifold. The RNN encoder-decoder technique is used with LSTM (Long short-term memory)elements.

Source: http://bit.ly/2yycPeE

A huge dataset is required for RNNs so that they give a new correct output for an input which wasn’t present in the training set.

Attention is a technique which is used to understand which word is to be given more attention during headline formation.

Two types of attention techniques may be used: Simple Attention and Complex Attention. Although Simple Attention tends to give a better results in this.

3. Statistical models: Statistical methods involve two steps:-

· Content selection-By this the model selects the content to be present in the summary. One of the basic models is “zero-level” model .In this model, the probability of a sentence to be in a summary is calculated by multiplying the probabilities of its individual terms(bag of words assumption).

· Surface realization- The probability of deciding sentence ordering using models like bigram or trigram etc.

4. Generative Model: Using HMMs for Story Generation from Headlines:

Source- http://www.cs.northwestern.edu/~akm175/docs/btp.pdf

HMM (Hidden Markov Model) is used in this technique. Other techniques used here are tagging, normalisation, segmentation, stemming stop word filtering and merging similar content words.

Source: http://www.cs.northwestern.edu/~akm175/docs/btp.pdf

Hence various techniques as mentioned above have been used to generate headlines out of a given text document. It has been observed that these techniques work fairly well for technical or historical documents but not much for poetic or artistic works as in such works , most of the times, the meaning of the context is much different from the words mentioned there.

Some of the most efficient techniques mentioned above are HMM model which around 20% of the times gives efficient headings other times it’s a little less matching with the context. RNNs also give good efficiency due to the techniques of simple attention used to give attention to candidate heading words.

Although all these techniques have been used but there is still much scope to make headline generation much better.

References:

Amalgam

Search This Blog

HEADLINE GENERATION

Comments

Post a Comment

Popular posts from this blog

NLP in Video Games

Word embeddings and an application in SMT

Discourse Analysis