Natural
Language Processing is gaining importance gradually in almost all walks of life
ranging from industries like financial, health care to day to day activities
like searching on search engines, chatting with automated chat bots and so on.
Another major application of NLP is in text summarization which is still a hot
topic and being worked upon vibrantly.
Source: http://bit.ly/2y0HFMt
Many
techniques have been worked upon for summarization but to retrieve a summary of
less than a sentence (heading probably) has not been much worked upon and is
gaining more popularity.
There can be
many applications of finding short summaries of texts of less than a sentence.
Some of them are:
·
Generating Headlines automatically for newspaper stories.
·
Generating table of contents for a document.
Some of the
techniques touched upon to achieve this are as follows:
1. Context Free Grammars(CFGs):
Generating headlines through CFG involves two steps:
·
Extracting sentence from the
content-This
involves various other steps like normalizing the text followed by feature
extraction and sentence ranking.
·
Headline generation- After the sentences have
been extracted based on their scores, content words from them are extracted
which represent the entire text. Finally the headings are generated using rules
of CFG.
2.
Recurrent Neural
Networks(RNN): Neural Nets have been used
in this field earlier as well but RNN's have increased the efficiency manifold.
The RNN encoder-decoder technique is used with LSTM (Long short-term
memory)elements.
Source: http://bit.ly/2yycPeE
A huge dataset is required
for RNNs so that they give a new correct output for an input which wasn’t
present in the training set.
Attention is a technique
which is used to understand which word is to be given more attention during
headline formation.
Two types of attention
techniques may be used: Simple Attention and Complex Attention. Although Simple
Attention tends to give a better results in this.
3. Statistical models: Statistical methods involve two
steps:-
·
Content selection-By this the model selects
the content to be present in the summary. One of the basic models is “zero-level” model
.In this model, the probability of a sentence to be in a summary is calculated
by multiplying the probabilities of its individual terms(bag of words
assumption).
·
Surface realization- The probability of deciding
sentence ordering using models like bigram or trigram etc.
4. Generative
Model: Using HMMs for Story Generation from Headlines:
HMM (Hidden
Markov Model) is used in this technique. Other techniques used here are
tagging, normalisation, segmentation, stemming stop word filtering and merging
similar content words.
Source: http://www.cs.northwestern.edu/~akm175/docs/btp.pdf
Hence various
techniques as mentioned above have been used to generate headlines out of a
given text document. It has been observed that these techniques work fairly
well for technical or historical documents but not much for poetic or artistic
works as in such works , most of the times, the meaning of the context is much
different from the words mentioned there.
Some of the
most efficient techniques mentioned above are HMM model which around 20% of the
times gives efficient headings other times it’s a little less matching with the
context. RNNs also give good efficiency due to the techniques of simple
attention used to give attention to candidate heading words.
Although all
these techniques have been used but there is still much scope to make headline
generation much better.
References:
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.253&rep=rep1&type=pdf
- http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=1477FAF311E0C110B3AA3A5B5920AF0B?doi=10.1.1.117.1236&rep=rep1&type=pdf
- http://www.cs.northwestern.edu/~akm175/docs/btp.pdf
- https://nlp.stanford.edu/courses/cs224n/2015/reports/1.pdf
- http://www.ipcsit.com/vol59/013-ICIE2014-2-005.pdf
Comments
Post a Comment