Coreference Resolution and Applications in NLP

In computational linguistics and natural language processing coreference resolution (CR) is an avidly studies problem in discourse which has managed to be only partially solved by the state of the art and consequently remain one of the most exciting open problems in this field.

Introduction and Definition

The process of linking together mentions of a particular entity in a speech or text excerpt that related to real world entities is termed as coreference resolution. This process identifies the dependence between a phrase with the rest of the sentence or other sentences in the text. This is an integral part of natural languages to avoid repetition, demonstrate possession/relation etc.
A basic example to illustrate the above definition is given below :

Another example which uses elements from popular fiction literature :

Harry wouldn’t bother to read “Hogwarts: A History” as long as Hermione is around. He knows she knows the book by heart.

The different types of coreference includes:
Noun phrases: Hogwarts A history <- the book
Pronouns : Harry <- He, Hermione <- she

This might seem very trivial and natural to humans but it is a much more difficult problem for an AI-brain. Solutions to solve this problem in NLP broadly fall into two groups :

Data driven : focuses on a supervised training paradigm where loads of training data is fed into a network to resolve coreference. This works well with huge training data.
Syntactical : These methods rely on building a heuristic derived from the surrounding sentence structure at the point of coreference and works for cases where data available is scarce.

We look at these in more detail later. For now, we look at the importance and necessity of studying this topic in Section 2, followed by the state of the art solutions available for this problem in Section 3, which will further be followed by its applications in NLP and an evaluations of its performance in Section 4 and 5 respectively. (Notice the bolded words in this sentence represent a coreference for the topic of this blog entry.)

Why is CR an important topic?

The following are the primary reasons that why this topic requires extensive study :

Coreference resolution forms the basis of the Winograd Schema Challenge, a test of machine intelligence … build to defeat the AIs who’ve beaten the Turing Test! - the machine must identify the antecedent of an ambiguous pronoun in a statement
This is still largely an unsolved problem and there is a lot of scope to improve upon the results we get at present. A lot lesser tools are also made available to people for this purpose. This is due to inherent ambiguities in resolution which make the problem difficult.

An example to highlight this ambiguity is as follows : The pronoun it, which has many uses. It can refer much like he and she, except that it generally refers to inanimate objects . It can also refer to abstractions rather than beings: "He was paid minimum wage, but didn't seem to mind it." Finally, it also has pleonastic uses, which do not refer in anything specific like : a. It's raining. b. It's really a shame.

Coreference resolution is important because it consequently improves the performance of may tasks in NLP like text summarization, question-answer systems, chatbots, etc

State of the Art for CR

This is not a new problem but has seen revived interest in it in the past five years as people have started applying techniques like deep representational and reinforcement learning to it. There have also been publications which have guaranteed an improve in performance in some supervised neural networks like RNN and LSTM if a better solution for the conreference problem is found.

A typical algorithm corresponding to CR is given below :

Extract a list of all the mentions in the text - mentions are words which may refer to some other previous or coming word in the text.
Compute a set of features on pair of mentions that have been obtained in the prior step.
Then we attempt to find the most appropriate antecedant to each mention obtained based on a likelihood allotment to each occurring noun.

Step 2 in detail : Features can be extracted by traditional handcrafted methods or there is a possibility to use ready built networks which learn the features in abstractions as the data passes through each layer.

Step 3 in detail : We take the features and use it in a neural network. The first NN gives a conditional probability score that corresponds to that given a mention, what is the probability of each possible antecedant. The second network will gives us the probability if a mention has no possible antecedant. We can then simply compare all these scores together and take the highest score to determine whether a mention has an antecedent and which one it should be.

The above algorithm is used and integrated into efficient pipelines to make the best estimate of a good coreference resolution.

Note : Machine-learning and rule-based approaches worked best when augmented with external knowledge sources and coreference clues extracted from document structure. The systems performed better in coreference resolution when provided with ground truth mentions. Overall, the systems struggle in solving coreference resolution for cases that require domain knowledge.

Applications in NLP

Required extensively in the document analysis and information retrieval aspect of NLP - ex. Clinical health records in the US often want a clear disambiguity in the mentions (can be of person, drug, doctor) on prescriptions and records of their patients.
Coreference resolution drastically improves readability of summaries.
Automatic summarization, textual entailment and text classification are among some of its core applications in NLP.

Performance Evaluation in Applications

With the present extent of CR achievable, experiments on the above activities in NLP have been tried with the usage of CR. There is no statistically significant benefit brought by automatic coreference resolution to these applications. In one specific study, CR that was calculated using the conventional BART toolkit generally evoked slight but not significant change in performance for these activities and in few cases even deteriorated the performance. Thus, we realize that there is a long way to go before we can perfect CR which appears to be insufficient based on the present methods used to calculate it.

References

State of the Art for CR

Extent of help in NLP applications

analyzing CR software

Word embeddings and an application in SMT

We all are aware of (not so) recent advancements in word representation, such as Word2Vec, GloVe etc. for various NLP tasks. Let's try to dig a little deeper of how they work, and why they are so helpful! The basics, what is a Word vector? We need a mathematical way of representing words so as to process them. We call this representation, a word vector. This representation can be as simple as a one-hot encoded vector having the size of the vocabulary. For ex, if we had 3 words in our vocabulary {man, woman, child}, we can generate word vectors in the following manner Man : {0, 0, 1} Woman : {0, 1, 0} Child : {1, 0, 0} Such an encoding cannot be used to for any meaningful comparisons, other than checking for equality. In vectors such as Word2Vec, a word is represented as a distribution over some dimensions. Each word is assigned some particular weight for each of the dimensions. Picking up the previous example, this time the vectors can be as following (assuming a 2 dime...

Amalgam

Search This Blog