In computational linguistics and natural language processing coreference resolution (CR) is an avidly studies problem in discourse which has managed to be only partially solved by the state of the art and consequently remain one of the most exciting open problems in this field.
A basic example to illustrate the above definition is given below :
This might seem very trivial and natural to humans but it is a much more difficult problem for an AI-brain. Solutions to solve this problem in NLP broadly fall into two groups :
A typical algorithm corresponding to CR is given below :
Introduction and Definition
The process of linking together mentions of a particular entity in a speech or text excerpt that related to real world entities is termed as coreference resolution. This process identifies the dependence between a phrase with the rest of the sentence or other sentences in the text. This is an integral part of natural languages to avoid repetition, demonstrate possession/relation etc.A basic example to illustrate the above definition is given below :
Another example which uses elements from popular fiction literature :
Harry wouldn’t bother to read “Hogwarts: A History” as long as Hermione is around. He knows she knows the book by heart.
The different types of coreference includes:
Noun phrases: Hogwarts A history <- the book
Pronouns : Harry <- He, Hermione <- she
Noun phrases: Hogwarts A history <- the book
Pronouns : Harry <- He, Hermione <- she
- Data driven : focuses on a supervised training paradigm where loads of training data is fed into a network to resolve coreference. This works well with huge training data.
- Syntactical : These methods rely on building a heuristic derived from the surrounding sentence structure at the point of coreference and works for cases where data available is scarce.
We look at these in more detail later. For now, we look at the importance and necessity of studying this topic in Section 2, followed by the state of the art solutions available for this problem in Section 3, which will further be followed by its applications in NLP and an evaluations of its performance in Section 4 and 5 respectively. (Notice the bolded words in this sentence represent a coreference for the topic of this blog entry.)
Why is CR an important topic?
The following are the primary reasons that why this topic requires extensive study :- Coreference resolution forms the basis of the Winograd Schema Challenge, a test of machine intelligence … build to defeat the AIs who’ve beaten the Turing Test! - the machine must identify the antecedent of an ambiguous pronoun in a statement
- This is still largely an unsolved problem and there is a lot of scope to improve upon the results we get at present. A lot lesser tools are also made available to people for this purpose. This is due to inherent ambiguities in resolution which make the problem difficult.
- An example to highlight this ambiguity is as follows : The pronoun it, which has many uses. It can refer much like he and she, except that it generally refers to inanimate objects . It can also refer to abstractions rather than beings: "He was paid minimum wage, but didn't seem to mind it." Finally, it also has pleonastic uses, which do not refer in anything specific like : a. It's raining. b. It's really a shame.
- Coreference resolution is important because it consequently improves the performance of may tasks in NLP like text summarization, question-answer systems, chatbots, etc
State of the Art for CR
This is not a new problem but has seen revived interest in it in the past five years as people have started applying techniques like deep representational and reinforcement learning to it. There have also been publications which have guaranteed an improve in performance in some supervised neural networks like RNN and LSTM if a better solution for the conreference problem is found.A typical algorithm corresponding to CR is given below :
- Extract a list of all the mentions in the text - mentions are words which may refer to some other previous or coming word in the text.
- Compute a set of features on pair of mentions that have been obtained in the prior step.
- Then we attempt to find the most appropriate antecedant to each mention obtained based on a likelihood allotment to each occurring noun.
Step 2 in detail : Features can be extracted by traditional handcrafted methods or there is a possibility to use ready built networks which learn the features in abstractions as the data passes through each layer.
Step 3 in detail : We take the features and use it in a neural network. The first NN gives a conditional probability score that corresponds to that given a mention, what is the probability of each possible antecedant. The second network will gives us the probability if a mention has no possible antecedant. We can then simply compare all these scores together and take the highest score to determine whether a mention has an antecedent and which one it should be.
The above algorithm is used and integrated into efficient pipelines to make the best estimate of a good coreference resolution.
Note : Machine-learning and rule-based approaches worked best when augmented with external knowledge sources and coreference clues extracted from document structure. The systems performed better in coreference resolution when provided with ground truth mentions. Overall, the systems struggle in solving coreference resolution for cases that require domain knowledge.
Applications in NLP
- Required extensively in the document analysis and information retrieval aspect of NLP - ex. Clinical health records in the US often want a clear disambiguity in the mentions (can be of person, drug, doctor) on prescriptions and records of their patients.
- Coreference resolution drastically improves readability of summaries.
- Automatic summarization, textual entailment and text classification are among some of its core applications in NLP.
Comments
Post a Comment