Skip to main content

Paragraph Vectors and Word Vectors

Word prediction model using bag-of-word representation or n-grams model can be limiting. Bag-of-word representation of given text disregarded the linguistic context of the word, the semantic of a given word is not taken into consideration. N-gram models can capture dependencies and relation upto a short distance but fail to capture this over long distances. In bag-of-word representation words such as “small”, “little”and “white” would all be considered the same, that is, they would be considered equidistant form each other; but according to linguistic context “small” and “little” should be considered to be closer than “small” and “white”. 


Word Vectors


Word Vectors are distributed representations of words in given text. Each word is represented by a unique vector, each vector is a set of features. Word Vectors keep the linguistic dependencies and semantic structures of words intact. Vectors of words “small” and “ little” are lot closer in the vector space as compared to vectors of words “small” and “white”.

Concept of distributed vector representation of words: 
  • Each word in context is mapped to a specific vector. Each word represents a column in the vector matrix. The ranking of the word is decided by the position of the word in the vector space. 
  • A neural network is used to generate this representation of words as vectors. Stochastic gradient descent with back-propagation is used to set feature-values of the word vectors. When training converges similar words are mapped closer to each other. These neural nets consider the underlying dependencies of words while training. Word2Vec are a class of algorithms that are used to generate vectors words.
  • An aggregate / combination function is used to combine different word vectors. A soft max multi class classifier is then used to then assign output probabilities to possible new words. Figure-a(Source:[1] Distributed Representations of Sentences and Documents;Quoc Le,Tomas Mikolo)

Figure-a




Paragraph Vectors 

Paragraph Vectors are unsupervised frameworks for representation of paragraphs as vectors. Paragraph vectors are an improvement in the representation of text form the word vector model. Paragraph vector are continuous distribution vector, of a given piece of text. A unique vector represent a unique text; these vectors are a set of features. The term
“Paragraph” emphasis that the length of the text can be varying: sentences, paragraphs, documents etc. Paragraph vectors are created by stochastic gradient descent and back propagation.

Paragraph Vectors can be thought as vectors that capture semantics properties and dependencies that are lost in word vector representation.

Paragraph vector framework:
  • Each paragraph is mapped to a unique vector. 
  • Each word is also mapped to a unique word vector
  • The paragraph and word vectors are concatenated.
  • New word is predicted using a classifier to assign probabilities to possible next word.
  • Paragraph vectors and the word vectors are trained using back-propagation with stochastic gradient descent. 
  • Refer Figure-b(Source: [1]Distributed Representations of Sentences and Documents;Quoc Le,Tomas Mikolo) 
Figure-b

All context generated from/for the same paragraph share that particular paragraph vector. Paragraph vectors are not shared across different paragraphs as two paragraph vectors represent different paragraphs; whereas , the word vectors are amongst paragraphs, this is because multiple paragraphs can have common words. 

After training paragraphs and word vectors, learning models such as SVM, logistic regression,K-means etc can be applied.

Alternate framework : 

  • Words are predicted, from words that are randomly sampled from paragraph vectors.
  • Each gradient descent iteration , a part of text is chosen from a paragraph vector and random words are chosen from this portion of text. These words then are used to generated new words. 
  • Refer Figure-c(Source: [1]Distributed Representations of Sentences and Documents;Quoc Le,Tomas Mikolo) 
Figure-c



Comments

Popular posts from this blog

NLP in Video Games

From the last few decades, NLP (Natural Language Processing) has obtained a high level of success in the field  of Computer Science, Artificial Intelligence and Computational Logistics. NLP can also be used in video games, in fact, it is very interesting to use NLP in video games, as we can see games like Serious Games includes Communication aspects. In video games, the communication includes linguistic information that is passed either through spoken content or written content. Now the question is why and where can we use NLP in video games?  There are some games that are related to pedagogy or teaching (Serious Games). So, NLP can be used in these games to achieve these objectives in the real sense. In other games, one can use the speech control using NLP so that the player can play the game by concentrating only on visuals rather on I/O. These things at last increases the realism of the game. Hence, this is the reason for using NLP in games.  We can use NLP to impr

Discourse Analysis

NLP makes machine to understand human language but we are facing issues like word ambiguity, sarcastic sentiments analysis and many more. One of the issue is to predict correctly relation between words like " Patrick went to the club on last Friday. He met Richard ." Here, ' He' refers to 'Patrick'. This kind of issue makes Discourse analysis one of the important applications of Natural Language Processing. What is Discourse Analysis ? The word discourse in linguistic terms means language in use. Discourse analysis may be defined as the process of performing text or language analysis, which involves text interpretation and knowing the social interactions. Discourse analysis may involve dealing with morphemes, n-grams, tenses, verbal aspects, page layouts, and so on. It is often used to refer to the analysis of conversations or verbal discourse. It is useful for performing tasks, like A naphora Resolution (AR) , Named Entity Recognition (NE

Coreference Resolution and Applications in NLP

In computational linguistics and natural language processing coreference resolution (CR) is an avidly studies problem in discourse which has managed to be only partially solved by the state of the art and consequently remain one of the most exciting open problems in this field. Introduction and Definition The process of linking together mentions of a particular entity in a speech or text excerpt that related to real world entities is termed as coreference resolution. This process identifies the dependence between a phrase with the rest of the sentence or other sentences in the text.  This is an integral part of natural languages to avoid repetition, demonstrate possession/relation etc. A basic example to illustrate the above definition is given below : Another example which uses elements from popular fiction literature : Harry  wouldn’t bother to read “ Hogwarts: A History ” as long as  Hermione  is around.  He  knows  she  knows  the book  by heart. The different type