Skip to main content

Paragraph Vectors and Word Vectors

Word prediction model using bag-of-word representation or n-grams model can be limiting. Bag-of-word representation of given text disregarded the linguistic context of the word, the semantic of a given word is not taken into consideration. N-gram models can capture dependencies and relation upto a short distance but fail to capture this over long distances. In bag-of-word representation words such as “small”, “little”and “white” would all be considered the same, that is, they would be considered equidistant form each other; but according to linguistic context “small” and “little” should be considered to be closer than “small” and “white”. 


Word Vectors


Word Vectors are distributed representations of words in given text. Each word is represented by a unique vector, each vector is a set of features. Word Vectors keep the linguistic dependencies and semantic structures of words intact. Vectors of words “small” and “ little” are lot closer in the vector space as compared to vectors of words “small” and “white”.

Concept of distributed vector representation of words: 
  • Each word in context is mapped to a specific vector. Each word represents a column in the vector matrix. The ranking of the word is decided by the position of the word in the vector space. 
  • A neural network is used to generate this representation of words as vectors. Stochastic gradient descent with back-propagation is used to set feature-values of the word vectors. When training converges similar words are mapped closer to each other. These neural nets consider the underlying dependencies of words while training. Word2Vec are a class of algorithms that are used to generate vectors words.
  • An aggregate / combination function is used to combine different word vectors. A soft max multi class classifier is then used to then assign output probabilities to possible new words. Figure-a(Source:[1] Distributed Representations of Sentences and Documents;Quoc Le,Tomas Mikolo)

Figure-a




Paragraph Vectors 

Paragraph Vectors are unsupervised frameworks for representation of paragraphs as vectors. Paragraph vectors are an improvement in the representation of text form the word vector model. Paragraph vector are continuous distribution vector, of a given piece of text. A unique vector represent a unique text; these vectors are a set of features. The term
“Paragraph” emphasis that the length of the text can be varying: sentences, paragraphs, documents etc. Paragraph vectors are created by stochastic gradient descent and back propagation.

Paragraph Vectors can be thought as vectors that capture semantics properties and dependencies that are lost in word vector representation.

Paragraph vector framework:
  • Each paragraph is mapped to a unique vector. 
  • Each word is also mapped to a unique word vector
  • The paragraph and word vectors are concatenated.
  • New word is predicted using a classifier to assign probabilities to possible next word.
  • Paragraph vectors and the word vectors are trained using back-propagation with stochastic gradient descent. 
  • Refer Figure-b(Source: [1]Distributed Representations of Sentences and Documents;Quoc Le,Tomas Mikolo) 
Figure-b

All context generated from/for the same paragraph share that particular paragraph vector. Paragraph vectors are not shared across different paragraphs as two paragraph vectors represent different paragraphs; whereas , the word vectors are amongst paragraphs, this is because multiple paragraphs can have common words. 

After training paragraphs and word vectors, learning models such as SVM, logistic regression,K-means etc can be applied.

Alternate framework : 

  • Words are predicted, from words that are randomly sampled from paragraph vectors.
  • Each gradient descent iteration , a part of text is chosen from a paragraph vector and random words are chosen from this portion of text. These words then are used to generated new words. 
  • Refer Figure-c(Source: [1]Distributed Representations of Sentences and Documents;Quoc Le,Tomas Mikolo) 
Figure-c



Comments

Popular posts from this blog

NLP in Video Games

From the last few decades, NLP (Natural Language Processing) has obtained a high level of success in the field  of Computer Science, Artificial Intelligence and Computational Logistics. NLP can also be used in video games, in fact, it is very interesting to use NLP in video games, as we can see games like Serious Games includes Communication aspects. In video games, the communication includes linguistic information that is passed either through spoken content or written content. Now the question is why and where can we use NLP in video games?  There are some games that are related to pedagogy or teaching (Serious Games). So, NLP can be used in these games to achieve these objectives in the real sense. In other games, one can use the speech control using NLP so that the player can play the game by concentrating only on visuals rather on I/O. These things at last increases the realism of the game. Hence, this is the reason for using NLP in games.  We ...

Discourse Analysis

NLP makes machine to understand human language but we are facing issues like word ambiguity, sarcastic sentiments analysis and many more. One of the issue is to predict correctly relation between words like " Patrick went to the club on last Friday. He met Richard ." Here, ' He' refers to 'Patrick'. This kind of issue makes Discourse analysis one of the important applications of Natural Language Processing. What is Discourse Analysis ? The word discourse in linguistic terms means language in use. Discourse analysis may be defined as the process of performing text or language analysis, which involves text interpretation and knowing the social interactions. Discourse analysis may involve dealing with morphemes, n-grams, tenses, verbal aspects, page layouts, and so on. It is often used to refer to the analysis of conversations or verbal discourse. It is useful for performing tasks, like A naphora Resolution (AR) , Named Entity Recognition (NE...

Dbpedia Datasets

WHAT IS Dbpedia? It is a project idea aiming to extract structured content from the information created in the wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datsets. BUT? But why i am talking about Dbpedia ? How it is related to natural language processing? The DBpedia data set contains 4.58 million entities, out of which 4.22 million are classified in a consistent ontology, including 1,445,000 persons, 735,000 places, 123,000 music albums, 87,000 films, 19,000 video games, 241,000 organizations, 251,000 species and 6,000 diseases. The data set features labels and abstracts for these entities in up to 125 languages; 25.2 million links to images and 29.8 million links to external web pages. In addition, it contains around 50 million links...