Is Sanskrit the most suitable language for natural language processing?

“The structure of Paninian Grammar is nothing but a computer program” — Charles Babbage. It has captured the base of universal principles of all languages. Computational Linguistics requires formal rules for analysis and generation of language. Slowly Chomsky and others are turning towards Panini and in the past 60–70 years; much time, effort and money has been expended on designing computers that can work with phrases, expressions, idioms, oratory, rhetoric, ambiguity, vagueness, idiosyncrasy and peculiarity of human language for NLP programming.
Among the accomplishments of the grammarians can be reckoned as a method for paraphrasing Sanskrit in a manner that is identical not only in essence but in form with current work in Artificial Intelligence. To examine the possibility of seeking solutions for these problems, this research article demonstrates that a natural language can serve as an artificial language with the help of NLP, Semantic Net, Vibhakti, Dual Case; Inflection based Syntax, Shastric Sanskrit and Equivalence etc.
NLP is correlated with the field of interaction between human being and computers. It mainly entails the challenges like natural or human language understanding (spoken as well as written), so that computers can understand it. Further it can enable computers to obtain exact meaning from input of natural language and others, thereby helping in natural language creation and generation. In general as part of NLU, any language for human communication has three basic tenets:
1. Syntax
It depicts the basic form of any natural language, typically grammar specified. Here only that language can be best fit for the computer programming or NLP or Robot programming which has sound grammar base.
2. Semantics
Here semantics denotes the connotation, significance or real meaning of the words or sentences of the language. Although general semantic theories exist, when we build a natural language understanding system for a particular application, we try to use the simplest representation we can.
3. Pragmatics
The pragmatic component explains how the expressions or speech or words relate to the world. To understand language, an agent should consider more than the sentence; it has to take into account the context of the sentence, the state of the world, the goals of the speaker and the listener, special conventions, and the like.

If we want binary/twofold code/machine language as the ground of all computer activities, a well controlled, structured and unambiguous method followed in Sanskrit is essential to fulfill the required objectives defined earlier. Although many languages other than Sanskrit have a well defined grammar, still the same word and sentence carry different meanings in different perspectives. For example, a phrase in English language “I like apple” can suggest a brand of Computer or a kind of fruit apple.
In other example, “Do you see the man with the glasses?” suggest from linguistic analysis point of view varied meanings for different things or situations. At one place, it suggests “A man uses the glasses or eyeglasses” (“A device to compensate for defective vision or to protect the eyes from light and dust”) or things made of glass (“a hard, brittle substance, typically transparent or translucent, made by fusing sand with soda and lime and cooling rapidly which is used to make windows, drinking containers and many other things”) or at some other place, it depicts a drinking pot or container made from glass, e.g. a glass of milk.
In above examples, both sentences are okay grammatically. Their correct meaning varies with respect to their existing contexts or background. These vagueness or ambiguity must be properly dealt by NLP Programmers.

Word Structures in Sanskrit is in the format; <prefix> <dhatu> <suffix>. In English language, in order to change the meaning of a sentence, introduction of new words into the sentence has to be done. Unlike English, most of the sentences in Sanskrit only require addition of prefix or suffix to a word. For Example: “Gachaami” means “going”. “aagachaami” means “coming back”. In Sanskrit, words like is, an, the etc… doesn't have a separate word. In order to use it an addition of suffix or prefix to a word is done. But due to the concept called inflection words give the accurate meaning. For example, consider a sentence: This is an Elephant. In Sanskrit: Eshaha Gajaha. A sentence with four words in English is described with only two words in Sanskrit. Thus it decreases the storage space. All these features make Sanskrit unambiguous.

This language does not use concepts like mapping or diagrammatic representation. It only follows the grammar rules penned by Panini. These rules make it ambiguous free and also help it to consider as a treasure in the field of Artificial Intelligence. Sanskrit is such a highly inflected language that word order almost does not matter. For prose Sanskrit had the preferred word order of Subject-Object-Verb (SOV). For poetry and the like other word orders were used frequently for their effect.

This Sanskrit shlokas suggest; “A stupid person is like a two legged animal in front of the eyes so he must be avoided”. This sentence is like “Gagar main Sagar”, very economic in selecting and using words. Only few languages of the world can match this effectiveness. It has a distinct property to maintain one-to-one relation between the word/objects represented and its associated properties.

In comparison to others, Sanskrit is a highly inflected language by way of three grammatical genders (masculine, feminine and neuter) and three numbers (singular, plural and dual). Sanskrit supports this exclusive characteristic of using and handing out singular, plural as well as dual. We have made a comparative study among Sanskrit, English and other languages as mentioned here under;
Sanskrit: “Singular Case: baalkey (in the boy)” “Dual Case: baalkayo: (between the boys)” “Plural Case: baalkeshu (among the boys)”
English: “Singular Case: in the boy” “Dual Case: between the boys” “Plural Case: among the boys”

Sanskrit differs from other language in the sense that it has one-to-one unique relation or correspondence between the available words and the entities, they stand for. In English, a word tree simply denotes the word tree, not reflecting its associated features, attributes or properties. While in Sanskrit, the word (tree) corresponds to tree's features additionally and not denote the tree itself only. Likewise many other words in Sanskrit that describe the attributes of a tree, signify the same word tree.
In 1785, Sir William Jones (a Judge) in order to systematize the native law of India started studying Sanskrit and surprisingly, became a linguistic scholar. He found Sanskrit to be a marvelous language as he could guess the meaning of some Sanskrit words from his knowledge of Latin and Greek. After four months of study he wrote and delivered a paper in which Jones (1786) said:

“The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philosopher could examine them all three without believing them to have sprung from some common source, which, perhaps, no longer exists.”

Jones became a Sanskrit aficionado and communicated that passion to the intellectual scholar world of Europe of the time through his writings.

In conclusion, we can say that Sanskrit being well programmed natural language is the most suitable language for Soft Computing areas in Artificial Intelligence and

Natural Language Processors. It can be used as high-level language to write programs and to give instructions to advanced robots which are more likely to understand Sanskrit better. We also need highly powerful and robust architecture computers for Sanskrit language testing and implementation. Depth knowledge and learning of Sanskrit must be encouraged for AI & NLP programmers. It also requires huge amounts of money, research and man-power.

References:
R. Jha, A. Jha, D. Jha and S. Jha, "Is Sanskrit the most suitable language for natural language processing?," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 2016, pp. 211-216.

Semantic Similarity using Word Embeddings and Wordnet

Measuring semantic similarity between documents has varied applications in NLP and Artificial sentences such as in chatbots, voicebots, communication in different languages etc. . It refers to quantifying similarity of sentences based on their literal meaning rather than only syntactic structure. A semantic net such as WordNet and Word Embeddings such as Google’s Word2Vec, DocToVec can be used to compute semantic similarity. Let us see how. Word Embeddings Word embeddings are vector representations of words. A word embedding tries to map a word to a numerical vector representation using a dictionary of words, i.e. words and phrases from the vocabulary are mapped to the vector space and represented using real numbers. The closeness of vector representations of 2 words in the real space is a measure of similarity between them. Word embeddings can be broadly classified into frequency based (eg: count vector, tfidf, co occurrence etc) and prediction based (eg: Contin...

Velva Block17 January 2019 at 03:06
I have found this article very informative and educational. visit my blog I think, the probability of electronic getting ready for the weaken kids is phenomenally reasonable in light of the manner in which that handicapped individual understudies can't go to the schools and universities effectively by goodness of physical issues..

Amalgam

Search This Blog

Is Sanskrit the most suitable language for natural language processing?

Comments

Post a Comment

Popular posts from this blog

NLP in Video Games

Semantic Similarity using Word Embeddings and Wordnet

Discourse Analysis