Skip to main content

Is Sanskrit the most suitable language for natural language processing?

“The structure of Paninian Grammar is nothing but a computer program” — Charles Babbage. It has captured the base of universal principles of all languages. Computational Linguistics requires formal rules for analysis and generation of language. Slowly Chomsky and others are turning towards Panini and in the past 60–70 years; much time, effort and money has been expended on designing computers that can work with phrases, expressions, idioms, oratory, rhetoric, ambiguity, vagueness, idiosyncrasy and peculiarity of human language for NLP programming.
Among the accomplishments of the grammarians can be reckoned as a method for paraphrasing Sanskrit in a manner that is identical not only in essence but in form with current work in Artificial Intelligence. To examine the possibility of seeking solutions for these problems, this research article demonstrates that a natural language can serve as an artificial language with the help of NLP, Semantic Net, Vibhakti, Dual Case; Inflection based Syntax, Shastric Sanskrit and Equivalence etc.
NLP is correlated with the field of interaction between human being and computers. It mainly entails the challenges like natural or human language understanding (spoken as well as written), so that computers can understand it. Further it can enable computers to obtain exact meaning from input of natural language and others, thereby helping in natural language creation and generation. In general as part of NLU, any language for human communication has three basic tenets:
1. Syntax
It depicts the basic form of any natural language, typically grammar specified. Here only that language can be best fit for the computer programming or NLP or Robot programming which has sound grammar base.
2. Semantics
Here semantics denotes the connotation, significance or real meaning of the words or sentences of the language. Although general semantic theories exist, when we build a natural language understanding system for a particular application, we try to use the simplest representation we can.
3. Pragmatics
The pragmatic component explains how the expressions or speech or words relate to the world. To understand language, an agent should consider more than the sentence; it has to take into account the context of the sentence, the state of the world, the goals of the speaker and the listener, special conventions, and the like.

If we want binary/twofold code/machine language as the ground of all computer activities, a well controlled, structured and unambiguous method followed in Sanskrit is essential to fulfill the required objectives defined earlier. Although many languages other than Sanskrit have a well defined grammar, still the same word and sentence carry different meanings in different perspectives. For example, a phrase in English language “I like apple” can suggest a brand of Computer or a kind of fruit apple.
In other example, “Do you see the man with the glasses?” suggest from linguistic analysis point of view varied meanings for different things or situations. At one place, it suggests “A man uses the glasses or eyeglasses” (“A device to compensate for defective vision or to protect the eyes from light and dust”) or things made of glass (“a hard, brittle substance, typically transparent or translucent, made by fusing sand with soda and lime and cooling rapidly which is used to make windows, drinking containers and many other things”) or at some other place, it depicts a drinking pot or container made from glass, e.g. a glass of milk.
In above examples, both sentences are okay grammatically. Their correct meaning varies with respect to their existing contexts or background. These vagueness or ambiguity must be properly dealt by NLP Programmers.

Word Structures in Sanskrit is in the format; <prefix> <dhatu> <suffix>. In English language, in order to change the meaning of a sentence, introduction of new words into the sentence has to be done. Unlike English, most of the sentences in Sanskrit only require addition of prefix or suffix to a word. For Example: “Gachaami” means “going”. “aagachaami” means “coming back”. In Sanskrit, words like is, an, the etc… doesn't have a separate word. In order to use it an addition of suffix or prefix to a word is done. But due to the concept called inflection words give the accurate meaning. For example, consider a sentence: This is an Elephant. In Sanskrit: Eshaha Gajaha. A sentence with four words in English is described with only two words in Sanskrit. Thus it decreases the storage space. All these features make Sanskrit unambiguous.

This language does not use concepts like mapping or diagrammatic representation. It only follows the grammar rules penned by Panini. These rules make it ambiguous free and also help it to consider as a treasure in the field of Artificial Intelligence. Sanskrit is such a highly inflected language that word order almost does not matter. For prose Sanskrit had the preferred word order of Subject-Object-Verb (SOV). For poetry and the like other word orders were used frequently for their effect.

This Sanskrit shlokas suggest; “A stupid person is like a two legged animal in front of the eyes so he must be avoided”. This sentence is like “Gagar main Sagar”, very economic in selecting and using words. Only few languages of the world can match this effectiveness. It has a distinct property to maintain one-to-one relation between the word/objects represented and its associated properties.

In comparison to others, Sanskrit is a highly inflected language by way of three grammatical genders (masculine, feminine and neuter) and three numbers (singular, plural and dual). Sanskrit supports this exclusive characteristic of using and handing out singular, plural as well as dual. We have made a comparative study among Sanskrit, English and other languages as mentioned here under;
Sanskrit: “Singular Case: baalkey (in the boy)” “Dual Case: baalkayo: (between the boys)” “Plural Case: baalkeshu (among the boys)”
English:  “Singular Case: in the boy” “Dual Case: between the boys” “Plural Case: among the boys”

Sanskrit differs from other language in the sense that it has one-to-one unique relation or correspondence between the available words and the entities, they stand for. In English, a word tree simply denotes the word tree, not reflecting its associated features, attributes or properties. While in Sanskrit, the word (tree) corresponds to tree's features additionally and not denote the tree itself only. Likewise many other words in Sanskrit that describe the attributes of a tree, signify the same word tree.
In 1785, Sir William Jones (a Judge) in order to systematize the native law of India started studying Sanskrit and surprisingly, became a linguistic scholar. He found Sanskrit to be a marvelous language as he could guess the meaning of some Sanskrit words from his knowledge of Latin and Greek. After four months of study he wrote and delivered a paper in which Jones (1786) said:

“The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philosopher could examine them all three without believing them to have sprung from some common source, which, perhaps, no longer exists.”

Jones became a Sanskrit aficionado and communicated that passion to the intellectual scholar world of Europe of the time through his writings.

In conclusion, we can say that Sanskrit being well programmed natural language is the most suitable language for Soft Computing areas in Artificial Intelligence and

Natural Language Processors. It can be used as high-level language to write programs and to give instructions to advanced robots which are more likely to understand Sanskrit better. We also need highly powerful and robust architecture computers for Sanskrit language testing and implementation. Depth knowledge and learning of Sanskrit must be encouraged for AI & NLP programmers. It also requires huge amounts of money, research and man-power.

References:
R. Jha, A. Jha, D. Jha and S. Jha, "Is Sanskrit the most suitable language for natural language processing?," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 2016, pp. 211-216.

Comments

  1. I have found this article very informative and educational. visit my blog I think, the probability of electronic getting ready for the weaken kids is phenomenally reasonable in light of the manner in which that handicapped individual understudies can't go to the schools and universities effectively by goodness of physical issues..

    ReplyDelete

Post a Comment

Popular posts from this blog

NLP in Video Games

From the last few decades, NLP (Natural Language Processing) has obtained a high level of success in the field  of Computer Science, Artificial Intelligence and Computational Logistics. NLP can also be used in video games, in fact, it is very interesting to use NLP in video games, as we can see games like Serious Games includes Communication aspects. In video games, the communication includes linguistic information that is passed either through spoken content or written content. Now the question is why and where can we use NLP in video games?  There are some games that are related to pedagogy or teaching (Serious Games). So, NLP can be used in these games to achieve these objectives in the real sense. In other games, one can use the speech control using NLP so that the player can play the game by concentrating only on visuals rather on I/O. These things at last increases the realism of the game. Hence, this is the reason for using NLP in games.  We ...

Word embeddings and an application in SMT

We all are aware of (not so) recent advancements in word representation, such as Word2Vec, GloVe etc. for various NLP tasks. Let's try to dig a little deeper of how they work, and why they are so helpful! The basics, what is a Word vector? We need a mathematical way of representing words so as to process them. We call this representation, a word vector. This representation can be as simple as a one-hot encoded vector having the size of the vocabulary.  For ex, if we had 3 words in our vocabulary {man, woman, child}, we can generate word vectors in the following manner Man : {0, 0, 1} Woman : {0, 1, 0} Child : {1, 0, 0} Such an encoding cannot be used to for any meaningful comparisons, other than checking for equality. In vectors such as Word2Vec, a word is represented as a distribution over some dimensions. Each word is assigned some particular weight for each of the dimensions. Picking up the previous example, this time the vectors can be as following (assuming a 2 dime...

Discourse Analysis

NLP makes machine to understand human language but we are facing issues like word ambiguity, sarcastic sentiments analysis and many more. One of the issue is to predict correctly relation between words like " Patrick went to the club on last Friday. He met Richard ." Here, ' He' refers to 'Patrick'. This kind of issue makes Discourse analysis one of the important applications of Natural Language Processing. What is Discourse Analysis ? The word discourse in linguistic terms means language in use. Discourse analysis may be defined as the process of performing text or language analysis, which involves text interpretation and knowing the social interactions. Discourse analysis may involve dealing with morphemes, n-grams, tenses, verbal aspects, page layouts, and so on. It is often used to refer to the analysis of conversations or verbal discourse. It is useful for performing tasks, like A naphora Resolution (AR) , Named Entity Recognition (NE...