We all use online platforms like facebook, twitter or youtube for sharing our views on a post, video or a person. We do this by commenting on their post or by sharing information on their wall or page. Many of you might have noted that some person makes use of abusive language in the comment which may lead the person to depression. There are some instances in the past which led to severe problems. One of the cases was when Daughter of Robin William, Zelda, was bullied on Twitter and Instagram for posting a memoriam for her late father, which caused her to delete all her online accounts. There was one more significant highlight in 2013 when Facebook hosted the pages which were hateful against women.
Many types of research have been done in this field using different technologies like NLP, Machine Learning, Artificial Intelligence, Web Science etc.
What makes it a difficult task?
Some factors make it difficult to analyse like:
- One cannot use the simple word to spot Abusive language. Obfuscations such as ni9 9er etc., make it difficult to detect.
- There may also be a possibility that some insults that are offensive in one group may be acceptable by some other group.
- Some offensive sentence might be grammatically correct and fluent, which can be detected automatically but noisy sentences are difficult to process.
- Consider a sentence: "Chuck Hagel will shield Americans from the desert animals bickering. Let them kill each other, good riddance!", In this, the second sentence has more bitterness. And to decide whether the sentence is abusive or not we may need to account more than one sentence to classify.
- Apart from all these, the most problematic task is to separate sarcasm from abusive language as one might use the same tone of sarcastic comment as used by the person for offensive language.
Different approaches to detect Abusive language:
Some of the techniques that can be used:
- One of the fundamental approaches is to use offensive or Abusive keywords and Bag-of words for detecting the sentences. But the problem with these is that the number of false positive will be more.
- The accuracy of the previous technique can be increased by using n-gram models to train the model.
- The different feature can be extracted, like lexical, syntactic or semantic, from the sentences and use the word embedding model for classification of posts or sentences.
- One can also train different models like Naive Bayes classifier, Support Vector Machine (SVM) etc. with pre-classified posts or tweets for detecting abusive sentences.
- Multi-dimensional analysis (MDA) can also be done to analyse different posts and sentences. But the problem with MDA is that it performs better with longer text. So, one can combine more than one tweets or posts to form larger text.
- One approach is to use convolutional neural network (CNN), model. Further, the person can use the character-level convolutional network, word-level convolutional network or a combination of both.
To perform Abusive Language Detection with better performance and accuracy, one can combine different techniques discussed as all of these methods have some positive points as well as some negative points for classification or identification of tweets or posts.
References:
- I. Clarke, J. Grieve. 2017. Dimensions of Abusive Language on Twitter. In Proceedings of the First Workshop on Abusive Language Online, pages 1–10, Vancouver, Canada, July 30 - August 4, 2017.
- Xiang, B. Fan, L. Wang, J. Hong, and C. Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, New York, NY, USA, CIKM ’12, pages 1980–1984.
- Y. Chen, Y. Zhou, S. Zhu, and H. Xu. 2012. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust. IEEE Computer Society, Washington, DC, USA, SOCIALCOM-PASSAT ’12, pages 71–80.
- C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang. 2016. Abusive Language Detection in Online User Content. In Proceedings of the 25th International Conference on World Wide Web, WWW '16, Pages 145-153.
- J. Park, P. Fung. 2017. One-step and Two-step Classification for Abusive Language Detection on Twitter. In Proceedings of the First Workshop on Abusive Language Online, pages 41–45, Vancouver, Canada, July 30 - August 4, 2017.
Comments
Post a Comment