This is a code written in R to show the text extraction from a whatsapp chat and representing them in a form of word cloud. Basically the idea is to mine all the words/texts used in the group and find the most frequent words.
Following are the steps involved in the process,
- Extract the data in the required form.
- Data cleaning by getting rid of emoticons, punctuations, numbers, stopwords.
- Stemming of the data.
- Creating a vector corpus of the data
- Making document term matrix of the same.
- Create a word cloud of the text extracted with respective to its ocurring frequency.
R library which were used are - dplyr, tm, wordcloud, RColorBrewer.