Utilizing WhatsApp Chat Cleaner R Package
In this article, we will utilize the WhatsApp Chat Cleaner Package explained previously. You will learn how to use the WhatsApp package functions with the real chat data.
Before we move on, you need to download all the emoji data sets from Github.
Lets start, first we need to download devtools in order to install the Emoji Package
from Github.install.packages("devtools")
library("devtools")
#downloading WhatsApp Packageinstall_github("MJFND/R/WhatsappChatCleanerPackage/WhatsAppChatCleaner") library("WhatsAppChatCleaner")
The basic function in the library is to install and load all the packages required by the WhatsApp package.
install_load_packages()
Now, lets clean the WhatsApp chat data.
Note: In order to use this function, your date and time format (see your chat data) must be in m/d/y, H:M
format, your format could be different based on your timezone.
clean_data <- clean_whatsapp_chat("whatsapp_friends.txt")
We can now remove the media records, this is optional but it will save processing time in future.
clean_media_nomedia <- media_remove(clean_data)
At this point, you can start doing analyses on chat but in order to work with emojis you have to consider the remaining functions.
In order to load emojis from their datasets we will use emoji_loader
function.
whatsapp_emoji_all <- emoji_loader("whatsapp_emoji_all.csv")
whatsapp_emoji_custom <- emoji_loader("whatsapp_emoji_custom.csv")
whatsapp_emoji_human <- emoji_loader("whatsapp_emoji_human.csv")
We know WhatsApp human (body/hand/face) emoji come in six different color tones, and if we want to consider all colors as same for each emoji we can run the following function:
whatsapp_emoji_human <-emoji_human_color_ignore(whatsapp_emoji_human)
Now, we need to replace all emojis with the names, the reason is because the emojis in R are converted into symbols and are uninterpretable, so its better to replace the emojis with their respective names. We have to run for each of the datasets but with updated dataframe.
clean_data_without_media_emoji_replaced <-
emoji_replacer(whatsapp_emoji_all, clean_media_nomedia)
#Now passing the updated clean data so we can do the work on the remaining emojis
clean_data_without_media_emoji_replaced <-
emoji_replacer(whatsapp_emoji_custom, clean_data_without_media_emoji_replaced)
clean_data_without_media_emoji_replaced <-
emoji_replacer(whatsapp_emoji_human, clean_data_without_media_emoji_replaced)
Now comes an important part where we need to count each emoji occurrence in each message and then store it in their respective.
clean_data_without_media_emoji_replaced <- make_emoji_count_column(whatsapp_emoji_all,clean_data_without_media_emoji_replaced)
clean_data_without_media_emoji_replaced <- make_emoji_count_column(whatsapp_emoji_human,clean_data_without_media_emoji_replaced)
The last step is to remove all emojis that were not used, the reason is to reduce the memory usage which will ultimately increase the
speed.clean_data_without_media_emoji_replaced <- remove_unused_emoji(clean_data_without_media_emoji_replaced)
If you want any kind of help just write ?function_name
in the console, we have tried to document the package properly. If you want to learn more about the package, read the detailed documentation.
Download the full code from here.
In the next article, we will see how to do analysis on emojis. Later we will try to do some sentiment analysis, at the moment I am unable to perform due to memory issue.