Data Science on Social Media.. How to make more sense using NLP?

Utkarsh Singh
3 min readApr 28, 2021

Learn about Data Science through Data Science community. First up, Twitter.

But why read about Data Science on Twitter. Because:

  1. Expands Horizon-Twitter is a platform where engagement is lucid yet brief( 280 characters).

2. Quicker Update and more upbeat- Discussion forums usually have a limited participation. Twitter makes DS a more collaborative, open and relevant medium to stay updated.

So, what to do? Start searching endlessly for twitter profiles and tweets. Heck No!! Use a systematic approach- Scraping and NLP(natural language processing).

This article serves two purposes-

  1. Its a beginners DS project in R to scrape Twitter using Twitter API and rtweet package .(Assuming you have a developer access).
  2. Make sense of the data and find engaging DS Twitter feed without the manual hardwork using NLP.

Simple, basic search- ‘#DataScience’ is done and around 15k tweets are scraped. Due to restrictions in the developer’s access, tweets for past 7 days could only be extracted. Lets jump to results-

How often do people tweet about Data Science using the hashtags?

So the engagement is decent, frequent but not overwhelming.

What is the DS community tweeting about?

Machine learning, 100 days of code, big data, IoT, deep learning, java script and artificial intelligence.. are the most commonly used words.

The NRC lexicon gives sentiment spread across 8 emotions as shown in the figure-

Ok, so there is the sentiment of fear for artificial intelligence, robots, NLP(natural language processing) and python :P . Thankfully the source code used here is in R , hence no need to fear it. Lets look at the frequency of each of these emotions.

So, a disclaimer here, that not all words in the NRC lexicon makes true sense, but is an interesting food for thought, as to why it is, what it is depicting.

How about who are the most visible and liked/ frequently tweeting handles on DataScience? Here is a word cloud of their twitter handle names. You can follow them to get the most interactive feed on DS on Twitter.

How about give me a short overall context of all the tweets on Data Science in the form of important words used with one another-

Bonus visualization -Trigrams i.e. combination of three words used most frequently.

Let me know in the comment section, if the source code interests you guys.

Here is a drive link to the cleaner data file scraped using the code-

https://drive.google.com/file/d/1ZFJMTwGQIfVUI1t5JuncTyezKrPyDO5a/view?usp=sharing

--

--

Utkarsh Singh

Public Policy Consulting| Connecting the Dots| Data Science for Life| Python| R