This is a simple program for extracting data from Twitter using R. Once data is extracted many kinds of analysis can be performed. However, I’ll quickly demonstrate getting the data in place with a few powerful R libraries, and doing some basic exploration thereafter.
Register & Create a Twitter Application
In order to get access into the database of Twitter, we need to first create an application on https://apps.twitter.com/
Once done registering and creating the applcation, look at the values of api key, secret and token
Insert these values in the R environment as follows:
api_key <- “ ”
api_secret <- “ ”
access_token <- “ ”
access_token_secret <- “ ”
Initialize the environment
We will be using two libraries – twitteR & httr.
twitteR will help us in extracting and manipulating the tweets from Twitter. Whereas httr is for OAuth configuration. OAuth is explained briefly below.
Also, knitr is used for creating some decent quality tables suited to HTML.
library(twitteR) library (httr) library (knitr)
Setup Twitter OAuth
We now connect to the OAuth with setup_twitter_oauth function twitteR package. Library httr will be used to authenticate our connection with the Twitter API called oauth. ‘Oauth’ is an authenticating protocal used by Twitter so that external applications can access it’s database. More info on Oauth can be found here – https://oauth.net/
##  "Using direct authentication"
We are now ready to fetch some Tweets. We will collect the last 200 Tweets with the hashtag of ‘#DataScience’.
searchTwitter is a function from the twitterR library, and makes it extremeyly simple to fetch data. There are many other parameters which could be added including dates, location, geocode, language and so on.
tweets = searchTwitter('#DataScience', n=200, lang="en")
The data structure of the fetched tweets is in the form of ‘lists’. We will now have to apply some transformation to convert it to a data frame for analysis.
df <- do.call("rbind", lapply(tweets, as.data.frame))
do.call is a base R function and constructs and executes a function along with a list of arguments in our case which is our list of tweets.
# Print Table in HTML suitable formal #kable(head(df, 2),format = "markdown")
The dataframe headings are a bit off when I’m trying to output the HTML.But it should be clean on R Studio.
The dimensions of the data frame are:
##  200 16
The most favourited tweet is:
|45||Here’s how a decision tree splits the data https://t.co/fDQfPPfAi1 #MachineLearning #ArtificialIntelligence #DataScience||8||fasih_khatib|
We can see, although not very pretty, is that this tweet has a favorite count of 8
The tweet says – Here’s how a decision tree splits the data https://t.co/fDQfPPfAi1 #MachineLearning #ArtificialIntelligence #DataScience
Let’s fetch some info about this particular user – fasih_khatib
dset2 <- df[ which.max(df[,3] ) , c(1,3,11)] userInfo<-lookupUsers(dset2[,3]) ##Convert Info to a dataframe userDF <- t(twListToDF(userInfo)) kable(userDF,format = "markdown")
|description||Java. Groovy. JVM.|
So all feasible information about the user is fetched quite easily. The packge and the API do make this quite an awesome platform for Twitter analysis. There are a few limitations on the data which can be fetched, but still not bad.
We can create a bar plot of each user based on their number of tweets with the ‘DataScience’ hashtag.
counts = table(df$screenName) barplot(counts)
Limit the data set to show only those who tweeted more than 2 times in the sample. And also make the visualization a little prettier.
cc=subset(counts,counts>2) #please remove col = cols, as I haven't posted that chunk of code for colouring barplot(cc,las=2,cex.names =0.8, col = cols, beside = FALSE, width =c(0.1))
Now, we get an idea of the number of users tweeting most often with the hastag – ‘#DataScience’.
I will now quickly try to draw a world map with pointers to the approximate location of the tweets.
We will need the following libraries – ggplot2, maps, dismo
library(ggplot2) library (maps) library (dismo)
Fetch data again of the users using the lookupUsers function
# Batch lookup of user info userInfo <- lookupUsers(df[,11]) # Convert to a nice dF userFrame <- twListToDF(userInfo)
Now we build the world map!
locatedUsers <- !is.na(userFrame$location) # Use API to guess lat lon locations <- geocode(userFrame$location[locatedUsers])
long <-locations$longitude lat <- locations$latitude # Add world map worldMap <- map_data("world") zp1 <- ggplot(worldMap) # Draw map zp2 <- zp1 + geom_path(aes(x = long, y = lat, group = group), colour = gray(2/3), lwd = 1/3) # Add points indicating users zp3 <- zp2 + geom_point(data = locations, aes(x = longitude, y = latitude), colour = "RED", alpha = 1/2, size = 1) zp4 <- zp3 + coord_equal() + theme_minimal() print(zp4)
We have demonstrated data fetching along with some basic exploration for Twitter. Detailed analysis will be produced in a future post.