Scraping and Visualizing Twitter Streaming API
Web Scrapping is one of the important way to extract data from the web, an html webpage can be scrapped. Similarly, JSON or XML web service response can also be scrapped. So far there are lot of stuff written on how to scrape Twitter API but none of them have shown how to visualize the live stream API. This article will show how to use JavaScript Library HighCharts
to plot a live graph.
Before accessing Twitter API, you will need to have a Twitter Application account so that you can use authorization mechanism like OAuth.
I took a PHP code from here, which is an easy implementation of the Twitter Stream API. Just by modifying asper my need, got the response that I needed.
Once you have downloaded the twitter stream file. You can now add the following to another PHP file, this will authorize using the stream library. Replace all the keys and tokens with yours, check the Twitter Application Page.
<?php
require 'twitter_stream_server.php';
$t = new ctwitter_stream();
// Fill the details from your Twitter App Account
$t->login('Enter Consumer Key',
'Enter Consumer Secret Key',
'Enter Access Token',
'Enter Access Token Secret');
// Fetching from URL
$query = $_GET['query'];
$t->start(array($query));
?>
You would need a local server, I used XAMP
, once you make the server live, you will be able to see the response in your browser with the following link:
localhost:port/path_to/twitter_stream.php?query=google
You might notice, that you didn’t get the whole JSON from twitter, it is because of the modification I made in the twitter_stream_server.php
, I just picked up the three values.
$data = json_decode($json, true);
// extracting only relevant data for purpose, change as per the need
$myObj->name = $data['user']['name'];
$myObj->text = $data['text'];
$myObj->created_at = $data['created_at'];
// encoding again to print
$json = json_encode($myObj);
For visualization, I used JavaScript to fetch the data from the sockets and continuously write it and plot it on the screen.
Since we are working with Sockets we will need to pull data immediately when there is an update, so Event Stream in JavaScript is very useful, works in all major browsers.
This code will simple get the data from the page.
varsource = new EventSource('twitter_stream.php?query='+query);
The rest code is simple, a bit logical (comments are enough for explanation).
For graphs, HighCharts is a great JS library for this purpose, their Stock Charts are easy to use. They automatically updates when an event is occurred, we are checking for event updates after every 1 second.
// Create the chart
Highcharts.stockChart('container', {
chart: {
events: {
load: function () {
// set up the updating of the chart each second
var series = this.series[0];
setInterval(function () {
//if()
var x = (new Date(d)).getTime(), // current time
y = count
series.addPoint([x, y], true, true);
}, 1000); } } },
series: [{
name: 'No. of Tweets per given time',
data: (function () {
// generate an array of random data
var data = [],
time = (new Date(d)).getTime(),
i;
for (i = -999; i <= 0; i += 1) {
data.push([
time+i*1000,
count
]); }
return data;
}()) }] });
The series above first fill the plot with initial data (Date vs Count), once we start receiving the data, it automatically calls chart > event > load
function and update with new data.
There are several problems related to plotting the Twitter Stream API, like the number of tweets sometimes becomes too large to handle which will result in a little loss of data, especially when the query is too common.
Below is the sample video of how the result will actually be.
This currently does not store the data, I would recommend you to modify the code so that it can store the data in CSV or database so that we will be able to see the tweets’ behavior from the past.
Download all the files from Github. If you want to make modification to the current code, you can fork it.