The Easiest Way to Webscrape Tweets with Python

Before quarantine, I started a Tableau Public dashboard that contained an analysis of tweets from twitter personality @caucasianjames. Did I finish this dashboard? Nope- but I did figure out the laziest way to gather all of the tweets I needed based on a criteria! To get all of @caucasianjames’ tweets, I used a package called GetOldTweets3, which I consider to be the easiest way to webscrape tweets for a Python beginner.

Python and I were good friends back in college: I took CS classes with Python and occasionally did some tutoring for those classes. However, I haven’t had a chance to use it since college, so this blog post will be from the perspective of a might-as-well-be-a-beginner. In other words, Python beginners welcome here.

Before we move forward, I want to give a shout out to the post by Martin Beck that directed me toward GetOldTweets3. This post does a great job of outlining the differences between using Tweepy and GetOldTweets3, as well as how to do each. I used it as the base of my learning, but the below only expands on what he already has written. In case you were wondering, I specifically chose to use GetOldTweets3 over Tweepy for my lazy script because this way, I don’t need to deal with OAuth with the Twitter API.

So let’s get into the how-to: how to scrape tweets from a user using GetOldTweets3.

Installing GetOldTweets3

To install the package using pip, open your command line and enter the following:

pip install GetOldTweets3

This is done in the command line and not in the Python shell directly. If you attempt to do this in Python, the commands will not be recognized. 

Note: Pip is the preferred program installer for Python. It will automatically come installed starting in Python version 3.4 +. For more information about installing Python modules and pip, see here.

When the install is finished, you will get a notification that it was successful.

Creating the Python File and Importing the Package

To create a new Python file, open up shell and navigate File > New File.

To use GetOldTweets3, we’ll have to import it at the top of our file. Later on, we’ll also use the csv package to write the tweets we scrape into a csv file, so I will import that package as well.

Importing the packages is easy: simply write keyword import followed by the package name. You can assign an alias to the package by using keyword as followed by the alias you would like to refer to the package as. In this case, I am going to create an alias for GetOldTweets3 because it is a little long to type out every time I want to use something from the package.
import GetOldTweets3 as got
import csv
import packages we need python

Establishing our Variables

Next, we are going to create the criteria for the tweets we will be scraping. I’m going to establish my variables. In this case, I will be setting up the following:

  • twitterHandle: Twitter handle of the account
  • tweetCount: Number of tweets to grab

Declaring variables in Python is very simple. We’ll write the following:

twitterHandle = 'caucasianjames'
tweetCount = 200

This means that from now on when I type “twitterHandle” it will refer to “caucasianjames” and when I type “tweetCount”, 200. I’m wrapping the value for twitterHandle in quotations, above, because it is a string. I’m starting with only 200 tweets as it is a relatively small sample size to use while we test.

Creating the Tweet Criteria

We are going to use the variables we just created to set the criteria with the set subroutines setUsername() and setMaxTweets().

GetOldTweets3 can also get me the following as far as criteria goes: (from pypi.org)

  • setSince (str. "yyyy-mm-dd"): A lower bound date (UTC) to restrict search.
  • setUntil (str. "yyyy-mm-dd"): An upper bound date (not included) to restrict search.
  • setQuerySearch (str): A query text to be matched.
  • setTopTweets (bool): If True only the Top Tweets will be retrieved.
  • setNear(str): A reference location area from where tweets were generated.
  • setWithin (str): A distance radius from "near" location (e.g. 15mi).
  • setNear(str): A reference location area from where tweets were generated.

Create a list of tweets

Now that we have our criteria, we can use it to get a list of all of the tweets meeting that criteria using got.manager.TweetManager.getTweets(TweetCriteria). This gets me everything in the Tweet object class. I’m only interested in certain fields for my data source, so I’m creating another list that exports all of the fields I want from each tweet.

Here are all of my options as far as fields go: 

  • id (str)
  • permalink (str)
  • username (str)
  • to (str)
  • text (str)
  • date (datetime) in UTC
  • retweets (int)
  • favorites (int)
  • mentions (str)
  • hashtags (str)
  • geo (str)

Write tweets to a CSV for use in Tableau!

Finally, we get to output! We’re going to use the csv package we imported at the beginning. We’ll do this by opening the file (mine is currently blank), giving it a name, and creating a writer for it. Then, I can use a for loop to iterate through my list, writing a new row in the csv file for each item in our user_tweets list.

It’s as easy as that! I hope this helps anyone looking to make a dashboard about tweets. I hope yours actually gets finished, unlike mine! Tweet me the link or any of your questions here at @VisualAidan or email me at aidan.bramel@tessellationconsulting.com.

Like what you are reading?

Get articles sent to you when they are posted.
Comments are closed.