Python and Game of Thrones (Part 3 of 3)

With the final season of Game of Thrones happening soon, I wanted to do something fun. And by fun, I mean writing a 3 part series on some cool things you can do with Python! Because why not? 

Check out Part 1 and Part 2 here.

Data and Python

Gathering data is easy. Interpreting them is difficult. Which is why there’s a huge surge of demand for data scientists who can make sense of this data. And data scientists use languages like R and Python to interpret it.

In this tutorial, we’ll be using the csv module, which will be enough to generate a report. If we were working with a huge dataset, one that’s like 50,000 rows or bigger, we’ll have to tap into the Pandas library.

What we will be doing is downloading a CSV, having Python interpret the data, send a query based on what kind of question we want answered, and then have the answer print out to us.

Python VS basic spreadsheet functions

You might be wondering:

“Why should I use Python when I can easily just use spreadsheet functions like =SUM or =COUNT, or filter out the rows I don’t need manually?”

Like for all the other automation tricks in Part 1 and 2, you can definitely do this manually.

But imagine if you had to generate a new report every day.

For example: I build online courses. And we want a daily report of every student’s progress. How many students started today? How many students are active this week? How many students made it to Module 2? How many students submitted their Module 3 homework? How many students clicked on the completion button on mobile devices?

I can either spend 15 minutes sorting through the data to generate a report for my team. OR write Python code that does it daily.

Other use cases for using code instead of default spreadsheet functions:

  • You might be working with a huge set of data
  • You require multiple slices of filters and segmentation to get your answers.
  • You need to run the same query on a dataset that changes repeatedly

Generating Reports with Game of Thrones

Every year, Winteriscoming.net, a Game of Thrones news site, has their annual March Madness. Visitors would vote for their favorite characters, and winners move up the bracket and compete against another person. After 5 rounds of votes, a winner is declared.

GOT March Madness 2018 via http://winteriscoming.net

Since 2019’s votes are still happening, I grabbed all 6 rounds of 2018’s data and compiled them into a CSV file. To see how the poll looked like on winteriscoming.net, click here.

You can see the whole CSV as a Google Sheets file

I’ve also added some additional background data, so make the reporting a bit more interesting.

Asking Questions

In order to generate a report, we have to ask some questions.

By definition: A report’s primary duty is to ANSWER questions.

So let’s make them up right now.

Based on this dataset… here’s some questions.

  1. Who won popularity vote?
  2. Who won based on averages?
  3. Who is the most popular non-Westeros person? (characters not born in Westeros)

Before answering questions – let’s set up our Python code

To make it easier, I wrote the all the code in my new favorite online IDE, Repl.it. Go ahead and take a look.

import csv

# Import the data
f_csv = open('winter-is-coming-2018.csv')
headers = next(f_csv) 
f_reader = csv.reader(f_csv)
file_data = list(f_reader)

# Make all blank cells into zeroes
# https://stackoverflow.com/questions/2862709/replacing-empty-csv-column-values-with-a-zero
for row in file_data:
  for i, x in enumerate(row):
    if len(x)< 1:
      x = row[i] = 0

Here’s my process with the code.

  1. I imported the csv module.
  2. I imported the csv file, and turned it into a list type called file_data.
    • The way Python reads your file is by first passing the data to a object.
    • I removed the header, since it’ll fudge the data.
    • I then pass the object to a reader, and finally a list.
    • Note: I just realized I did it via the Python 2 way. There’s a cleaner way to do it in Python 3. Oh well. Still works.
  3. In order to sum up any totals, I made all blank cells become 0.
    • This was one of those moments where found a Stack Overflow solution that was better than my original version.

With this set up, we can now loop through the list of data, and answer questions!

Question #1 – Who won the popularity vote?

The Spreadsheet method:

The easiest way would be to add up each cell, using a formula.
Using row 2 as an example, in a blank column, you can write the formula:

=sum(E2:J2)

You can then drag that formula for the other rows.

Then, sort it by total. And you have a winner!


The Python Method:

## Include the code from above

# Push the data to a dictionary
total_score = {}

# Pass each character and their final score into total_score dictionary
for row in file_data:
  total = (int(row[4]) + 
          int(row[5]) + 
          int(row[6]) + 
          int(row[7]) + 
          int(row[8]) + 
          int(row[9]) )

  total_score[row[0]] = total

# Dictionaries aren't sortable by default, we'll have to borrow from these two classes.
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value
from operator import itemgetter
from collections import OrderedDict

sorted_score = OrderedDict(sorted(total_score.items(), key=itemgetter(1) ,reverse=True))

# We get the name of the winner and their score
winner = list(sorted_score)[0] #jon snow
winner_score = sorted_score[winner] #score

print(winner + " with " + str(winner_score))

## RESULT => Jon Snow with 12959

The steps I took are:

  1. The dataset is just one big list. By using a for loop, you can then access each row.
  2. Within that for loop, I added each cell. (emulating the whole “=sum(E:J)” formula)
  3. Since dictionaries aren’t exactly sortable, I had to import two classes to help me sort the dictionary by their values, from high to low.
  4. Finally, I passed the winner, and the winner’s value as text.

To help understand that loop, I drew a diagram.

File data is the WHOLE dataset.
A row is a single row of data (i.e. [Jamie Lannister, RIGHT, 5, Westeros…])
To add their total, I needed to add up row[4] + row[5] + row[6]…

Overall, this process is a bit longer compared to the spreadsheet Method. But wait, it gets easier!

Question 2 – Who won based on averages?

You might have noticed that whoever proceeded farther in the rankings would obviously get more votes.

For example: If Jon Snow got 500 points in Round One and 1000 points in Round Two, he already beats The Mountain who only had 1000 points and never made it past his bracket.

So the next best thing is to sum the total, and then divide it based on how many rounds they participated in.

The Spreadsheet Method:

This is easy. In Column B is how many rounds they participated in. You would divide the rounds by the sum, and presto!

The Python Method:

## OLD CODE FROM QUESTION 1
# Pass each character and their final score into total_score dictionary
for row in file_data:
  total = (int(row[4]) + 
          int(row[5]) + 
          int(row[6]) + 
          int(row[7]) + 
          int(row[8]) + 
          int(row[9]) )

  total_score[row[0]] = total

## NEW CODE
# Pass each character and their final score into total_score dictionary
for row in file_data:
  total = (int(row[4]) + 
          int(row[5]) + 
          int(row[6]) + 
          int(row[7]) + 
          int(row[8]) + 
          int(row[9]) )

  # NEW LINE - divide by how many rounds
  new_total = total / int(row[2])

  total_score[row[0]] = new_total

# RESULT => Davos Seaworth with 2247.6666666666665

Noticed the change? I just added one additional line.

That’s all it took to answer this question! NEXT!

Question 3 – Who is the most popular non-Westeros person?

With first two examples, it’s pretty easy to calculate the total with the default spreadsheet functions. For this question, things are a bit more complicated.

The Spreadsheet Method:

  1. Assuming you already have the sum
  2. You now have to filter it based on if they are Westeros/Other
  3. Then sort by the sum

The Python Method:

## OLD CODE FROM QUESTION 1
# Pass each character and their final score into total_score dictionary
for row in file_data:
  total = (int(row[4]) + 
          int(row[5]) + 
          int(row[6]) + 
          int(row[7]) + 
          int(row[8]) + 
          int(row[9]) )

  # NEW LINE - divide by how many rounds
  new_total = total / int(row[2])

  total_score[row[0]] = new_total

## NEW CODE
# Pass each character and their final score into total_score dictionary
for row in file_data:

  # Add IF-THEN statement
  if (row[3] == 'other'):
    total = (int(row[4]) + 
            int(row[5]) + 
            int(row[6]) + 
            int(row[7]) + 
            int(row[8]) + 
            int(row[9]) )
  else:
    total = 0

  total_score[row[0]] = total

# RESULT => Missandei with 4811

In Question 2, I added one line of code to answer that new question.

In Question 3, I added a IF-ELSE statement. If they are non-Westeros, then count their score. Else, give them a score of 0.

Reviewing this:

While the spreadsheet Method doesn’t seem like a lot of steps, it sure is a lot more clicks. The Python method took a lot longer to set up, but just changing a few lines of code.

Now imagine if the stakeholder asked a dozen more questions, such as how many points did characters who start with L get? Or how many points did everyone in round 3 get who lived in Westeros? Or if it was 640 GoT characters instead of just 64?

As the data set scales, it’ll take longer and longer to process. And that’s where the power of Python comes in.

Conclusion

In Part 1, I covered web automation with the Selenium library. In Part 2, I covered web scraping with the BeautifulSoup library. And in Part 3 (this one), I covered generating reports with the csv module.

Those are 3 really powerful tricks that can do a lot, all with Python.

3 Comments

  1. […] Python and Game of Thrones (Part 3 of 3) […]

  2. […] Python and Game of Thrones (Part 3 of 3) […]

  3. anch on April 15, 2019 at 4:30 am

    Hey, I think your blog might be having browser compatibility issues.
    When I look at your blog in Firefox, it looks fine but when opening in Internet Explorer, it has some overlapping.
    I just wanted to give you a quick heads up! Other
    then that, wonderful blog!

Leave a Comment