mlehotay

developer blog and assorted data science projects

Analysis of World of Warcraft PvP Leaderboards


This is a draft version of the post but I am publishing it now because the bootcamp is ending. I will keep working on it until it is done.


Note to self: don't forget to check final version against project 1 rubric and report template

If you combine your problem statement, executive summary, data dictionary, and conclusions/recommendations, you have an amazing README.md file that quickly aligns your audience to the contents of your project. Don't forget to cite your data sources!

Some recommendations on plotting: * Plots have titles * Plots have axis labels * Plots have appropriate tick labels * All text is legible in a plot * Plots demonstrate meaningful and valid relationships * Plots are interpreted to aid understanding


Contents

Introduction

**problem statement**
**executive summary**

data acquisition

connect to db, list tables
SQL query: battlegrounds & players
**data dictionary**
**data sources**

measures of best player

rating
rank
win ratio

rating vs rank
win ratio vs rank
num matches vs win ratio
wins vs num matches
ratings vs rank grouped by leaderboard

histograms, populations

rank histogram
ilvl histogram
achievement histogram

player class bar chart
player class bar chart stacked factions

ratings histogram for all leaderboards

exploring correlations

pair plot
heat map

ilvl vs rank
ilvl vs rating
achievement vs rank
rating histogram
win ratio bar chart by faction

Groupings

ilvl vs rank grouped by leaderboard
ilvl vs rank grouped by player class
win ratio vs rank grouped by player class
win ratio vs rank grouped by faction

mean rating bar chart by class

mean rank bar chart by class for all leaderboards

Descriptive and Inferential Statistics

measures of central tendency, spread, and shape/skewness
For each variable in your data, summarize the underlying distributions
Be sure to back up these summaries with statistics.

plot and interpret boxplots

summary, conclusions, etc

best class to play?
best PvP type to play? (2v2 3v3 battlegrounds)
recommendations for climbing leaderboards
does the hoarde receive preferential treatment from blizzard or not?

I recently scraped the PvP leaderboards from the World of Warcraft website and wrote them out to a SQLite database. Let's plot some charts of the data and see if anything interesting turns up.

There are three leaderboards: the 2v2 Arena, the 3v3 Arena, and the 10x10 Battleground. Each leaderboard lists the top 1000 players by rating. (I think these are Elo ratings.) Anyway, I saved the leaderboards to a SQL database as three separate tables. I also scraped a minimal amount of data from the profile pages of each of the characters on the leaderboards and saved the profile data to a fourth table.

import sqlite3
import altair as alt
import pandas as pd
query = '''
SELECT *
FROM sqlite_master
WHERE type='table'
'''
con = sqlite3.connect('data/wow.db')
pd.read_sql(query, con)
type name tbl_name rootpage sql
0 table arena_2v2 arena_2v2 2 CREATE TABLE "arena_2v2" (\n"rank" INTEGER,\n ...
1 table arena_3v3 arena_3v3 32 CREATE TABLE "arena_3v3" (\n"rank" INTEGER,\n ...
2 table battlegrounds battlegrounds 62 CREATE TABLE "battlegrounds" (\n"rank" INTEGER...
3 table players players 93 CREATE TABLE "players" (\n"name" TEXT,\n "tit...
query = '''
    SELECT name, rank, rating, wins, losses, achievement, ilvl
    FROM battlegrounds
    LEFT JOIN players
    ON battlegrounds.url = players.URL
'''
con = sqlite3.connect('data/wow.db')
df = pd.read_sql(query, con)
con.close()
df
name rank rating wins losses achievement ilvl
0 Crdefender 1 2163 81 4 18325.0 461.0
1 Lifeswaplol 1 2163 80 3 9070.0 464.0
2 Wolf 3 2153 77 4 18205.0 466.0
3 Intricate 4 2135 72 4 10565.0 462.0
4 Jøkes 5 2125 69 2 15570.0 462.0
... ... ... ... ... ... ... ...
995 None 995 1443 9 14 NaN NaN
996 None 995 1443 10 12 NaN NaN
997 None 995 1443 8 3 NaN NaN
998 None 995 1443 12 14 NaN NaN
999 None 1000 1442 15 12 NaN NaN

1000 rows × 7 columns

df['num_matches'] =  df['wins'] + df['losses']
df['win_ratio'] = df['wins'] / df['num_matches']
alt.Chart(df).mark_point().encode(x='rank', y='rating')
alt.Chart(df).mark_point().encode(x='rank', y='ilvl')
alt.Chart(df).mark_point().encode(alt.X('rating:Q', scale=alt.Scale(zero=False)),
                                  alt.Y('ilvl:Q', scale=alt.Scale(zero=False)))
alt.Chart(df).mark_point().encode(x='rank', y='achievement')
alt.Chart(df).mark_point().encode(x='rank', y='win_ratio')
alt.Chart(df).mark_point().encode(alt.X('win_ratio:Q', scale=alt.Scale(zero=False)), y='num_matches')
alt.Chart(df).mark_point().encode(alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)), x='num_matches')
alt.Chart(df).mark_point().encode(x='num_matches', y='wins')
alt.Chart(df).mark_bar().encode(alt.X("rating:Q", bin=True), y='count()')
alt.Chart(df).mark_bar().encode(alt.X("rank:Q", bin=True), y='count()')
alt.Chart(df).mark_bar().encode(alt.X("ilvl:Q", bin=True), y='count()')
alt.Chart(df).mark_bar().encode(alt.X("achievement:Q", bin=True), y='count()')
alt.Chart(df).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative', scale=alt.Scale(zero=False)),
    alt.Y(alt.repeat("row"), type='quantitative', scale=alt.Scale(zero=False))
).properties(
    width=100,
    height=100
).repeat(
    row=['rank', 'rating', 'win_ratio', 'ilvl', 'achievement'],
    column=['rank', 'rating', 'win_ratio', 'ilvl', 'achievement']
)
query = '''
SELECT board, rank, rating, name, title, realm, class, details,
    faction, wins, losses, achievement, ilvl, players.url
FROM (
    SELECT *, '2v2 arenas' as board
    FROM arena_2v2
    UNION
    SELECT *, '3v3 arenas' as board
    FROM arena_3v3
    UNION
    SELECT *, 'battlegrounds' as board
    FROM battlegrounds
) leaderboards
JOIN players
WHERE players.url = leaderboards.url
'''
con = sqlite3.connect('data/wow.db')
df_boards = pd.read_sql(query, con)
con.close()
df_boards
board rank rating name title realm class details faction wins losses achievement ilvl url
0 2v2 arenas 4 2514 Niarb Notorious Gladiator Laughing Skull Paladin 120 Human Holy Paladin ALLIANCE 98 42 13215.0 469.0 https://worldofwarcraft.com/en-us/character/us...
1 2v2 arenas 2 2591 Thugonomiczz Wrathful Gladiator Stormrage Warlock 120 Dwarf Destruction Warlock ALLIANCE 95 9 15410.0 473.0 https://worldofwarcraft.com/en-us/character/us...
2 2v2 arenas 15 2411 Nessper Notorious Gladiator Bleeding Hollow Rogue 120 Blood Elf Assassination Rogue HORDE 145 61 19680.0 469.0 https://worldofwarcraft.com/en-us/character/us...
3 2v2 arenas 33 2372 Tickix None Bleeding Hollow Priest 120 Undead Discipline Priest HORDE 119 60 11430.0 470.0 https://worldofwarcraft.com/en-us/character/us...
4 3v3 arenas 229 2058 Tickix None Bleeding Hollow Priest 120 Undead Discipline Priest HORDE 67 39 11430.0 470.0 https://worldofwarcraft.com/en-us/character/us...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1969 battlegrounds 626 1538 Landsharkzx Grand Marshal Bleeding Hollow Druid 120 Night Elf Guardian Druid ALLIANCE 10 5 12000.0 448.0 https://worldofwarcraft.com/en-us/character/us...
1970 battlegrounds 503 1568 Kelzaru Arena Master Emerald Dream Druid 120 Zandalari Troll Balance Druid HORDE 15 11 12565.0 455.0 https://worldofwarcraft.com/en-us/character/us...
1971 battlegrounds 634 1536 Miamigodxo Firelord Bleeding Hollow Rogue 120 Night Elf Assassination Rogue ALLIANCE 22 24 10805.0 448.0 https://worldofwarcraft.com/en-us/character/us...
1972 battlegrounds 422 1598 Abrastal Highlord Gorgonnash Paladin 120 Human Holy Paladin ALLIANCE 9 6 20055.0 463.0 https://worldofwarcraft.com/en-us/character/us...
1973 battlegrounds 422 1598 Intdruid the Fabulous Kel'Thuzad Druid 120 Night Elf Guardian Druid ALLIANCE 9 6 19620.0 445.0 https://worldofwarcraft.com/en-us/character/us...

1974 rows × 14 columns

alt.Chart(df_boards).mark_point().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
                                  alt.Y('rating:Q', scale=alt.Scale(zero=False)),
                                 color='board')
alt.Chart(df_boards).mark_point(opacity=0.75).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
                                  alt.Y('ilvl:Q', scale=alt.Scale(zero=False)),
                                 color='board')
alt.Chart(df_boards).mark_point(opacity=0.75, clip=True).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
                                  alt.Y('ilvl:Q', scale=alt.Scale(domain=(420, 480))),
                                 color='class')
df_boards['num_matches'] =  df_boards['wins'] + df_boards['losses']
df_boards['win_ratio'] = df_boards['wins'] / df_boards['num_matches']
alt.Chart(df_boards).mark_circle().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
                                  alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)), color='class')
alt.Chart(df_boards).mark_circle().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
                                  alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)),
                                 color='faction')
alt.Chart(df_boards).mark_bar().encode(x='class', y='count()')
alt.Chart(df_boards).mark_bar().encode(
    x='class',
    y='count()',
    color='faction'
)
# how to get these unstacked? column='faction' makes two plots
alt.Chart(df_boards).mark_bar().encode(
    x='faction',
    y='win_ratio',
)
bar = alt.Chart(df_boards).mark_bar().encode(
    alt.Y('mean(rating)', scale=alt.Scale(zero=False)),
    x='class'
)
rule = alt.Chart(df_boards).mark_rule(color='red').encode(
    y='mean(rating)'
)
(bar + rule).properties(width=500)
df_boards['rank2'] = 1000 - df_boards['rank']
alt.Chart(df_boards).mark_bar().encode(
    alt.X('class'),
    alt.Y('mean(rank2)', scale=alt.Scale(zero=False)),
    alt.Color('board'),
    alt.Column('board')
)
alt.Chart(df_boards).transform_fold(
    ['2v2 arenas', '3v3 arenas', 'battlegrounds'],
    as_=['Leaderboard', '# Characters']
).mark_area(
    opacity=0.5,
    interpolate='step'
).encode(
    alt.X('rating:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('board:N')
)