- Fri 21 February 2020
- Data Science
- Michael Lehotay
- #data-analysis, #visualization, #SQL
This is a draft version of the post but I am publishing it now because the bootcamp is ending. I will keep working on it until it is done.
Note to self: don't forget to check final version against project 1 rubric and report template
If you combine your problem statement, executive summary, data dictionary, and conclusions/recommendations, you have an amazing README.md file that quickly aligns your audience to the contents of your project. Don't forget to cite your data sources!
Some recommendations on plotting: * Plots have titles * Plots have axis labels * Plots have appropriate tick labels * All text is legible in a plot * Plots demonstrate meaningful and valid relationships * Plots are interpreted to aid understanding
Contents
Introduction
**problem statement**
**executive summary**
data acquisition
connect to db, list tables
SQL query: battlegrounds & players
**data dictionary**
**data sources**
measures of best player
rating
rank
win ratio
rating vs rank
win ratio vs rank
num matches vs win ratio
wins vs num matches
ratings vs rank grouped by leaderboard
histograms, populations
rank histogram
ilvl histogram
achievement histogram
player class bar chart
player class bar chart stacked factions
ratings histogram for all leaderboards
exploring correlations
pair plot
heat map
ilvl vs rank
ilvl vs rating
achievement vs rank
rating histogram
win ratio bar chart by faction
Groupings
ilvl vs rank grouped by leaderboard
ilvl vs rank grouped by player class
win ratio vs rank grouped by player class
win ratio vs rank grouped by faction
mean rating bar chart by class
mean rank bar chart by class for all leaderboards
Descriptive and Inferential Statistics
measures of central tendency, spread, and shape/skewness
For each variable in your data, summarize the underlying distributions
Be sure to back up these summaries with statistics.
plot and interpret boxplots
summary, conclusions, etc
best class to play?
best PvP type to play? (2v2 3v3 battlegrounds)
recommendations for climbing leaderboards
does the hoarde receive preferential treatment from blizzard or not?
I recently scraped the PvP leaderboards from the World of Warcraft website and wrote them out to a SQLite database. Let's plot some charts of the data and see if anything interesting turns up.
There are three leaderboards: the 2v2 Arena, the 3v3 Arena, and the 10x10 Battleground. Each leaderboard lists the top 1000 players by rating. (I think these are Elo ratings.) Anyway, I saved the leaderboards to a SQL database as three separate tables. I also scraped a minimal amount of data from the profile pages of each of the characters on the leaderboards and saved the profile data to a fourth table.
import sqlite3
import altair as alt
import pandas as pd
query = '''
SELECT *
FROM sqlite_master
WHERE type='table'
'''
con = sqlite3.connect('data/wow.db')
pd.read_sql(query, con)
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | arena_2v2 | arena_2v2 | 2 | CREATE TABLE "arena_2v2" (\n"rank" INTEGER,\n ... |
1 | table | arena_3v3 | arena_3v3 | 32 | CREATE TABLE "arena_3v3" (\n"rank" INTEGER,\n ... |
2 | table | battlegrounds | battlegrounds | 62 | CREATE TABLE "battlegrounds" (\n"rank" INTEGER... |
3 | table | players | players | 93 | CREATE TABLE "players" (\n"name" TEXT,\n "tit... |
query = '''
SELECT name, rank, rating, wins, losses, achievement, ilvl
FROM battlegrounds
LEFT JOIN players
ON battlegrounds.url = players.URL
'''
con = sqlite3.connect('data/wow.db')
df = pd.read_sql(query, con)
con.close()
df
name | rank | rating | wins | losses | achievement | ilvl | |
---|---|---|---|---|---|---|---|
0 | Crdefender | 1 | 2163 | 81 | 4 | 18325.0 | 461.0 |
1 | Lifeswaplol | 1 | 2163 | 80 | 3 | 9070.0 | 464.0 |
2 | Wolf | 3 | 2153 | 77 | 4 | 18205.0 | 466.0 |
3 | Intricate | 4 | 2135 | 72 | 4 | 10565.0 | 462.0 |
4 | Jøkes | 5 | 2125 | 69 | 2 | 15570.0 | 462.0 |
... | ... | ... | ... | ... | ... | ... | ... |
995 | None | 995 | 1443 | 9 | 14 | NaN | NaN |
996 | None | 995 | 1443 | 10 | 12 | NaN | NaN |
997 | None | 995 | 1443 | 8 | 3 | NaN | NaN |
998 | None | 995 | 1443 | 12 | 14 | NaN | NaN |
999 | None | 1000 | 1442 | 15 | 12 | NaN | NaN |
1000 rows × 7 columns
df['num_matches'] = df['wins'] + df['losses']
df['win_ratio'] = df['wins'] / df['num_matches']
alt.Chart(df).mark_point().encode(x='rank', y='rating')
alt.Chart(df).mark_point().encode(x='rank', y='ilvl')
alt.Chart(df).mark_point().encode(alt.X('rating:Q', scale=alt.Scale(zero=False)),
alt.Y('ilvl:Q', scale=alt.Scale(zero=False)))
alt.Chart(df).mark_point().encode(x='rank', y='achievement')
alt.Chart(df).mark_point().encode(x='rank', y='win_ratio')
alt.Chart(df).mark_point().encode(alt.X('win_ratio:Q', scale=alt.Scale(zero=False)), y='num_matches')
alt.Chart(df).mark_point().encode(alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)), x='num_matches')
alt.Chart(df).mark_point().encode(x='num_matches', y='wins')
alt.Chart(df).mark_bar().encode(alt.X("rating:Q", bin=True), y='count()')
alt.Chart(df).mark_bar().encode(alt.X("rank:Q", bin=True), y='count()')
alt.Chart(df).mark_bar().encode(alt.X("ilvl:Q", bin=True), y='count()')
alt.Chart(df).mark_bar().encode(alt.X("achievement:Q", bin=True), y='count()')
alt.Chart(df).mark_circle().encode(
alt.X(alt.repeat("column"), type='quantitative', scale=alt.Scale(zero=False)),
alt.Y(alt.repeat("row"), type='quantitative', scale=alt.Scale(zero=False))
).properties(
width=100,
height=100
).repeat(
row=['rank', 'rating', 'win_ratio', 'ilvl', 'achievement'],
column=['rank', 'rating', 'win_ratio', 'ilvl', 'achievement']
)
query = '''
SELECT board, rank, rating, name, title, realm, class, details,
faction, wins, losses, achievement, ilvl, players.url
FROM (
SELECT *, '2v2 arenas' as board
FROM arena_2v2
UNION
SELECT *, '3v3 arenas' as board
FROM arena_3v3
UNION
SELECT *, 'battlegrounds' as board
FROM battlegrounds
) leaderboards
JOIN players
WHERE players.url = leaderboards.url
'''
con = sqlite3.connect('data/wow.db')
df_boards = pd.read_sql(query, con)
con.close()
df_boards
board | rank | rating | name | title | realm | class | details | faction | wins | losses | achievement | ilvl | url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2v2 arenas | 4 | 2514 | Niarb | Notorious Gladiator | Laughing Skull | Paladin | 120 Human Holy Paladin | ALLIANCE | 98 | 42 | 13215.0 | 469.0 | https://worldofwarcraft.com/en-us/character/us... |
1 | 2v2 arenas | 2 | 2591 | Thugonomiczz | Wrathful Gladiator | Stormrage | Warlock | 120 Dwarf Destruction Warlock | ALLIANCE | 95 | 9 | 15410.0 | 473.0 | https://worldofwarcraft.com/en-us/character/us... |
2 | 2v2 arenas | 15 | 2411 | Nessper | Notorious Gladiator | Bleeding Hollow | Rogue | 120 Blood Elf Assassination Rogue | HORDE | 145 | 61 | 19680.0 | 469.0 | https://worldofwarcraft.com/en-us/character/us... |
3 | 2v2 arenas | 33 | 2372 | Tickix | None | Bleeding Hollow | Priest | 120 Undead Discipline Priest | HORDE | 119 | 60 | 11430.0 | 470.0 | https://worldofwarcraft.com/en-us/character/us... |
4 | 3v3 arenas | 229 | 2058 | Tickix | None | Bleeding Hollow | Priest | 120 Undead Discipline Priest | HORDE | 67 | 39 | 11430.0 | 470.0 | https://worldofwarcraft.com/en-us/character/us... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1969 | battlegrounds | 626 | 1538 | Landsharkzx | Grand Marshal | Bleeding Hollow | Druid | 120 Night Elf Guardian Druid | ALLIANCE | 10 | 5 | 12000.0 | 448.0 | https://worldofwarcraft.com/en-us/character/us... |
1970 | battlegrounds | 503 | 1568 | Kelzaru | Arena Master | Emerald Dream | Druid | 120 Zandalari Troll Balance Druid | HORDE | 15 | 11 | 12565.0 | 455.0 | https://worldofwarcraft.com/en-us/character/us... |
1971 | battlegrounds | 634 | 1536 | Miamigodxo | Firelord | Bleeding Hollow | Rogue | 120 Night Elf Assassination Rogue | ALLIANCE | 22 | 24 | 10805.0 | 448.0 | https://worldofwarcraft.com/en-us/character/us... |
1972 | battlegrounds | 422 | 1598 | Abrastal | Highlord | Gorgonnash | Paladin | 120 Human Holy Paladin | ALLIANCE | 9 | 6 | 20055.0 | 463.0 | https://worldofwarcraft.com/en-us/character/us... |
1973 | battlegrounds | 422 | 1598 | Intdruid | the Fabulous | Kel'Thuzad | Druid | 120 Night Elf Guardian Druid | ALLIANCE | 9 | 6 | 19620.0 | 445.0 | https://worldofwarcraft.com/en-us/character/us... |
1974 rows × 14 columns
alt.Chart(df_boards).mark_point().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
alt.Y('rating:Q', scale=alt.Scale(zero=False)),
color='board')
alt.Chart(df_boards).mark_point(opacity=0.75).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
alt.Y('ilvl:Q', scale=alt.Scale(zero=False)),
color='board')
alt.Chart(df_boards).mark_point(opacity=0.75, clip=True).encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
alt.Y('ilvl:Q', scale=alt.Scale(domain=(420, 480))),
color='class')
df_boards['num_matches'] = df_boards['wins'] + df_boards['losses']
df_boards['win_ratio'] = df_boards['wins'] / df_boards['num_matches']
alt.Chart(df_boards).mark_circle().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)), color='class')
alt.Chart(df_boards).mark_circle().encode(alt.X('rank:Q', scale=alt.Scale(zero=False)),
alt.Y('win_ratio:Q', scale=alt.Scale(zero=False)),
color='faction')
alt.Chart(df_boards).mark_bar().encode(x='class', y='count()')
alt.Chart(df_boards).mark_bar().encode(
x='class',
y='count()',
color='faction'
)
# how to get these unstacked? column='faction' makes two plots
alt.Chart(df_boards).mark_bar().encode(
x='faction',
y='win_ratio',
)
bar = alt.Chart(df_boards).mark_bar().encode(
alt.Y('mean(rating)', scale=alt.Scale(zero=False)),
x='class'
)
rule = alt.Chart(df_boards).mark_rule(color='red').encode(
y='mean(rating)'
)
(bar + rule).properties(width=500)
df_boards['rank2'] = 1000 - df_boards['rank']
alt.Chart(df_boards).mark_bar().encode(
alt.X('class'),
alt.Y('mean(rank2)', scale=alt.Scale(zero=False)),
alt.Color('board'),
alt.Column('board')
)
alt.Chart(df_boards).transform_fold(
['2v2 arenas', '3v3 arenas', 'battlegrounds'],
as_=['Leaderboard', '# Characters']
).mark_area(
opacity=0.5,
interpolate='step'
).encode(
alt.X('rating:Q', bin=alt.Bin(maxbins=100)),
alt.Y('count()', stack=None),
alt.Color('board:N')
)