ICode9

精准搜索请尝试: 精确搜索
首页 > 数据库> 文章详细

基于SQL和pandas的欧洲足球数据分析【附图详解】

2021-07-24 11:59:51  阅读:251  来源: 互联网

标签:loc name season SQL id 附图 team pandas goal


我又又又搬运了以前的存货来分享啦!

项目来源:

https://www.kaggle.com/hugomathien/soccer

项目简介:

利用SQL和pandas对11支球队在7个赛季中的25979场比赛数据,分析各球队在每个赛季的主客场得分情况以及联赛积分情况。

PS: 项目本身提供的是sqlite文件,为了更贴近现实中的工作场景,我将其中的表都输出成csv文件,再利用Navicat for MySQL 导入到MYSQL当中。(文末附资源链接)
本次项目同样在jupyter上运行。

导入模块:

import pymysql
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] # 设置汉字字体,优先使用黑体
plt.rcParams['font.size'] = 12  # 设置字体大小
plt.rcParams['axes.unicode_minus'] = False   # 设置正常显示负号

数据库中的表格:

conn = pymysql.connect(
    host = 'localhost',
    user = 'root',
    password = '',
    db = 'data',
    port = 3306
)
df = pd.read_sql("show tables",conn)
df

在这里插入图片描述
国家名单:

countries = pd.read_sql("""SELECT *
                        FROM Country;""", conn)
countries

在这里插入图片描述
将country表和league表连接起来:

leagues = pd.read_sql("""SELECT *
                        FROM League
                        JOIN Country ON Country.id = League.country_id;""", conn)
leagues

在这里插入图片描述
按队名排序中的前十名:

teams = pd.read_sql("""SELECT *
                        FROM Team
                        ORDER BY team_long_name
                        LIMIT 10;""", conn)
teams

在这里插入图片描述
输出spain主客队比赛的信息:

detailed_matches = pd.read_sql("""SELECT Match.id, 
                                        Country.name AS country_name, 
                                        League.name AS league_name, 
                                        season, 
                                        stage, 
                                        date,
                                        HT.team_long_name AS  home_team,
                                        AT.team_long_name AS away_team,
                                        home_team_goal, 
                                        away_team_goal                                        
                                FROM `match`
                                JOIN Country on Country.id = Match.country_id
                                JOIN League on League.id = Match.league_id
                                LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
                                LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
                                WHERE country.name = 'Spain'
                                ORDER by date
                                LIMIT 10;""", conn)
detailed_matches

在这里插入图片描述
统计各个国家的各个联赛的各个赛季中stage大于10的球队主客队平均得分,主客队平均分之和与差,以及总和:

leages_by_season = pd.read_sql("""SELECT Country.name AS country_name, 
                                        League.name AS league_name, 
                                        season,
                                        count(distinct stage) AS number_of_stages,
                                        count(distinct HT.team_long_name) AS number_of_teams,
                                        avg(home_team_goal) AS avg_home_team_scors, 
                                        avg(away_team_goal) AS avg_away_team_goals, 
                                        avg(home_team_goal-away_team_goal) AS avg_goal_dif, 
                                        avg(home_team_goal+away_team_goal) AS avg_goals, 
                                        sum(home_team_goal+away_team_goal) AS total_goals                                       
                                FROM `match`
                                JOIN Country on Country.id = Match.country_id
                                JOIN League on League.id = Match.league_id
                                LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
                                LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
                                WHERE country.name in ('Spain', 'Germany', 'France', 'Italy', 'England')
                                GROUP BY Country.name, League.name, season
                                HAVING count(distinct stage) > 10
                                ORDER BY Country.name, League.name, season DESC
                                ;""", conn)
leages_by_season.head(10)

在这里插入图片描述
列出几支球队的各赛季平均得分趋势图:

df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())

df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goals'])
df.loc[:,'Spain']   = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goals'])
df.loc[:,'France']   = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goals'])
df.loc[:,'Italy']   = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goals'])
df.loc[:,'England']   = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goals'])

df.plot(figsize=(12,5),title='各赛季平均得分趋势图')

在这里插入图片描述
列出几支球队的各赛季平均主客队分差趋势图:

df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())

df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goal_dif'])
df.loc[:,'Spain']   = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goal_dif'])
df.loc[:,'France']   = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goal_dif'])
df.loc[:,'Italy']   = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goal_dif'])
df.loc[:,'England']   = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goal_dif'])

df.plot(figsize=(12,5),title='各赛季平均主客队分差趋势图')

在这里插入图片描述
求各赛季各球队的积分(赢球得3分,平局得1分,输球不得分):

team_season_score = pd.read_sql("""SELECT 
		m.season,
    t.team_long_name,
		t.team_short_name,	
        t.team_api_id,
    COUNT(*) AS 'matches_played', 
    SUM(
        CASE 
            WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal>m.away_team_goal) OR
                (t.team_api_id=m.away_team_api_id AND m.home_team_goal<m.away_team_goal) THEN 3
            WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal=m.away_team_goal) OR
                (t.team_api_id=m.away_team_api_id AND m.home_team_goal=m.away_team_goal) THEN 1
            WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal<m.away_team_goal) OR
                (t.team_api_id=m.away_team_api_id AND m.home_team_goal>m.away_team_goal) THEN 0
        END
    ) AS 'points', 
    SUM(IF(t.team_api_id=m.home_team_api_id, m.home_team_goal, m.away_team_goal)) AS 'goal_for', 
    SUM(IF(t.team_api_id=m.home_team_api_id, m.away_team_goal, m.home_team_goal)) AS 'goal_against', 
    SUM(IF(t.team_api_id=m.home_team_api_id, m.home_team_goal, m.away_team_goal))-SUM(IF(t.team_api_id=m.home_team_api_id, m.away_team_goal, m.home_team_goal)) AS 'goal_diff'
FROM Team t JOIN `match` m ON t.team_api_id=m.home_team_api_id OR t.team_api_id=m.away_team_api_id
WHERE t.team_api_id in (10260, 8634, 8650, 9904, 8636)
GROUP BY team_api_id, season
ORDER BY season, points DESC, goal_diff DESC, team_long_name;""", conn)
team_season_score.head(10)

在这里插入图片描述
列出几支球队的各赛季积分趋势图:

df = pd.DataFrame(index=np.sort(team_season_score['season'].unique()), columns=team_season_score['team_short_name'].unique())

df.loc[:,'MUN'] = list(team_season_score.loc[team_season_score['team_short_name']=='MUN','points'])
df.loc[:,'BAR'] = list(team_season_score.loc[team_season_score['team_short_name']=='BAR','points'])
df.loc[:,'LIV'] = list(team_season_score.loc[team_season_score['team_short_name']=='LIV','points'])
df.loc[:,'HAN'] = list(team_season_score.loc[team_season_score['team_short_name']=='HAN','points'])
df.loc[:,'INT'] = list(team_season_score.loc[team_season_score['team_short_name']=='INT','points'])

df.plot(figsize=(12,5),title='各赛季积分趋势图')

在这里插入图片描述
结束语

此次项目的数据提供了不少数据,就连球员的EA SPORT游戏中的数据都有,不过这个项目我主要也是拿来练手,并没有刻意去将全部的数据都拿来分析,大概感兴趣的话也可以自己尝试一下。
我把最初kaggle提供的sqlite文件,后面我自己输出的csv文件和sql文件以及本次完整代码都放到了网盘上,需要的朋友自行下载:
链接:https://pan.baidu.com/s/1CbrdX0q18W9v-PgNyZfRPg
提取码:1024

标签:loc,name,season,SQL,id,附图,team,pandas,goal
来源: https://blog.csdn.net/qq_44186838/article/details/119054928

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有