标签:8.0 4.0 categorical Python Science cars carb Merc gear
Chapter 5 - Basic Math and Statistics
Segment 4 - Summarizing categorical data using pandas
import numpy as np
import pandas as pd
The basics
address = "~/Data/mtcars.csv"
cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp','hp','drat','wt','qsec','vs','am','gear','carb']
cars.index = cars.car_names
cars.head(15)
car_names | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
car_names | ||||||||||||
Mazda RX4 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Duster 360 | Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
Merc 240D | Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
Merc 230 | Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
Merc 280 | Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
Merc 280C | Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
Merc 450SE | Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
Merc 450SL | Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
Merc 450SLC | Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
Cadillac Fleetwood | Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
carb = cars.carb
carb.value_counts()
4 10
2 10
1 7
3 3
8 1
6 1
Name: carb, dtype: int64
cars_cat = cars[['cyl','vs','am','gear','carb']]
cars_cat.head()
cyl | vs | am | gear | carb | |
---|---|---|---|---|---|
car_names | |||||
Mazda RX4 | 6 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 6 | 0 | 1 | 4 | 4 |
Datsun 710 | 4 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 6 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 8 | 0 | 0 | 3 | 2 |
gears_group = cars_cat.groupby('gear')
gears_group.describe()
cyl | vs | ... | am | carb | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | count | mean | ... | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | |
gear | |||||||||||||||||||||
3 | 15.0 | 7.466667 | 1.187234 | 4.0 | 8.0 | 8.0 | 8.0 | 8.0 | 15.0 | 0.200000 | ... | 0.0 | 0.0 | 15.0 | 2.666667 | 1.175139 | 1.0 | 2.0 | 3.0 | 4.0 | 4.0 |
4 | 12.0 | 4.666667 | 0.984732 | 4.0 | 4.0 | 4.0 | 6.0 | 6.0 | 12.0 | 0.833333 | ... | 1.0 | 1.0 | 12.0 | 2.333333 | 1.302678 | 1.0 | 1.0 | 2.0 | 4.0 | 4.0 |
5 | 5.0 | 6.000000 | 2.000000 | 4.0 | 4.0 | 6.0 | 8.0 | 8.0 | 5.0 | 0.200000 | ... | 1.0 | 1.0 | 5.0 | 4.400000 | 2.607681 | 2.0 | 2.0 | 4.0 | 6.0 | 8.0 |
3 rows × 32 columns
Transforming variables to categorical data type
cars['group'] = pd.Series(cars.gear,dtype="category")
cars['group'].dtypes
CategoricalDtype(categories=[3, 4, 5], ordered=False)
cars['group'].value_counts()
3 15
4 12
5 5
Name: group, dtype: int64
Describing categorical data with crosstabs
pd.crosstab(cars['am'],cars['gear'])
gear | 3 | 4 | 5 |
---|---|---|---|
am | |||
0 | 15 | 4 | 0 |
1 | 0 | 8 | 5 |
标签:8.0,4.0,categorical,Python,Science,cars,carb,Merc,gear 来源: https://www.cnblogs.com/keepmoving1113/p/14284969.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。