下面练习题的数据集,给出的网址不一定可用,这个地址数据集亲测可用。如果数据集失效了,可自行网上寻找。https://github.com/daacheng/PythonBasic/tree/master/dataset
代码如下:
import pandas as pd
这个地址数据集不一定能用,可能需要梯子。
代码如下:
chipo = pd.read_csv('chipotle.csv', sep=',')
代码如下:
# 题目是让你求单价超过10美金的产品
# 整理 item_price 列并将其转换为浮点数
prices = [float(value[1:-1]) for value in chipo.item_price]
# 用整理过的价格重新分配列
chipo.item_price = prices
# 删除 item_name 和quantity中的重复项
'''
drop_duplicates(self, subset=None, keep="first", inplace=False)
subset(子集 ):考虑用于标识重复行的列标签或标签序列。 默认情况下,所有列均用于查找重复的行。
keep :允许的值为{'first','last',False},默认为'first'。 如果为“ first”,则删除除第一个行以外的重复行。
如果为“ last”,则删除除最后一行以外的重复行。 如果为False,则删除所有重复的行。
inplace :如果为True,则更改源DataFrame并返回None。 默认情况下,源DataFrame保持不变,并返回一个新的DataFrame实例。
'''
chipo_filtered = chipo.drop_duplicates(['item_name', 'quantity'])
# 仅选择数量等于 1 的产品
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]
# item_name.nunique()返回每列不同值的个数
chipo_one_prod[chipo_one_prod['item_price']>10].item_name.nunique()
输出结果如下:
12
代码如下:
# 输出每个商品的单价,只输出item_name和item_price
# delete the duplicates in item_name and quantity
chipo_filtered = chipo.drop_duplicates(['item_name','quantity'])
# chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]
# select only the products with quantity equals to 1
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]
# select only the item_name and item_price columns
price_per_item = chipo_one_prod[['item_name', 'item_price']]
print(price_per_item)
# sort the values from the most to less expensive
# price_per_item.sort_values(by = "item_price", ascending = False).head(20)
输出结果如下:
item_name item_price
0 Chips and Fresh Tomato Salsa 2.39
1 Izze 3.39
2 Nantucket Nectar 3.39
3 Chips and Tomatillo-Green Chili Salsa 2.39
5 Chicken Bowl 10.98
6 Side of Chips 1.69
7 Steak Burrito 11.75
8 Steak Soft Tacos 9.25
10 Chips and Guacamole 4.45
11 Chicken Crispy Tacos 8.75
12 Chicken Soft Tacos 8.75
16 Chicken Burrito 8.49
21 Barbacoa Burrito 8.99
27 Carnitas Burrito 8.99
28 Canned Soda 1.09
33 Carnitas Bowl 8.99
34 Bottled Water 1.09
38 Chips and Tomatillo Green Chili Salsa 2.95
39 Barbacoa Bowl 11.75
40 Chips 2.15
44 Chicken Salad Bowl 8.75
54 Steak Bowl 8.99
56 Barbacoa Soft Tacos 9.25
57 Veggie Burrito 11.25
62 Veggie Bowl 11.25
92 Steak Crispy Tacos 9.25
111 Chips and Tomatillo Red Chili Salsa 2.95
168 Barbacoa Crispy Tacos 11.75
186 Veggie Salad Bowl 11.25
191 Chips and Roasted Chili-Corn Salsa 2.39
233 Chips and Roasted Chili Corn Salsa 2.95
237 Carnitas Soft Tacos 9.25
250 Chicken Salad 10.98
263 Canned Soft Drink 1.25
298 6 Pack Soft Drink 6.49
300 Chips and Tomatillo-Red Chili Salsa 2.39
510 Burrito 7.40
520 Crispy Tacos 7.40
554 Carnitas Crispy Tacos 9.25
606 Steak Salad Bowl 11.89
664 Steak Salad 8.99
673 Bowl 7.40
674 Chips and Mild Fresh Tomato Salsa 3.00
738 Veggie Soft Tacos 11.25
1132 Carnitas Salad Bowl 11.89
1229 Barbacoa Salad Bowl 11.89
1414 Salad 7.40
1653 Veggie Crispy Tacos 8.49
1694 Veggie Salad 8.49
3750 Carnitas Salad 8.99
代码如下:
chipo.sort_values(by='item_name')
# chipo.item_name.sort_values()
输出结果如下:
Unnamed: 0 | order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|---|
3389 | 3389 | 1360 | 2 | 6 Pack Soft Drink | [Diet Coke] | 12.98 |
341 | 341 | 148 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1849 | 1849 | 749 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
1860 | 1860 | 754 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
2713 | 2713 | 1076 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3422 | 3422 | 1373 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
553 | 553 | 230 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1916 | 1916 | 774 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1922 | 1922 | 776 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
1937 | 1937 | 784 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
3836 | 3836 | 1537 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
298 | 298 | 129 | 1 | 6 Pack Soft Drink | [Sprite] | 6.49 |
1976 | 1976 | 798 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1167 | 1167 | 481 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3875 | 3875 | 1554 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1124 | 1124 | 465 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3886 | 3886 | 1558 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
2108 | 2108 | 849 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3010 | 3010 | 1196 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
4535 | 4535 | 1803 | 1 | 6 Pack Soft Drink | [Lemonade] | 6.49 |
4169 | 4169 | 1664 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
4174 | 4174 | 1666 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
4527 | 4527 | 1800 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
4522 | 4522 | 1798 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
3806 | 3806 | 1525 | 1 | 6 Pack Soft Drink | [Sprite] | 6.49 |
2389 | 2389 | 949 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3132 | 3132 | 1248 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
3141 | 3141 | 1253 | 1 | 6 Pack Soft Drink | [Lemonade] | 6.49 |
639 | 639 | 264 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1026 | 1026 | 422 | 1 | 6 Pack Soft Drink | [Sprite] | 6.49 |
... | ... | ... | ... | ... | ... | ... |
2996 | 2996 | 1192 | 1 | Veggie Salad | [Roasted Chili Corn Salsa (Medium), [Black Bea... | 8.49 |
3163 | 3163 | 1263 | 1 | Veggie Salad | [[Fresh Tomato Salsa (Mild), Roasted Chili Cor... | 8.49 |
4084 | 4084 | 1635 | 1 | Veggie Salad | [[Fresh Tomato Salsa (Mild), Roasted Chili Cor... | 8.49 |
1694 | 1694 | 686 | 1 | Veggie Salad | [[Fresh Tomato Salsa (Mild), Roasted Chili Cor... | 8.49 |
2756 | 2756 | 1094 | 1 | Veggie Salad | [[Tomatillo-Green Chili Salsa (Medium), Roaste... | 8.49 |
4201 | 4201 | 1677 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Black... | 11.25 |
1884 | 1884 | 760 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
455 | 455 | 195 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
3223 | 3223 | 1289 | 1 | Veggie Salad Bowl | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
2223 | 2223 | 896 | 1 | Veggie Salad Bowl | [Roasted Chili Corn Salsa, Fajita Vegetables] | 8.75 |
2269 | 2269 | 913 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 8.75 |
4541 | 4541 | 1805 | 1 | Veggie Salad Bowl | [Tomatillo Green Chili Salsa, [Fajita Vegetabl... | 8.75 |
3293 | 3293 | 1321 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Rice, Black Beans, Chees... | 8.75 |
186 | 186 | 83 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
960 | 960 | 394 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | 8.75 |
1316 | 1316 | 536 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 8.75 |
2156 | 2156 | 869 | 1 | Veggie Salad Bowl | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
4261 | 4261 | 1700 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
295 | 295 | 128 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | 11.25 |
4573 | 4573 | 1818 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | 8.75 |
2683 | 2683 | 1066 | 1 | Veggie Salad Bowl | [Roasted Chili Corn Salsa, [Fajita Vegetables,... | 8.75 |
496 | 496 | 207 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Rice, Lettuce, Guacamole... | 11.25 |
4109 | 4109 | 1646 | 1 | Veggie Salad Bowl | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
738 | 738 | 304 | 1 | Veggie Soft Tacos | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
3889 | 3889 | 1559 | 2 | Veggie Soft Tacos | [Fresh Tomato Salsa (Mild), [Black Beans, Rice... | 16.98 |
2384 | 2384 | 948 | 1 | Veggie Soft Tacos | [Roasted Chili Corn Salsa, [Fajita Vegetables,... | 8.75 |
781 | 781 | 322 | 1 | Veggie Soft Tacos | [Fresh Tomato Salsa, [Black Beans, Cheese, Sou... | 8.75 |
2851 | 2851 | 1132 | 1 | Veggie Soft Tacos | [Roasted Chili Corn Salsa (Medium), [Black Bea... | 8.49 |
1699 | 1699 | 688 | 1 | Veggie Soft Tacos | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
1395 | 1395 | 567 | 1 | Veggie Soft Tacos | [Fresh Tomato Salsa (Mild), [Pinto Beans, Rice... | 8.49 |
4622 rows × 6 columns
代码如下:
chipo.sort_values(by='item_price', ascending=False).head(1)
输出结果如下:
Unnamed: 0 | order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|---|
3598 | 3598 | 1443 | 15 | Chips and Fresh Tomato Salsa | NaN | 44.25 |
代码如下:
chipo_salad = chipo[chipo.item_name == 'Veggie Salad Bowl']
len(chipo_salad)
# 或者chipo_salad.shape[0]
输出结果如下:
18
代码如下:
# chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]
chipo_soda = chipo[(chipo.item_name == 'Canned Soda') & (chipo.quantity>1)]
len(chipo_soda)
# 或者print(chipo_soda.shape[0])
输出结果如下:
20
This time we are going to pull data directly from the internet.
代码如下:
import pandas as pd
代码如下:
euro12 = pd.read_csv('Euro_2012_stats_TEAM.csv', sep=',')
euro12
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | Subs on | Subs off | Players Used | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Croatia | 4 | 13 | 12 | 51.9% | 16.0% | 32 | 0 | 0 | 0 | ... | 13 | 81.3% | 41 | 62 | 2 | 9 | 0 | 9 | 9 | 16 |
1 | Czech Republic | 4 | 13 | 18 | 41.9% | 12.9% | 39 | 0 | 0 | 0 | ... | 9 | 60.1% | 53 | 73 | 8 | 7 | 0 | 11 | 11 | 19 |
2 | Denmark | 4 | 10 | 10 | 50.0% | 20.0% | 27 | 1 | 0 | 0 | ... | 10 | 66.7% | 25 | 38 | 8 | 4 | 0 | 7 | 7 | 15 |
3 | England | 5 | 11 | 18 | 50.0% | 17.2% | 40 | 0 | 0 | 0 | ... | 22 | 88.1% | 43 | 45 | 6 | 5 | 0 | 11 | 11 | 16 |
4 | France | 3 | 22 | 24 | 37.9% | 6.5% | 65 | 1 | 0 | 0 | ... | 6 | 54.6% | 36 | 51 | 5 | 6 | 0 | 11 | 11 | 19 |
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 | 15 | 15 | 17 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 | 1 | 1 | 1 | ... | 13 | 65.1% | 67 | 48 | 12 | 9 | 1 | 12 | 12 | 20 |
7 | Italy | 6 | 34 | 45 | 43.0% | 7.5% | 110 | 2 | 0 | 0 | ... | 20 | 74.1% | 101 | 89 | 16 | 16 | 0 | 18 | 18 | 19 |
8 | Netherlands | 2 | 12 | 36 | 25.0% | 4.1% | 60 | 2 | 0 | 0 | ... | 12 | 70.6% | 35 | 30 | 3 | 5 | 0 | 7 | 7 | 15 |
9 | Poland | 2 | 15 | 23 | 39.4% | 5.2% | 48 | 0 | 0 | 0 | ... | 6 | 66.7% | 48 | 56 | 3 | 7 | 1 | 7 | 7 | 17 |
10 | Portugal | 6 | 22 | 42 | 34.3% | 9.3% | 82 | 6 | 0 | 0 | ... | 10 | 71.5% | 73 | 90 | 10 | 12 | 0 | 14 | 14 | 16 |
11 | Republic of Ireland | 1 | 7 | 12 | 36.8% | 5.2% | 28 | 0 | 0 | 0 | ... | 17 | 65.4% | 43 | 51 | 11 | 6 | 1 | 10 | 10 | 17 |
12 | Russia | 5 | 9 | 31 | 22.5% | 12.5% | 59 | 2 | 0 | 0 | ... | 10 | 77.0% | 34 | 43 | 4 | 6 | 0 | 7 | 7 | 16 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 | 0 | 1 | 0 | ... | 15 | 93.8% | 102 | 83 | 19 | 11 | 0 | 17 | 17 | 18 |
14 | Sweden | 5 | 17 | 19 | 47.2% | 13.8% | 39 | 3 | 0 | 0 | ... | 8 | 61.6% | 35 | 51 | 7 | 7 | 0 | 9 | 9 | 18 |
15 | Ukraine | 2 | 7 | 26 | 21.2% | 6.0% | 38 | 0 | 0 | 0 | ... | 13 | 76.5% | 48 | 31 | 4 | 5 | 0 | 9 | 9 | 18 |
16 rows × 35 columns
代码如下:
euro12.Goals
# 或者euro12['Goals']
输出结果如下:
0 4
1 4
2 4
3 5
4 3
5 10
6 5
7 6
8 2
9 2
10 6
11 1
12 5
13 12
14 5
15 2
Name: Goals, dtype: int64
代码如下:
euro12.shape[0]
# 或者len(euro12.Team)
输出结果如下:
16
代码如下:
# euro12.columns.shape[0]
euro12.info()
输出结果如下:
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):
Team 16 non-null object
Goals 16 non-null int64
Shots on target 16 non-null int64
Shots off target 16 non-null int64
Shooting Accuracy 16 non-null object
% Goals-to-shots 16 non-null object
Total shots (inc. Blocked) 16 non-null int64
Hit Woodwork 16 non-null int64
Penalty goals 16 non-null int64
Penalties not scored 16 non-null int64
Headed goals 16 non-null int64
Passes 16 non-null int64
Passes completed 16 non-null int64
Passing Accuracy 16 non-null object
Touches 16 non-null int64
Crosses 16 non-null int64
Dribbles 16 non-null int64
Corners Taken 16 non-null int64
Tackles 16 non-null int64
Clearances 16 non-null int64
Interceptions 16 non-null int64
Clearances off line 15 non-null float64
Clean Sheets 16 non-null int64
Blocks 16 non-null int64
Goals conceded 16 non-null int64
Saves made 16 non-null int64
Saves-to-shots ratio 16 non-null object
Fouls Won 16 non-null int64
Fouls Conceded 16 non-null int64
Offsides 16 non-null int64
Yellow Cards 16 non-null int64
Red Cards 16 non-null int64
Subs on 16 non-null int64
Subs off 16 non-null int64
Players Used 16 non-null int64
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB
代码如下:
discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
discipline
输出结果如下:
Team | Yellow Cards | Red Cards | |
---|---|---|---|
0 | Croatia | 9 | 0 |
1 | Czech Republic | 7 | 0 |
2 | Denmark | 4 | 0 |
3 | England | 5 | 0 |
4 | France | 6 | 0 |
5 | Germany | 4 | 0 |
6 | Greece | 9 | 1 |
7 | Italy | 16 | 0 |
8 | Netherlands | 5 | 0 |
9 | Poland | 7 | 1 |
10 | Portugal | 12 | 0 |
11 | Republic of Ireland | 6 | 1 |
12 | Russia | 6 | 0 |
13 | Spain | 11 | 0 |
14 | Sweden | 7 | 0 |
15 | Ukraine | 5 | 0 |
代码如下:
# 通过红牌数和黄牌数对每个队伍排序
discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending=False)
输出结果如下:
Team | Yellow Cards | Red Cards | |
---|---|---|---|
6 | Greece | 9 | 1 |
9 | Poland | 7 | 1 |
11 | Republic of Ireland | 6 | 1 |
7 | Italy | 16 | 0 |
10 | Portugal | 12 | 0 |
13 | Spain | 11 | 0 |
0 | Croatia | 9 | 0 |
1 | Czech Republic | 7 | 0 |
14 | Sweden | 7 | 0 |
4 | France | 6 | 0 |
12 | Russia | 6 | 0 |
3 | England | 5 | 0 |
8 | Netherlands | 5 | 0 |
15 | Ukraine | 5 | 0 |
2 | Denmark | 4 | 0 |
5 | Germany | 4 | 0 |
代码如下:
# 计算每个队伍得到的黄牌数量平均值
round(discipline['Yellow Cards'].mean())
输出结果如下:
7
代码如下:
# 筛选出goals大于6的队伍
euro12[euro12['Goals']>6]
# euro12[euro12.Goals>6]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | Subs on | Subs off | Players Used | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 | 15 | 15 | 17 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 | 0 | 1 | 0 | ... | 15 | 93.8% | 102 | 83 | 19 | 11 | 0 | 17 | 17 | 18 |
2 rows × 35 columns
代码如下:
# 选择G开头的队伍
euro12[euro12.Team.str.startswith('G')]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | Subs on | Subs off | Players Used | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 | 15 | 15 | 17 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 | 1 | 1 | 1 | ... | 13 | 65.1% | 67 | 48 | 12 | 9 | 1 | 12 | 12 | 20 |
2 rows × 35 columns
代码如下:
# 选择前七列
euro12.iloc[:, 0:7]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | |
---|---|---|---|---|---|---|---|
0 | Croatia | 4 | 13 | 12 | 51.9% | 16.0% | 32 |
1 | Czech Republic | 4 | 13 | 18 | 41.9% | 12.9% | 39 |
2 | Denmark | 4 | 10 | 10 | 50.0% | 20.0% | 27 |
3 | England | 5 | 11 | 18 | 50.0% | 17.2% | 40 |
4 | France | 3 | 22 | 24 | 37.9% | 6.5% | 65 |
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 |
7 | Italy | 6 | 34 | 45 | 43.0% | 7.5% | 110 |
8 | Netherlands | 2 | 12 | 36 | 25.0% | 4.1% | 60 |
9 | Poland | 2 | 15 | 23 | 39.4% | 5.2% | 48 |
10 | Portugal | 6 | 22 | 42 | 34.3% | 9.3% | 82 |
11 | Republic of Ireland | 1 | 7 | 12 | 36.8% | 5.2% | 28 |
12 | Russia | 5 | 9 | 31 | 22.5% | 12.5% | 59 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 |
14 | Sweden | 5 | 17 | 19 | 47.2% | 13.8% | 39 |
15 | Ukraine | 2 | 7 | 26 | 21.2% | 6.0% | 38 |
代码如下:
# 选择除了后三列的所有列
euro12.iloc[:, 0:-3]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Clean Sheets | Blocks | Goals conceded | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Croatia | 4 | 13 | 12 | 51.9% | 16.0% | 32 | 0 | 0 | 0 | ... | 0 | 10 | 3 | 13 | 81.3% | 41 | 62 | 2 | 9 | 0 |
1 | Czech Republic | 4 | 13 | 18 | 41.9% | 12.9% | 39 | 0 | 0 | 0 | ... | 1 | 10 | 6 | 9 | 60.1% | 53 | 73 | 8 | 7 | 0 |
2 | Denmark | 4 | 10 | 10 | 50.0% | 20.0% | 27 | 1 | 0 | 0 | ... | 1 | 10 | 5 | 10 | 66.7% | 25 | 38 | 8 | 4 | 0 |
3 | England | 5 | 11 | 18 | 50.0% | 17.2% | 40 | 0 | 0 | 0 | ... | 2 | 29 | 3 | 22 | 88.1% | 43 | 45 | 6 | 5 | 0 |
4 | France | 3 | 22 | 24 | 37.9% | 6.5% | 65 | 1 | 0 | 0 | ... | 1 | 7 | 5 | 6 | 54.6% | 36 | 51 | 5 | 6 | 0 |
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 1 | 11 | 6 | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 | 1 | 1 | 1 | ... | 1 | 23 | 7 | 13 | 65.1% | 67 | 48 | 12 | 9 | 1 |
7 | Italy | 6 | 34 | 45 | 43.0% | 7.5% | 110 | 2 | 0 | 0 | ... | 2 | 18 | 7 | 20 | 74.1% | 101 | 89 | 16 | 16 | 0 |
8 | Netherlands | 2 | 12 | 36 | 25.0% | 4.1% | 60 | 2 | 0 | 0 | ... | 0 | 9 | 5 | 12 | 70.6% | 35 | 30 | 3 | 5 | 0 |
9 | Poland | 2 | 15 | 23 | 39.4% | 5.2% | 48 | 0 | 0 | 0 | ... | 0 | 8 | 3 | 6 | 66.7% | 48 | 56 | 3 | 7 | 1 |
10 | Portugal | 6 | 22 | 42 | 34.3% | 9.3% | 82 | 6 | 0 | 0 | ... | 2 | 11 | 4 | 10 | 71.5% | 73 | 90 | 10 | 12 | 0 |
11 | Republic of Ireland | 1 | 7 | 12 | 36.8% | 5.2% | 28 | 0 | 0 | 0 | ... | 0 | 23 | 9 | 17 | 65.4% | 43 | 51 | 11 | 6 | 1 |
12 | Russia | 5 | 9 | 31 | 22.5% | 12.5% | 59 | 2 | 0 | 0 | ... | 0 | 8 | 3 | 10 | 77.0% | 34 | 43 | 4 | 6 | 0 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 | 0 | 1 | 0 | ... | 5 | 8 | 1 | 15 | 93.8% | 102 | 83 | 19 | 11 | 0 |
14 | Sweden | 5 | 17 | 19 | 47.2% | 13.8% | 39 | 3 | 0 | 0 | ... | 1 | 12 | 5 | 8 | 61.6% | 35 | 51 | 7 | 7 | 0 |
15 | Ukraine | 2 | 7 | 26 | 21.2% | 6.0% | 38 | 0 | 0 | 0 | ... | 0 | 4 | 4 | 13 | 76.5% | 48 | 31 | 4 | 5 | 0 |
16 rows × 32 columns
代码如下:
# 只取出三个队伍England, Italy and Russia的Shooting Accuracy
euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team', 'Shooting Accuracy']]
输出结果如下:
Team | Shooting Accuracy | |
---|---|---|
3 | England | 50.0% |
7 | Italy | 43.0% |
12 | Russia | 22.5% |
This exercise was inspired by this page
代码如下:
import pandas as pd
代码如下:
# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}
代码如下:
army = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters', 'origin'])
代码如下:
army = army.set_index('origin')
army
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Arizona | Nighthawks | 1st | 523 | 5 | 1045 | 1 | 1 | 1 | 4 |
California | Nighthawks | 1st | 52 | 42 | 957 | 5 | 2 | 0 | 24 |
Texas | Nighthawks | 2nd | 25 | 2 | 1099 | 62 | 3 | 1 | 31 |
Florida | Nighthawks | 2nd | 616 | 2 | 1400 | 26 | 3 | 1 | 2 |
Maine | Dragoons | 1st | 43 | 4 | 1592 | 73 | 2 | 0 | 3 |
Iowa | Dragoons | 1st | 234 | 7 | 1006 | 37 | 1 | 1 | 4 |
Alaska | Dragoons | 2nd | 523 | 8 | 987 | 949 | 2 | 0 | 24 |
Washington | Dragoons | 2nd | 62 | 3 | 849 | 48 | 3 | 1 | 31 |
Oregon | Scouts | 1st | 62 | 4 | 973 | 48 | 2 | 0 | 2 |
Wyoming | Scouts | 1st | 73 | 7 | 1005 | 435 | 1 | 0 | 3 |
Louisana | Scouts | 2nd | 37 | 8 | 1099 | 63 | 2 | 1 | 2 |
Georgia | Scouts | 2nd | 35 | 9 | 1523 | 345 | 3 | 1 | 3 |
代码如下:
army.veterans
# army['veterans']
输出结果如下:
origin
Arizona 1
California 5
Texas 62
Florida 26
Maine 73
Iowa 37
Alaska 949
Washington 48
Oregon 48
Wyoming 435
Louisana 63
Georgia 345
Name: veterans, dtype: int64
代码如下:
army[['veterans', 'deaths']]
输出结果如下:
veterans | deaths | |
---|---|---|
origin | ||
Arizona | 1 | 523 |
California | 5 | 52 |
Texas | 62 | 25 |
Florida | 26 | 616 |
Maine | 73 | 43 |
Iowa | 37 | 234 |
Alaska | 949 | 523 |
Washington | 48 | 62 |
Oregon | 48 | 62 |
Wyoming | 435 | 73 |
Louisana | 63 | 37 |
Georgia | 345 | 35 |
代码如下:
army.columns
输出结果如下:
Index(['regiment', 'company', 'deaths', 'battles', 'size', 'veterans',
'readiness', 'armored', 'deserters'],
dtype='object')
代码如下:
army.loc[['Maine', 'Alaska'], ['deaths', 'size', 'deserters']]
输出结果如下:
deaths | size | deserters | |
---|---|---|---|
origin | |||
Maine | 43 | 1592 | 3 |
Alaska | 523 | 987 | 24 |
代码如下:
army.iloc[3:7, 3:6]
输出结果如下:
battles | size | veterans | |
---|---|---|---|
origin | |||
Florida | 2 | 1400 | 26 |
Maine | 4 | 1592 | 73 |
Iowa | 7 | 1006 | 37 |
Alaska | 8 | 987 | 949 |
代码如下:
army.iloc[3:]
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Florida | Nighthawks | 2nd | 616 | 2 | 1400 | 26 | 3 | 1 | 2 |
Maine | Dragoons | 1st | 43 | 4 | 1592 | 73 | 2 | 0 | 3 |
Iowa | Dragoons | 1st | 234 | 7 | 1006 | 37 | 1 | 1 | 4 |
Alaska | Dragoons | 2nd | 523 | 8 | 987 | 949 | 2 | 0 | 24 |
Washington | Dragoons | 2nd | 62 | 3 | 849 | 48 | 3 | 1 | 31 |
Oregon | Scouts | 1st | 62 | 4 | 973 | 48 | 2 | 0 | 2 |
Wyoming | Scouts | 1st | 73 | 7 | 1005 | 435 | 1 | 0 | 3 |
Louisana | Scouts | 2nd | 37 | 8 | 1099 | 63 | 2 | 1 | 2 |
Georgia | Scouts | 2nd | 35 | 9 | 1523 | 345 | 3 | 1 | 3 |
代码如下:
# 选择每一行直到第 4 行
army.iloc[:3]
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Arizona | Nighthawks | 1st | 523 | 5 | 1045 | 1 | 1 | 1 | 4 |
California | Nighthawks | 1st | 52 | 42 | 957 | 5 | 2 | 0 | 24 |
Texas | Nighthawks | 2nd | 25 | 2 | 1099 | 62 | 3 | 1 | 31 |
代码如下:
army.iloc[: , 4:7]
输出结果如下:
size | veterans | readiness | |
---|---|---|---|
origin | |||
Arizona | 1045 | 1 | 1 |
California | 957 | 5 | 2 |
Texas | 1099 | 62 | 3 |
Florida | 1400 | 26 | 3 |
Maine | 1592 | 73 | 2 |
Iowa | 1006 | 37 | 1 |
Alaska | 987 | 949 | 2 |
Washington | 849 | 48 | 3 |
Oregon | 973 | 48 | 2 |
Wyoming | 1005 | 435 | 1 |
Louisana | 1099 | 63 | 2 |
Georgia | 1523 | 345 | 3 |
代码如下:
army[army['deaths']>50]
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Arizona | Nighthawks | 1st | 523 | 5 | 1045 | 1 | 1 | 1 | 4 |
California | Nighthawks | 1st | 52 | 42 | 957 | 5 | 2 | 0 | 24 |
Florida | Nighthawks | 2nd | 616 | 2 | 1400 | 26 | 3 | 1 | 2 |
Iowa | Dragoons | 1st | 234 | 7 | 1006 | 37 | 1 | 1 | 4 |
Alaska | Dragoons | 2nd | 523 | 8 | 987 | 949 | 2 | 0 | 24 |
Washington | Dragoons | 2nd | 62 | 3 | 849 | 48 | 3 | 1 | 31 |
Oregon | Scouts | 1st | 62 | 4 | 973 | 48 | 2 | 0 | 2 |
Wyoming | Scouts | 1st | 73 | 7 | 1005 | 435 | 1 | 0 | 3 |
代码如下:
army[(army['deaths']>500) | (army['deaths']<50)]
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Arizona | Nighthawks | 1st | 523 | 5 | 1045 | 1 | 1 | 1 | 4 |
Texas | Nighthawks | 2nd | 25 | 2 | 1099 | 62 | 3 | 1 | 31 |
Florida | Nighthawks | 2nd | 616 | 2 | 1400 | 26 | 3 | 1 | 2 |
Maine | Dragoons | 1st | 43 | 4 | 1592 | 73 | 2 | 0 | 3 |
Alaska | Dragoons | 2nd | 523 | 8 | 987 | 949 | 2 | 0 | 24 |
Louisana | Scouts | 2nd | 37 | 8 | 1099 | 63 | 2 | 1 | 2 |
Georgia | Scouts | 2nd | 35 | 9 | 1523 | 345 | 3 | 1 | 3 |
代码如下:
army[(army.regiment != 'Dragoons')]
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Arizona | Nighthawks | 1st | 523 | 5 | 1045 | 1 | 1 | 1 | 4 |
California | Nighthawks | 1st | 52 | 42 | 957 | 5 | 2 | 0 | 24 |
Texas | Nighthawks | 2nd | 25 | 2 | 1099 | 62 | 3 | 1 | 31 |
Florida | Nighthawks | 2nd | 616 | 2 | 1400 | 26 | 3 | 1 | 2 |
Oregon | Scouts | 1st | 62 | 4 | 973 | 48 | 2 | 0 | 2 |
Wyoming | Scouts | 1st | 73 | 7 | 1005 | 435 | 1 | 0 | 3 |
Louisana | Scouts | 2nd | 37 | 8 | 1099 | 63 | 2 | 1 | 2 |
Georgia | Scouts | 2nd | 35 | 9 | 1523 | 345 | 3 | 1 | 3 |
代码如下:
army.loc[['Texas', 'Arizona']]
输出结果如下:
regiment | company | deaths | battles | size | veterans | readiness | armored | deserters | |
---|---|---|---|---|---|---|---|---|---|
origin | |||||||||
Texas | Nighthawks | 2nd | 25 | 2 | 1099 | 62 | 3 | 1 | 31 |
Arizona | Nighthawks | 1st | 523 | 5 | 1045 | 1 | 1 | 1 | 4 |
代码如下:
army.loc[['Arizona'], ['deaths']]
# OR army.iloc[[0], army.columns.get_loc('deaths')]
输出结果如下:
deaths | |
---|---|
origin | |
Arizona | 523 |
代码如下:
# army.loc['Texas', 'deaths']
# OR army.deaths[2]
# OR
army.iloc[[2], army.columns.get_loc('deaths')]
输出结果如下:
origin
Texas 25
Name: deaths, dtype: int64
tsv与csv文件
TSV ,Tab-separated values ,制表符分隔值。
CSV,Comma-separated values,逗号分隔值。(CSV更为常见)
TSV与CSV的区别:
1)从名称上即可知道,TSV是用制表符(tab,’\t’)作为字段值的分隔符;CSV是用半角逗号(’,’)作为字段值的分隔符;
2)IANA规定的标准TSV格式,字段值之中是不允许出现制表符的。
read_csv()函数与read_table函数用法
本次练习题涉及了很多的iloc和loc操作,可以有遗忘的可以参考前面的博客——pandas真入门(2)
今天的练习题就这么多了,数据分析还得多多练习,这是基础,越到后面还涉及到业务的逻辑就更麻烦了,所以请把基础打牢靠。还有今天粉丝涨的有点猛,本人表示不知所措,但还是感谢各位关注,同各位一起学习进步。哈哈哈!好了,希望各位抓住暑假时间,继续加油学习鸭!