第2节 Python中的数据结构
基本数据格式
- 整数 : int(int32或者int64等)
- 小数 : float(float32或者float64等)
- 字符串 : string
- 布尔值 : True False
- 对象 : object
Python中基本数据结构
列表(list) 丨★★★
ls1 = [1,2,3,4,5]
ls2 = ['a','b','c','d','e']
ls3 = ['a','b',[1,2],'d','e']
print(ls1)
print(ls2)
print(ls3)
[1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e']
['a', 'b', [1, 2], 'd', 'e']
在for循环中使用
for i in ls1:
print(i+10)
11
12
13
14
15
tolist()与list()
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randint(10,50,(5,2)),columns=['A','B'])
df1
|
A |
B |
0 |
43 |
21 |
1 |
44 |
13 |
2 |
37 |
15 |
3 |
15 |
28 |
4 |
12 |
10 |
print(df1['A'])
print('----------------------------')
print(df1['A'].tolist())
print('----------------------------')
print(list(df1['A']))
0 43
1 44
2 37
3 15
4 12
Name: A, dtype: int32
----------------------------
[43, 44, 37, 15, 12]
----------------------------
[43, 44, 37, 15, 12]
字典(dict) 丨★★★
- 由键(key) 和 值(values), 组成的成对数据
- 大括号
- 键值之间用冒号
dic1 = {'A':1,'B':2}
dic2 = {'A':'中国','B':'美国'}
dic3 = {'A':[1,2,3],'B':[4,2,5]}
print(dic1)
print(dic2)
print(dic3)
{'A': 1, 'B': 2}
{'A': '中国', 'B': '美国'}
{'A': [1, 2, 3], 'B': [4, 2, 5]}
元组(Tuple)
tup1 = 4,5,6,7
tup2 = 'a','b','1',1
print(tup1)
print(tup2)
(4, 5, 6, 7)
('a', 'b', '1', 1)
集合(set)
s1 = set([2,2,2,1,3,3,'a','a'])
print(s1)
{1, 2, 3, 'a'}
Numpy中数据结构 丨array()
import numpy as np
arr1 = np.array([1,2,3,4,5])
arr1
array([1, 2, 3, 4, 5])
arr2 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
arr2
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
print(arr2.shape)
print(arr2.size)
print(arr2.dtype)
(2, 5)
10
int32
arr3 = arr2 + 100
arr3
array([[101, 102, 103, 104, 105],
[106, 107, 108, 109, 110]])
Pandas中数据结构
一维数据(Series) 丨★★★★★
- 一维的数组类对象,
- 包含一个序列和数据标签(索引)
import numpy as np
import pandas as pd
s1 = pd.Series([4,7,-5,3])
s2 = pd.Series(np.random.random(5))
s3 = pd.Series(np.random.randn(5),index=list('ABCDE'))
s4 = pd.Series(np.random.randint(5,20,5),index=list('ABCDE'))
print(s1)
print('----------------------------')
print(s2)
print('----------------------------')
print(s3)
print('----------------------------')
print(s4)
0 4
1 7
2 -5
3 3
dtype: int64
----------------------------
0 0.327655
1 0.314315
2 0.118767
3 0.249609
4 0.005788
dtype: float64
----------------------------
A -0.462626
B 0.135683
C -0.417308
D 0.061270
E -0.687880
dtype: float64
----------------------------
A 6
B 9
C 14
D 18
E 16
dtype: int32
s1.values
array([ 4, 7, -5, 3], dtype=int64)
s1.index
RangeIndex(start=0, stop=4, step=1)
s2 > 0.5
0 False
1 False
2 False
3 False
4 False
dtype: bool
s2 + 10
0 10.327655
1 10.314315
2 10.118767
3 10.249609
4 10.005788
dtype: float64
s4['A']
6
二维数据(DataFrame) 丨★★★★★
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(5,20,(10,5)),columns=list('ABCDE'))
print(df)
A B C D E
0 7 12 18 11 15
1 10 18 18 15 7
2 10 6 6 17 16
3 17 6 18 5 13
4 15 13 19 17 9
5 14 11 9 11 9
6 10 19 7 8 5
7 9 16 12 7 9
8 19 5 11 6 9
9 7 15 12 10 7
df.head()
|
A |
B |
C |
D |
E |
0 |
7 |
12 |
18 |
11 |
15 |
1 |
10 |
18 |
18 |
15 |
7 |
2 |
10 |
6 |
6 |
17 |
16 |
3 |
17 |
6 |
18 |
5 |
13 |
4 |
15 |
13 |
19 |
17 |
9 |
df.tail()
|
A |
B |
C |
D |
E |
5 |
14 |
11 |
9 |
11 |
9 |
6 |
10 |
19 |
7 |
8 |
5 |
7 |
9 |
16 |
12 |
7 |
9 |
8 |
19 |
5 |
11 |
6 |
9 |
9 |
7 |
15 |
12 |
10 |
7 |
df.shape
(10, 5)
df.info()
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 10 non-null int32
1 B 10 non-null int32
2 C 10 non-null int32
3 D 10 non-null int32
4 E 10 non-null int32
dtypes: int32(5)
memory usage: 328.0 bytes
df.describe().round(2)
|
A |
B |
C |
D |
E |
count |
10.00 |
10.00 |
10.00 |
10.00 |
10.0 |
mean |
11.80 |
12.10 |
13.00 |
10.70 |
9.9 |
std |
4.18 |
5.09 |
4.92 |
4.40 |
3.6 |
min |
7.00 |
5.00 |
6.00 |
5.00 |
5.0 |
25% |
9.25 |
7.25 |
9.50 |
7.25 |
7.5 |
50% |
10.00 |
12.50 |
12.00 |
10.50 |
9.0 |
75% |
14.75 |
15.75 |
18.00 |
14.00 |
12.0 |
max |
19.00 |
19.00 |
19.00 |
17.00 |
16.0 |
df.index
RangeIndex(start=0, stop=10, step=1)
df.columns
Index(['A', 'B', 'C', 'D', 'E'], dtype='object')
df.dtypes
A int32
B int32
C int32
D int32
E int32
dtype: object
df[:5]
|
A |
B |
C |
D |
E |
0 |
7 |
12 |
18 |
11 |
15 |
1 |
10 |
18 |
18 |
15 |
7 |
2 |
10 |
6 |
6 |
17 |
16 |
3 |
17 |
6 |
18 |
5 |
13 |
4 |
15 |
13 |
19 |
17 |
9 |
本节重点
- 列表、字典定义:列表中括号,字典大括号
- Series 和 DataFrame:首字母大写
- DataFrame属性与方法:是否带括号
- 索引