python pandas dataframe函数_Pandas的DataFrame列操作

13. Pandas的DataFrame列操作

本章主要研究一下DataFrame数据结构如何修改、增删等操作。

13.1 rename修改列名字

对一个dataframe的数据使用rename函数后返回新的dataframe,不影响原dataframe。

import pandas as pd

import numpy as np

val = np.arange(10, 60).reshape(10, 5)

col = ["ax", "bx", "cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val, columns = col, index = idx)

print df1

print "*" * 21, "

df2 = df1.rename(columns = {"ax" : "close", "bx" : "open"})

print df2

print "*" * 21, "

程序执行结果:

ax bx cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

close open cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

如果想直接影响本dataframe,可以使用参数inplace设置为True。

import pandas as pd

import numpy as np

val = np.arange(10, 60).reshape(10, 5)

col = ["ax", "bx", "cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val, columns = col, index = idx)

print df1

print "*" * 21, "

df1.rename(columns = {"ax" : "close", "bx" : "open"}, inplace = True)

print df1

print "*" * 21, "

程序的执行结果:

ax bx cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

close open cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

13.2 增加一列

在pandas里对dataframe数据的增加可以通过[]或者insert函数等方法来实现。

[]方式将新的series添加在原dataframe的尾部。

import pandas as pd

import numpy as np

val = np.arange(10, 60).reshape(10, 5)

col = ["ax", "bx", "cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val, columns = col, index = idx)

print df1

print "*" * 21, "

nval = val = np.arange(100, 110).reshape(10, 1)

df1["fx"] = nval

print df1

程序的执行结果:

ax bx cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

ax bx cx dx ex fx

a 10 11 12 13 14 100

b 15 16 17 18 19 101

c 20 21 22 23 24 102

d 25 26 27 28 29 103

e 30 31 32 33 34 104

f 35 36 37 38 39 105

g 40 41 42 43 44 106

h 45 46 47 48 49 107

i 50 51 52 53 54 108

j 55 56 57 58 59 109

而insert函数可将插入的series放在指定位置。

import pandas as pd

import numpy as np

val = np.arange(10, 60).reshape(10, 5)

col = ["ax", "bx", "cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val, columns = col, index = idx)

print df1

print "*" * 21, "

nval = val = np.arange(100, 110).reshape(10, 1)

df1["fx"] = nval

print df1

print "*" * 21, "

df1.insert(1, "gx", nval)

print df1

print "*" * 21, "

程序的执行结果:

ax bx cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

ax bx cx dx ex fx

a 10 11 12 13 14 100

b 15 16 17 18 19 101

c 20 21 22 23 24 102

d 25 26 27 28 29 103

e 30 31 32 33 34 104

f 35 36 37 38 39 105

g 40 41 42 43 44 106

h 45 46 47 48 49 107

i 50 51 52 53 54 108

j 55 56 57 58 59 109

*********************

ax gx bx cx dx ex fx

a 10 100 11 12 13 14 100

b 15 101 16 17 18 19 101

c 20 102 21 22 23 24 102

d 25 103 26 27 28 29 103

e 30 104 31 32 33 34 104

f 35 105 36 37 38 39 105

g 40 106 41 42 43 44 106

h 45 107 46 47 48 49 107

i 50 108 51 52 53 54 108

j 55 109 56 57 58 59 109

*********************

loc[]来添加新的数据列。

import pandas as pd

import numpy as np

val = np.arange(10, 60).reshape(10, 5)

col = ["ax", "bx", "cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val, columns = col, index = idx)

print df1

print "*" * 21, "

nval = val = np.arange(100, 110).reshape(10, 1)

df1.loc[:, "ix"] = nval

print df1

print "*" * 21, "

程序的执行结果:

ax bx cx dx ex

a 10 11 12 13 14

b 15 16 17 18 19

c 20 21 22 23 24

d 25 26 27 28 29

e 30 31 32 33 34

f 35 36 37 38 39

g 40 41 42 43 44

h 45 46 47 48 49

i 50 51 52 53 54

j 55 56 57 58 59

*********************

ax bx cx dx ex ix

a 10 11 12 13 14 100

b 15 16 17 18 19 101

c 20 21 22 23 24 102

d 25 26 27 28 29 103

e 30 31 32 33 34 104

f 35 36 37 38 39 105

g 40 41 42 43 44 106

h 45 46 47 48 49 107

i 50 51 52 53 54 108

j 55 56 57 58 59 109

13.3 concat多列连接

pandas有个concat函数可以连接多个dataframe数据组成一个更大的dataframe数据。

import pandas as pd

import numpy as np

val1 = np.arange(10, 40).reshape(10, 3)

val2 = np.arange(50, 80).reshape(10, 3)

col1 = ["ax", "bx", "cx"]

col2 = ["cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val1, columns = col1, index = idx)

df2 = pd.DataFrame(val2, columns = col2, index = idx)

print df1

print "*" * 21, "

print df2

print "*" * 21, "

df3 = pd.concat([df1, df2[5:], df1[:5],df2], axis = 1)

print df3

程序执行结果:

********************

ax bx cx

a 10 11 12

b 13 14 15

c 16 17 18

d 19 20 21

e 22 23 24

f 25 26 27

g 28 29 30

h 31 32 33

i 34 35 36

j 37 38 39

*********************

cx dx ex

a 50 51 52

b 53 54 55

c 56 57 58

d 59 60 61

e 62 63 64

f 65 66 67

g 68 69 70

h 71 72 73

i 74 75 76

j 77 78 79

*********************

ax bx cx cx dx ex ax bx cx cx dx ex

a 10 11 12 NaN NaN NaN 10 11 12 50 51 52

b 13 14 15 NaN NaN NaN 13 14 15 53 54 55

c 16 17 18 NaN NaN NaN 16 17 18 56 57 58

d 19 20 21 NaN NaN NaN 19 20 21 59 60 61

e 22 23 24 NaN NaN NaN 22 23 24 62 63 64

f 25 26 27 65 66 67 NaN NaN NaN 65 66 67

g 28 29 30 68 69 70 NaN NaN NaN 68 69 70

h 31 32 33 71 72 73 NaN NaN NaN 71 72 73

i 34 35 36 74 75 76 NaN NaN NaN 74 75 76

j 37 38 39 77 78 79 NaN NaN NaN 77 78 79

从结果可以看出,连接的两个dataframe结构不同,即有的dataframe没有相应的行,那么数据行上无数据用NaN填充。

13.4 列的内容替换

可以通过赋值的方式更换列的数值。

import pandas as pd

import numpy as np

val1 = np.arange(10, 40).reshape(10, 3)

val2 = np.arange(50, 80).reshape(10, 3)

col1 = ["ax", "bx", "cx"]

col2 = ["cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val1, columns = col1, index = idx)

df2 = pd.DataFrame(val2, columns = col2, index = idx)

print df1[:3]

print "*" * 21, "

print df2[:3]

print "*" * 21, "

df1.cx = df2.cx

print df1[:3]

这里df1里的cx列被换成了df2里的cx内容。

ax bx cx

a 10 11 12

b 13 14 15

c 16 17 18

*********************

cx dx ex

a 50 51 52

b 53 54 55

c 56 57 58

*********************

ax bx cx

a 10 11 50

b 13 14 53

c 16 17 56

13.5 删除列

删除dataframe的列可以用del()、dataframe的pop函数、drop函数。del函数直接影响原dataframe,pop函数返回被删除的数据即某列,其结果是一个Series,而drop可以指定多列删除。

import pandas as pd

import numpy as np

val1 = np.arange(10, 40).reshape(10, 3)

val2 = np.arange(50, 80).reshape(10, 3)

col1 = ["ax", "bx", "cx"]

col2 = ["cx", "dx", "ex"]

idx = list("abcdefghij")

df1 = pd.DataFrame(val1, columns = col1, index = idx)

df2 = pd.DataFrame(val2, columns = col2, index = idx)

print "*" * 21

print df1[:3]

print "*" * 21

print df2[:3]

del df1["cx"]

print "*" * 21

print df1[:3]

df3 = df2.pop("cx")

print "+" * 21

print df2[:3]

print "-" * 21

print df3[:3]

print "/" * 21

df1 = pd.DataFrame(val1, columns = col1, index = idx)

df4 = df1.drop(["ax", "cx"], axis = 1)

print df1[:3]

print df4[:3]

程序执行结果如下:

ax bx cx

a 10 11 12

b 13 14 15

c 16 17 18

*********************

cx dx ex

a 50 51 52

b 53 54 55

c 56 57 58

*********************

ax bx

a 10 11

b 13 14

c 16 17

+++++++++++++++++++++

dx ex

a 51 52

b 54 55

c 57 58

---------------------

a 50

b 53

c 56

Name: cx, dtype: int64

/

ax bx cx

a 10 11 12

b 13 14 15

c 16 17 18

bx

a 11

b 14

c 17

你可能感兴趣的:(python,pandas,dataframe函数)