【1】read_csv()函数:读取CSV(逗号分割)文件到DataFrame,也支持文件的部分导入和选择迭代。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
# print(practice)
print(practice.dtypes)
结果:
first int64
second int64
three int64
four int64
five int64
dtype: object
【2】head()函数:查看文件的指定行数的数据,默认是5行,下标从0开始。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
head = practice.head()
print(head)
结果:
first second three four five six seven eight nith ten
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10
2 1 2 3 4 5 6 7 8 9 10
3 1 2 3 4 5 6 7 8 9 10
4 1 2 3 4 5 6 7 8 9 10
【3】tail()函数:查看文件的指定行数的数据,默认是5行。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
tail = practice.tail()
print(tail)
结果:
first second three four five six seven eight nith ten
24 1 2 3 4 5 6 7 8 9 10
25 1 2 3 4 5 6 7 8 9 10
26 1 2 3 4 5 6 7 8 9 10
27 1 2 3 4 5 6 7 8 9 10
28 1 2 3 4 5 6 7 8 9 10
【4】shape函数:查看DataFrame结构。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
shape = practice.shape
print(shape)
结果:
(29, 10)
【5】columns函数:获取数据文件中的列名。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
col = practice.columns
print(col)
结果:
Index(['first', 'second', 'three', 'four', 'five', 'six', 'seven', 'eight',
'nith', 'ten'],
dtype='object')
【6】dtypes函数:获取文件中的每一列的数据类型。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
print(practice.dtypes)
结果:
first int64
second int64
three int64
four int64
five int64
【7】tolist()函数:将DataFrame数据转化为数组。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
colList = practice.columns.tolist()
print(colList)
结果:
['first', 'second', 'three', 'four', 'five', 'six(g)', 'seven', 'eight', 'nith(mg)', 'ten']
【8】endswith()函数:查看元素以某值结尾。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
colList = practice.columns.tolist()
print(colList)
newColumns = []
for col in colList:
if col.endswith("(g)"):
newColumns.append(col)
print(newColumns)
print(practice[newColumns])
结果:
['first', 'second', 'three', 'four', 'five', 'six(g)', 'seven', 'eight', 'nith(mg)', 'ten']
['six(g)']
six(g)
0 6.1
1 6.2
2 6.3
3 6.0
4 6.0
5 6.0
【9】sort_values()函数:对DataFrame中的某一列进行排序。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
newPractice = practice.sort_values("first",inplace=False,ascending=False)
print(practice)
print("#########")
print(newPractice)
结果:
first second three four five six(g) seven eight nith(mg) ten
0 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
1 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2
2 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3
#########
first second three four five six(g) seven eight nith(mg) ten
29 12.0 22.0 NaN NaN NaN NaN NaN NaN NaN NaN
2 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3
1 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2
0 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
【10】isnull()函数:判断数据文件中某一个数值是否为null。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
print(practice["first"].isnull())
# print(practice.isnull())
结果:
23 False
24 False
25 True
26 False
27 False
【11】len()函数:获取数据文件中最大的行数是多少行。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
print(len(practice))
结果:
30
【12】dropna()函数:数据清理删除NaN。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\pandaTest.csv")
drop = practice.dropna(axis=0,subset=["first"])
print(drop)
结果:
first second three four five six(g) seven eight nith(mg) ten
1 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2
2 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3
3 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
4 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
【13】to_datetime()函数:将数据文件的某一列转化为标准时间格式。
示例:
import pandas
practice = pandas.read_csv("C:\\Users\\Lenovo\\Desktop\\practice.csv")
practice = pandas.to_datetime(pratice["Data"])
print(practice)
结果:
0 1998-06-05
1 1998-06-06
2 1998-06-07
3 1998-06-08
4 1998-06-09