python numpy读取csv文件_numpy模块处理csv文件

数据:

X,Y,NAME,LABEL

120.7512427,30.75084798,嘉兴市,2

120.0830671,30.89524644,湖州市,80

120.574379,30.00700998,绍兴市,140

122.202972,29.98754304,舟山市,165

121.546246,29.87620299,宁波市,86

119.642848,29.08127199,金华市,184

118.869413,28.93892703,衢州市,72

121.416636,28.65889302,台州市,38

119.912514,28.455276,丽水市,49

120.695457,27.99819198,温州市,127

103.71468,27.341637,昭通市,23

简单读取

import numpy as np

path = 'city_aic.csv'

with open(path, encoding='utf-8') as f:

data = np.loadtxt(path, delimiter=',')

print(data[:5])

这样直接读取的结果是,报错: ValueError: could not convert string to float: ‘X’

字符串处理-str参数

原因是,默认情况下,数据被认为是float类型,因此,在上面读取csv文件第1行时,遇到’X’,尝试进行数据类型转换,转换失败报错。

经查可以使用str参数,让方法读取数据时,支持str类型。

import numpy as np

path = 'city_aic.csv'

with open(path, encoding='utf-8') as f:

data = np.loadtxt(path, dtype=str, delimiter=',')

print(data[:5])

#结果

[['X' 'Y' 'NAME' 'LABEL']

['120.7512427' '30.75084798' '嘉兴市' '2']

['120.0830671' '30.89524644' '湖州市' '80']

['120.574379' '30.00700998' '绍兴市' '140']

['122.202972' '29.98754304' '舟山市' '165']]

跳过首行 - skiprow = 1

import numpy as np

path = 'city_aic.csv'

with open(path, encoding='utf-8') as f:

data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1)

print(data[:5])

# 结果

[['120.7512427' '30.75084798' '嘉兴市' '2']

['120.0830671' '30.89524644' '湖州市' '80']

['120.574379' '30.00700998' '绍兴市' '140']

['122.202972' '29.98754304' '舟山市' '165']

['121.546246' '29.87620299' '宁波市' '86']]

读取特定列 - usecols 参数

usecols参数接收一个元组,元组里面用列索引来指定输入特定的列。

import numpy as np

path = 'city_aic.csv'

with open(path, encoding='utf-8') as f:

data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1, usecols=(0,2,1,3))

print(data[:5])

# 结果

[['120.7512427' '嘉兴市' '30.75084798' '2']

['120.0830671' '湖州市' '30.89524644' '80']

['120.574379' '绍兴市' '30.00700998' '140']

['122.202972' '舟山市' '29.98754304' '165']

['121.546246' '宁波市' '29.87620299' '86']]

numpy切片

numpy切片的方法有很多,但对于二维数组,采用Array[行,列]这种方法进行切片最容易让人理解和牢记。

import numpy as np

path = 'city_aic.csv'

with open(path, encoding='utf-8') as f:

data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1, usecols=(0,2,1,3))

# 取前5行

print(data[:5])

# 取前5行第3列

print(data[:5, 2])

# 取具体的第5行第3列

print(data[5, 2])

# 取第3列

print(data[:, 2])

print(type(data))

# 结果

[['120.7512427' '嘉兴市' '30.75084798' '2']

['120.0830671' '湖州市' '30.89524644' '80']

['120.574379' '绍兴市' '30.00700998' '140']

['122.202972' '舟山市' '29.98754304' '165']

['121.546246' '宁波市' '29.87620299' '86']]

['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299']

29.08127199

['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299'

'29.08127199' '28.93892703' '28.65889302' '28.455276' '27.99819198'

'27.341637']

Numpy.genfromtxt-读取csv文件数据

data = np.genfromtxt(path, delimiter=',')

print(data[:5])

# 结果

[[ nan nan nan nan]

[120.7512427 30.75084798 nan 2. ]

[120.0830671 30.89524644 nan 80. ]

[120.574379 30.00700998 nan 140. ]

[122.202972 29.98754304 nan 165. ]]

以str形式读取数据

# Numpy.genfromtxt-读取csv文件数据

data = np.genfromtxt(path, delimiter=',', dtype=str, skip_header=0, usecols=(0,2,1,3))

print(data[:5])

print(type(data))

# 结果

[['X' 'NAME' 'Y' 'LABEL']

['120.7512427' '嘉兴市' '30.75084798' '2']

['120.0830671' '湖州市' '30.89524644' '80']

['120.574379' '绍兴市' '30.00700998' '140']

['122.202972' '舟山市' '29.98754304' '165']]

np.savetxt('frame',array,fmt='%d',delimiter=None)

frame: 文件

array:存入文件的数组

fmt:写入文件的格式,如%d %f %e %s

delimiter:分割字符串,默认空格

a = np.arange(20).reshape(2, 10)

b = np.savetxt('a.csv', a, fmt='%s', delimiter=',')

print(a)

print(type(a))

# 结果

[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]]

np.loadtxt('frame',dtype=np.int,delimiter=None,unpack=False)

frame:文件

dtype:数据类型

delimiter:分割字符串

c = np.loadtxt('a.csv', delimiter=',', dtype=str)

print(c)

print(type(c))

# 结果

[['0' '1' '2' '3' '4' '5' '6' '7' '8' '9']

['10' '11' '12' '13' '14' '15' '16' '17' '18' '19']]

完整示例

import numpy as np

path = 'city_aic.csv'

with open(path, encoding='utf-8') as f:

data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1, usecols=(0,2,1,3))

# 取前5行

print(data[:5])

# 取前5行第3列

print(data[:5, 2])

# 取具体的第5行第3列

print(data[5, 2])

# 取第3列

print(data[:, 2])

print(type(data))

# Numpy.genfromtxt-读取csv文件数据

data = np.genfromtxt(path, delimiter=',', dtype=str, skip_header=0, usecols=(0,2,1,3))

print(data[:5])

print(type(data))

a = np.arange(20).reshape(2, 10)

b = np.savetxt('a.csv', a, fmt='%s', delimiter=',')

print(a)

print(type(a))

c = np.loadtxt('a.csv', delimiter=',', dtype=str)

print(c)

print(type(c))

# 结果

[['120.7512427' '嘉兴市' '30.75084798' '2']

['120.0830671' '湖州市' '30.89524644' '80']

['120.574379' '绍兴市' '30.00700998' '140']

['122.202972' '舟山市' '29.98754304' '165']

['121.546246' '宁波市' '29.87620299' '86']]

['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299']

29.08127199

['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299'

'29.08127199' '28.93892703' '28.65889302' '28.455276' '27.99819198'

'27.341637']

[['X' 'NAME' 'Y' 'LABEL']

['120.7512427' '嘉兴市' '30.75084798' '2']

['120.0830671' '湖州市' '30.89524644' '80']

['120.574379' '绍兴市' '30.00700998' '140']

['122.202972' '舟山市' '29.98754304' '165']]

[[ 0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]]

[['0' '1' '2' '3' '4' '5' '6' '7' '8' '9']

['10' '11' '12' '13' '14' '15' '16' '17' '18' '19']]

你可能感兴趣的:(python,numpy读取csv文件)