数据:
X,Y,NAME,LABEL
120.7512427,30.75084798,嘉兴市,2
120.0830671,30.89524644,湖州市,80
120.574379,30.00700998,绍兴市,140
122.202972,29.98754304,舟山市,165
121.546246,29.87620299,宁波市,86
119.642848,29.08127199,金华市,184
118.869413,28.93892703,衢州市,72
121.416636,28.65889302,台州市,38
119.912514,28.455276,丽水市,49
120.695457,27.99819198,温州市,127
103.71468,27.341637,昭通市,23
简单读取
import numpy as np
path = 'city_aic.csv'
with open(path, encoding='utf-8') as f:
data = np.loadtxt(path, delimiter=',')
print(data[:5])
这样直接读取的结果是,报错: ValueError: could not convert string to float: ‘X’
字符串处理-str参数
原因是,默认情况下,数据被认为是float类型,因此,在上面读取csv文件第1行时,遇到’X’,尝试进行数据类型转换,转换失败报错。
经查可以使用str参数,让方法读取数据时,支持str类型。
import numpy as np
path = 'city_aic.csv'
with open(path, encoding='utf-8') as f:
data = np.loadtxt(path, dtype=str, delimiter=',')
print(data[:5])
#结果
[['X' 'Y' 'NAME' 'LABEL']
['120.7512427' '30.75084798' '嘉兴市' '2']
['120.0830671' '30.89524644' '湖州市' '80']
['120.574379' '30.00700998' '绍兴市' '140']
['122.202972' '29.98754304' '舟山市' '165']]
跳过首行 - skiprow = 1
import numpy as np
path = 'city_aic.csv'
with open(path, encoding='utf-8') as f:
data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1)
print(data[:5])
# 结果
[['120.7512427' '30.75084798' '嘉兴市' '2']
['120.0830671' '30.89524644' '湖州市' '80']
['120.574379' '30.00700998' '绍兴市' '140']
['122.202972' '29.98754304' '舟山市' '165']
['121.546246' '29.87620299' '宁波市' '86']]
读取特定列 - usecols 参数
usecols参数接收一个元组,元组里面用列索引来指定输入特定的列。
import numpy as np
path = 'city_aic.csv'
with open(path, encoding='utf-8') as f:
data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1, usecols=(0,2,1,3))
print(data[:5])
# 结果
[['120.7512427' '嘉兴市' '30.75084798' '2']
['120.0830671' '湖州市' '30.89524644' '80']
['120.574379' '绍兴市' '30.00700998' '140']
['122.202972' '舟山市' '29.98754304' '165']
['121.546246' '宁波市' '29.87620299' '86']]
numpy切片
numpy切片的方法有很多,但对于二维数组,采用Array[行,列]这种方法进行切片最容易让人理解和牢记。
import numpy as np
path = 'city_aic.csv'
with open(path, encoding='utf-8') as f:
data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1, usecols=(0,2,1,3))
# 取前5行
print(data[:5])
# 取前5行第3列
print(data[:5, 2])
# 取具体的第5行第3列
print(data[5, 2])
# 取第3列
print(data[:, 2])
print(type(data))
# 结果
[['120.7512427' '嘉兴市' '30.75084798' '2']
['120.0830671' '湖州市' '30.89524644' '80']
['120.574379' '绍兴市' '30.00700998' '140']
['122.202972' '舟山市' '29.98754304' '165']
['121.546246' '宁波市' '29.87620299' '86']]
['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299']
29.08127199
['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299'
'29.08127199' '28.93892703' '28.65889302' '28.455276' '27.99819198'
'27.341637']
Numpy.genfromtxt-读取csv文件数据
data = np.genfromtxt(path, delimiter=',')
print(data[:5])
# 结果
[[ nan nan nan nan]
[120.7512427 30.75084798 nan 2. ]
[120.0830671 30.89524644 nan 80. ]
[120.574379 30.00700998 nan 140. ]
[122.202972 29.98754304 nan 165. ]]
以str形式读取数据
# Numpy.genfromtxt-读取csv文件数据
data = np.genfromtxt(path, delimiter=',', dtype=str, skip_header=0, usecols=(0,2,1,3))
print(data[:5])
print(type(data))
# 结果
[['X' 'NAME' 'Y' 'LABEL']
['120.7512427' '嘉兴市' '30.75084798' '2']
['120.0830671' '湖州市' '30.89524644' '80']
['120.574379' '绍兴市' '30.00700998' '140']
['122.202972' '舟山市' '29.98754304' '165']]
np.savetxt('frame',array,fmt='%d',delimiter=None)
frame: 文件
array:存入文件的数组
fmt:写入文件的格式,如%d %f %e %s
delimiter:分割字符串,默认空格
a = np.arange(20).reshape(2, 10)
b = np.savetxt('a.csv', a, fmt='%s', delimiter=',')
print(a)
print(type(a))
# 结果
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
np.loadtxt('frame',dtype=np.int,delimiter=None,unpack=False)
frame:文件
dtype:数据类型
delimiter:分割字符串
c = np.loadtxt('a.csv', delimiter=',', dtype=str)
print(c)
print(type(c))
# 结果
[['0' '1' '2' '3' '4' '5' '6' '7' '8' '9']
['10' '11' '12' '13' '14' '15' '16' '17' '18' '19']]
完整示例
import numpy as np
path = 'city_aic.csv'
with open(path, encoding='utf-8') as f:
data = np.loadtxt(path, dtype=str, delimiter=',', skiprows=1, usecols=(0,2,1,3))
# 取前5行
print(data[:5])
# 取前5行第3列
print(data[:5, 2])
# 取具体的第5行第3列
print(data[5, 2])
# 取第3列
print(data[:, 2])
print(type(data))
# Numpy.genfromtxt-读取csv文件数据
data = np.genfromtxt(path, delimiter=',', dtype=str, skip_header=0, usecols=(0,2,1,3))
print(data[:5])
print(type(data))
a = np.arange(20).reshape(2, 10)
b = np.savetxt('a.csv', a, fmt='%s', delimiter=',')
print(a)
print(type(a))
c = np.loadtxt('a.csv', delimiter=',', dtype=str)
print(c)
print(type(c))
# 结果
[['120.7512427' '嘉兴市' '30.75084798' '2']
['120.0830671' '湖州市' '30.89524644' '80']
['120.574379' '绍兴市' '30.00700998' '140']
['122.202972' '舟山市' '29.98754304' '165']
['121.546246' '宁波市' '29.87620299' '86']]
['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299']
29.08127199
['30.75084798' '30.89524644' '30.00700998' '29.98754304' '29.87620299'
'29.08127199' '28.93892703' '28.65889302' '28.455276' '27.99819198'
'27.341637']
[['X' 'NAME' 'Y' 'LABEL']
['120.7512427' '嘉兴市' '30.75084798' '2']
['120.0830671' '湖州市' '30.89524644' '80']
['120.574379' '绍兴市' '30.00700998' '140']
['122.202972' '舟山市' '29.98754304' '165']]
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
[['0' '1' '2' '3' '4' '5' '6' '7' '8' '9']
['10' '11' '12' '13' '14' '15' '16' '17' '18' '19']]