1. List
通过sum(list)来对list进行求和,不过看情况需要先将元素转化为数字。
import csv
f = open("world_alcohol.csv", "r")
world_alcohol = csv.reader(f)
years = []
for row in world_alcohol[1:]:
years.append(row[0])
total = sum(float(i) for i in years)
avg_year = total / len(years)
print (avg_year)
2. Numpy
利用Numpy读取csv到 numpy_ndarray的格式
import numpy as np
world_alcohol = np.genfromtxt("world_alcohol.csv", delimiter = ",")
print (type(world_alcohol))
#
利用Numpy把list转化为vector(一维向量),或把list of list转化为matrix(二维矩阵)
import numpy as np
vector = np.array([10, 20, 30])
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
print (vector)
print (matrix)
检查矩阵的shape
vector = numpy.array([10, 20, 30])
matrix = numpy.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
vector_shape = vector.shape
matrix_shape = matrix.shape
Each value in a NumPy array has to have the same data type.
numbers = numpy.array([1, 2, 3, 4])
numbers.dtype
关于NaN and NA
When NumPy can't convert a value to a numeric data type like float or integer, it uses a special nan value that stands for Not a Number. NumPy assigns an na value, which stands for Not Available, when the value doesn't exist. nan and na values are types of missing data.
把所有的内容读成string格式
为了防止NAN和NA的出现,这里把所有的内容都以string的格式读进来,设置dtype和skip_header。
import numpy as np
world_alcohol = np.genfromtxt("world_alcohol.csv", dtype = "U75", skip_header = 1, delimiter = ",")
print (world_alcohol)
Numpy Slicing
matrix = numpy.array([
[5, 10, 15],
[20, 25, 30],
[35, 40, 45]
])
print(matrix[:,1])
# to get all the rows, second column
# returns 10, 25, 40
print(matrix[:,0:2])
# to get all the rows, and 1st -2nd columns
# returns
[
[5, 10],
[20, 25],
[35, 40]
]
用Numpy返回Boolean,进而进行选择
matrix = numpy.array([
[5, 10, 15],
[20, 25, 30],
[35, 40, 45]
])
second_column_25 = (matrix[:,1] == 25)
print(matrix[second_column_25, :])
# it returns:
[
[20, 25, 30]
]
Replacing Values
s1986 = (world_alcohol[:, 0] == "1986")
world_alcohol[s1986, 0] = "2014"
sWine = (world_alcohol[:, 3] == "Wine")
world_alcohol[sWine, 3] = "Grog"