1.写入文件可以使用numpy.savetxt(‘filename’,array)可以把数组写入到文件filename中。使用numpy.loadtxt(‘filename’,delimiter=’,or something’,usecls=sequence,unpack=True/False)读取文件。这两个函数也能对大部分数据存储使用的csv格式文件进行操作。
2.使用numpy.average(arrayone,weights=arraytwo)可以arrayone在arraytwo加权上的均值。numpy.mean(array)直接求array的均值。比如可以求成交量加权价格,时间加权价格等。
>>> c,v=numpy.loadtxt('apple.csv', delimiter=',', unpack=True)
>>> c
array([ 344.17, 345.17, 346.17, 347.17, 348.17, 349.17, 350.17,
351.17, 352.17])
>>> v
array([ 344.4, 345.4, 346.4, 347.4, 348.4, 349.4, 350.4, 351.4,
352.4])
>>> k=numpy.average(c,weights=v)
>>> k
348.18913509376193
3.使用numpy.max(array)和numpy.min(array)分别可求array的最大值和最小值。而numpy.ptp(array)是求array的极差,也就是最大和最小值的差。numpy.median(array)计算array排序后的中位数。numpy.var(array)计算array的方差,而numpy.std(array)计算array的标准差。(注意样本方差和总体方差的计算区别,总体方差是用总体个数去除离差平方和,而样本使用样本个数减1去除离差平方和,其中样本个数减1(即n-1)称为自由度。样本方差如此计算是为了保证样本方差是一个无偏估计量。而这些区别在numpy中具体有没有体现,还得摸索)。ndarray中array.mean()也可以直接计算array均值。
>>> c
array([ 1., 1., 1., 5., 1., 1.])
>>> v
array([ 1., 1., 1., 5., 1., 1.])
>>> numpy.max(c)
5.0
>>> numpy.max(v)
5.0
>>> numpy.min(v)
1.0
>>> numpy.ptp(c)
4.0
>>> c
array([ 1., 1., 1., 5., 1., 1.])
>>> numpy.median(c)
1.0
>>> numpy.var(c)
2.2222222222222219
>>> numpy.var(c)==numpy.mean((c-c.mean())**2)##验证var()
True
4.可以使用numpy.diff(array)计算array中相邻的两个元素的差值。使用numpy.log(array)计算array中每个元素的对数值。numpy是面向浮点型数值运算的。注意numpy.loadtxt()中的converters参数的使用。numpy.where(array>num)可以提取出array元素中大于num值的下标数组。numpy.take(array,arrayindexs)可以提取出array数组中arrayindexs下标的值。numpy.argmax(array)返回array中最大值的下标,而numpy.argmin(array)返回array中最小值的小标。numpy.apply_along_axis()函数的使用要着重探讨。考察numpy.apply_along_axis()的性能提升。
>>> v
array([ 1., 1., 1., 5., 1., 1.])
>>> numpy.diff(v)
array([ 0., 0., 4., -4., 0.])
>>> numpy.diff(v)/v[:-1]
array([ 0. , 0. , 4. , -0.8, 0. ])
>>> def datestr2num(s):##定义日期转换函数,日期转换为数字
return datetime.datetime.strptime(s,'%Y/%m/%d').date().weekday()
>>> dates,price=numpy.loadtxt('apple.csv',delimiter=',',usecols=(2,0),unpack=True,converters={2:datestr2num})
>>> dates
array([ 3., 4., 0., 1., 2., 3., 4., 0., 1., 2., 3., 4., 0.,
1., 2., 3., 4., 0., 1., 2., 3., 4., 0., 1., 2.])
>>> price
array([ 46.5, 47.5, 48.5, 49.5, 50.5, 51.5, 52.5, 53.5, 54.5,
55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5, 63.5,
64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5])
>>> numpy.zeros(5)##初始化一个数组
array([ 0., 0., 0., 0., 0.])
>>> for i in range(5):
indices = numpy.where(dates==i)
prices=numpy.take(price,indices)
agv = numpy.mean(prices)
print "Day",i,'prices',prices,"averange",agv
Day 0 prices [[ 48.5 53.5 58.5 63.5 68.5]] averange 58.5
Day 1 prices [[ 49.5 54.5 59.5 64.5 69.5]] averange 59.5
Day 2 prices [[ 50.5 55.5 60.5 65.5 70.5]] averange 60.5
Day 3 prices [[ 46.5 51.5 56.5 61.5 66.5]] averange 56.5
Day 4 prices [[ 47.5 52.5 57.5 62.5 67.5]] averange 57.5
>>> numpy.argmax(prices)
4
>>> numpy.argmin(prices)
0
>>>
+++++++++++++++++++++++++++++++++++++++++++++++++++++
apple.csv
opendata date high low close
46.5 765.98 2015/1/1 48.99 44.11 47.66
47.5 766.98 2015/1/2 49.99 45.11 48.66
48.5 767.98 2015/1/5 50.99 46.11 49.66
49.5 768.98 2015/1/6 51.99 47.11 50.66
50.5 769.98 2015/1/7 52.99 48.11 51.66
51.5 770.98 2015/1/8 53.99 49.11 52.66
52.5 771.98 2015/1/9 54.99 50.11 53.66
53.5 772.98 2015/1/12 55.99 51.11 54.66
54.5 773.98 2015/1/13 56.99 52.11 55.66
55.5 774.98 2015/1/14 57.99 53.11 56.66
56.5 775.98 2015/1/15 58.99 54.11 57.66
57.5 776.98 2015/1/16 59.99 55.11 58.66
58.5 777.98 2015/1/19 60.99 56.11 59.66
59.5 778.98 2015/1/20 61.99 57.11 60.66
60.5 779.98 2015/1/21 62.99 58.11 61.66
61.5 780.98 2015/1/22 63.99 59.11 62.66
62.5 781.98 2015/1/23 64.99 60.11 63.66
63.5 782.98 2015/1/26 65.99 61.11 64.66
64.5 783.98 2015/1/27 66.99 62.11 65.66
65.5 784.98 2015/1/28 67.99 63.11 66.66
66.5 785.98 2015/1/29 68.99 64.11 67.66
67.5 786.98 2015/1/30 69.99 65.11 68.66
68.5 787.98 2015/2/2 70.99 66.11 69.66
69.5 788.98 2015/2/3 71.99 67.11 70.66
70.5 789.98 2015/2/4 72.99 68.11 71.66
+++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> opendata,highdata,lowdata,closedata=numpy.loadtxt('apple.csv',delimiter=',',usecols=(0,3,4,5),unpack=True)
>>> opendata
array([ 46.5, 47.5, 48.5, 49.5, 50.5, 51.5, 52.5, 53.5, 54.5,
55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5, 63.5,
64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5])
>>> highdata
array([ 48.99, 49.99, 50.99, 51.99, 52.99, 53.99, 54.99, 55.99,
56.99, 57.99, 58.99, 59.99, 60.99, 61.99, 62.99, 63.99,
64.99, 65.99, 66.99, 67.99, 68.99, 69.99, 70.99, 71.99,
72.99])
>>> lowdata
array([ 44.11, 45.11, 46.11, 47.11, 48.11, 49.11, 50.11, 51.11,
52.11, 53.11, 54.11, 55.11, 56.11, 57.11, 58.11, 59.11,
60.11, 61.11, 62.11, 63.11, 64.11, 65.11, 66.11, 67.11,
68.11])
>>> closedata
array([ 47.66, 48.66, 49.66, 50.66, 51.66, 52.66, 53.66, 54.66,
55.66, 56.66, 57.66, 58.66, 59.66, 60.66, 61.66, 62.66,
63.66, 64.66, 65.66, 66.66, 67.66, 68.66, 69.66, 70.66,
71.66])
>>> weekdate=numpy.loadtxt('apple.csv',delimiter=',',usecols=(2,),converters={2:datestr2num})
>>> weekdate
array([ 3., 4., 0., 1., 2., 3., 4., 0., 1., 2., 3., 4., 0.,
1., 2., 3., 4., 0., 1., 2., 3., 4., 0., 1., 2.])
>>> numpy.ravel(numpy.where(weekdate==0))[0]
2
>>> numpy.ravel(numpy.where(weekdate==4))[-1]
21
>>> weekdatearray=numpy.arange(2,22)
>>> weekdatearray
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21])
>>> weekdatearray=numpy.split(weekdatearray,4)
>>> weekdatearray
[array([2, 3, 4, 5, 6]), array([ 7, 8, 9, 10, 11]), array([12, 13, 14, 15, 16]), array([17, 18, 19, 20, 21])]
>>> def sumerize(a,o,h,l,c):
monday_open=o[a[0]]
week_high=numpy.max(numpy.take(h,a))
week_low=numpy.min(numpy.take(l,a))
friday_close=c[a[-1]]
return ("apple ",monday_open,week_high,week_low,friday_close)
>>> weeksummary=numpy.apply_along_axis(sumerize,1,weekdatearray,opendata,highdata,lowdata,closedata)
>>> weeksummary
array([['apple ', '48.5', '54.99', '46.11', '53.66'],
['apple ', '53.5', '59.99', '51.11', '58.66'],
['apple ', '58.5', '64.99', '56.11', '63.66'],
['apple ', '63.5', '69.99', '61.11', '68.66']],
dtype='|S6')
>>> numpy.savetxt('applesumeray.csv',weeksummary,delimiter=',',fmt="%s")
5.numpy.maximum()与numpy.minimum()的使用。
>>> numpy.maximum([2, 3, 4], [1, 5, 2])
array([2, 5, 4])
>>>