本文不含DF切片、选择、赋值等操作,仅涉及各method。
purpose | func | return | note |
---|---|---|---|
solution | fsolve(func,x0,args) | unpack[0] | 一元一次、二元一次… |
solution | bisect(func,a,b) | func(a)*func(b)<0 | |
optimize | minimize(func,x0,bounds,constraints) | res.fun=minimum value, res.x=x result |
consts:type ineq$\to \geq$0 bds:inf=None |
math | derivative(func,x0,dx,n,args) | n=nth order | |
math | quad(func,a,b,args) | unpack[0] | a=low |
math | dblquad(func,a,b,gfun,hfun) | unpack[0] | func(x,y)会先对x积分, 对应ground gfun=lambda x:3 |
reg | interp1d(x,y,kind) | 必过全部点 kind=‘quadratic’,‘cubic’ |
|
reg | np.polyfit(x,y,deg) polyval(paramt,xnew) |
paramt是系数array 顺序:最高 → \to →最低order+intercept |
>>> return: array([2.,])
func(a)*func(b)<0
constraints = (
{'type':'ineq','fun':lambda x:x[0]+x[1]},
{'type':'eq','fun':lambda x:x[0]+x[1]}
)
bounds = (
(0,None),
(None,None)
)
type = ‘ineq’ → \to → ≥ 0 \geq 0 ≥0
n=nth derivative order
dx=accuracy
DF.apply(
lambda x:derivative(
lambda s: BS(*(x[:4]),s,x[5]),
x[4],
dx=0.01
),
axis=1
)
a=lower bound
b=higher bound
quand(fun,0,np.inf)
# return a tuple of rst and error
>>> return (21.33333,2.34e-13)
gfun=lower bound function, even for a int
hfun=higher bound function
for below function:
f u n ( y , x ) = ∫ 0 2 ∫ 0 1 x y 2 d y d x fun(y,x)=\int_0^2 \int_0^1xy^2dydx fun(y,x)=∫02∫01xy2dydx
int order is first y then x, put y ahead of x in lambda function(fun).
# put y ahead of x for real order
fun = lambda y,x: y**2*x
# always first lower bound then higher bound
dblquad(fun,0,2,lambda x:0,lambda x:1)
# return a tuple of rst and error
>>> return (0.66667,7.401e-15)
It returns a function, just call the function
from scipy.interpolate import interp1d
x=np.arange(0,10)
y=np.exp(-x/3.0)
f=interp1d(x,y)
xnew=np.arange(0,9,0.1)
ynew=f(xnew)
plt.plot(x,y,'o',xnew,ynew,'-')
It returns parameters of regression equation, use polyval to use it.
polyfit result = [ x k x^k xk-coef, x k − 1 x^{k-1} xk−1-coef, … \dots …, x x x-coef, intercept]
p2 = np.polyfit(x,y,2)
f2 = interp1d(x,y,kind='quadratic')
p3 = np.polyfit(x,y,3)
f3 = interp1d(x,y,kind='cubic')
Purpose | Method | args | note |
---|---|---|---|
generate | pd.read_csv | header, name, index_col, phrase_date(日期不是index是否还phrase?), sep=’,’ | 注意路径双反斜杠 重名column变成a,a.1 重名包括和index name重名 |
generate | np.linspace | start,stop,num | 分割成n个点,n-1段 |
generate | np.ones/zeros | np.ones(3)/np.ones((3,3)) 1d可直接数字,2d必须tuple | |
gen rand | np.random.rand/randn | rand(3,3),randn(3,3,3) | |
gen rand | np.random.random/standard_normal | random((3,3))/standard_normal((3,3)) | |
gen rand | np.random.seed | ||
transform | np.arange | start,stop(取不到),step | |
transform | np.append df.append |
np默认flatten后append,df只能纵向增加条目 | |
transform | np.concatenate df.concat |
list of tables, axis | |
transform | array.astype, df.astype | change element type | |
label | Series.rename(name/dict/func) df.rename(columns/index) |
Series: name: 改Sereis.name func/dict: 改label(index) DF: func/dict: 改columns/index |
DF.rename(index=str)转成str |
del | df.drop | label(index like的东西??),axis=0,inplace | del 无法删除行,因为无法用.loc |
del | df.dropna | how,thresh,subset,inplace | thresh=2: not Nan value >= 2 how=any 全部nan都干掉,all=全部是nan再干掉 |
op | np.reshape | array.reshape(4,-1) | |
op | df.rolling | window,min_periods,axis | df[‘x’].rolling(10).mean() |
op | df.diff | ||
cal | np.any, np.all | axis=None | np.any(array, axis=0) |
cal | np.round | np.round(array,3) | |
cal | np.isin vs df.isin vs Series.isin | np.isin/Series.isin 不考虑label, df.isin考虑label | |
cal np | exp/sqrt/mean/std/ sum/cumsum/prod/cumprod |
||
cal np | maximum | element wise comparison | |
fancy | df.apply | func, axis, result_type=None,'broadcast’输入=输出shape,'expand’返回的是df, args(给apply的method用) |
只能by row/columns使用 |
fancy | df.applymap | 对所有元素使用 | |
fancy | np.tile | tile(array,(2,2)),(2,2)是scaler,在x,y维度翻倍 | |
fancy | df.where df.mask |
where: 不满足条件的用nan/指定table覆盖 mask: 满足条件的覆盖nan/指定table覆盖 |
|
sort | df.sort_values, np.sort,np.argsort |
df: by, axis, ascending, np.sort: default: ascending, inplace change; np.argsort: default: ascending |
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(20).reshape(5,-1),columns=list('abcd'),index=list('abcde'))
# expand: specify name with dict
cal_df=data.apply(lambda x:{'sum':sum(x),'std':np.std(x)},
axis=1,
result_type='expand')
# expand: list of function also work, column name = function name, lambda func name =
data.apply([np.sum,np.std], axis=1, result_type='expand')
# if want to concat results, jsut use pd.concat
rst = pd.concat([data,data.apply(lambda x:{'sum':sum(x),'std':np.std(x)},
axis=1,
result_type='expand')],
axis=1)
# 以下代码会输出DF!!!!
# 因为每行输出结果都是Series,正好多行黏在一起就成了DF
data.apply(lambda x: pd.Series([x['a']**2, x['c']**2+np.sum(x)],index=['cc','dd']),axis=1)
obj | args | eg |
---|---|---|
pd.Series | index,name(默认不显示) | pd.Series([1,2,3],index=list('abc'),name='e') |
# a function return functions
fun = lambda a,b: lambda x: a*x+b
need to def a function to deal with.
# 系统逻辑是先读取等号右边的数据,存内存,然后再在左边assign
data['a'], data['b'] = data['b'].copy(), data['a'].copy()
# 以下也work
data['a'], data['b'] = data['b'], data['a'].copy()
# 以下能成功是奇怪的,因为suppose会match label,即永远b to b, a to a
# fancy indexing可,未match label
data[['a','b']]=data[['b','a']]
data[['a','b']]=data.loc[:,['b','a']]
# loc, iloc会match label, 无法直接操作
data.loc[:,['a','b']]=data.loc[:,['b','a']] # 失败
data.loc[:,['a','b']]=data[['b','a']] # 失败
data.loc[:,['a','b']]=data[['b','a']].values # 成功,清除label后可操作
np.random.randn(3/1)# error, 因为3/1返回float,需要3//1
# 2
# N+1因为要从T时刻回归到0时刻,所以一共N+1行
fc = np.zeros((N+1,N+1))
fp = np.zeros((N+1,N+1))
# 一共产生N+1个数据
j=np.arange(0,N+1,1)
Ss=S*(u**j)*(d**(N-j))
# 处理最后一期数据,这里用N合适,因为对应第N期,回归到第0期
fc[N,:]=np.maximum(0,Ss-K)
fp[N,:]=np.maximum(0,K-Ss)
# 3
p1=1-p
ert=np.exp(-r*dt)
# range可以从任意位置开始,包括-1
for i in range(N-1,0-1,-1):
# 上一期价格=下一期上升和下降期望
# 一定要选中而不是直接对行内每个元素都这么操作的原因?
# 避免0和last value?
fc[i,0:i+1]=ert*(p*fc[i+1,0+1:i+1+1]+p1*fc[i+1,0:i+1])
fp[i,0:i+1]=ert*(p*fp[i+1,0+1:i+1+1]+p1*fp[i+1,0:i+1])
# 4
c=fc[0,0]
p=fp[0,0]
比较两个数据表格(shape同)并提取较大的,nan改成0。两边数据都有nan,两个数据label不同
max ( d a t a , d a t a 1 , 0 ) where 0 will replace nan \max(data,data1,0)\quad \textnormal{where 0 will replace nan} max(data,data1,0)where 0 will replace nan
np.where/mask
# 题干
data = pd.DataFrame(np.arange(9).reshape(3,-1),columns=list('abc'))
data1 = pd.DataFrame(np.arange(9).reshape(3,-1),columns=list('efg'))
# data 奇数=nan
data=data.mask(data%2==1)
# data1 3的倍数=nan
data1=data1.mask(data1%3==0)
# 解答
np.maximum(data.mask(data.isna(),data1.values),data1.mask(data1.isna(),data.values)).fillna(0)
type | plt | ax | fig |
---|---|---|---|
Title | plt.title(name) | ax.set_title(name) | |
x/y-label | plt.ylabel(name) | ax.set_ylabel(name) | |
limit | plt.ylim((0,1.3)) | ||
save | fig.savefig(path) | ||
config | plt.rcParams[‘figure.figsize’]=[12,8] |
‘-’
‘–’
‘-.’
‘:’
‘o’ dot
# plt.plot
plt.plot(x,y,'k-',x1,y1,'go')
# DF.plot
DF.plot(ax=ax,style=['bo'])
# plt
plt.plot(x,y,'ko')
# DF scatter
DF.plot.scatter(
ax=ax,x='xname',y='yname',
style=['go'], label='y-axisname')
# 1. DF can be assigned to Series?
df1['D']=df1[['D']].rolling(3).mean()
# 2. 2D array slicing
array[[1,2,3],[1,1,1]]# 提取的是1-1,2-1,3-1的value