Python in Finance

本文不含DF切片、选择、赋值等操作,仅涉及各method。

Science solutions

purpose func return note
solution fsolve(func,x0,args) unpack[0] 一元一次、二元一次…
solution bisect(func,a,b) func(a)*func(b)<0
optimize minimize(func,x0,bounds,constraints) res.fun=minimum value,
res.x=x result
consts:type ineq$\to \geq$0
bds:inf=None
math derivative(func,x0,dx,n,args) n=nth order
math quad(func,a,b,args) unpack[0] a=low
math dblquad(func,a,b,gfun,hfun) unpack[0] func(x,y)会先对x积分,
对应ground gfun=lambda x:3
reg interp1d(x,y,kind) 必过全部点
kind=‘quadratic’,‘cubic’
reg np.polyfit(x,y,deg)
polyval(paramt,xnew)
paramt是系数array
顺序:最高 → \to 最低order+intercept

find solution

scipy.optimize.fsolve(func,x0,args)

>>> return: array([2.,])

scipy.optimize.bisect(func,a,b)

func(a)*func(b)<0

find best

scipy.optimize.minimize(func,x0,bounds,constraints)

constraints = (
    {'type':'ineq','fun':lambda x:x[0]+x[1]},
    {'type':'eq','fun':lambda x:x[0]+x[1]}
)
bounds = (
    (0,None),
    (None,None)
)

type = ‘ineq’ → \to ≥ 0 \geq 0 0

Derivation

scipy.misc.derivative(func,x0,dx=1.0,n=1,args)

n=nth derivative order
dx=accuracy

DF.apply(
    lambda x:derivative(
        lambda s: BS(*(x[:4]),s,x[5]),
        x[4],
        dx=0.01
        ),
    axis=1
)

scipy.integrade.quad(func,a,b,args)

a=lower bound
b=higher bound

quand(fun,0,np.inf)
# return a tuple of rst and error
>>> return (21.33333,2.34e-13) 

scipy.integrade.dblquad(func,a,b,gfun,hfun)

gfun=lower bound function, even for a int
hfun=higher bound function
for below function:
f u n ( y , x ) = ∫ 0 2 ∫ 0 1 x y 2 d y d x fun(y,x)=\int_0^2 \int_0^1xy^2dydx fun(y,x)=0201xy2dydx
int order is first y then x, put y ahead of x in lambda function(fun).

# put y ahead of x for real order
fun = lambda y,x: y**2*x
# always first lower bound then higher bound
dblquad(fun,0,2,lambda x:0,lambda x:1)
# return a tuple of rst and error
>>> return (0.66667,7.401e-15) 

Interpolation

scipy.interpolate.interp1d(x,y,kind=‘linear’) CHECK 1Cp65

It returns a function, just call the function

from scipy.interpolate import interp1d
x=np.arange(0,10)
y=np.exp(-x/3.0)
f=interp1d(x,y)
xnew=np.arange(0,9,0.1)
ynew=f(xnew)
plt.plot(x,y,'o',xnew,ynew,'-')

numpy.polyfit(x,y,deg), polyval(parameters,xnew)

It returns parameters of regression equation, use polyval to use it.
polyfit result = [ x k x^k xk-coef, x k − 1 x^{k-1} xk1-coef, … \dots , x x x-coef, intercept]

p2 = np.polyfit(x,y,2)
f2 = interp1d(x,y,kind='quadratic')
p3 = np.polyfit(x,y,3)
f3 = interp1d(x,y,kind='cubic')

table operations

Purpose Method args note
generate pd.read_csv header, name, index_col, phrase_date(日期不是index是否还phrase?), sep=’,’ 注意路径双反斜杠
重名column变成a,a.1
重名包括和index name重名
generate np.linspace start,stop,num 分割成n个点,n-1段
generate np.ones/zeros np.ones(3)/np.ones((3,3)) 1d可直接数字,2d必须tuple
gen rand np.random.rand/randn rand(3,3),randn(3,3,3)
gen rand np.random.random/standard_normal random((3,3))/standard_normal((3,3))
gen rand np.random.seed
transform np.arange start,stop(取不到),step
transform np.append
df.append
np默认flatten后append,df只能纵向增加条目
transform np.concatenate
df.concat
list of tables, axis
transform array.astype, df.astype change element type
label Series.rename(name/dict/func)
df.rename(columns/index)
Series: name: 改Sereis.name
func/dict: 改label(index)
DF: func/dict: 改columns/index
DF.rename(index=str)转成str
del df.drop label(index like的东西??),axis=0,inplace del 无法删除行,因为无法用.loc
del df.dropna how,thresh,subset,inplace thresh=2: not Nan value >= 2
how=any 全部nan都干掉,all=全部是nan再干掉
op np.reshape array.reshape(4,-1)
op df.rolling window,min_periods,axis df[‘x’].rolling(10).mean()
op df.diff
cal np.any, np.all axis=None np.any(array, axis=0)
cal np.round np.round(array,3)
cal np.isin vs df.isin vs Series.isin np.isin/Series.isin 不考虑label, df.isin考虑label
cal np exp/sqrt/mean/std/
sum/cumsum/prod/cumprod
cal np maximum element wise comparison
fancy df.apply func, axis,
result_type=None,'broadcast’输入=输出shape,'expand’返回的是df,
args(给apply的method用)
只能by row/columns使用
fancy df.applymap 对所有元素使用
fancy np.tile tile(array,(2,2)),(2,2)是scaler,在x,y维度翻倍
fancy df.where
df.mask
where: 不满足条件的用nan/指定table覆盖
mask: 满足条件的覆盖nan/指定table覆盖
sort df.sort_values,
np.sort,np.argsort
df: by, axis, ascending,
np.sort: default: ascending, inplace change;
np.argsort: default: ascending

DF.apply high level skills

import pandas as pd
import numpy as np

data = pd.DataFrame(np.arange(20).reshape(5,-1),columns=list('abcd'),index=list('abcde'))

# expand: specify name with dict
cal_df=data.apply(lambda x:{'sum':sum(x),'std':np.std(x)}, 
                    axis=1, 
                    result_type='expand')
# expand: list of function also work, column name = function name, lambda func name = 
data.apply([np.sum,np.std], axis=1, result_type='expand')

# if want to concat results, jsut use pd.concat
rst = pd.concat([data,data.apply(lambda x:{'sum':sum(x),'std':np.std(x)}, 
                    axis=1, 
                    result_type='expand')],
                axis=1)

# 以下代码会输出DF!!!!
# 因为每行输出结果都是Series,正好多行黏在一起就成了DF
data.apply(lambda x: pd.Series([x['a']**2, x['c']**2+np.sum(x)],index=['cc','dd']),axis=1)

Table Elements

obj args eg
pd.Series index,name(默认不显示) pd.Series([1,2,3],index=list('abc'),name='e')

Other

looped lambda function

# a function return functions
fun = lambda a,b: lambda x: a*x+b

sqrt for negative numbers

need to def a function to deal with.

assigning value simutaniously

# 系统逻辑是先读取等号右边的数据,存内存,然后再在左边assign
data['a'], data['b'] = data['b'].copy(), data['a'].copy()
# 以下也work
data['a'], data['b'] = data['b'], data['a'].copy()

# 以下能成功是奇怪的,因为suppose会match label,即永远b to b, a to a
# fancy indexing可,未match label
data[['a','b']]=data[['b','a']] 
data[['a','b']]=data.loc[:,['b','a']] 

# loc, iloc会match label, 无法直接操作
data.loc[:,['a','b']]=data.loc[:,['b','a']] # 失败
data.loc[:,['a','b']]=data[['b','a']] # 失败
data.loc[:,['a','b']]=data[['b','a']].values # 成功,清除label后可操作

仅接受整型的情况

np.random.randn(3/1)# error, 因为3/1返回float,需要3//1

Sample Codes

Binomial Tree (ndarray operations)

# 2
# N+1因为要从T时刻回归到0时刻,所以一共N+1行
fc = np.zeros((N+1,N+1)) 
fp = np.zeros((N+1,N+1))
# 一共产生N+1个数据
j=np.arange(0,N+1,1)
Ss=S*(u**j)*(d**(N-j))
# 处理最后一期数据,这里用N合适,因为对应第N期,回归到第0期
fc[N,:]=np.maximum(0,Ss-K)
fp[N,:]=np.maximum(0,K-Ss)
# 3
p1=1-p
ert=np.exp(-r*dt)
# range可以从任意位置开始,包括-1
for i in range(N-1,0-1,-1):
    # 上一期价格=下一期上升和下降期望
    # 一定要选中而不是直接对行内每个元素都这么操作的原因?
    # 避免0和last value?
    fc[i,0:i+1]=ert*(p*fc[i+1,0+1:i+1+1]+p1*fc[i+1,0:i+1])
    fp[i,0:i+1]=ert*(p*fp[i+1,0+1:i+1+1]+p1*fp[i+1,0:i+1])
# 4
c=fc[0,0]
p=fp[0,0]

Self-designed

比较两个数据表格(shape同)并提取较大的,nan改成0。两边数据都有nan,两个数据label不同
max ⁡ ( d a t a , d a t a 1 , 0 ) where 0 will replace nan \max(data,data1,0)\quad \textnormal{where 0 will replace nan} max(data,data1,0)where 0 will replace nan
np.where/mask

# 题干
data = pd.DataFrame(np.arange(9).reshape(3,-1),columns=list('abc'))
data1 = pd.DataFrame(np.arange(9).reshape(3,-1),columns=list('efg'))
# data 奇数=nan
data=data.mask(data%2==1)
# data1 3的倍数=nan
data1=data1.mask(data1%3==0)

# 解答
np.maximum(data.mask(data.isna(),data1.values),data1.mask(data1.isna(),data.values)).fillna(0)

Matplotlib

type plt ax fig
Title plt.title(name) ax.set_title(name)
x/y-label plt.ylabel(name) ax.set_ylabel(name)
limit plt.ylim((0,1.3))
save fig.savefig(path)
config plt.rcParams[‘figure.figsize’]=[12,8]

line styles

‘-’
‘–’
‘-.’
‘:’
‘o’ dot

plot

# plt.plot
plt.plot(x,y,'k-',x1,y1,'go')
# DF.plot
DF.plot(ax=ax,style=['bo'])

scatter

# plt
plt.plot(x,y,'ko')

# DF scatter
DF.plot.scatter(
    ax=ax,x='xname',y='yname',
    style=['go'], label='y-axisname')

check

# 1. DF can be assigned to Series?
df1['D']=df1[['D']].rolling(3).mean()

# 2. 2D array slicing
array[[1,2,3],[1,1,1]]# 提取的是1-1,2-1,3-1的value

你可能感兴趣的:(Python)