dataframe操作踩坑

  • 在进行groupby之后,要进行.reset_index()

  • 两个dataframe进行纵向拼接。用axis=1,即在右边多一列;横向拼接用axis = 0,即在下面多一行

new_store_11 = pd.concat([store_11,month_df],axis = 1,sort = False)
  • 在进行拼接的时候,两个表的index一定要相同,否则拼接没有意义。如:
store_11.index = [3052084, 3052085, 3052086, 3052087, 3052088, 3052089, 3052090,
            3052091, 3052092, 3052093,
            ...
            3088954, 3088955, 3088956, 3088957, 3088958, 3088959, 3088960,
            3088961, 3088962, 3088963]
month_df.index = RangeIndex(start=0, stop=36880, step=1)

改进做法是:

month_df.index = store_11.index

然后再进行拼接操作

new_store_11 = pd.concat([store_11,month_df],axis = 1,sort = False)
  • merge操作不同于dataframe,是因为用merge时,有相同的列或行,这样连接才有意义。
month_sales_all = pd.merge(month_sales_all,get_month_sales(store_unique[i+1]),how = 'left',on = ['月份']) #how = 'left'以左边的df为参考
  • python实现switch case的方法:
def jia(x,y):
    z = x+y
    return z 
def jian(x,y):
    print( x-y )
def cheng(x,y):
    print( x*y )
operator = {'+':jia,'-':jian,'*':cheng}
def fun(o,x,y):
    return operator.get(o)(x,y)
fun('+',3,4)
def mon0():
    return 0
def mon1():
    return 31
def mon2():
    return 59
def mon3():
    return 90    
def mon4():
    return 120
def mon5():
    return 151
def mon6():
    return 181
def mon7():
    return 212
def mon8():
    return 243
def mon9():
    return 273
def mon10():
    return 304
def mon11():
    return 334

operator = {'0':mon0,'1':mon1,'2':mon2,'3':mon3,
            '4':mon4,'5':mon5,'6':mon6,'7':mon7,
            '8':mon8,'9':mon9,'10':mon10,'11':mon11}
def mon(o):
    return operator.get(o)()
print(mon('6'))
    • 针对时间序列可以使用resample进行重采样,对字符串形式的时间数据转换为时间类型的数据 如:https://blog.csdn.net/wangshuang1631/article/details/52314944
####找出所有每天都被卖出的商品###
fullday_code = date[date['days_many'] == 308]

##查看每天都被卖出的cscode=30104的商品信息##——煎饼————
data_30104 = data[data['cscode'] == 30104]


##————进行重采样必须进行的操作
data_30104['filldate'] = pd.to_datetime(data_30104['filldate'])
am_day = data_30104.groupby(['filldate'])['am'].sum().reset_index()

######将filldate设为index
am_day = am_day.set_index(am_day['filldate'])
am_week = am_day['am'].resample('7D',how = 'sum')
am_mon = am_day['am'].resample('1M').sum()
am_se = am_day['am'].resample('3M').sum()

##——————画图,每季度、每月、每周销售量——————##
#plt.subplot(311)
#am_week.plot(figsize=(12,6),title='weekly_30104_am',legend=None)
#plt.subplot(312)
#am_mon.plot(figsize=(12,6),title = 'monthly_30104_am',legend = None)
#plt.subplot(313)
#am_se.plot(figsize=(12,6),title = 'sessionly_30104_am',legend = None)

pandas学习教程 pandas学习

python-pandas 时间日期的处理:https://blog.csdn.net/qq_22238533/article/details/77110626

https://www.jianshu.com/p/1e675360b494

你可能感兴趣的:(dataframe操作踩坑)