Pandas使用技巧

添加列并逐行设置值

import tushare as ts
import time
import pandas as pd

def statisticsfordayofweek(code):
    marketdata = ts.get_k_data(code)
    for index, row in marketdata.iterrows():
        marketdata.loc[index, 'dayofweek'] = time.strftime('%A', time.strptime(row.date, "%Y-%m-%d"))
    downdata = marketdata[marketdata.close < marketdata.open]
    dayweekgroup = downdata['dayofweek'].groupby(
        downdata['dayofweek'].map(lambda  x: x[0:4]))\
        .count()
    dayweekgroup.plot(kind="bar")
    plt.rcParams['font.sans-serif'] = ['SimHei']
    plt.rcParams['axes.unicode_minus'] = False
    plt.show()

从DataFrame获取特定列数据

data = pd.read_csv('./data/optdigits.tra', header=None)
#注意iloc的用法,这里表示x为所有行,索引为0~63,共64列的数据。
#python3中,使用data[range(64)].values,会报slice不正确的异常
x, y = data.iloc[:,range(64)].values, data.iloc[:, 64].values
images = x.reshape(-1, 8, 8)
y = y.ravel().astype(np.int)

完全构建新的DataFrame,并添加数据

dfresult = pd.DataFrame(columns=('domicile',
                                         'universe',
                                         'rawname',
                                         'suggestid',
                                         'legalname',
                                         'Similarity%'))
dfresult.loc[0] = {'domicile': 'UK',
                           'universe':'ETF',
                           'rawname': 'IL Bright Start College Savings (Advisor) Advisor Age Based 15-17 Yrs Port',
                           'suggestid':'F123456ABC',
                            'legalname':'IL Bright Start College Savings (Advisor) Advisor Age Based',
                            'Similarity%':'81.75'}

导出csv时,解决中文乱码问题

dfresult.to_csv('./output/result_%s.csv' % searchdate, encoding='utf_8_sig')

解决merge过程中,数据类型不匹配问题

dfresult = pd.DataFrame(columns=('documentid',
                                     'effectivedatehat'))
    indexfordf = 0
    for docid in docdate:
        dfresult.loc[indexfordf] = {'documentid': docid,
                                    'effectivedatehat': docdate[docid]
                                    }
        indexfordf += 1

    dfresult['documentid'] = dfresult['documentid'].apply(int)
    dfsample = pd.read_csv('./output/sample/rawsupplementdocwitheffectivedate.csv', encoding='utf-8')
    dfsample['documentid'] = dfsample['documentid'].apply(int)
    dfmerge = pd.merge(dfsample, dfresult, on = ['documentid'], how='left')

你可能感兴趣的:(Pandas使用技巧)