【dataframe】增加时间序列、填充内容、笛卡尔积、差分、去除无效值、排序

import pandas as pd
import time 
data = pd.read_csv("Airpassengers.csv")

快速增加时间序列轴


df = pd.DataFrame()
df['value'] = data['value']
df['sequence_num'] = pd.date_range("1949-1", periods=len(df), freq='M')
df = df.set_index('sequence_num')
print(df.head(5))
              value
sequence_num       
1949-01-31      112
1949-02-28      118
1949-03-31      132
1949-04-30      129
1949-05-31      121

对dataframe进行差分,然后再去除无效值

df_tmp = df.diff().dropna()
print(df_tmp.head(5))
              value
sequence_num       
1949-02-28      6.0
1949-03-31     14.0
1949-04-30     -3.0
1949-05-31     -8.0
1949-06-30     14.0

数值排序,ascending=True 表示升序排列

df_tmp = df.sort_values('value', ascending=True)
print(df_tmp.head(5))
              value
sequence_num       
1949-11-30      104
1949-01-31      112
1950-11-30      114
1950-01-31      115
1949-02-28      118

生成空dataframe

df_col = ['a', 'b', 'c']
df_a = pd.DataFrame(columns=df_col)
print(df_a)
Empty DataFrame
Columns: [a, b, c]
Index: []

快速填充dataframe

df_b = pd.DataFrame({'p': [n for n in range(5)], 'key':[0]*5})
print(df_b)
   key  p
0    0  0
1    0  1
2    0  2
3    0  3
4    0  4

两个dataframe关联

df_c = pd.DataFrame({'p': [n for n in range(4)], 'key':[0]*4})
df_bc = pd.merge(df_b, df_c, on='key')
print(df_bc.head(5))
   key  p_x  p_y
0    0    0    0
1    0    0    1
2    0    0    2
3    0    0    3
4    0    1    0

你可能感兴趣的:(python)