import pandas as pd
import time
data = pd.read_csv("Airpassengers.csv")
快速增加时间序列轴
df = pd.DataFrame()
df['value'] = data['value']
df['sequence_num'] = pd.date_range("1949-1", periods=len(df), freq='M')
df = df.set_index('sequence_num')
print(df.head(5))
value
sequence_num
1949-01-31 112
1949-02-28 118
1949-03-31 132
1949-04-30 129
1949-05-31 121
对dataframe进行差分,然后再去除无效值
df_tmp = df.diff().dropna()
print(df_tmp.head(5))
value
sequence_num
1949-02-28 6.0
1949-03-31 14.0
1949-04-30 -3.0
1949-05-31 -8.0
1949-06-30 14.0
数值排序,ascending=True 表示升序排列
df_tmp = df.sort_values('value', ascending=True)
print(df_tmp.head(5))
value
sequence_num
1949-11-30 104
1949-01-31 112
1950-11-30 114
1950-01-31 115
1949-02-28 118
生成空dataframe
df_col = ['a', 'b', 'c']
df_a = pd.DataFrame(columns=df_col)
print(df_a)
Empty DataFrame
Columns: [a, b, c]
Index: []
快速填充dataframe
df_b = pd.DataFrame({'p': [n for n in range(5)], 'key':[0]*5})
print(df_b)
key p
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
两个dataframe关联
df_c = pd.DataFrame({'p': [n for n in range(4)], 'key':[0]*4})
df_bc = pd.merge(df_b, df_c, on='key')
print(df_bc.head(5))
key p_x p_y
0 0 0 0
1 0 0 1
2 0 0 2
3 0 0 3
4 0 1 0