python气象日值数据,使用Python熊猫使用每日数据的月平均值

I'm a Python user but a rookie in terms of using pandas. I'm hoping to use it more as I'm getting into working with a lot of time series and I've heard they're a whole lot easier to modify with pandas. I've read through some of the tutorials but they have yet to make sense. Hoping you can help me out with an example.

I have a text file with four columns: year, month, day and snow depth. This is daily data for a 30-year period, 1979-2009. I would like to calculate 360 (30yrs X 12 months) individual monthly averages using pandas techniques (i.e. isolating all the values for Jan-1979, Feb-1979,... Dec-2009 and averaging each). Can anyone help me out with some example code?

Thanks.

1979 1 1 3

1979 1 2 3

1979 1 3 3

1979 1 4 3

1979 1 5 3

1979 1 6 3

1979 1 7 4

1979 1 8 5

1979 1 9 7

1979 1 10 8

1979 1 11 16

1979 1 12 16

1979 1 13 16

1979 1 14 18

1979 1 15 18

1979 1 16 18

1979 1 17 18

1979 1 18 20

1979 1 19 20

1979 1 20 20

1979 1 21 20

1979 1 22 20

1979 1 23 18

1979 1 24 18

1979 1 25 18

1979 1 26 18

1979 1 27 18

1979 1 28 18

1979 1 29 18

1979 1 30 18

1979 1 31 19

1979 2 1 19

1979 2 2 19

1979 2 3 19

1979 2 4 19

1979 2 5 19

1979 2 6 22

1979 2 7 24

1979 2 8 27

1979 2 9 29

1979 2 10 32

1979 2 11 32

1979 2 12 32

1979 2 13 32

1979 2 14 33

1979 2 15 33

1979 2 16 33

1979 2 17 34

1979 2 18 36

1979 2 19 36

1979 2 20 36

1979 2 21 36

1979 2 22 36

1979 2 23 36

1979 2 24 31

1979 2 25 29

1979 2 26 27

1979 2 27 27

1979 2 28 27

解决方案

You'll want to group your data by year and month, and then calculate the mean of each group. Pseudo-code:

import numpy as np

import pandas as pd

# Read in your file as a pandas.DataFrame

# using 'any number of whitespace' as the seperator

df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Show the first 5 rows of the DataFrame

print df.head()

# Group data first by year, then by month

g = df.groupby(["year", "month"])

# For each group, calculate the average of only the snow_depth column

monthly_averages = g.aggregate({"snow_depth":np.mean})

For more, about the split-apply-combine approach in Pandas, read here.

"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."

For your purposes, the difference between a numpy ndarray and a DataFrame are not too significant, but DataFrames have a bunch of functions that will make your life easier, so I'd suggest doing some reading on them.

你可能感兴趣的:(python气象日值数据)