python时间序列滞后命令,时间序列-相关性和滞后时间

I am studying the correlation between a set of input variables and a response variable, price. These are all in time series.

1) Is it necessary that I smooth out the curve where the input variable is cyclical (autoregressive)? If so, how?

2) Once a correlation is established, I would like to quantify exactly how the input variable affects the response variable.

Eg: "Once X increases >10% then there is an 2% increase in y 6 months later."

Which python libraries should I be looking at to implement this - in particular to figure out the lag time between two correlated occurrences?

Example:

I already looked at: statsmodels.tsa.ARMA but it seems to deal with predicting only one variable over time. In scipy the covariance matrix can tell me about the correlation, but does not help with figuring out the lag time.

解决方案

While part of the question is more statistics based, the bit about how to do it in Python seems at home here. I see that you've since decided to do this in R from looking at your question on Cross Validated, but in case you decide to move back to Python, or for the benefit of anyone else finding this question:

I think you were in the right area looking at statsmodels.tsa, but there's a lot more to it than just the ARMA package:

In particular, have a look at statsmodels.tsa.vector_ar for modelling multivariate time series. The documentation for it is available here:

The page above specifies that it's for working with stationary time series - I presume this means removing both trend and any seasonality or periodicity. The following link is ultimately readying a model for forecasting, but it discusses the Box-Jenkins approach for building a model, including making it stationary:

You'll notice that link discusses looking for autocorrelations (ACF) and partial autocorrelations (PACF), and then using the Augmented Dickey-Fuller test to test whether the series is now stationary. Tools for all three can be found in statsmodels.tsa.stattools. Likewise, statsmodels.tsa.arma_process has ACF and PACF.

The above link also discusses using metrics like AIC to determine the best model; both statsmodels.tsa.var_model and statsmodels.tsa.ar_model include AIC (amongst other measures). The same measures seem to be used for calculating lag order in var_model, using select_order.

In addition, the pandas library is at least partially integrated into statsmodels and has a lot of time series and data analysis functionality itself, so will probably be of interest. The time series documentation is located here:

你可能感兴趣的:(python时间序列滞后命令)