LIQING LIN

pff1_whylog return Nominal Inflation_CPI_Realized Volati_outlier_distplot_Jarque–Bera_pAcf_sARIMAx

Simple returns VS Log returns

There are two types of returns:

Simple returns(or raw returns): They aggregate over assets; the simple return of a portfolio is the weighted sum of the returns of the individual assets in the portfolio. Simple returns are defined as:
$\small R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1$
pct_change() command that gives us the percentage change over the prior period values
Log returns: They aggregate over time; it is easier to understand with the help of an example—the log return for a given month is the sum of the log returns of the days within that month. Log returns are defined as:
$\small r_t = log(\frac{P_t}{P_{t-1}}) = log(P_t) - log(P_{t-1})$ Note: the logarithmic base to use. Default is 'e'

Why use log returns over simple returns? There are several reasons, but the most important of them is normalization, and this avoids the problem of negative prices.

Benefit of using returns, versus prices, is normalization: measuring all variables in a comparable metric, thus enabling evaluation of analytic relationships amongst two or more variables despite originating from price series of unequal values. This is a requirement for many multidimensional statistical analysis and machine learning techniques. For example, interpreting an equity covariance matrix解释股权协方差矩阵 is made sane when the variables are both measured in percentage.

Several benefits of using log returns, both theoretic and algorithmic.
- First, log-normality对数正态性: if we assume that prices are distributed log normally (which, in practice, may or may not be true for any given price series), then $\small log(1+r_i)$ is conveniently normally distributed, because:
  $1+r_t = \frac{P_t}{P_{t-1}} = exp^{ log(\frac{P_t}{P_{t-1}}) }$
  This is handy given much of classic statistics presumes normality.
- Second, approximate raw-log equality近似原始对数相等: when returns are very small (common for trades with short holding durations), the following approximation ensures they are close in value to raw returns(simple return):
  $log(1+r_t) = log(\frac{P_t}{P_{t-1}}) \approx r_t$ , $r_t\ll 1$
- Third, time-additivity: consider an ordered sequence of n trades. A statistic frequently calculated from this sequence is the compounding return, which is the running return of this sequence of trades over time:
  $(1+r_1) (1+r_2) \cdots (1+r_n) = \prod_{i}^{} (1+r_i)$
  This formula is fairly unpleasant, as probability theory reminds us the product of normally-distributed variables is not normal. Instead, the sum of normally-distributed variables is normal (important technicality: only when all variables are uncorrelated), which is useful when we recall the following logarithmic identity: $log(1+r_t) = log(\frac{P_t}{P_t-1}) = log(P_t) - log(P_{t-1})$
  Thus, compounding returns are normally distributed. Finally, this identity leads us to a pleasant algorithmic benefit; a simple formula for calculating compound returns:
  $\small \sum_{i}^{} log(1+r_i) = log(1+r_1) + log(1+r_2) + \cdots + log(1+r_n) = log(P_n) - log(P_0)$
  Thus, the compound return over n periods is merely the difference in log between initial and final periods. In terms of algorithmic complexity, this simplification reduces O(n) multiplications to O(1) additions. This is a huge win for moderate to large n. Further, this sum is useful for cases in which returns diverge from normal, as the central limit theorem reminds us that the sample average of this sum will converge to normality (presuming finite first and second moments).
- Fourth, mathematical ease: from calculus, we are reminded (ignoring the constant of integration): $\small e^x = \int e^xdx = \frac{\mathrm{d} }{\mathrm{d} x} e^x = e^x$
  This identity is tremendously useful, as much of financial mathematics is built upon continuous time stochastic processes which rely heavily upon integration and differentiation.
- Fifth, numerical stability: addition of small numbers is numerically safe, while multiplying small numbers is not as it is subject to arithmetic underflow. For many interesting problems, this is a serious potential problem. To solve this, either the algorithm must be modified to be numerically robust or it can be transformed into a numerically safe summation via logs.
- As suggested by John Hall, there are downsides to using log returns. Here are two recent papers to consider (along with their references):
  - Comparing Security Returns is Harder than You Think: Problems with Logarithmic Returns, by Hudson (2010)
  - Quant Nugget 2: Linear vs. Compounded Returns – Common Pitfalls in Portfolio Management, by Meucci (2010)
The difference between simple and log returns for daily/intraday data will be very small, however, the general rule is that log returns are smaller in value than simple returns.
$\small P_t$ is the price of an asset in time t. In the preceding case, we do not consider dividends, which obviously impact the returns and require a small modification of the formulas.

The best practice while working with stock prices is to use adjusted values, as they account for possible corporate actions, such as stock splits.

def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=True, 
             group_by='column', auto_adjust=False, back_adjust=False, repair=False, keepna=False,
             progress=True, period="max", show_errors=True, interval="1d", prepost=False,
             proxy=None, rounding=False, timeout=10, **kwargs):
    """Download yahoo tickers
    :Parameters:
        tickers : str, list
            List of tickers to download
        period : str
            Valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
            Either Use period parameter or use start and end
        interval : str
            Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
            Intraday data cannot extend last 60 days
        start: str
            Download start date string (YYYY-MM-DD) or _datetime.
            Default is 1900-01-01
        end: str
            Download end date string (YYYY-MM-DD) or _datetime.
            Default is now
        group_by : str
            Group by 'ticker' or 'column' (default)
        prepost : bool
            Include Pre and Post market data in results?
            Default is False
        auto_adjust: bool
            Adjust all OHLC automatically? Default is False
        repair: bool
            Detect currency unit 100x mixups and attempt repair
            Default is False
        keepna: bool
            Keep NaN rows returned by Yahoo?
            Default is False
        actions: bool
            Download dividend + stock splits data. Default is False
        threads: bool / int
            How many threads to use for mass downloading. Default is True
        ignore_tz: bool
            When combining from different timezones, ignore that part of datetime.
            Default is True
        proxy: str
            Optional. Proxy server URL scheme. Default is None
        rounding: bool
            Optional. Round values to 2 decimal places?
        show_errors: bool
            Optional. Doesn't print errors if False
        timeout: None or float
            If not None stops waiting for a response after given number of
            seconds. (Can also be a fraction of a second e.g. 0.01)

In this recipe, we show how to calculate both types of returns using Apple's stock prices.

import pandas as pd
import numpy as np
import yfinance as yf

# progress=True : Progress will show a progress bar.
# auto_adjust will overwrite the Close
# Adjust all OHLC automatically? Default is False
df = yf.download( 'AAPL',
                  start='2000-01-01',
                  end='2010-12-31',
                  progress=False
                )
df

aapl = yf.download( 'AAPL',
                    start='2010-01-01',
                    #end='2010-12-31',
                    progress=False
                  )
aapl

aapl.append(df).sort_index()

aapl.to_csv('aapl')

3. Calculate the simple $\small R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1$ and
log returns $\small r_t = log(\frac{P_t}{P_{t-1}}) = log(P_t) - log(P_{t-1})$ using the adjusted close prices:

df['simple_rtn'] = df['Adj Close'].pct_change() # (p_t - p_t_previous)/p_t_previous
df['log_rtn'] = np.log( df['Adj Close']/df['Adj Close'].shift(1) )
df

To calculate the simple returns, we used the pct_change method of pandas Series/DataFrame, which calculates the percentage change between the current and prior element (we can specify the number of lags, but for this specific case, the default value of 1 suffices).
To calculate the log returns, we followed the formula given in the introduction to this recipe. When dividing each element of the series by its lagged value, we used the shift method with a value of 1 to access the prior element. In the end, we took the natural logarithm of the divided values by using np.log .

The first row will always contain a not a number (NaN) value, as there is no previous price to use for calculating the returns.

We will also discuss how to account for inflation in the returns series如何在收益系列中解释通货膨胀. To do so, we continue with the example used in this recipe.

What is an inflation rate?

Inflation rate is the measure of the increase or rate of increase in the general price of selected goods and services over a determined period. Inflation can indicate a decline in the purchasing power or value of a nation's currency and is typically recorded and reported as a percentage.

Inflation rate is important because as the average cost of items increases, currency loses value as it takes more and more funds to acquire the same goods and services as before. This fluctuation in the value of the dollar impacts the cost of living and adversely affects the economy leading to slower economic growth.

The consumer price index

The consumer price index (CPI) is a measure of the overall cost of the goods and services bought by a typical consumer.

The consumer price index is used to monitor changes in the cost of living over time. When the consumer price index rises, the typical family has to spend more money to maintain the same standard of living. Economists use the term inflation to describe a situation in which the economy’s overall price level is rising. The inflation rate is the percentage change in the price level from the previous period. The preceding chapter showed how economists can measure inflation using the GDP deflator. The inflation rate you are likely to hear on the nightly news, however, is calculated from the consumer price index, which better reflects the goods and services bought by consumers.

When the Bureau of Labor Statistics (BLS) calculates the consumer price index and the inflation rate, it uses data on the prices of thousands of goods and services. To see exactly how these statistics are constructed, let’s consider a simple economy in which consumers buy only two goods: hot dogs and hamburgers. Table 1 shows the five steps that the BLS follows.

1. Fix the basket. Determine which prices are most important to the typical consumer. If the typical consumer buys more hot dogs than hamburgers, then the price of hot dogs is more important than the price of hamburgers and, therefore, should be given greater weight in measuring the cost of living. The BLS sets these weights by surveying consumers to find the basket of goods and services bought by the typical consumer. In the example in the table, the typical consumer buys a basket of 4 hot dogs and 2 hamburgers.
2. Find the prices. Find the prices of each of the goods and services in the basket at each point in time. The table shows the prices of hot dogs and hamburgers for three different years.
3. Compute the basket’s cost. Use the data on prices to calculate the cost of the basket of goods and services at different times. The table shows this calculation for each of the three years. Notice that only the prices in this calculation change. By keeping the basket of goods the same (4 hot dogs and 2 hamburgers), we are isolating the effects of price changes from the effects of any quantity changes that might be occurring at the same time.
4. Choose a base year and compute the consumer price index (CPI). Designate one year as the base year, the benchmark against which other years are compared. (The choice of base year is arbitrary, as the index is used to measure changes in the cost of living.) Once the base year is chosen, the index is calculated as follows:

The consumer price index is 175 in 2014. This means that the price of the basket in 2014 is 175 percent of its price in the base year
the consumer price index is 250 in 2015, indicating that the price level in 2015 is 250 percent of the price level in the base year.
5. Compute the inflation rate. Use the consumer price index to calculate the inflation rate, which is the percentage change in consumer price index (CPI) from the preceding period. That is, the inflation rate between two consecutive years is computed as follows:

the inflation rate in our example is 75 percent in 2014 and 43 percent in 2015.

In addition to the consumer price index (CPI) for the overall economy, the BLS calculates several other price indexes. It reports the index for specific metropolitan areas特定大都市地区 within the country (such as Boston, New York, and Los Angeles) and for some narrow categories of goods and services (such as food, clothing, and energy). It also calculates the producer price index (PPI), which measures the cost of a basket of goods and services bought by firms rather than consumers. Because firms eventually pass on their costs to consumers in the form of higher consumer prices, changes in the producer price index are often thought to be useful in predicting changes in the consumer price index.

#############
https://data.stats.gov.cn/easyquery.htm?cn=A01

https://data.stats.gov.cn/english/ks.htm?cn=A01

#############USA Consumer Price Index (CPI) | Inflation Rate and Consumer Price Index

What Is the Inflation-Adjusted Return?

The inflation-adjusted return is the measure of return that takes into account the time period's inflation rate. The purpose of the inflation-adjusted return metric is to reveal the return on an investment after removing the effects of inflation.

Removing the effects of inflation from the return of an investment allows the investor to see the true earning potential of the security看到证券的真正盈利潜力 without external economic forces. The inflation-adjusted return is also known as the real rate of return or required rate of return adjusted for inflation(The real rate of return on an investment measures the increase in purchasing power that the investment provides.).

The inflation-adjusted return accounts for the effect of inflation on an investment's performance over time.
Also known as the real return, the inflation-adjusted return provides a more realistic comparison of an investment's performance.
Inflation will lower the size of a positive return and increase the magnitude of a loss.

The inflation-adjusted return is useful for comparing investments, especially between different countries because each country's inflation rate is accounted for in the return. In this scenario, without adjusting for inflation across international borders, an investor may get vastly different results when analyzing an investment's performance. The Inflation-adjusted return serves as a more realistic measure of an investment's return when compared to other investments.

example

assume a bond investment is reported to have earned 2% in the previous year. This appears like a gain. However, suppose that inflation last year was 2.5%. Essentially, this means the investment did not keep up with inflation, and it effectively lost 0.5%.

Assume also a stock that returned 12% last year and inflation was 3%. An approximate estimate of the real rate of return is 9%, or the 12% reported return less the inflation amount (3%).

We first download the monthly Consumer Price Index (CPI) values from Quandl and calculate the percentage change (simple return) in the index. We can then merge the inflation data with Apple's stock returns通胀数据与苹果的股票收益合并, and account for inflation计算通胀 by using the following formula:
$\small R_t ^{r} = \frac{1+R_t}{1+\pi_t} - 1$ OR
Here, $\small R_t$ is a time t simple return and $\small \pi_t$ is the inflation rate.

Calculating the inflation-adjusted return requires three basic steps.

First, the return on the investment must be calculated.
Second, the inflation for the period must be calculated.
And third, the inflation amount must be geometrically backed out of the investment's return.通货膨胀金额必须以几何方式从投资回报中扣除。

Example of Inflation-Adjusted Return

Assume an investor purchases a stock on January 1 of a given year for $75,000. At the end of the year, on December 31, the investor sells the stock for $90,000. During the course of the year, the investor received $2,500 in dividends. At the beginning of the year, the Consumer Price Index (CPI) was at 700. On December 31, the CPI was at a level of 721.

The first step is to calculate the investment's return using the following formula:
Return = (Ending price - Beginning price + Dividends) / (Beginning price)
= ( $90,000 - $75,000 + $2,500 ) / $75,000
= 23.3% percent.
The second step is to calculate the level of inflation over the period using the following formula:
Inflation = (Ending CPI level - Beginning CPI level) / Beginning CPI level
= ( 721 - 700 ) / 700
= 3 percent
The third step is to geometrically back out the inflation amount using the following formula:
Inflation-adjusted return = (1 + Stock Return) / (1 + Inflation) - 1
= ( 1.233 / 1.03 ) - 1
= 19.7 percent

Nominal Return vs. Inflation-Adjusted Return

Using inflation-adjusted returns is often a good idea because they put things into a very real-world perspective. Focusing on how investments are doing over the long-term can often present a better picture when it comes to its past performance (rather than a day-to-day, weekly, or even monthly glance).

But there may be a good reason why nominal returns work over those adjusted for inflation. Nominal returns are generated before any taxes, investment fees, or inflation. Since we live in a “here and now” world, these nominal prices and returns are what we deal with immediately to move forward. So, most people will want to get an idea of how the high and low price of an investment is—relative to its future prospects—rather than its past performance. In short, how the price fared when adjusted for inflation five years ago won’t necessarily matter when an investor buys it tomorrow.

Example

The nominal rate of return on an investment is the return that the investment earns expressed in current dollars. For example, if you put $50 into an investment that promises to pay 3% interest, at the end of the year you will have $51.50 (the initial $50 plus a $1.50 return). Your nominal return is 3%, but this does not necessarily mean that you are better off financially at the end of the year because the nominal return does not take into account the effects of inflation.

To continue the example, assume that at the beginning of the year, one bag of groceries costs $50. During the year, suppose grocery prices rise by 3%. This means that by the end of the year one bag of groceries costs $51.50. In other words, at the beginning of the year you could have used your $50 either to buy one bag of groceries or to make the investment that promised a 3% return. If you invested your money rather than spending it on groceries, by the end of 1 year you would have had $51.50, still just enough to buy one bag of groceries. In other words, your purchasing power did not increase at all during the year. The real rate of return on an investment measures the increase in purchasing power that the investment provides. In our continuing example, the real rate of return is 0%(or ) even though the nominal rate of return is 3%. In dollar terms, by investing $50 you increased your wealth by 3% to $51.50, but in terms of purchasing power you are no better off because your money is only enough to buy the same amount of goods that it could have bought before you made the investment. In mathematical terms, the real rate of return is approximately equal to the nominal rate of return minus the inflation rate.

Example

Suppose you have $50 today and are trying to decide whether to invest that money or spend it.

If you invest it, you believe that you can earn a nominal return of 10%, so after 1 year your money will grow to $55.
If you spend the money today, you plan to feed your caffeine habit by purchasing 20 lattes at your favorite coffee shop at $2.50 each.

You decide to save and invest your money, so a year later you have $55. Unfortunately, during that time inflation caused the price of a latte to increase by 4.8% from $2.50 to $2.62(=2.5*1.048). At the new price, you can just about afford to buy 21 lattes (21 × $2.62 = $55.02). That extra latte represents an increase in your purchasing power of 5% (i.e., 21 is 5% more than 20), so your real return(or) on the investment is 5% because it enabled you to buy 5% more than you could before you invested. Notice that the real return is approximately equal to the difference between the investment’s nominal return (10%) and the inflation rate (4.8%).

Execute the following steps to account for inflation in the returns series.

import pandas as pd
import quandl

QUANDL_API_KEY = 'sKqHwnHr8rNWK-3s5imS'
quandl.ApiConfig.api_key = QUANDL_API_KEY

df_all_dates = pd.DataFrame( index=pd.date_range( start='1999-12-31',
                                                  end='2010-12-31'
                                                )
                           )
df = df_all_dates.join( df[ ['Adj Close'] ], 
                        how ='left'    
                      ).fillna( method='ffill').asfreq('M')
                      #Convert time series to specified frequency
df

df_all_dates.join( df[ ['Adj Close'] ],.join(how='left') foward-fill resample
==>==>==>

We used a left join, which is a type of join (used for merging DataFrames) that returns all rows from the left table and the matched rows from the right table while leaving the unmatched rows empty.
In case the last day of the month was not a trading day, we used the last known price of that month( fillna(method=' ffill' ) ).
Lastly, we selected the end-of-month rows only by applying asfreq(' M' ) .

Download the inflation data from Quandl:

# Download the inflation data from Quandl:
# Consumer Price Index (CPI)
df_cpi = quandl.get( dataset='RATEINF/CPI_USA',
                     start_date = '1999-12-01',
                     end_date = '2010-12-31'
                   )
df_cpi.rename( columns={'Value':'cpi'},
               inplace=True
             )
# Merge the inflation data to the prices: 
df_merged = df.join(df_cpi, how='left')
df_merged

=>=><=

5. Calculate the simple returns $\small R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1$ and inflation rate(the inflation rate between two consecutive years):

df_merged['simple_rtn'] = df_merged['Adj Close'].pct_change()
df_merged['inflation_rate'] = df_merged.cpi.pct_change()
df_merged

6. Adjust the returns for inflation:
$\small R_t ^{r} = \frac{1+R_t}{1+\pi_t} - 1$ OR

df_merged['real_rtn'] = ( df_merged.simple_rtn + 1)\
                            /(df_merged.inflation_rate -1 )-1
df_merged

The DataFrame contains all the intermediate results, and the real_rtn column contains the inflation-adjusted returns.

######################

import pandas as pd
import numpy as np
import yfinance as yf

# progress=True : Progress will show a progress bar.
# auto_adjust will overwrite the Close
# Adjust all OHLC automatically? Default is False
df = yf.download( 'AAPL',
                  start='2000-01-01',
                  #end='2010-12-31',
                  progress=False
                )
df

df['simple_rtn'] = df['Adj Close'].pct_change() # (p_t - p_t_previous)/p_t_previous
df['log_rtn'] = np.log( df['Adj Close']/df['Adj Close'].shift(1) )
df

import pandas as pd
import quandl

QUANDL_API_KEY = 'sKqHwnHr8rNWK-3s5imS'
quandl.ApiConfig.api_key = QUANDL_API_KEY

df_all_dates = pd.DataFrame( index=pd.date_range( start=df.index[:1][0],
                                                  end=df.index[-1:][0]
                                                )
                           )
df = df_all_dates.join( df[ ['Adj Close'] ], 
                        how ='left'    
                      ).fillna( method='ffill').asfreq('M')
                      #Convert time series to specified frequency

# Download the inflation data from Quandl:
# Consumer Price Index (CPI)
df_cpi = quandl.get( dataset='RATEINF/CPI_USA',
                     start_date = df.index[:1][0],
                     end_date = df.index[-1:][0]
                   )
df_cpi.rename( columns={'Value':'cpi'},
               inplace=True
             )
# Merge the inflation data to the prices: 
df_merged = df.join(df_cpi, how='left')
df_merged

inflation_rate = quandl.get( dataset='RATEINF/inflation_USA',
                             start_date = '1999-12-01',
                           )
inflation_rate.rename( columns={'Value':'inflation_rate'},
                       inplace=True
                     )
# Merge the inflation data to the prices: 
df_merged = df_merged.join(inflation_rate, how='left')

df_merged['simple_rtn'] = df_merged['Adj Close'].pct_change()
df_merged['inflation_rate_from_cpi'] = df_merged.cpi.pct_change()

df_merged['real_rtn'] = ( df_merged.simple_rtn + 1)\
                            /(df_merged.inflation_rate -1 )-1
df_merged

import matplotlib.pyplot as plt
plt.style.use('seaborn')

fig, ax = plt.subplots( 1,1, figsize=(10,8) )

date='2021-01-01'
#df_merged.loc[df.index>'2019-01-01']['real_rtn'].plot()
ax.plot( df.index[df.index>date],
         df_merged.loc[df.index>date]['inflation_rate'],
         label='inflation_rate',
         color='blue'
       )
ax.plot( df.index[df.index>date],
         df_merged.loc[df.index>date]['inflation_rate_from_cpi'],
         label='inflation_rate_from_cpi',
         color='green'
       )
ax.plot( df.index[df.index>date],
         df_merged.loc[df.index>date]['real_rtn'], 
         label='aapl real_rtn',
         color='r'
       )


plt.setp( ax.get_xticklabels(), rotation=45, horizontalalignment='right', fontsize=14 )
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.legend(loc='best', fontsize=14)
plt.show()

######################

Changing frequency

The general rule of thumb for changing frequency can be broken down into the following:

Multiply/divide the log returns by the number of time periods.
Multiply/divide the volatility by the square root of the number of time periods.

In this recipe, we present an example of how to calculate the monthly realized volatilities for Apple using daily returns and then annualize the values.

What is Realized Volatility?

Realized volatility is the assessment of variation in returns for an investment product by analyzing its historical returns within a defined time period.(that is, the observed volatility of the underlying returns from historical market prices.)

Assessment of degree of uncertainty and/or potential financial loss/gain from investing in a firm may be measured using variability/ volatility in stock prices of the entity.
In statistics, the most common measure to determine variability is by measuring the standard deviation, i.e., the variability of returns from the mean.测量标准差，即平均收益的变异性 It is an indicator of the actual price risk.

The realized volatility or actual volatility in the market is caused by two components- a continuous volatility component and a jump component, which influence the stock prices. Continuous volatility in a stock market is affected by the intra-day trading volumes. 股票市场的持续波动受日内(当天的)交易量的影响For example, a single high volume trade transaction can introduce a significant variation in the price of an instrument.

Analysts make use of high-frequency intraday data to determine measures of volatility at hourly/daily/weekly or monthly frequency. The data may then be utilized to forecast the volatility in returns.

Realized Volatility Formula

It is measured by calculating the standard deviation from the average price of an asset in a given time period. Since volatility is non-linear, realized variance is

first calculated by converting returns from a stock/asset to logarithmic values and
- Variance in daily returns of the underlying标的资产 is calculated as follows: $\small r_t = log(\frac{P_t}{P_{t-1}}) = log(P_t) - log(P_{t-1})$ This approach assumes the mean to be set to zero, considering the upside and downside trend in the movement of stock prices.
  - P= stock price
  - t= time period
measuring the standard deviation of log normal returns.
- Realized variance is calculated by computing the aggregate of returns over the time period defined $Realized\; Variance = \sum_{i=1}^{T} r_t^2$
  
  where = number of observations (monthly/ weekly/ daily returns). Typically, 20, 50, and 100-day returns are calculated.
- The formula of Realized volatility is the square root of realized variance.
  $RV = \sqrt{\sum_{i=1}^{T} r^2_t}$ Realized volatility is frequently used for daily volatility using the intraday returns.
- The results are then annualized. Realized variance is annualized by multiplying daily realized variance with a number of trading days/weeks/ months in a year.The square root of the annualized realized variance is the realized volatility $\small RV\cdot \sqrt{12}$ .

Examples of Realized Volatility

Example #1

For example, supposed realized volatility for two stocks with similar closing prices is calculated for 20, 50, and 100 days for stock and is annualized with values as follows:

Looking at the pattern of increasing volatility in the given time frame, it can be inferred that stock-1 has been trading with high variation in prices in recent times (i.e., 20 days), whereas stock-2 has been trading without any wild swings.

Advantages

It measures the actual performance of an asset in the past and helps to understand the stability of the asset based on its past performance.
It is an indicator of how an asset’s price has changed in the past and the time period in which it has undergone the change.
Higher the volatility, the higher the price risk associated with the stock, and therefore higher the premium attached to the stock因此与股票相关的溢价就越高。.
The realized volatility of the asset may be used to forecast future volatility, i.e., implied volatility of the asset. While entering into transactions with complex financial products such as derivatives, options, etc., the premiums are determined based on the volatility of the underlying and influences the prices of these products.
It is the starting point for option pricing.
Realized volatility is measured based on statistical methods and is, therefore, a reliable indicator of the volatility in asset value.

Disadvantages

It is a measure of historical volatility and is therefore not forward-looking. It does not factor in any major “shocks” in the market that may arise in the future, which may affect the value of the underlying.

Limitation

The volume of data used influences the end results during the calculation of realized volatility. At least 20 observations are statistically required to calculate a valid value of realized volatility. Therefore, realized volatility is better used to measure longer-term price risk in the market (~ 1 month or more).
Realized volatility calculations are directionless. i.e., it factors in upward and downward trends in price movements它会影响价格走势的上升和下降趋势。.
It is assumed that asset prices reflect all available information while measuring volatility.

Important Points

In order to calculate the downside risk associated with a stock, the measurement of realized volatility may be restricted to downside price movements.
An increase in realized volatility of a stock over a time period would imply a significant change in the inherent value of the stock owning to external/internal factors.
An increase in volatility implies a higher premium on option prices. The value of a stock can be inferred by comparing the realized volatility and estimated future volatility (implied volatility) of the options.可以通过比较期权的已实现波动率和估计的未来波动率（隐含波动率）来推断股票的价值。
Comparing the volatility of a stock with the benchmark index helps determine the stability of stock将股票的波动性与基准指数进行比较有助于确定股票的稳定性: the lower the volatility, the more predictable the price of the asset.
A decrease in the realized volatility of a stock over a time period would indicate the stabilization of the stock.

Realized volatility measures help to quantify the inherent price risk arising out of volume fluctuations and external factors of a stock based on its historical performance. Combined with implied volatility, it also helps determine option prices based on the volatility in the underlying stock.

$RV = \sqrt{\sum_{i=1}^{T} r^2_t}$ Realized volatility is frequently used for daily volatility using the intraday returns.

The steps we need to take are as follows:

Download the data and calculate the log returns.
Calculate the realized volatility over the months.
Annualize the values by multiplying by $\sqrt{12}$ , as we are converting from monthly values.

We assume you have followed the instructions from earlier recipes and have a DataFrame called df with a single log_rtn column and timestamps as the index.
converting returns from a stock/asset to logarithmic values $\small r_t = log(\frac{P_t}{P_{t-1}}) = log(P_t) - log(P_{t-1})$

import pandas as pd
import numpy as np
import yfinance as yf

# progress=True : Progress will show a progress bar.
# auto_adjust will overwrite the Close
# Adjust all OHLC automatically? Default is False
df = yf.download( 'AAPL',
                  start='2000-01-01',
                  end='2010-12-31',
                  auto_adjust=False,
                  progress=False
                )

# keep only the adjusted close price
df = df.loc[:, ['Adj Close']]
df.rename( columns={'Adj Close':'adj_close'},
           inplace=True
         )

# converting returns from a stock/asset to logarithmic values
df['log_rtn'] = np.log( df['adj_close']/df['adj_close'].shift(1) )

df.drop('adj_close', axis=1, inplace=True)
df.dropna( axis=0, inplace=True )
df

==>
2. Define the function for calculating the realized volatility $RV = \sqrt{\sum_{i=1}^{T} r^2_t}$ and
3. Calculate the monthly realized volatility:

import pandas as pd

def realized_volatility(x):
    return np.sqrt( np.sum(x**2) # Realized variance for each month
                  ) # Realized volatility

df_rv = df.groupby( pd.Grouper( freq='M') ).apply(realized_volatility)
df_rv.rename( columns={'log_rtn':'rv'},
              inplace=True
            )
df_rv

==>==>

4.Annualize the values $\small RV \cdot \sqrt{12}$ :

df_rv.rv = df_rv.rv * np.sqrt(12)
df_rv

5. Plot the results:

import matplotlib.pyplot as plt
plt.style.use('seaborn')

fig, ax = plt.subplots(2, 1, sharex=True, figsize=(10,10))

ax[0].plot(df)
ax[0].set_title('daily log return(adjust_close)')
ax[1].plot( df_rv )
ax[1].set_title('realized volatility')

plt.show()

We can see that the spikes(it can be inferred that stock has been trading with high variation in prices in that time point) in the realized volatility coincide with some extreme log-returns (which might be outliers).

Normally, we could use the resample method of a pandas DataFrame. Supposing we wanted to calculate the average monthly return, we could run df.log_rtn.resample(' M' ).mean().

For the resample method, we can use any built-in aggregate functions of pandas, such as mean, sum, min, and max. However, our case is a bit more complex, so we defined a helper function called realized_volatility, and replicated the behavior of resample by using a combination of groupby, Grouper, and apply.
###################

###################

Identifying outliers

While working with any kind of data, we often encounter observations that are significantly different from the majority, that is, outliers. They can be a result of a wrong tick (price), something major happening on the financial markets金融市场上发生的重大事件, an error in the data processing pipeline, and so on. Many machine learning algorithms and statistical approaches can be influenced by outliers, leading to incorrect/biased results. That is why we should handle the outliers before creating any models.

Execute the following steps to detect outliers using the 3σ( $\small R_t \notin [sma - 3\sqrt{\sigma}, sma + 3\sqrt{\sigma}]$ ) approach, and mark them on a plot.

import pandas as pd
import yfinance as yf

df = yf.download( 'AAPL',
                  start='2000-01-01',
                  end='2010-12-31',
                  progress=False
                )
df = df.loc[:, ['Adj Close']] # keep only the adjusted close price
df.rename( columns={'Adj Close':'adj_close'},
           inplace=True
         )
df['simple_rtn'] = df.adj_close.pct_change()
df

<== $\small R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1$

Calculate the rolling mean and standard deviation:

Simple moving average

Simple moving average, which we will refer to as SMA, is a basic technical analysis indicator. The simple moving average, as you may have guessed from its name, is computed by adding up the price of an instrument over a certain period of time divided by the number of time periods. It is basically the price average over a certain time period, with equal weight being used for each price. The time period over which it is averaged is often referred to as the lookback period or history. Let's have a look at the following formula of the simple moving average:
Here, the following applies:

$\small P_i$ : Price at time period i
$\small N$ : Number of prices added together or the number of time periods(lookback periods)

#########rolling window https://blog.csdn.net/Linli522362242/article/details/122955700

There are exactly 252 trading days in 2021. January and February have the fewest (19), and March the most (23), with an average of 21 per month, or 63 per quarter.

Out of a possible 365 days, 104 days(365/7=52*2=104) are weekend days (Saturday and Sunday) when the stock exchanges are closed. Seven of the nine holidays which close the exchanges fall on weekdays, with Independence Day being observed on Monday, July 5, and Christmas on Friday, December 24. There is one shortened trading session on Friday, November 26 (the day after Thanksgiving Day). ==> 365-104-7-2=252
########

Standard deviation

Standard deviation, which will be referred to as STDEV, is a basic measure of price volatility that is used in combination with a lot of other technical analysis indicators to improve them. We'll explore that in greater detail in this section.

Standard deviation is a standard measure that is computed by measuring the squared deviation of individual prices from the mean price, and then finding the average of all those squared deviation values. This value is known as variance, and the standard deviation is obtained by taking the square root of the variance.
Larger STDEVs are

a mark of more volatile markets市场波动较大的标志 or
larger expected price moves,
so trading strategies need to factor that increased volatility into risk estimates and other trading behavior.

To compute standard deviation, first we compute the variance:
Then, standard deviation is simply the square root of the variance:

SMA : Simple moving average over n time periods.

# Calculate the rolling mean and standard deviation:
# 21 : the average number of trading days in a month 
df_rolling = df[['simple_rtn']].rolling( window=21 )\
                                .agg(['mean',# SMA
                                      'std'
                                     ])
# df_rolling.columns
#      MultiIndex([('simple_rtn', 'mean'),
#                  ('simple_rtn',  'std')
#                 ],)
df_rolling.columns = df_rolling.columns.droplevel()
# droplevel()
#    Return Series/DataFrame with requested index / column level(s) removed
df_rolling

df df_rolling df_rolling

==>==>

2. Join the rolling metrics to the original data:

df_outliers = df.join(df_rolling, how='left')
df_outliers[:25]

<==rolling( window=21 )

3. Define a function for detecting outliers(using the 3σ approach $\small R_t \notin [sma - 3\sqrt{\sigma}, sma + 3\sqrt{\sigma}]$ ):
we defined a function that

returns 1 if the observation is considered an outlier, according to the 3σ rule (we parametrized the number of standard deviations),
and 0 otherwise.

The condition for a given observation x to be qualified as an outlier is x > μ + 3σ or x < μ - 3σ.

def identify_outliers( row, n_sigmas = 3 ):
    '''
    Function for identifying the outliers using the 3 sigma rule. 
    The row must contain the following columns/indices: simple_rtn, mean, std.
    
    Parameters
    ----------
    row : pd.Series
        A row of a pd.DataFrame, over which the function can be applied.
    n_sigmas : int
        The number of standard deviations above/below the mean - used for detecting outliers
        
    Returns
    -------
    0/1 : int
        An integer with 1 indicating an outlier and 0 otherwise.
    '''    
    x = row['simple_rtn']
    mu = row['mean']
    sigma = row['std']
    
    if( (x < mu - n_sigmas*sigma) | (x > mu + n_sigmas*sigma) ):
        return 1
    else:
        return 0

4. Identify the outliers and extract their values for later use:

df_outliers['outlier'] = df_outliers.apply( identify_outliers,
                                            axis=1 # for each row
                                          )
outliers = df_outliers.loc[ df_outliers['outlier']==1,
                            ['simple_rtn']
                          ]
outliers

5. Plot the results:

fig = plt.figure( figsize=(10,8) )

plt.plot( df_outliers.index, df_outliers.simple_rtn,
          color='blue', label='Normal'
        )
plt.scatter( outliers.index, outliers.simple_rtn,
             color='red', label='Anomaly'
           )
plt.title("Apple's stock returns")
plt.legend( loc='lower right')
plt.show()

In the plot, we can observe outliers marked with a red dot. One thing to notice is that when there are two large returns in the vicinity, the algorithm identifies the first one as an outlier and the second one as a regular observation. This might be due to the fact that the first outlier enters the rolling window and affects the moving average/standard deviation.

In real-life cases, we should not only identify the outliers, but also treat them, for example, by capping them at限制在 the maximum/minimum acceptable value, replacing them by interpolated values, or by following any of the other possible approaches.

There are many different methods of identifying outliers in a time series, for example, using Isolation Forest, Hampel Filter, Support Vector Machines, and z-score (which is similar to the presented approach).

In the 3σ approach, for each time point, we calculated the moving average (μ) and standard deviation (σ) using the last 21 days (not including that day since we missing the first day trading data). We used 21 as this is the average number of trading days in a month, and we work with daily data. However, we can choose different value, and then the moving average will react faster/slower to changes. We can also use (exponentially) weighted moving average if we find it more meaningful in our particular case. https://blog.csdn.net/Linli522362242/article/details/121406833

Investigating stylized facts of asset returns

Stylized程式化的,按固定格式的 facts are statistical properties that appear to be present in many empirical/ɪmˈpɪrɪk(ə)l/经验主义的,以经验为依据的 asset returns (across time and markets). It is important to be aware of them because when we are building models that are supposed to represent asset price dynamics, the models must be able to capture/replicate these properties.

In the following recipes, we investigate the five stylized facts using an example of daily S&P 500 returns from the years 1985 to 2018.

We download the S&P 500 prices from Yahoo Finance (following the approach in the Getting data from Yahoo Finance recipe) and calculate returns as in the Converting prices to returns recipe.

Fact1:Non-Gaussian distribution of returns回报的非高斯(正态)分布

The name of the fact (Non-Gaussian distribution of returns) is pretty much selfexplanatory. It was observed in the literature that (daily) asset returns exhibit the following:

Negative skewness (third moment): Large negative returns occur more frequently than large positive ones.
Excess kurtosis (fourth moment) : Large (and small) returns occur more often than expected.

Note:
The pandas implementation of kurtosis is the one that literature refers to as excess kurtosis or Fisher's kurtosis费舍尔峰度. Using this metric, the excess kurtosis of a Gaussian distribution(or called normal distribution) is 0, while the standard kurtosis is 3. This is not to be confused with the name of the stylized fact's excess kurtosis, which simply means kurtosis higher than that of normal distribution(or called Gaussian distribution)它仅表示峰度高于正态分布的峰度( excess kurtosis=Kurtosis-3=0 ).

Samples from a normal distribution have
- an expected skewness of 0 and
- an expected excess kurtosis of 0 (which is the same as a kurtosis of 3).
  The normal distribution has standard Kurtosis=3. The measure ( excess kurtosis=Kurtosis-3=0 ) is known ast the "Excess Kurtosis", i.e: $\large K = \frac{\hat{u}_4}{\hat{\sigma}^4} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{2}}-3$
- Jarque-Bera 检验使用正态的这两个（统计）属性分布

In social sciences, especially economics, a stylized fact is a simplified presentation of an empirical finding.[1] Stylized facts are broad tendencies that aim to summarize the data, offering essential truths while ignoring individual details.

A prominent example of a stylized fact is: "Education significantly raises lifetime income." Another stylized fact in economics is: "In advanced economies, real GDP growth fluctuates in a recurrent but irregular fashion".

However, scrutiny to detail will often produce counterexamples对细节的审查往往会产生反例。. In the case given above, holding a PhD may lower lifetime income, because of the years of lost earnings it implies and because many PhD holders enter academia instead of higher-paid fields. Nonetheless, broadly speaking, people with more education tend to earn more, so the above example is true in the sense of a stylized fact.

Stylized facts are widely used in economics, in particular to motivate the construction of a model and/or to validate it. Examples are:

Stock returns are uncorrelated and not easily forecastable.[6]
Yield curves tend to move in parallel.
Education is positively correlated to lifetime earnings
Inventory behavior of firms: “the variance of production exceeds the variance of sales”

Run the following steps to investigate the existence of this first fact by plotting the histogram of returns and a Q-Q plot.

import pandas as pd
import yfinance as yf

df = yf.download( '^GSPC',
                  start='1985-01-01',
                  end='2018-12-31',
                  progress=False
                )
df = df[['Adj Close']].rename( columns={'Adj Close':'adj_close'} )
df['log_rtn'] = np.log( df.adj_close/df.adj_close.shift(1) )

df = df[['adj_close', 'log_rtn']].dropna( how='any' )
df

##################################################################

the binomial distribution

概率密度函数(Probability Density Function (PDF) 连续性数据) where n is the number of trials, p is the probability of success, and N is the number of successes.
概率质量函数(Probability Mass Function, PMF,离散型数据)是离散随机变量在各特定取值上的概率
概率质量函数PMF是对离散随机变量定义的，本⾝代表该值的概率；
概率密度函数PDF是对连续随机变量定义的，本⾝不是概率，只有对连续随机变量的概率密度函数PDF在某区间内进⾏积分后才是概率
The cumulative distribution function (cdf) or distribution function of a random variable X of the continuous type, defined in terms of the pdf of X, is given by
Here, again, F(x) accumulates (or, more simply, cumulates) all of the probability less than or equal to x. From the fundamental theorem of calculus, we have, for x values
for which the derivative $\small F'(x)$ exists, $\small F'(x) = f(x) = f(k)$ and we call $\small f(x)$ the Probability Density Function (PDF) of X

seaborn.distplot can help us to process the data into bins and show us a histogram as a result.

The seaborn.distplot function expects either pandas Series, single-dimensional numpy.array, or a Python list as input. Then, it determines the size(width) of the bins according to the Freedman-Diaconis rule(the size of the bins = , the number of bins =, Interquartile range( IQR = Q3-Q1 ), Freedman-Diaconisf方法偏向适用于长尾分布的数据，因为其对数据中的离群值不敏感。x表示事例的数值分布情况), and finally it fits(curve) a kernel density estimate (KDE) over the histogram.

KDE is a non-parametric method used to estimate the distribution of a variable. We can also supply a parametric distribution, such as beta, gamma, or normal distribution, to the fit argument.https://blog.csdn.net/Linli522362242/article/details/93617948

##############
1. Calculate the normal Probability Density Function (PDF) using the mean and standard deviation of the observed returns:

import scipy.stats as scs

r_range = np.linspace( min(df.log_rtn), max(df.log_rtn),
                       num=1000
                     )
mu = df.log_rtn.mean()
sigma = df.log_rtn.std()

# scipy.stats.norm ： A normal continuous random variable.
# The location (loc) keyword specifies the mean.
# The scale (scale) keyword specifies the standard deviation
norm_pdf = scs.norm.pdf( r_range, loc=mu, scale=sigma )

2. Plot the histogram and the Q-Q plot:

import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt

# https://blog.csdn.net/Linli522362242/article/details/93617948
sns.set(font_scale=1.)
# sns.set_style('ticks') # white, dark, whitegrid, darkgrid, ticks

fig, ax = plt.subplots( 1,2, figsize=(12,8) )

# histogram
# sns. displot()
# norm_hist:若为True, 则直方图高度显示密度而非计数(含有kde图像中默认为True)
sns.distplot( df.log_rtn, kde=False, norm_hist=True, ax=ax[0], )
ax[0].plot( r_range, norm_pdf, 
            'g', lw=2, label=f'N({mu:.2f}, {sigma**2 : .4f})',
          )
ax[0].legend(loc='upper left')
ax[0].set_title( 'Distribution of S&P 500 returns', fontsize=16 )

# Q-Q plot
qq = sm.qqplot( df.log_rtn.values, line='s', ax=ax[1],
                markerfacecolor='b',
                # markeredgecolor='k'
                # alpha=0.3
              )
# sm.qqline( qq.axes[1], line='45', #If line is not 45, x and y cannot be None.
#            fmt='y--') #fmt = '[marker][line][color]'
ax[1].get_lines()[1].set_color('red')
ax[1].get_lines()[1].set_linewidth(3.0)
ax[1].set_title('Q-Q plot', fontsize=16)

plt.show()

Histogram of returns

The first step of investigating this fact was to plot a histogram visualizing the distribution of returns. To do so, we used sns.distplot while setting kde=False (which does not use the Gaussian kernel density estimate) and norm_hist=True (this plot shows density instead of the count).

To see the difference between our histogram and Gaussian distribution(or called Normal distribution), we superimposed a line representing the PDF of the Gaussian distribution with the mean(mu = df.log_rtn.mean()) and standard deviation(sigma = df.log_rtn.std()) coming from the considered return series.

First, we specified the range over which we calculated the PDF by using np.linspace (we set the number of points to 1,000, generally the more points the smoother the line) and then calculated the PDF using scs.norm.pdf . The default arguments correspond to the standard normal distribution, that is, with zero mean and unit variance即0均值(loc default=1)和单位方差(scale default=1). That is why we specified the loc and scale arguments as the sample mean and standard deviation, respectively.

To verify the existence of the previously mentioned patterns, we should look at the following:

Negative skewness: The left tail of the distribution is longer, while the mass of the distribution is concentrated on the right side of the distribution.而分布的质量集中在分布的右侧。(Large negative returns occur more frequently than large positive ones.)
Excess kurtosis: Fat-tailed and peaked distribution(leptokurtic distributions are sometimes characterized as "concentrated toward the mean,"(“向均值集中”)). Large(Massive) and (small) returns occur more often than expected大量的小额回报的发生频率比预期的要高.

The second point is easier to observe on our plot, as there is a clear peak over the PDF and we see more mass in the tails.

Q-Q plot

After inspecting the histogram, we looked at the Q-Q (quantile-quantile) plot, on which we compared two distributions (theoretical and observed) by plotting their quantiles against each other. In our case,

the theoretical distribution is Gaussian (Normal) and
the observed one comes from the S&P 500 returns.

To obtain the plot, we used the sm.qqplot function. If the empirical distribution is Normal, then the vast majority of the points will lie on the red line 如果经验分布是正态分布，那么绝大多数点将位于红线上. However, we see that this is not the case, as points on the left side of the plot are more negative (that is, lower empirical quantiles are smaller) than expected in the case of the Gaussian distribution, as indicated by the line. This means that the left tail of the returns distribution is heavier than that of the Gaussian distribution. Analogical conclusions can be drawn about the right tail, which is heavier than under normality.

if `distplot` is deprecated(solution):

import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt

sns.set(font_scale=1.1)
#sns.set() # Reset all previous theme settings to default
# sns.set( context="notebook",
#          style="darkgrid",
#          font_scale=1.2,      # increase the font scale#
#          rc={ #'grid.color': '0.6',
#               'axes.labelcolor':'darkblue',
#               #"lines.linewidth": 2.5, # increase the line width of the KDE plot
#             }
#        )

fig, ax = plt.subplots( 1,2, figsize=(12,8) )

# histogram
# sns. displot()
# norm_hist:若为True, 则直方图高度显示密度而非计数(含有kde图像中默认为True)
sns.histplot( df.log_rtn,
              kde=False,
              stat='density', #norm_hist=True,
              ax=ax[0],
              bins=50,
              color='b'
            )
ax[0].plot( r_range, norm_pdf, 
            'g', lw=2, label=f'N({mu:.2f}, {sigma**2 : .4f})',
          )
# ax[0].fig.legend(loc='upper left')
ax[0].set_title( 'Distribution of S&P 500 returns', fontsize=16 )

# Q-Q plot
qq = sm.qqplot( df.log_rtn.values, line='s', ax=ax[1],
                markerfacecolor='b',
                # markeredgecolor='k'
                # alpha=0.3
              )
# sm.qqline( qq.axes[1], line='45', # If line is not 45, x and y cannot be None.
#            fmt='y--') #fmt = '[marker][line][color]'
ax[1].get_lines()[1].set_color('red')
ax[1].get_lines()[1].set_linewidth(3.0)
ax[1].set_title('Q-Q plot', fontsize=16)

plt.subplots_adjust(wspace=0.3)
plt.show()

We can use the histogram (showing the shape of the distribution) and the Q-Q plot to assess the normality of the log returns评估对数回报的正态性. Additionally, we can print the summary statistics:

Perform the Jarque-Bera goodness of fit test on sample data.

The Jarque-Bera test tests whether the sample data has the skewness and kurtosis matching a normal distribution.

Note that this test only works for a large enough number of data samples (>2000) as the test statistic asymptotically has a Chi-squared distribution with 2 degrees of freedom.

# import scipy.stats as scs
# df['log_rtn'] = np.log( df.adj_close/df.adj_close.shift(1) )
jb_test = scs.jarque_bera( df.log_rtn.values )

print( '---------- Descriptive Statistics ----------' )
print( 'Range of dates:', min(df.index.date), '-', max(df.index.date) )
print( 'Number of observations:', df.shape[0] )
print( f'Mean: {df.log_rtn.mean():.4f}' )
print( f'Median: {df.log_rtn.median():.4f}' )
print( f'Min: {df.log_rtn.min():.4f}' )
print( f'Max: {df.log_rtn.max():.4f}' )
print( f'Standard Deviation: {df.log_rtn.std():.4f}' )
print( f'Skewness: {df.log_rtn.skew():.4f}' )
print( f'Kurtosis: {df.log_rtn.kurtosis():.4f}' )
print( f'Jarque-Bera statistic: {jb_test[0]:.2f} with p-value: {jb_test[1]:.2f}')

If the data comes from a normal distribution, the JB statistic asymptotically has a chi-squared distribution with two degrees of freedomJB 统计量渐近具有自由度为2的卡方分布, so the statistic can be used to test the hypothesis that the data are from a normal distribution.

https://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm

We immediately see that the returns exhibit negative skewness and excess kurtosis. We also ran the Jarque-Bera test ( scs.jarque_bera ) to verify that returns do not follow a Gaussian distribution. With a p-value of zero, we reject the null hypothesis that sample data has skewness and kurtosis matching those of a Gaussian distribution(called Normal distribution).

first fact $JB_{test} > 5.991_{\alpha=0.05}$

At 5% significant level(α = 0.05), we reject the null hypothesis that the log-return is normally distributed(OR $\large H_0 : S_{expected}= 0, K_{expected \; excess} =0 \;OR\; K=3$ ).
By looking at the metrics such as the mean, standard deviation, skewness, and kurtosis we can infer that they deviate from what we would expect under normality. Additionally, the Jarque-Bera normality test gives us reason to reject the null hypothesis stating that the distribution is normal at the 99% confidence level(α = 0.01).

Jarque–Bera test : can give us reason to reject the null hypothesis

In statistics, the Jarque–Bera test is a goodness-of-fit test拟合优度检验 of whether sample data have the skewness and kurtosis matching a normal distribution. The test statistic is always nonnegative. If it is far from zero, it signals the data do not have a normal distribution.

The test statistic JB is defined as $JB = \frac{n}{6} ( S^2 + \frac{1}{4}(K-3)^2 )$
Skewness is a function of data, i.e. a random variable, Kurtosis is a function of data, i.e. a random variable, so JB $JB = \frac{n}{6} ( S^2 + \frac{1}{4}(K-3)^2 )$ is is a random variable, and follows what distribution?
where

n is the number of observations (or degrees of freedom in general);
S is the sample skewness: $\large S = \frac{\hat{u}_3}{\hat{\sigma}^3} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{3/2}}$
Zero skewness implies asymmetric distribution非对称分布 (the Normal, t-distribution)
Skewness is a function of data, i.e. a random variable
K is the sample kurtosis : $\large K = \frac{\hat{u}_4}{\hat{\sigma}^4} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{2}}$
Kurtosis is a function of data, i.e. a random variable
- where $\large \hat{u}_3$ and $\large \hat{u}_4$ are the estimates of third and fourth central moments,三阶中心矩和四阶中心矩的估计值 respectively,
- $\large \bar{x}$ is the sample mean, and
- $\large \hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2$ is the estimate of the second central moment二阶中心矩（即方差）的估计值, the variance.

If the data comes from a normal distribution, the JB statistic asymptotically has a chi-squared distribution with two degrees of freedomJB 统计量渐近具有自由度为2的卡方分布, so the statistic can be used to test the hypothesis that the data are from a normal distribution.

The null hypothesis is a joint hypothesis of the skewness being zero and the excess kurtosis being zero. $\large H_0 : S_{expected}= 0, K_{expected\; excess} =0$
Samples from a normal distribution have
- an expected skewness of 0 and
- an expected excess kurtosis of 0 (which is the same as a kurtosis of 3).
  The normal distribution has Kurtosis=3. The measure ( excess kurtosis= Kurtosis-3 =0) is known ast the "Excess Kurtosis", i.e: $\large K = \frac{\hat{u}_4}{\hat{\sigma}^4} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{2}}-3$
- Jarque-Bera 检验使用正态的这两个（统计）属性分布
As the definition of JB shows, any deviation from this increases the JB statistic.
The Jarque-Bera test tests the hypothesis
H0: Data is normal
H1: Data is NOT norma

The Jarque-Bera test — Example

Hypotheses:
H0: The error term is normally distributed,
H1: The error term is not normally distributed
Significance level: α = 0.05 and How to do a Jarque-Bera test in practice:
1. 1 Calculate the skewness in the sample. $\large S = \frac{\hat{u}_3}{\hat{\sigma}^3} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{3/2}}$
2. 2 Calculate the kurtosis in the sample. $\large K = \frac{\hat{u}_4}{\hat{\sigma}^4} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{2}}$
3. Assumptions n large, Calculate the Jarque-Bera test statistic : $JB = \frac{n}{6} ( S^2 + \frac{1}{4}(K-3)^2 )$
  which under the null( asymptoically ) follows a a chisquare distribution with 2 df
  JB统计量近似服从自由度为2的卡方分布
4. 4 Compare the Jarque-Bera test statistic with the critical values in the chi-square table, 2 df.
  Distribution under the null rejection rule:
5. At 5% significant level(α = 0.05), we reject the null of that the disturbance term is normally distributed.(The error term is not normally distributed)

For small samples the chi-squared approximation is overly sensitive, often rejecting the null hypothesis when it is true. Furthermore, the distribution of p-values departs from a uniform distribution and becomes a right-skewed unimodal distribution, especially for small p-values. This leads to a large Type I error rate(Type I error(FP): Rejecting a null hypothesis when it is true, (OR is true, but we reject it by mistake. Because we mistakenly think is False, ###wrongly reject
P=target class(the null hypothesis=False, Reject) https://blog.csdn.net/Linli522362242/article/details/125662545)
). The table below shows some p-values approximated by a chi-squared distribution that differ from their true alpha levels for small samples.

Skewness(偏度)

用来描述数据分布的对称性Skewness measures the degree of symmetry in the distribution. $\large S = \frac{\hat{u}_3}{\hat{\sigma}^3} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{3/2}}$

正态分布的偏度为0。
Zero skewness implies asymmetric distribution非对称分布 (the Normal, t-distribution)
计算数据样本的偏度，
当偏度<0时，称为负偏，数据出现左侧长尾；
Negative skewness means that the distribution has a long left tail, its skewed to the left.
当偏度>0时，称为正偏，数据出现右侧长尾；
Positive skewness means that the distribution has a long right tail, it's skewed to the right.
当偏度为0时，表示数据相对均匀的分布在平均值两侧，不一定是绝对的对称分布，此时要与正态分布偏度为0的情况进行区分。
当偏度绝对值过大时，长尾的一侧出现极端值的可能性较高。

Kurtosis measures how much of a probability distribution is centered around the middle (mean) vs. the tails峰度测量有多少概率分布集中在中间（平均值）与尾部.Positive or negative excess kurtosis will then change the shape of the distribution, accordingly. 正或负的超峰度将相应地改变分布的形状。
Skewness instead measures the relative symmetry of a distribution around the mean偏度测量平均值周围分布的相对对称性。.

Kurtosis(峰度)

用来描述数据分布陡峭或是平滑的情况。 $\large K = \frac{\hat{u}_4}{\hat{\sigma}^4} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{2}}$
Kurtosis explains how often observations in some data set fall in the tails vs. the center of a probability distribution.峰度解释了某些数据集中的观测值落在概率分布的尾部与中心的频率 In finance and investing, excess kurtosis is interpreted as a type of risk, known as "tail risk,"尾部风险 or the chance of a loss occurring due to a rare event或由概率分布预测的由于罕见事件而发生损失的可能性, as predicted by a probability distribution. If such events turn out to be more common than predicted by a distribution, the tails are said to be "fat." Tail events have had experts questions the true probability distribution of returns for investable assets - and now many believe that the normal distribution is not a correct template.尾部事件让专家质疑可投资资产回报的真实概率分布——现在许多人认为正态分布不是一个正确的模板。

正态分布的峰度为3，
The normal distribution has Kurtosis=3. The measure ( Kurtosis-3 ) is known ast the "Excess Kurtosis", i.e: $\large K = \frac{\hat{u}_4}{\hat{\sigma}^4} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2)^{2}}-3$
Excess kurtosis compares the kurtosis coefficient with that of a normal distribution.
A distribution with kurtosis=3 is said to be mesokurtic中峰分布(常峰态分布).
This distribution has a kurtosis statistic similar to that of the normal distribution, meaning the extreme value characteristic of the distribution is similar to that of a normal distribution.
A distribution with kurtosis>3 is said to be leptokurtic/ ˌleptəˈkərtɪk /尖峰态分布 or fat-tailed胖尾.
Any distribution that is leptokurtic displays greater kurtosis than a mesokurtic distribution. Characteristics of this distribution is one with long tails (outliers.) The prefix of "lepto-" means "skinny," making the shape of a leptokurtic distribution easier to remember. The "skinniness" of a leptokurtic distribution is a consequence of the outliers, which stretch the horizontal axis of the histogram graph, making the bulk of the data appear in a narrow ("skinny") vertical range.细峰分布的“瘦”是异常值的结果，异常值拉伸直方图的水平轴，使大部分数据出现在狭窄（“瘦”）的垂直范围内。
Thus leptokurtic distributions are sometimes characterized as "concentrated toward the mean,"(“向均值集中”) but the more relevant issue (especially for investors) is there are occasional extreme outliers that cause this "concentration" appearance. Examples of leptokurtic distributions are the T-distributions with small degrees of freedom.
Platykurtic (kurtosis < 3.0)
The final type of distribution is a platykurtic distribution. These types of distributions have short tails (paucity/ˈpɔːsəti/ of outliers缺乏异常值.) The prefix of "platy-" means "broad," and it is meant to describe a short and broad-looking peak短而宽的峰值, but this is an historical error. Uniform distributions (1.3.6.6.2. Uniform Distribution orhttps://en.wikipedia.org/wiki/Continuous_uniform_distribution )are platykurtic and have broad peaks, but the beta (.5,1) distribution is also platykurtic and has an infinitely pointy peak.
峰度越大，代表分布越陡峭，尾部越厚；
如果峰度大于三，峰的形状比较尖，比正态分布峰要陡峭。反之亦然
峰度越小，分布越平滑。
很多情况下，为方便计算，将峰度值－3 ( Kurtosis-3 )，因此正态分布的峰度变为0，方便比较。
在方差相同(或者相同的标准差)的情况下，峰度(系数)越大，分布就可能有更多的极端值，那么其余值必然要更加集中在众数周围，其分布必然就更加陡峭#https://blog.csdn.net/Linli522362242/article/details/99728616

Fact2: Volatility clustering

Run the following code to investigate this second fact by plotting the log returns series.

1. Visualize the log returns series:

df.log_rtn.plot( title='Daily S&P 500 returns', figsize=(10,6) )

plt.show()

We can observe clear clusters of volatility—periods of higher positive and negative returns.

The first thing we should be aware of when investigating stylized facts is the volatility clustering—periods of high returns alternating with交替出现 periods of low returns, suggesting that volatility is not constant. To quickly investigate this fact, we plot the returns using the plot method of a pandas DataFrame.

# Displaying rolling statistics
def plot_rolling_statistics_ts( ts, titletext, ytext, window_size=12 ):
    ts.plot( color='red', label='Original', lw=0.5 )
    ts.rolling( window_size ).mean().plot( color='blue', label='Rolling Mean' )
    ts.rolling( window_size ).std().plot( color='black', label='Rolling Std' )
    
    plt.legend( loc='best' )
    plt.ylabel( ytext )
    plt.xlabel( 'Date')
    plt.setp( plt.gca().get_xticklabels(), rotation=30, horizontalalignment='right' )
    plt.title( titletext )
    plt.show( block=False )

# df['log_rtn'] = np.log( df.adj_close/df.adj_close.shift(1) )
#                          the daily historical log returns
plot_rolling_statistics_ts( df.log_rtn, 
                           ' Daily S&P 500 returns rolling mean and standard deviation',
                           'Daily prices',
                          )

The intuition behind the test is as follows. If the series y is stationary (or trend-stationary, e.g. Daily S&P 500 log-return ), then it has a tendency to return to a constant (or deterministically trending确定性趋势) mean. Therefore, large values will tend to be followed by smaller values (negative changes), and small values by larger values (positive changes).

红线一直围绕均值波动。而均值不随时间变化（其实方差(标准差)也不随时间变化

A stationary time series is one whose statistical properties, such as mean, variance, and autocorrelation, are constant over time.https://blog.csdn.net/Linli522362242/article/details/126113926

It looks the daily S&P 500 log returns has no trend, seasonality or cyclic behaviour(In general, the average length of cycles is longer than the length of a seasonal pattern, and the magnitudes of cycles tend to be more variable than the magnitudes of seasonal patterns.). There are random fluctuations which do not appear to be very predictable, and no strong patterns that would help with developing a forecasting model.

Stationary process: A process that generates a stationary series of observations.
Trend stationary: A process that does not exhibit a trend.
Seasonal stationary: A process that does not exhibit seasonality.
Strictly stationary: Also known as strongly stationary. A process whose unconditional joint probability distribution of random variables does not change when shifted in time (or along the x axis)随机变量的无条件联合概率分布在随时间（或沿 x 轴）移动时不会改变.
Weakly stationary: Also known as covariance-stationary, or second-order stationary. A process whose mean, variance, and correlation of random variables doesn't change when shifted in time.

Making a time series stationary

A non-stationary time series data is likely to be affected by a trend or seasonality. Trending time series data has a mean that is not constant over time. Data that is affected by seasonality have variations at specific intervals in time. In making a time series data stationary, the trend and seasonality effects have to be removed. Detrending, differencing, and decomposition are such methods. The resulting stationary data is then suitable for statistical forecasting. https://blog.csdn.net/Linli522362242/article/details/126113926

There is also an extension of the Dickey–Fuller (DF) test called the Augmented Dickey-Fuller Test (ADF), which removes all the structural effects (autocorrelation) in the time series and then tests using the same procedure.

An Augmented Dickey-Fuller Test (ADF) is a type of statistical test that determines whether a unit root is present in time series data. Unit roots can cause unpredictable results in time series analysis. A null hypothesis is formed on the unit root test to determine how strongly time series data is affected by a trend.

By accepting the null hypothesis(the series has a unit root), we accept the evidence that the time series data is non-stationary.
By rejecting the null hypothesis(we don't find a unit root), or accepting the alternative hypothesis, we accept the evidence that the time series data is generated by a stationary process. This process is also known as trend-stationary.
Values of the ADF test statistic are negative. Lower values of ADF indicates stronger rejection of the null hypothesis. the stronger the rejection of the hypothesis that there is a unit root at some level of confidence.
The testing procedure for the ADF test is the same as for the Dickey–Fuller test but it is applied to the model https://blog.csdn.net/Linli522362242/article/details/121721868

from statsmodels.tsa.stattools import adfuller
 
def test_stationarity( timeseries ):
    print( "Results of Dickey-Fuller Test:" )
    df_test = adfuller( timeseries, autolag='AIC' )
    print(df_test)
    
    df_output = pd.Series( df_test[0:4], index=['Test Statistic',
                                                'p-value',
                                                "#Lags Used",
                                                "Number of Observations Used"
                                               ]
                         )
    print( df_output )
test_stationarity(df.log_rtn)

https://blog.csdn.net/Linli522362242/article/details/126113926

The ADF test statistic value(-15.81989, Lower values of ADF indicates stronger rejection of the null hypothesis) is less than the critical values (especially at 5%, -2.8618787387723104 ), and the p-value of less than 0.05. With these, we reject the null hypothesis that there is a unit root(we reject the null hypothesis, this means that we don't find a unit root) and consider that our data is stationary. We recommend using daily returns when studying financial products.

Autocorrelation

Autocorrelation is a mathematical representation of the degree of similarity between a given time series and a lagged version of itself over successive time intervals. It's conceptually similar to the correlation between two different time series, but autocorrelation uses the same time series twice: once in its original form and once lagged one or more time periods.

For example, if it's rainy today, the data suggests that it's more likely to rain tomorrow than if it's clear today. When it comes to investing, a stock might have a strong positive autocorrelation of returns, suggesting that if it's "up" today, it's more likely to be up tomorrow, too.

Naturally, autocorrelation can be a useful tool for traders to utilize; particularly for technical analysts.

Autocorrelation represents the degree of similarity between a given time series and a lagged version of itself over successive time intervals.
Autocorrelation measures the relationship between a variable's current value and its past values.
An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of negative 1 represents a perfect negative correlation.
Technical analysts can use autocorrelation to measure how much influence past prices for a security have on its future price.

Autocorrelation can also be referred to as lagged correlation or serial correlation, as it measures the relationship between a variable's current value and its past values.

As a very simple example, take a look at the five percentage values in the chart below. We are comparing them to the column on the right, which contains the same set of values, just moved up one row.

When calculating autocorrelation, the result can range from -1 to +1.

An autocorrelation of +1 represents a perfect positive correlation (an increase seen in one time series leads to a proportionate increase成比例地增加 in the other time series). vs
On the other hand, an autocorrelation of -1 represents a perfect negative correlation (an increase seen in one time series results in a proportionate decrease in the other time series).

Autocorrelation measures linear relationships. Even if the autocorrelation is minuscule /ˈmɪnəskjuːl/极小的, there can still be a nonlinear relationship between a time series and a lagged version of itself.

So why is autocorrelation important in financial markets? Simple. Autocorrelation can be applied to thoroughly analyze historical price movements, which investors can then use to predict future price movements. Specifically, autocorrelation can be used to determine if a momentum trading strategy makes sense.

Autocorrelation in Technical Analysis

Autocorrelation can be useful for technical analysis, That's because technical analysis is most concerned with the trends of, and relationships between, security prices using charting techniques. This is in contrast with fundamental analysis, which focuses instead on a company's financial health or management.

Technical analysts can use autocorrelation to figure out how much of an impact past prices for a security have on its future price.

Autocorrelation can help determine if there is a momentum factor是否存在动量因素 at play with a given stock. If a stock with a high positive autocorrelation posts two straight days of big gains, for example, it might be reasonable to expect the stock to rise over the next two days, as well.

If the price of a stock with strong positive autocorrelation has been increasing for several days, the analyst can reasonably estimate the future price will continue to move upward in the recent future days. The analyst may buy and hold the stock for a short period of time to profit from the upward price movement.

Example of Autocorrelation

Let’s assume Rain is looking to determine if a stock's returns in their portfolio exhibit autocorrelation; that is, the stock's returns relate to its returns in previous trading sessions是否股票的回报与之前交易时段的回报有关.

If the returns exhibit autocorrelation, Rain could characterize it as a momentum stock because past returns seem to influence future returns. Rain runs a regression with

the prior trading session's return as the independent variable and
the current return as the dependent variable.
They find that returns one day prior have a positive autocorrelation of 0.8.

Since 0.8 is close to +1, past returns seem to be a very good positive predictor of future returns for this particular stock.

Therefore, Rain can adjust their portfolio to take advantage of the autocorrelation, or momentum, by continuing to hold their position or accumulating more shares.

################

10.2 - Autocorrelation and Time Series Methods

One common way for the "independence" condition in a multiple linear regression model to fail is when the sample data have been collected over time and the regression model fails to effectively capture any time trends. In such a circumstance, the random errors in the model are often positively correlated over time, so that each random error is more likely to be similar to the previous random error that it would be if the random errors were independent of one another. This phenomenon is known as autocorrelation (or serial correlation) and can sometimes be detected by plotting the model residuals versus time通过绘制模型残差与时间的关系图来检测. We'll explore this further in this section and the next.

A time series is a sequence of measurements of the same variable(s) made over time. Usually the measurements are made at evenly spaced times测量是在均匀间隔的时间进行的 - for example, monthly or yearly. Let us first consider the problem in which we have a y-variable measured as a time series. As an example, we might have y a measure of global temperature, with measurements observed each year. To emphasize that we have measured values over time, we use "t" as a subscript rather than the usual "i," i.e., means y measured in time period t. An autoregressive model is when a value from a time series is regressed on previous values from that same time series自回归模型是指时间序列中的值根据同一时间序列中的先前值进行回归。(The Auto-Regressive (AR) model assumes that the current value is dependent on previous values $\large y_{t-1}$ , $\large y_{t-2}$ ,..., $\large y_{t-(k-1)}$ , $\large y_{t-k}$ ). for example, on $y_{t-1}$ :
$\large y_t= \beta_0 + \beta_1y_{t-1} + \epsilon_t$
In this regression model, the response variable in the previous time period $y_{t-1}$ has become the predictor and the errors have our usual assumptions about errors $\epsilon_t$ (white noise) in a simple linear regression model(note $\beta_0$ is a constant). The order of an autoregression is the number of immediately preceding values in the series that are used to predict the value at the present time. So, the preceding model is a first-order autoregression, written as AR(1).

If we want to predict y this year () using measurements of global temperature in the previous two years ( $y_{t-1}$ , $y_{t-2}$ ), then the autoregressive model for doing so would be:
$\large y_t= \beta_0 + \beta_1y_{t-1} + \beta_2y_{t-2} + \epsilon_t$

This model is a second-order autoregression, written as AR(2), since the value at time t is predicted from the values at times t−1 and t−2. More generally, a $\large k^{th}$ -order autoregression, written as AR(k) or AR(p), is a multiple linear regression in which the value of the series at any time t is a (linear) function of the values at times t−1,t−2,…,t−k.
$\large y_t= \beta_0 + \sum_{k=1}^{K}\beta_k y_{t-k} + \epsilon_t$ OR
https://blog.csdn.net/Linli522362242/article/details/121721868

To figure out the order of an AR model, you need to look at the PACF.

(MA) Moving Average Model

The Moving Average (MA) model assumes that the current value (y_t) is dependent on the error terms including the current error (_t, _(t-1),…). Because error terms are random, there’s no linear relationship between the current value and the error terms.
$\LARGE Y_t = c + \phi_1 e_{t-1} + \phi_2 e_{t-2} + ... \phi_q e_{t-q} + e_t$ OR $\LARGE Y_t = c + \sum_{q=1}^{Q} \phi_q e_{t-q} + e_t$

$\LARGE e_t$ is white noise or current error

To figure out the order of an MA model, you need to look at the ACF.

Autocorrelation( $\large y_t$ , $\large y_{t-1}$ , $\large y_{t-2}$ ,..., $\large y_{t-(k-1)}$ , $\large y_{t-k}$ ) and Partial Autocorrelation( $\large y_t$ and $\large y_{t-k}$ )

The coefficient of correlation between two values in a time series is called the autocorrelation function (ACF) For example the ACF for a time series $\large y_t$ is given by: $\large Corr(y_t, y_{t-k}), k=1,2,...$
This value of k is the time gap being considered and is called the lag. A lag 1 autocorrelation (i.e., k = 1 in the above) is the correlation between values that are one time period apart. More generally, a lag k autocorrelation is the correlation between values that are k time periods apart.滞后 k 自相关是相隔 k 个时间段的值之间的相关性。

The ACF is a way to measure the linear relationship between an observation at time t and the observations at previous times. If we assume an AR(k) model, then we may wish to only measure the association between $\large y_t$ and $\large y_{t-k}$ and filter out the linear influence of the random variables that lie in between (i.e., $\large y_{t-1}$ , $\large y_{t-2}$ ,…, $\large y_{t-(k-1)}$ , which requires a transformation on the time series. Then by calculating the correlation of the transformed time series we obtain the partial autocorrelation function (PACF).

The PACF is most useful for identifying the order of an autoregressive model. Specifically, sample partial autocorrelations that are significantly different from 0 indicate lagged terms of $\large y_{t-k}$ that are useful predictors of $\large y_t$ . It is important that the choice of the order makes sense. For example, suppose you have blood pressure readings for every day over the past two years. You may find that an AR(1) or AR(2) model is appropriate for modeling blood pressure. However, the PACF may indicate a large partial autocorrelation value at a lag of 17, but such a large order for an autoregressive model likely does not make much sense.

Example 1: Google Data

20 : 1 Stock Split ：https://seekingalpha.com/symbol/GOOG/historical-price-quotes

Amazon Announces 20-1 Stock Split

Amazon approved a 20-1 stock split and a $10 billion stock buyback on March 9. The board said the split would “give our employees more flexibility in how they manage their equity in Amazon and make the share price more accessible for people looking to invest.”

When a company splits its stock, that means it divides each existing share into multiple new shares. In a 20-1 stock split, every share of the company’s stock will be split into 20 new shares, each of which would be worth one twentieth of the original share value.

A stock split does not impact a company’s market capitalization—the combined value of all its shares—and it doesn’t change the value of each investor’s stake in the company. It merely increases the number of shares outstanding and decreases the cost of each share.(In fact, it has an impact, the price after splitting will be lower than price/20事实上是有影响的，拆分后的价格会低于 $\large \frac{Price_{perShare}}{20}$ )

The data set (google_stock.txt) consists of n = 105 values which are the closing stock price of a share of Google stock during 2-7-2005 to 7-7-2005. We will analyze the dataset to identify the order of an autoregressive model. A plot of the stock prices versus time is presented in the figure below:

import yfinance as yf
import pandas as pd
 
start_date = '2014-01-01'
end_date = '2018-01-01'
SRC_DATA_FILENAME = 'goog_data.pkl'
 
try:
    goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
    print( 'File found...reading GOOG data')
except:
    print( 'File not found...downloading GOOG data')
    goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
    goog_data2.to_pickle( SRC_DATA_FILENAME )
goog_data2

goog=goog_data2.loc['2005-02-07':'2005-07-07']
goog

If you are careful enough, then you will also notice that the current price of "Adj Close" is the same as the price of "Close", also because of the previous split https://online.stat.psu.edu/stat462/sites/onlinecourses.science.psu.edu.stat462/files/data/google_stock/index.txt

Why is the stock price downloaded from yfinance now different from the previous download, because there is a 1:20 split
为什么现在从yfinance下载的股价不同于之前下载的呢，因为存在1:20的拆分

goog=goog_data2.loc['2005-02-07':'2005-07-07']

import matplotlib.ticker as ticker

goog['Adj Close'].plot(ls='-', lw=0.8, marker='o', label='Daily price')
# 24%12=0： we need 10 xticklabels and 12 is close to 10
plt.gca().xaxis.set_major_locator(ticker.MaxNLocator(12)) 
plt.gca().grid(True)
plt.legend( loc='best' )
plt.xlabel('Date')
plt.ylabel('price' )
plt.setp( plt.gca().get_xticklabels(), rotation=30, horizontalalignment='right' )
plt.show()

Consecutive values appear to follow one another fairly closely, suggesting an autoregression model could be appropriate. We next look at a plot of partial autocorrelations for the data:

import statsmodels.tsa.api as smt
from matplotlib import pyplot

smt.graphics.plot_pacf( goog['Adj Close'],
                        lags=26,
                        alpha=0.05, # α
                        method='ywm', # Yule-Walker without adjustment. Default.
                        ax=pyplot.gca(), 
                        auto_ylims=True,
                        zero=False # Flag indicating whether to include the 0-lag autocorrelation. 
                      )
#plt.autoscale(enable=True, axis='x', tight=True)
plt.show()

Here we notice that there is a significant spike at a lag of 1 and much lower spikes for the subsequent lags. Thus, an AR(1) model would likely be feasible for this data set.

Approximate bounds can also be constructed (as given by the red lines in the plot above) for this plot to aid in determining large values. Approximate (1−α)×100% significance bounds are given by $\large \pm \frac{z_{1-\alpha/2}}{\sqrt{n}}$ Values lying outside of either of these bounds are indicative of an autoregressive process位于这些界限之外的值表示自回归过程.

We next create a lag-1 price variable and consider a scatterplot of price versus this lag-1 variable:

plt.scatter( goog['Adj Close'].shift(1), goog['Adj Close'])
plt.xlabel('lag-1-price', fontsize=14)
plt.ylabel('price', fontsize=14 )

plt.show()

There appears to be a strong linear pattern, affirming that the first-order autoregression model (AR(1)).

The PACF is most useful for identifying the order of an autoregressive model. Specifically, sample partial autocorrelations that are significantly different from 0 indicate lagged terms of $\large y_{t-k}$ that are useful predictors of $\large y_t$ . It is important that the choice of the order makes sense. For example, suppose you have blood pressure readings for every day over the past two years. You may find that an AR(1) or AR(2) model is appropriate for modeling blood pressure. However, the PACF may indicate a large partial autocorrelation value at a lag of 17, but such a large order for an autoregressive model likely does not make much sense.

Example 2(PACF:sm.OLS vs AutoReg): Quake Data

Let $\large y_t$ = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter scale for n = 100 years (earthquakes.txt data obtained from https://earthquake.usgs.gov). The plot below gives a time series plot for this dataset.

earthquakes = pd.read_table('earthquakes.txt')

earthquakes.set_index('Year',inplace=True)
earthquakes

import matplotlib.ticker as ticker

earthquakes['Quakes'].plot(ls='-', lw=0.8, marker='o')
# 24%12=0： we need 10 xticklabels and 12 is close to 10
plt.gca().xaxis.set_major_locator(ticker.MaxNLocator(12)) 
plt.gca().grid(True)
plt.legend( loc='best' )
plt.xlabel('Year')
plt.ylabel('Quakes' )
plt.setp( plt.gca().get_xticklabels(), rotation=30, horizontalalignment='right' )
plt.legend([])
plt.show()

https://online.stat.psu.edu/stat462/node/188/

The plot below gives a plot of the PACF (partial autocorrelation function), which can be interpreted to mean that a third-order autoregression may be warranted since there are notable partial autocorrelations for lags 1 and 3.

import statsmodels.tsa.api as smt
from matplotlib import pyplot

smt.graphics.plot_pacf( earthquakes['Quakes'],
                        lags=26,
                        alpha=0.05,
                        method='ywm', # Yule-Walker without adjustment. Default.
                        ax=pyplot.gca(), 
                        auto_ylims=True,
                        zero=True # Flag indicating whether to include the 0-lag autocorrelation. 
                      )
#plt.autoscale(enable=True, axis='x', tight=True)
plt.show()

called “lollipop plots”

The next step is to do a multiple linear regression with number of quakes as the response variable and lag-1, lag-2, and lag-3 quakes as the predictor variables. In the results below we see that the lag-3 predictor is significant at the 0.05 level (p-value < 0.05 and the lag-1 predictor p-value is also relatively small).

smt.graphics.plot_acf( earthquakes['Quakes'],
                        lags=26,
                        alpha=0.05,
                        #method='ywm', # Yule-Walker without adjustment. Default.
                        ax=pyplot.gca(), 
                        auto_ylims=True,
                        zero=True # Flag indicating whether to include the 0-lag autocorrelation. 
                      )
#plt.autoscale(enable=True, axis='x', tight=True)
plt.show()

called “lollipop plots”
Autocorrelation is the correlation between a time series with a lagged version of itself. The ACF starts at a lag of 0, which is the correlation of the time series with itself and therefore results in a correlation of 1.

Both the ACF and PACF start with a lag of 0, which is the correlation of the time series with itself and therefore results in a correlation of 1.

The difference between ACF and PACF is the inclusion or exclusion of indirect correlations in the calculation.

Autocorrelation( $\large y_t$ , $\large y_{t-1}$ , $\large y_{t-2}$ ,..., $\large y_{t-(k-1)}$ , $\large y_{t-k}$ ) and Partial Autocorrelation( $\large y_t$ and $\large y_{t-k}$ )

Additionally, you can see a blue area in the ACF and PACF plots. This blue area depicts the 95% confidence interval and is an indicator of the significance threshold. That means, anything within the blue area is statistically close to zero and anything outside the blue area is statistically non-zero(statistically significant).

We can make the following observations:

There are several autocorrelations that are significantly non-zero. Therefore, the time series is non-random.
High degree of autocorrelation between adjacent (lag = 3) in PACF plot
Geometric decay in ACF plot

sm.OLS

eq = pd.DataFrame(earthquakes['Quakes'].shift(1))
eq.rename(columns={'Quakes':'lag1Quakes'},
          inplace=True
         )
eq=eq.join(earthquakes['Quakes'].shift(2), how='left')
eq.rename(columns={'Quakes':'lag2Quakes'},
          inplace=True
         )
eq=eq.join(earthquakes['Quakes'].shift(3), how='left')
eq.rename(columns={'Quakes':'lag3Quakes'},
          inplace=True
         )
eq.dropna(inplace=True)
eq

==>dropna==>

import pandas as pd
import statsmodels.api as sm

#https://www.statology.org/sklearn-linear-regression-summary/

x= sm.add_constant(eq[list(eq.columns)]) # insert 1s to first column
# with sklearn
model = sm.OLS( earthquakes['Quakes'][3:], x ).fit()
model.summary()

Quakes AR(3) = 6.4492 + 0.1642 lag1Quakes + 0.0713 lag2Quakes + 0.2693 lag3Quakes
OR AutoReg

from statsmodels.tsa.ar_model import AutoReg
# https://vitalflux.com/autoregressive-ar-models-with-python-examples/
train_data = earthquakes['Quakes'].values
#
# Instantiate and fit the AR model with training data
# Autoregressive AR-X(p) model
# Estimate an AR-X model using Conditional Maximum Likelihood (OLS).
ar_model = AutoReg(train_data, lags=3, 
                   trend='c',# ‘c’ - Constant only.
                   missing='drop',# dropna
                  )
ar_model=ar_model.fit('nonrobust',#The class OLS covariance estimator that assumes homoskedasticity
                       use_t=True #use the Student’s t distribution that accounts for model degree of freedom
                     )
#
# Print Summary
#
print(ar_model.summary())

use_t = True, so the values of z column in the summary result are T-Values

Quakes AR(3) = 6.4492 + 0.1642 y.L1 + 0.0713 y.L2 + 0.2693 y.L3

use_t bool, optional

A flag indicating that inference should use the Student’s t distribution that accounts for model degree of freedom. If False, uses the normal distribution. If None, defers the choice to the cov_type. It also removes degree of freedom corrections from the covariance estimator when cov_type is ‘nonrobust’.

from statsmodels.tsa.ar_model import AutoReg
# https://vitalflux.com/autoregressive-ar-models-with-python-examples/
train_data = earthquakes['Quakes'].values
#
# Instantiate and fit the AR model with training data
# Autoregressive AR-X(p) model
# Estimate an AR-X model using Conditional Maximum Likelihood (OLS).
ar_model = AutoReg(train_data, lags=3, 
                   trend='c',# ‘c’ - Constant only.
                   missing='drop',# dropna
                  )
ar_model=ar_model.fit('nonrobust',#The class OLS covariance estimator that assumes homoskedasticity
                       use_t=False #use the Student’s t distribution that accounts for model degree of freedom
                     )
#
# Print Summary
#
ar_model.summary()

From AutoReg to SARIMAX

from statsmodels.tsa.statespace.sarimax import SARIMAX

ar_model = SARIMAX( earthquakes['Quakes'].values,
                  #order: The (p,d,q) order of the model for 
                  #the autoregressive, differences, and moving average components.
                  #d is always an integer, while p and q may either be integers or lists of integers.
                  order = (3,0,0), # AR(3)
                  trend='c',
                  missing='drop',
                  enforce_stationarity=False,  
                  concentrate_scale=True, # not sigma^2=variance of the error term,

               ).fit(cov_type='oim')

ar_model.summary()

From SARIMAX to ARIMA

from statsmodels.tsa.arima.model import ARIMA

ar_model = ARIMA( earthquakes['Quakes'].values,
                  #order: The (p,d,q) order of the model for 
                  #the autoregressive, differences, and moving average components.
                  #d is always an integer, while p and q may either be integers or lists of integers.
                  order = (3,0,0), # AR(3)
                  trend='c',
                  missing='drop',
                  enforce_stationarity=False,  
                  concentrate_scale=True, # not sigma^2=variance of the error term,

               ).fit(cov_type='oim')

ar_model.summary()

MA(3)

https://towardsdatascience.com/interpreting-acf-and-pacf-plots-for-time-series-forecasting-af0d6db4061c ==>ma(q=3)
mpf6_Time Series Data_quandl_更正kernel PCA_AIC_BIC_trend_log_return_seasonal_decompose_sARIMAx_ADFull_LIQING LIN的博客-CSDN博客

Rules for identifying ARIMA models

from statsmodels.tsa.arima.model import ARIMA
ma_model = ARIMA( earthquakes['Quakes'].values,
                  #order: The (p,d,q) order of the model for 
                  #the autoregressive, differences, and moving average components.
                  #d is always an integer, while p and q may either be integers or lists of integers.
                  order = (0,0,3), # MA(3)
                  trend='c',
                  missing='drop',
                  enforce_stationarity=False,  
                  concentrate_scale=True, # not sigma^2=variance of the error term,
               ).fit(cov_type='oim')

ma_model.summary()

OR we use SARIMAX will have same output result:

from statsmodels.tsa.statespace.sarimax import SARIMAX

ma_model = SARIMAX( earthquakes['Quakes'].values,
                  #order: The (p,d,q) order of the model for 
                  #the autoregressive, differences, and moving average components.
                  #d is always an integer, while p and q may either be integers or lists of integers.
                  order = (0,0,3), # MA(3)
                  trend='c',
                  missing='drop',
                  enforce_stationarity=False,  
                  concentrate_scale=True, # not sigma^2=variance of the error term,
               ).fit(cov_type='oim')

ma_model.summary()

Quakes MA(3) = 12.8397 + 0.1450 ma.L1 + 0.0874 ma.L2 + 0.2163 ma.L3

from statsmodels.tsa.arima.model import ARIMA
ma_model = ARIMA( earthquakes['Quakes'].values,
                  #order: The (p,d,q) order of the model for 
                  #the autoregressive, differences, and moving average components.
                  #d is always an integer, while p and q may either be integers or lists of integers.
                  order = (0,0,3), # MA(3)
                  trend='c',
                  missing='drop',
                  enforce_stationarity=False,  
                  #concentrate_scale=True, # not sigma^2=variance of the error term,
               ).fit(cov_type='oim')

ma_model.summary()

sigma squared represents the variance of the residual values. This value is used to test the normality of residuals against the alternative of non-normality

where $\large \eta_t$ ∼WN(0, $\large \sigma^2$ ) is a white noise process
$\large \phi_p$ is coef term at time t-q: $\LARGE Y_t = c + \phi_1 e_{t-1} + \phi_2 e_{t-2} + ... \phi_q e_{t-q} + e_t$

################

Fact3: Absence of autocorrelation in returns

Investigate this third fact about the absence of autocorrelation in returns回报中不存在自相关.

Autocorrelation (also known as serial correlation) measures how similar is a given time series to the lagged version of itself, over successive time intervals.

1. Define the parameters for creating the autocorrelation plots:

import statsmodels.tsa.api as smt
from matplotlib.collections import PolyCollection

#1. Define the parameters for creating the autocorrelation plots: 
N_lags = 50
Sigificance_Level = 0.05

# plt.rc("figure", figsize=(10,8))
curr_fig, curr_ax = plt.subplots(figsize=(10, 8))
#2. Run the following code to create ACF plot of log returns:
# plot_acf : Plots lags on the horizontal and the correlations on vertical axis.
smt.graphics.plot_acf( df.log_rtn,
                             lags=N_lags,
                             alpha = Sigificance_Level,
                             title='Autocorrelation', zero=True, 
                             auto_ylims=True,#adjusts automatically the y-axis limits to ACF values
                             ax=curr_ax,
                             color='blue',
                             vlines_kwargs={"colors": 'black'}
                           )
for item in curr_ax.collections:
    if type(item)==PolyCollection:
        item.set_facecolor('green')
plt.show()

https://blog.csdn.net/Linli522362242/article/details/121406833

To investigate whether there is significant autocorrelation in returns, we created the autocorrelation plot using plot_acf from the statsmodels library. We inspected 50 lags and used the default alpha=0.05 , which means that we also plotted the 95% confidence interval. Values outside of this interval can be considered statistically significant.

third fact
Only a few values lie outside the confidence interval (we do not look at lag 0) and can be considered statistically significant. We can assume that we have verified that there is no autocorrelation in the log returns series.

Fact 4:Small and decreasing autocorrelation in squared/absolute returns

平方和绝对收益的自相关值较小且递减（autocorrelation range from -1 to +1）

Investigate this fourth fact by creating the ACF plots of squared and absolute returns.

import statsmodels.tsa.api as smt
from matplotlib.collections import PolyCollection
plt.rcParams.update({'font.size': 12})

fig, ax = plt.subplots( 2,1, figsize=(12,12) )

smt.graphics.plot_acf( df.log_rtn**2,
                       lags=N_lags,
                       alpha = Sigificance_Level,
                       #title='Autocorrelation-df.log_rtn**2', 
                       zero=True, 
                       auto_ylims=True,#adjusts automatically the y-axis limits to ACF values
                       ax=ax[0],
                       color='blue',
                       vlines_kwargs={"colors": 'black'},
 
                     )
ax[0].set(ylabel='Squared Returns')
ax[0].set_title('Autocorrelation-df.log_rtn**2', fontsize=14)

smt.graphics.plot_acf( np.abs(df.log_rtn),
                       lags=N_lags,
                       alpha = Sigificance_Level,
                       zero=True, 
                       auto_ylims=True,#adjusts automatically the y-axis limits to ACF values
                       ax=ax[1],
                       color='blue',
                       vlines_kwargs={"colors": 'black'},
                     )
ax[1].set_title('Autocorrelation-np.abs(df.log_rtn)', fontsize=14)
ax[1].set( xlabel='Lag',
           ylabel='Absolute Returns',
         )

for curr_ax in ax:
    for item in curr_ax.collections:
        if type(item)==PolyCollection:
            item.set_facecolor('green')
plt.show()

We can observe the small and decreasing values（<0.5） of autocorrelation for the squared and absolute returns, which are in line with the fourth stylized fact.

Fact5: Leverage effect

For the fifth fact, run the following steps to investigate the existence of the leverage effect.

To investigate it, we used the moving standard deviation (calculated using the rolling method of a pandas DataFrame) as a measure of historical volatility. We used windows of 21 and 252 days, which correspond to one month and one year of trading data.

1. Calculate volatility measures as rolling standard deviations:

$\large SMA=\frac{\sum_{i=1}^{N}P_i}{N}$

To compute standard deviation, first we compute the variance:
Then, standard deviation is simply the square root of the variance:

# Calculate volatility measures as rolling standard deviations:
# 252 trading days per year
# 21 trading days per month
df['Moving_std_252'] = df[['log_rtn']].rolling(window=252).std()
df['Moving_std_21'] = df[['log_rtn']].rolling(window=21).std()

df

plt.style.use('seaborn')
plt.rcParams.update({'font.size': 12})

fig, ax = plt.subplots( 3,1, 
                        figsize=(10,10),
                        sharex=True
                      )

df.adj_close.plot( ax=ax[0] )
ax[0].set( title='S&P 500 time series(adj_close)',
           ylabel='Price ($)',
         )

df.log_rtn.plot( ax=ax[1] )
ax[1].set( ylabel='Log returns (%)' )

df.moving_std_252.plot( ax=ax[2], color='darkred',
                        label='Moving Volatility 252d'
                      )
df.moving_std_21.plot( ax=ax[2], color='g',
                        label='Moving Volatility 21d'
                      )
ax[2].set( ylabel='Moiving Volatility',
           xlabel='Date'
         )
ax[2].legend()

plt.show()

We can now investigate the leverage effect by visually comparing the price series to the (rolling) volatility metric:

This fact states that most measures of an asset's volatility are negatively correlated with its returns, and we can indeed observe a pattern of increased volatility when the prices go down and decreased volatility when they are rising.

We present another method of investigating the leverage effect (fact 5). To do so, we use the VIX (CBOE VIX : Chicago Board Options Exchange (CBOE) Volatility Index(VIX)), which is a popular metric of the stock market's expectation regarding volatility. The measure is implied by option prices on the S&P 500 index该指标隐含在标准普尔 500 指数的期权价格中. We take the following steps:

df = yf.download( ['^GSPC', '^VIX'], 
                  start='1985-01-01', 
                  end='2018-12-31',
                  progress=False
                )
df = df[['Adj Close']]

df.columns = df.columns.droplevel(0)
df = df.rename(columns={'^GSPC': 'sp500', '^VIX': 'vix'})

df

==>droplevel(0) and rename

2. Calculate the log returns (we can just as well use percentage change-simple returns):

df['log_rtn'] = np.log( df.sp500 / df.sp500.shift(1) )
df['vol_rtn'] = np.log( df.vix / df.vix.shift(1) )
df.dropna( how='any', axis=0, inplace=True )

df

3. Plot a scatterplot with the returns on the axes and fit a regression line to identify the trend:

( Pearson product-moment correlation coefficient ) Pearson's r :

We additionally calculated the correlation coefficient between the two series and included it in the title:

# corr() : Compute pairwise correlation of columns, excluding NA/null values.
corr_coeff = df.log_rtn.corr( df.vol_rtn )

ax = sns.regplot( x='log_rtn', y='vol_rtn', data=df, 
                  line_kws={'color': 'red'}
                )
ax.set( title=f'S&P 500 vs. VIX ($\\rho$ = {corr_coeff:.2f})',
        ylabel='VIX log returns',
        xlabel='S&P 500 log returns'
      )

# plt.tight_layout()
plt.show()

( Pearson product-moment correlation coefficient ) Pearson's r :

import statsmodels.api as sm

x_u = df.log_rtn - df.log_rtn.mean()
y_u = df.vol_rtn - df.vol_rtn.mean()
corr_coeff = x_u.dot(y_u) / ( np.sqrt( np.sum(x_u**2) ) * np.sqrt( np.sum(y_u**2) ) )

df.plot( figsize=(10,6),
         kind='scatter',
         x='log_rtn', y='vol_rtn'
       )

ols_fit = sm.OLS( df['vol_rtn'].values,
                  df['log_rtn'].values,
                ).fit()

plt.plot( df['log_rtn'],
          ols_fit.fittedvalues,
          color='r',
        )

plt.title( f'S&P 500 vs. VIX ($\\rho$ = {corr_coeff:.2f})', fontsize=14 )
plt.xlabel('S&P 500 log returns', fontsize=14)
plt.ylabel('VIX log returns', fontsize=14)
 
plt.show()

We can see that both the negative slope of the regression line and a strong negative correlation between the two series confirm the existence of the leverage effect in the return series.

你可能感兴趣的:(人工智能)

【机器学习&深度学习】反向传播机制
目录一、一句话定义二、类比理解三、为什重要？四、用生活例子解释：神经网络=烹饪机器人4.1第一步：尝一口（前向传播）4.2第二步：倒着推原因（反向传播）五、换成人工智能流程说一遍六、图示类比：找山顶（最优参数）七、总结一句人话八、PyTorch代码示例：亲眼看到每一层的梯度九、梯度=损失函数对参数的偏导数十、类比总结反向传播（Backpropagation）是神经网络中训练过程的核心机制，它就像“
潜入思维的海洋：SoftCoT++如何让语言模型更聪明步子哥智能涌现语言模型人工智能自然语言处理
在人工智能的浩瀚星空下，大型语言模型（LLMs）如同一颗颗璀璨的恒星，照亮了从文本生成到复杂推理的广阔领域。然而，这些模型在推理任务中往往像是在迷雾中航行——尽管它们能抵达目的地，却常常因为固定的思维路径而错过更优的航线。2025年5月，一篇题为《SoftCoT++:Test-TimeScalingwithSoftChain-of-ThoughtReasoning》的论文如同一盏明灯，照亮了如何让
BI+AI实战：我们如何用3秒完成车企供应链推演 qq_43696218 人工智能
一、BI+AI引领财务分析新纪元在财务数据分析领域，奥威BI+AI正以革命性的姿态颠覆传统。当金蝶、用友等工具仍深陷报表泥潭时，奥威BI+AI通过深度融合商业智能（BI）与人工智能（AI），实现了从滞后报表到实时洞察的飞跃。这不仅极大地提升了财务分析的效率，更为企业的战略决策提供了前所未有的精准支持。二、BI+AI的核心技术优势‌实时动态分析‌o奥威BI+AI摒弃了静态数据集，依托原始科目余额表实
DeepSeek-V3 通俗详解：从诞生到优势，以及与 GPT-4o 的对比码事漫谈 AI ai
前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站1.DeepSeek的前世今生1.1什么是DeepSeek？DeepSeek是一家专注于人工智能技术研发的公司，致力于打造高性能、低成本的AI模型。它的目标是让AI技术更加普惠，让更多人能够用上强大的AI工具。1.2DeepSeek-V3的诞生DeepSeek-V3是DeepSeek公司推出的最新一代A
企业级AI开发利器：Spring AI框架深度解析与实战_spring ai实战 AI大模型-海文人工智能 spring python 算法开发语言 java 机器学习
企业级AI开发利器：SpringAI框架深度解析与实战一、前言：Java生态的AI新纪元在人工智能技术爆发式发展的今天，Java开发者面临着一个新的挑战：如何将大语言模型（LLMs）和生成式AI（GenAI）无缝融入企业级应用。传统的Java生态缺乏统一的AI集成方案，开发者往往需要为不同AI供应商（如OpenAI、阿里云、HuggingFace）编写大量重复的接口适配代码，这不仅增加了开发成本，
图扑软件智慧云展厅，开启数字化展馆新模式智慧园区可视化 5g 人工智能大数据安全云计算
随着疫情的影响以及新兴技术的不断发展，展会的发展形式也逐渐从线下转向线上。通过“云”上启动、云端互动、双线共频的形式开展。通过应用大数据、人工智能、沉浸式交互等多重技术手段，构建数据共享、信息互通、精准匹配的高精度“云展厅”，突破时空壁垒限制。图扑软件运用HT强大的渲染功能，数字孪生“云展位”，1:1复现实际展厅内部独特的结构造型和建筑特色。也可以第一人称视角漫游，模拟用户在展厅内的参观场景，在保
转行要趁早！网络安全行业人才缺口大，企业招聘需求正旺！
网络安全行业具有人才缺口大、岗位选择多、薪资待遇好、学历要求不高等优势，对于想要转行的人员来说，是一个非常不错的选择。人才缺口大网络安全攻防技术手段日新月异，特别是现在人工智能技术飞速发展，网络安全形势复杂严峻，人才重要性凸显。教育部《网络安全人才实战能力白皮书》数据显示，到2027年，我国网络安全人员缺口将达327万。近期发布的《2024年网络安全产业人才发展报告》中提到，沿用ISC2的人才缺口
【机器学习与数据挖掘实战 | 医疗】案例18：基于Apriori算法的中医证型关联规则分析 Francek Chen 机器学习与数据挖掘实战机器学习数据挖掘 Apriori python 关联规则人工智能
【作者主页】FrancekChen【专栏介绍】⌈⌈⌈机器学习与数据挖掘实战⌋⌋⌋机器学习是人工智能的一个分支，专注于让计算机系统通过数据学习和改进。它利用统计和计算方法，使模型能够从数据中自动提取特征并做出预测或决策。数据挖掘则是从大型数据集中发现模式、关联和异常的过程，旨在提取有价值的信息和知识。机器学习为数据挖掘提供了强大的分析工具，而数据挖掘则是机器学习应用的重要领域，两者相辅相成，共同推动
误差的回响：反向传播算法与神经网络的惊天逆转田园Coder 人工智能科普人工智能科普
当专家系统在20世纪80年代初期大放异彩，成为人工智能实用化的耀眼明星时，另一股曾经被宣判“死刑”的力量——连接主义（神经网络）——正在寒冬的冻土下悄然涌动，孕育着一场惊天动地的复苏。马文·明斯基和西摩·帕尔特在1969年《感知机》专著中那精准而冷酷的理论批判，如同沉重的封印，将多层神经网络的研究禁锢了近二十年。他们指出的核心死结——缺乏有效算法来训练具有隐藏层的网络——仿佛一道无法逾越的天堑。单
【Html实现“心形日出”（附效果+源代码）】| JavaScript面试题：解释一下异步编程中的回调函数、Promise和Async/Await的概念。它们有什么区别？追光者♂ html5 css3 心形日出前端特效 JS面试题 Promise Async/Await
风会带走你曾经存在过的证明。——虞姬作者主页：追光者♂个人简介：[1]计算机专业硕士研究生[2]2023年城市之星领跑者TOP1(哈尔滨)[3]2022年度博客之星人工智能领域TOP4[4]阿里云社区特邀专家博主[5]CSDN-人工智能领域优质创作者无限进步，一起追光！！！
青少年编程与数学 01-012 通用应用软件简介 15 人工智能助手明月看潮生编程与数学第01阶段青少年编程人工智能应用软件编程与数学
青少年编程与数学01-012通用应用软件简介15人工智能助手一、什么是人工智能助手二、人工智能助手的产生和发展（一）早期探索阶段（二）技术突破阶段（三）广泛应用阶段三、人工智能助手的主要功能（一）信息查询（二）日程管理（三）设备控制（四）知识问答四、人工智能助手的商业模式（一）广告收入（二）增值服务（三）数据服务（四）硬件销售五、DeepSeek（一）基本情况（二）技术水平（三）产品功能（四）市场
虚拟空间中的AI协作与任务 AI天才研究院 ChatGPT AI大模型企业级应用开发实战 AI人工智能与大数据大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
虚拟空间与AI概述在当今信息化和数字化的时代，虚拟空间（VirtualSpace）已成为人们生活和工作的重要一部分。虚拟空间是一种通过计算机技术构建的虚拟环境，它能够模拟和增强现实世界中的各种交互和体验。而人工智能（AI）作为计算机科学的一个分支，通过模拟人类的认知能力来实现自动化和智能化的决策。虚拟空间与AI的结合，不仅为人类带来了全新的交互方式，也为各行业的发展注入了强大的动力。虚拟空间的定义
AI Agent: AI的下一个风口智能体在元宇宙里的应用 AI智能应用 Python入门实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
AIAgent:AI的下一个风口智能体在元宇宙里的应用作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming关键词：AIAgent,元宇宙,虚拟角色,智能交互,人工智能,虚拟世界,智能体架构,交互式应用1.背景介绍1.1问题的由来随着虚拟现实(VR)、增强现实(AR)和区块链技术的不断发展，元宇宙(Metaverse)的概念逐渐兴起。元宇宙是一个由虚拟世界
攻击者利用热门AI发动黑帽SEO攻击，通过污染搜索结果传播窃密木马 FreeBuf- 人工智能
伪装成AI主题网站的恶意页面|图片来源：ZscalerZscaler威胁实验室研究人员发现一起精心策划的恶意软件攻击活动，攻击者利用ChatGPT和LumaAI等人工智能(AI)工具的热度，通过黑帽SEO（搜索引擎优化）技术劫持搜索引擎结果，诱导用户落入恶意软件陷阱。Zscaler警告称："这些攻击背后的威胁行为者正在利用ChatGPT和LumaAI等AI工具的热度。"这些欺诈活动至少从2025年
Python/Java/Php/C#/Go/C/C++这几个主力语言，谁到底真的不行 dotNET跨平台 java c#开发语言
1.前言阿里最近又进行了史诗级的大裁员，IT行业肉眼可见的持续性衰退与没落。当潮水退却，才能看出谁在裸泳。作为当今计算机编程界的几大主力语言，谁才真正的裸泳者呢？2.描述1.Python:Python作为一款解释性的动态语言，它很早就诞生了。它的第一个发行版1991年出世，比Java还要早四年。可惜命运不济，一直没有大的作为。到了2014年人工智能的风口悄然兴起，Python一路高歌猛进。到了20
人工智能-基础篇-5-建模方式（判别式模型和生成式模型）
机器学习包括了多种建模方式，其中判别式建模（DiscriminativeModel）和生成式建模是最常见的两种。这两种建模方式都可以通过深度学习技术来实现，并用于创建不同类型的模型。简单来说：想要创建一个模型，依赖需求需要合适的建模方式来创建这个模型。通常建模方式主要分为两大类。一类是判别式模型，针对输入数据给出特定的输出。如：判断一张图片是猫还是狗，直接学习“猫”和“狗”的特征差异（如耳朵形状、
【机器学习】数学基础——张量（傻瓜篇）一叶千舟深度学习【理论】机器学习人工智能
目录前言一、张量的定义1.标量（0维张量）2.向量（1维张量）3.矩阵（2维张量）4.高阶张量（≥3维张量）二、张量的数学表示2.1张量表示法示例三、张量的运算3.1常见张量运算四、张量在深度学习中的应用4.1PyTorch示例：张量在神经网络中的运用五、总结：张量的多维世界延伸阅读前言在机器学习、深度学习以及物理学中，张量是一个至关重要的概念。无论是在人工智能领域的神经网络中，还是在高等数学、物
深度学习详解：通过案例了解机器学习基础 beist 深度学习机器学习人工智能
引言机器学习（MachineLearning，ML）和深度学习（DeepLearning，DL）是现代人工智能领域中的两个重要概念。通过让机器具备学习的能力，机器可以从数据中自动找到函数，并应用于各种任务，如语音识别、图像识别和游戏对战等。在这篇笔记中，我们将通过一个简单的案例，逐步了解机器学习的基础知识。1.1机器学习案例学习1.1.1回归问题与分类问题在机器学习中，根据所要解决的问题类型，任务
【人工智能】微调的秘密武器：释放大模型的无限潜能蒙娜丽宁 Python杂谈人工智能人工智能
《PythonOpenCV从菜鸟到高手》带你进入图像处理与计算机视觉的大门！解锁Python编程的无限可能：《奇妙的Python》带你漫游代码世界在人工智能迅猛发展的今天，大规模语言模型（LLMs）以其强大的通用能力席卷各行各业。然而，如何让这些通用模型在特定领域或任务中发挥最大潜力？答案是微调（Fine-tuning）。本文深入探讨微调的理论基础、技术细节与实践方法，揭示其作为解锁大模型隐藏潜力
昇腾AI生态组件全解析：与英伟达生态的深度对比
随着人工智能技术的快速发展，国产AI芯片的崛起正在改变全球计算产业的格局。华为昇腾（Ascend）系列AI处理器凭借自主创新的达芬奇架构，构建了完整的软硬件生态体系。本文将从核心组件对比、显卡性能对标两个维度，深入剖析昇腾与英伟达（NVIDIA）生态的技术差异与适用场景。一、昇腾核心组件与英伟达对标分析1.推理引擎：MindIEvsTensorRT昇腾MindIE1.0.0基于昇腾芯片的深度学习推
媒体AI关键技术研究阿维同学大模型应用开发人工智能研究报告媒体人工智能 ai AIGC
一、引言随着人工智能技术的迅猛发展，媒体行业正经历前所未有的变革。AI技术不仅重塑了内容生产和传播模式，更为媒体创意发展提供了全新可能。在数字化、移动化和信息爆炸的大背景下，传统媒体面临巨大挑战，而AI技术为行业带来了新的机遇。媒体行业正从搜索驱动向AI驱动的内容发现转变，通过新兴技术的融合创造全新的内容消费体验[[1]]。这种转变不仅提高了内容生产效率，也为受众提供了更加个性化的媒体体验。人工智
智能汽车图像及视频处理方案，支持视频智能包装创作能力美摄科技汽车
在这个日新月异的智能时代，每一帧画面都承载着超越想象的力量。随着自动驾驶技术的飞速发展，智能汽车不仅成为了未来出行的代名词，更是技术与艺术完美融合的典范。在这场变革的浪潮中，美摄科技以创新为翼，推出了领先的智能汽车图像及视频处理方案，为智能汽车行业带来了前所未有的视觉盛宴，重新定义了智能出行的视觉体验。一、智能重塑，视觉新境界美摄科技的智能汽车图像及视频处理方案，是基于深度学习、人工智能及大数据处
利用人工智能做python爬虫
在Python爬虫领域，人工智能（AI）可以从多个维度赋能，提升爬虫的效率、智能性和应对复杂反爬策略的能力。下面从数据提取、反反爬、自动化脚本生成等方面，介绍如何结合AI技术实现更强大的Python爬虫：一、利用大语言模型辅助爬虫开发1.代码生成与优化大语言模型（如GPT系列、文心一言、通义千问等）可以根据自然语言描述快速生成Python爬虫代码。例如，你可以向模型输入“写一个Python爬虫，抓
蜂鸟云平台大更新：地图空间定价重塑与功能全面升级蜂鸟视图fengmap 信息可视化蜂鸟云地图编辑器地图绘制工具室内外地图一体化智慧园区蜂鸟视图
1.引言随着云计算、大数据以及人工智能技术的快速发展，企业对云平台的需求日益增长。蜂鸟云平台作为一款创新性的地图服务平台，已逐渐成为众多企业、政府及科研机构的核心依赖。为了更好地满足用户需求，提高平台的市场竞争力，蜂鸟云平台定期进行功能更新与优化。2024年9月21日，蜂鸟云平台将在晚上20:00至24:00进行一轮重要的系统更新。本次更新的核心内容包括地图空间的重新定价与功能优化，涉及制图、微程
AIGC领域中Copilot的创作效率对比研究 AI大模型应用工坊 AI大模型开发实战 AIGC copilot ai
AIGC领域中Copilot的创作效率对比研究关键词：AIGC、Copilot、创作效率、对比研究、代码创作摘要：本文章聚焦于AIGC领域中Copilot的创作效率对比研究。随着人工智能技术在创作领域的广泛应用，Copilot作为一款具有代表性的创作辅助工具备受关注。文章首先介绍了研究的背景、目的、预期读者等信息，接着阐述了Copilot及相关创作效率的核心概念与联系。通过详细讲解核心算法原理、数
Sklearn 机器学习数值离散化虚拟编码 Thomas Kant 人工智能机器学习 sklearn 人工智能
亲爱的技术爱好者们，热烈欢迎来到Kant2048的博客！我是ThomasKant，很开心能在CSDN上与你们相遇～本博客的精华专栏：【自动化测试】【测试经验】【人工智能】【Python】Sklearn机器学习：数值离散化+虚拟编码实战详解在机器学习的特征工程中，数值型特征并不总是适合直接输入模型。尤其是树模型或分类模型时，**将连续变量进行离散化（分箱）+虚拟编码（独热编码）**是一种常见且高效的
对话云蝠智能：大模型如何让企业呼叫系统从 “成本中心” 变身 “价值枢纽”？ MARS_AI_ 人工智能自然语言处理信息与通信交互
在人工智能重塑企业服务的浪潮中，云蝠智能（南京星蝠科技有限公司旗下品牌）以深厚的技术积累和行业实践，逐步成长为国内智能外呼领域的标杆企业。其发展路径揭示了技术自主创新与场景深度结合的必然性。一、技术架构：全栈自研奠定领先基础云蝠智能的核心竞争力源于其全链路自研技术体系。该架构覆盖语音识别（ASR）、自然语言处理（NLP）、语音合成（TTS）及软交换六大层级，实现从基础设施到操作层的闭环设计。这一分
MCP多模态模式 goodfornothing-s microsoft
多模式整合多模态应用在人工智能领域日益重要，能够实现更丰富的交互和更复杂的任务。模型上下文协议(MCP)提供了一个框架，用于构建能够处理各种类型数据（例如文本、图像和音频）的多模态应用。MCP不仅支持基于文本的交互，还支持多模式功能，允许模型处理图像、音频和其他数据类型。介绍在本课中，您将学习如何构建多模式应用程序。学习目标学完本课后，您将能够：了解多模式选择实现多模式应用程序。多模式支持架构多模
开源即王炸？MiniMax-M1 如何用 MoE 架构实现大模型推理的极致效率。技术程序猿华锋 AIGC资讯开源架构
效率的胜利：MiniMaxM1如何用架构智慧挑战AI的“蛮力时代”楔子：一场必要的豪赌在人工智能的“暴力美学”时代，巨头们用无尽的参数和算力堆砌着通往未来的巴别塔。然而，在上海，一家名为MiniMax的初创公司，却选择了一条截然不同的朝圣路。2023年夏，一个看似疯狂的决定震动了观察圈：MiniMax将80%的资源，悉数押注于底层模型架构的一场革命。这并非一次寻常的技术迭代，而是在资源悬殊的牌局上
OpenAI O3 大模型深度解析：功能、API Key 获取、Python 代码开发教程 (附代码) 技术程序猿华锋 AIGC资讯 python 开发语言 ChatGPT ai
引言：OpenAIo3大模型：新一代推理引擎的崛起人工智能领域正经历着前所未有的飞速发展，其中大型语言模型(LLM)的能力边界不断被拓宽。OpenAI作为该领域的领军者之一，继其广受关注的o1模型之后，推出了新一代的o3大模型系列。这一系列模型的问世，不仅代表了技术的又一次重要迭代，更预示着人工智能在复杂推理和自主能力方面迈向了新的台阶。o3模型的诞生背景与意义OpenAIo3是作为OpenAIo
web前段跨域nginx代理配置刘正强 nginx cms Web
nginx代理配置可参考server部分 server { listen 80; server_name localhost;
spring学习笔记 caoyong spring
一、概述 a>、核心技术 : IOC与AOP b>、开发为什么需要面向接口而不是实现接口降低一个组件与整个系统的藕合程度，当该组件不满足系统需求时，可以很容易的将该组件从系统中替换掉，而不会对整个系统产生大的影响 c>、面向接口编口编程的难点在于如何对接口进行初始化,(使用工厂设计模式)
Eclipse打开workspace提示工作空间不可用 0624chenhong eclipse
做项目的时候，难免会用到整个团队的代码，或者上一任同事创建的workspace， 1.电脑切换账号后，Eclipse打开时，会提示Eclipse对应的目录锁定，无法访问，根据提示，找到对应目录，G:\eclipse\configuration\org.eclipse.osgi\.manager，其中文件.fileTableLock提示被锁定。解决办法，删掉.fileTableLock文件，重
Javascript 面向对面写法的必要性？一炮送你回车库 JavaScript
现在Javascript面向对象的方式来写页面很流行，什么纯javascript的mvc框架都出来了：ember 这是javascript层的mvc框架哦,不是j2ee的mvc框架我想说的是，javascript本来就不是一门面向对象的语言，用它写出来的面向对象的程序，本身就有些别扭，很多人提到js的面向对象首先提的是：复用性。那么我请问你写的js里有多少是可以复用的，用fu
js array对象的迭代方法换个号韩国红果果 array
1.forEach 该方法接受一个函数作为参数，对数组中的每个元素使用该函数 return 语句失效 function square(num) { print(num, num * num); } var nums = [1,2,3,4,5,6,7,8,9,10]; nums.forEach(square); 2.every 该方法接受一个返回值为布尔类型
对Hibernate缓存机制的理解归来朝歌 session 一级缓存对象持久化
在hibernate中session一级缓存机制中，有这么一种情况：问题描述：我需要new一个对象，对它的几个字段赋值，但是有一些属性并没有进行赋值，然后调用 session.save()方法，在提交事务后，会出现这样的情况： 1：在数据库中有默认属性的字段的值为空 2：既然是持久化对象，为什么在最后对象拿不到默认属性的值？通过调试后解决方案如下：对于问题一，如你在数据库里设置了
WebService调用错误合集 darkranger webservice
Java.Lang.NoClassDefFoundError: Org/Apache/Commons/Discovery/Tools/DiscoverSingleton 调用接口出错，一个简单的WebService import org.apache.axis.client.Call;import org.apache.axis.client.Service; 首先必不可
JSP和Servlet的中文乱码处理 aijuans Java Web
JSP和Servlet的中文乱码处理前几天学习了JSP和Servlet中有关中文乱码的一些问题，写成了博客，今天进行更新一下。应该是可以解决日常的乱码问题了。现在作以下总结希望对需要的人有所帮助。我也是刚学，所以有不足之处希望谅解。一、表单提交时出现乱码：在进行表单提交的时候，经常提交一些中文，自然就避免不了出现中文乱码的情况，对于表单来说有两种提交方式：get和post提交方式。所以
面试经典六问 atongyeye 工作面试
题记：因为我不善沟通，所以在面试中经常碰壁，看了网上太多面试宝典，基本上不太靠谱。只好自己总结，并试着根据最近工作情况完成个人答案。以备不时之需。以下是人事了解应聘者情况的最典型的六个问题： 1 简单自我介绍关于这个问题，主要为了弄清两件事，一是了解应聘者的背景，二是应聘者将这些背景信息组织成合适语言的能力。我的回答：(针对技术面试回答，如果是人事面试，可以就掌
contentResolver.query()参数详解百合不是茶 android query()详解
收藏csdn的博客,介绍的比较详细,新手值得一看 1.获取联系人姓名一个简单的例子，这个函数获取设备上所有的联系人ID和联系人NAME。 [java] view plain copy public void fetchAllContacts() {
ora-00054:resource busy and acquire with nowait specified解决方法 bijian1013 oracle 数据库 kill nowait
当某个数据库用户在数据库中插入、更新、删除一个表的数据，或者增加一个表的主键时或者表的索引时，常常会出现ora-00054:resource busy and acquire with nowait specified这样的错误。主要是因为有事务正在执行（或者事务已经被锁），所有导致执行不成功。 1.下面的语句
web 开发乱码征客丶 spring Web
以下前端都是 utf-8 字符集编码一、后台接收 1.1、 get 请求乱码 get 请求中，请求参数在请求头中；乱码解决方法： a、通过在web 服务器中配置编码格式：tomcat 中，在 Connector 中添加URIEncoding="UTF-8"； 1.2、post 请求乱码 post 请求中，请求参数分两部份， 1.2.1、url？参数，
【Spark十六】： Spark SQL第二部分数据源和注册表的几种方式 bit1129 spark
Spark SQL数据源和表的Schema case class apply schema parquet json JSON数据源准备源数据 {"name":"Jack", "age": 12, "addr":{"city":"beijing&
JVM学习之:调优总结 -Xms -Xmx -Xmn -Xss BlueSkator -Xss -Xmn -Xms -Xmx
堆大小设置JVM 中最大堆大小有三方面限制：相关操作系统的数据模型（32-bt还是64-bit）限制；系统的可用虚拟内存限制；系统的可用物理内存限制。32位系统下，一般限制在1.5G~2G；64为操作系统对内存无限制。我在Windows Server 2003 系统，3.5G物理内存，JDK5.0下测试，最大可设置为1478m。典型设置： java -Xmx355
jqGrid 各种参数详解(转帖) BreakingBad jqGrid
jqGrid 各种参数详解分类：源代码分享个人随笔请勿参考解决开发问题 2012-05-09 20:29 84282人阅读评论(22) 收藏举报 jquery 服务器 parameters function ajax string
读《研磨设计模式》-代码笔记-代理模式-Proxy bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.lang.reflect.InvocationHandler; import java.lang.reflect.Method; import java.lang.reflect.Proxy; /* * 下面
应用升级iOS8中遇到的一些问题 chenhbc ios8 升级iOS8
1、很奇怪的问题，登录界面，有一个判断，如果不存在某个值，则跳转到设置界面，ios8之前的系统都可以正常跳转，iOS8中代码已经执行到下一个界面了，但界面并没有跳转过去，而且这个值如果设置过的话，也是可以正常跳转过去的，这个问题纠结了两天多，之前的判断我是在 -(void)viewWillAppear:(BOOL)animated 中写的，最终的解决办法是把判断写在 -(void
工作流与自组织的关系？ comsci 设计模式工作
目前的工作流系统中的节点及其相互之间的连接是事先根据管理的实际需要而绘制好的，这种固定的模式在实际的运用中会受到很多限制，特别是节点之间的依存关系是固定的，节点的处理不考虑到流程整体的运行情况，细节和整体间的关系是脱节的，那么我们提出一个新的观点，一个流程是否可以通过节点的自组织运动来自动生成呢？这种流程有什么实际意义呢？这里有篇论文，摘要是：“针对网格中的服务
Oracle11.2新特性之INSERT提示IGNORE_ROW_ON_DUPKEY_INDEX daizj oracle
insert提示IGNORE_ROW_ON_DUPKEY_INDEX 转自：http://space.itpub.net/18922393/viewspace-752123 在 insert into tablea ...select * from tableb中，如果存在唯一约束，会导致整个insert操作失败。使用IGNORE_ROW_ON_DUPKEY_INDEX提示，会忽略唯一
二叉树:堆 dieslrae 二叉树
这里说的堆其实是一个完全二叉树,每个节点都不小于自己的子节点,不要跟jvm的堆搞混了.由于是完全二叉树,可以用数组来构建.用数组构建树的规则很简单: 一个节点的父节点下标为: (当前下标 - 1)/2 一个节点的左节点下标为: 当前下标 * 2 + 1 &
C语言学习八结构体 dcj3sjt126com c
为什么需要结构体，看代码 # include <stdio.h> struct Student //定义一个学生类型，里面有age, score, sex, 然后可以定义这个类型的变量 { int age; float score; char sex; } int main(void) { struct Student st = {80, 66.6,
centos安装golang dcj3sjt126com centos
#在国内镜像下载二进制包 wget -c http://www.golangtc.com/static/go/go1.4.1.linux-amd64.tar.gz tar -C /usr/local -xzf go1.4.1.linux-amd64.tar.gz #把golang的bin目录加入全局环境变量 cat >>/etc/profile<
10.性能优化-监控-MySQL慢查询 frank1234 性能优化 MySQL慢查询
1.记录慢查询配置 show variables where variable_name like 'slow%' ; --查看默认日志路径查询结果：--不用的机器可能不同 slow_query_log_file=/var/lib/mysql/centos-slow.log 修改mysqld配置文件：/usr /my.cnf[一般在/etc/my.cnf，本机在/user/my.cn
Java父类取得子类类名 happyqing java this 父类子类类名
在继承关系中，不管父类还是子类，这些类里面的this都代表了最终new出来的那个类的实例对象，所以在父类中你可以用this获取到子类的信息！ package com.urthinker.module.test; import org.junit.Test; abstract class BaseDao<T> { public void
Spring3.2新注解@ControllerAdvice jinnianshilongnian @Controller
@ControllerAdvice，是spring3.2提供的新注解，从名字上可以看出大体意思是控制器增强。让我们先看看@ControllerAdvice的实现： @Target(ElementType.TYPE) @Retention(RetentionPolicy.RUNTIME) @Documented @Component public @interface Co
Java spring mvc多数据源配置 liuxihope spring
转自：http://www.itpub.net/thread-1906608-1-1.html 1、首先配置两个数据库 <bean id="dataSourceA" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close&quo
第12章 Ajax（下） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
BW / Universe Mappings blueoxygen BO
BW Element OLAP Universe Element Cube Dimension Class Charateristic A class with dimension and detail objects (Detail objects for key and desription) Hi
Java开发熟手该当心的11个错误 tomcat_oracle java 多线程工作单元测试
#1、不在属性文件或XML文件中外化配置属性。比如，没有把批处理使用的线程数设置成可在属性文件中配置。你的批处理程序无论在DEV环境中，还是UAT（用户验收测试）环境中，都可以顺畅无阻地运行，但是一旦部署在PROD 上，把它作为多线程程序处理更大的数据集时，就会抛出IOException，原因可能是JDBC驱动版本不同，也可能是#2中讨论的问题。如果线程数目可以在属性文件中配置，那么使它成为
推行国产操作系统的优劣 yananay windows linux 国产操作系统
最近刮起了一股风，就是去“国外货”。从应用程序开始，到基础的系统，数据库，现在已经刮到操作系统了。原因就是“棱镜计划”，使我们终于认识到了国外货的危害，开始重视起了信息安全。操作系统是计算机的灵魂。既然是灵魂，为了信息安全，那我们就自然要使用和推行国货。可是，一味地推行，是否就一定正确呢？先说说信息安全。其实从很早以来大家就在讨论信息安全。很多年以前，就据传某世界级的网络设备制造商生产的交

pff1_whylog return Nominal Inflation_CPI_Realized Volati_outlier_distplot_Jarque–Bera_pAcf_sARIMAx

Simple returns VS Log returns

What is an inflation rate?

The consumer price index

What Is the Inflation-Adjusted Return?

example

Example of Inflation-Adjusted Return

Nominal Return vs. Inflation-Adjusted Return

Changing frequency

What is Realized Volatility?

Realized Volatility Formula

Examples of Realized Volatility

Advantages

Disadvantages

Limitation

Important Points

Identifying outliers

Simple moving average

Standard deviation

Investigating stylized facts of asset returns

Fact1:Non-Gaussian distribution of returns回报的非高斯(正态)分布

the binomial distribution

Histogram of returns

Q-Q plot

Jarque–Bera test : can give us reason to reject the null hypothesis

The Jarque-Bera test — Example

Skewness(偏度)

Kurtosis(峰度)

Fact2: Volatility clustering

Making a time series stationary

Autocorrelation

Autocorrelation in Technical Analysis

Example of Autocorrelation

10.2 - Autocorrelation and Time Series Methods

(MA) Moving Average Model

Autocorrelation(,,,...,,) and Partial Autocorrelation(and )

Example 1: Google Data

Amazon Announces 20-1 Stock Split

Example 2(PACF:sm.OLS vs AutoReg): Quake Data

From AutoReg to SARIMAX

Fact3: Absence of autocorrelation in returns

Fact 4:Small and decreasing autocorrelation in squared/absolute returns

Fact5: Leverage effect

你可能感兴趣的:(人工智能)

Autocorrelation( $\large y_t$ , $\large y_{t-1}$ , $\large y_{t-2}$ ,..., $\large y_{t-(k-1)}$ , $\large y_{t-k}$ ) and Partial Autocorrelation( $\large y_t$ and $\large y_{t-k}$ )