LIQING LIN

t3_Predicting the Markets w ML_sklearn_scatter_PairGrid_R-squared_log returns_Lasso_ridge_KNN_SVM_LR

In the last chapter, we learned how to design trading strategies, create trading signals, and implement advanced concepts, such as seasonality in trading instruments. Understanding those concepts in greater detail is a vast field comprising stochastic processes, random walks, martingales, and time series analysis, which we leave to you to explore at your own pace.

So what's next? Let's look at an even more advanced method of prediction and forecasting: statistical inference and prediction. This is known as machine learning, the fundamentals of which were developed in the 1800s and early 1900s and have been worked on ever since. Recently, there has been a resurgence in interest in machine learning algorithms and applications owing to the availability of extremely cost-effective processing power and the easy availability of large datasets. Understanding machine learning techniques in great detail is a massive field at the intersection of linear algebra, multivariate calculus, probability theory, frequentist and Bayesian statistics, and an in-depth analysis of machine learning is beyond the scope of a single book. Machine learning methods, however, are surprisingly easily accessible in Python and quite intuitive to understand, so we will explain the intuition behind the methods and see how they find applications in algorithmic trading. But first, let's introduce some basic concepts and notation that we will need for the rest of this chapter.

This chapter will cover the following topics:

Understanding the terminology and notations
Creating predictive models that predict price movement using linear regression methods
Creating predictive models that predict buy and sell signals using linear classification methods

Understanding the terminology and notations

To develop ideas quickly and build an intuition regarding supply and demand, we have a simple and completely hypothetical dataset of height, weight, and race of a few random samples obtained from a survey. Let's have a look at the dataset:

Let's examine the individual fields:

Height in inches and weight in lbs are continuous data types because they can take on any values, such as 65, 65.123, and 65.3456667.
Race, on the other hand, would be an example of a categorical data type, because there are a finite number of possible values that can go in the field. In this example, we assume that possible race values are Asian, African, and Caucasian.

Now, given this dataset, say our task is to build a mathematical model that can learn from the data we provide it with. The task or objective we are trying to learn in this example is to find the relationship between the weight of a person as it relates to their height and race. Intuitively, it should be obvious that height will have a major role to play (taller people are much more likely to be heavier), and race should have very little impact. Race may have some impact on the height of an individual, but once the height is known, knowing their race also provides very little additional information in guessing/predicting a person's weight. In this particular problem, note that in the dataset, we are also provided the weight of the samples in addition to their height and race.

Since the variable we are trying to learn how to predict is known, this is known as a supervised learning problem. If, on the other hand, we were not provided with the weight variable and were asked to predict whether, based on height and race, someone is more likely to be heavier than someone else, that would be an unsupervised learning problem. For the scope of this chapter, we will focus on supervised learning problems only, since that is the most typical use case of machine learning in algorithmic trading.

Another thing to address in this example is the fact that, in this case, we are trying to predict weight as a function of height and race. So we are trying to predict a continuous variable. This is known as a regression problem, since the output of such a model is a continuous value. If, on the other hand, say our task was to predict the race of a person as a function of their height and weight, in that case, we would be trying to predict a categorical variable type. This is known as a classification problem, since the output of such a model will be one value from a set of finite discrete values.

When we start addressing this problem, we will begin with a dataset that is already available to us and will train our model of choice on this dataset. This process (as you've already guessed) is known as training your model. We will use the data provided to us to guess the parameters of the learning model of our choice (we will elaborate more on what this means later). This is known as statistical inference of these parametric learning models. There are also non-parametric learning models, where we try to remember the data we've seen so far to make a guess as regards new data.

Once we are done training our model, we will use it to predict weight for datasets we haven't seen yet. Obviously, this is the part we are interested in. Based on data in the future that we haven't seen yet, can we predict the weight? This is known as testing your model and the datasets used for that are known as test data. The task of using a model where the parameters were learned by statistical inference to actually make predictions on previously unseen data is known as statistical prediction or forecasting.

We need to be able to understand the metrics of how to differentiate between a good model and a bad model. There are several well known and well understood performance metrics for different models. For regression prediction problems, we should try to minimize the differences between predicted value and the actual value of the target variable. This error term is known as residual errors; larger errors mean worse models and, in regression, we try to minimize the sum of these residual errors, or the sum of the square of these residual errors (squaring has the effect of penalizing large outliers more strongly, but more on that later). The most common metric for regression problems is R^2, which tracks the ratio of explained variance vis-à-vis unexplained variance, but we save that for more advanced texts.

In the simple hypothetical prediction problem of guessing weight based on height and race, let's say the model predicts the weight to be 170 and the actual weight is 160. In this case, the error is 160-170 = -10, the absolute error is | -10| = 10, and the squared error is (-10)^2 =100. In classification problems, we want to make sure our predictions are the same discrete value as the actual value. When we predict a label that is different from the actual label, that is a misclassification or error. Obviously, the higher the number of accurate predictions, the better the model, but it gets more complicated than that. There are metrics such as a confusion matrix(https://blog.csdn.net/Linli522362242/article/details/120093948), a Receiver Operating Characteristic(the ROC curve plots the true positive rate (another name for recall) against the false positive rate(TPR vs FPR)), and the area under the curve(https://blog.csdn.net/Linli522362242/article/details/103786116, A perfect classifier will have a ROC AUC equal to 1, whereas a purely random classifier will have a ROC AUC equal to 0.5.), but we save those for more advanced texts. Let's say, in the modified hypothetical problem of guessing race based on height and weight, that we guess the race to be Caucasian while the correct race is African. That is then considered an error, and we can aggregate all such errors to find the aggregate errors across all predictions, but we will talk more on this in the later parts of the book.

So far, we have been speaking in terms of a hypothetical example, but let's tie the terms we've encountered so far into how it applies to financial datasets. As we mentioned, supervised learning methods are most common here because, in historical financial data, we are able to measure the price movements from the data. If we are simply trying to predict that, if a price moves up or down from the current price, then that is a classification problem with two prediction labels – Price goes up and Price goes down. There can also be three prediction labels since Price goes up, Price goes down, and Price remains the same. If, however, we want to predict the magnitude and direction of price moves, then this is a regression problem where an example of the output could be Price moves +10.2 dollars, meaning the prediction is that the price will move up by $10.2. The training dataset is generated from historical data, and this can be historical data that was not used in training the model and the live market data实时市场数据 during live trading. We measure the accuracy of such models with the metrics we listed above in addition to the PnL盈亏 generated from the trading strategies. With this introduction complete, let's now look into these methods in greater detail, starting with regression methods.

Exploring our financial dataset

Before we start applying machine learning techniques to build predictive models, we need to perform some exploratory data wrangling[ˈræŋɡlɪŋ]数据整理 on our dataset with the help of the steps listed here. This is often a large and an underestimated prerequisite when it comes to applying advanced methods to financial datasets.

Getting the data: We'll continue to use Google stock data that we've used in our previous chapter:

import pandas as pd
from pandas_datareader import data

def load_financial_data( start_date, end_date, output_file='', stock_symbol='GOOG' ):
    if len(output_file) == 0:
        output_file = stock_symbol+'_data_large.pkl'
        
    try:
        df = pd.read_pickle( output_file )
        print( "File data found. . . reading {} data".format(stock_symbol) )
    except FileNotFoundError:
        print( "File not found. . . downloading the {} data".format(stock_symbol) )
        df = data.DataReader( stock_symbol, 'yahoo', start_date, end_date)
        df.to_pickle( output_file )
            
    return df

In the code, we revisited how to download the data and implement a method, load_financial_data , which we can use moving forward. It can also be invoked, as shown in the following code, to download 17 years' of daily Google data:

goog_data = load_financial_data( start_date='2001-01-01',
                                 end_date='2018-01-01',
                               )
goog_data.head()

The code will download financial data over a period of 17 years from GOOG stock data. Now, let's move on to the next step.

t3_Predicting the Markets w ML_sklearn_scatter_PairGrid_R-squared_log returns_Lasso_ridge_KNN_SVM_LR_第2张图片

Creating objectives/trading conditions that we want to predict: Now that we know how to download our data, we need to operate on it to extract our target for the predictive models, also known as a response or dependent variable; effectively, what we are trying predict.

In our hypothetical example of predicting weight, weight was our response variable. For algorithmic trading, the common target is to be able to predict what the future price will be so that we can take positions in the market right now that will yield a profit in the future. If we model the response variable as future price - current price, then we are trying to predict the direction of the future price with regard to the current price (does it go up, does it go down, or does it remain the same), as well as the magnitude of the price change. So, these variables look like +10, +3.4, -4, and so on. This is the response variable methodology that we will use for regression models, but we will look at it in greater detail later. Another variant of the response variable would be to simply predict the direction but ignore the magnitude, in other words, +1 to signify the future price moving up, -1 to signify the future price moving down, and 0 to signify that the future price remains the same as the current price. That is the response variable methodology that we will use for classification models, but we will explore that later. Let's implement the following code to generate these response variables:

---The classification response variable is +1 if the close price tomorrow is higher than the close price today, and -1 if the close price tomorrow is lower than the close price today.

---For this example, we assume that the close price tomorrow is not the same as the close price today, which we can choose to handle by creating a third categorical value, 0.
```
def create_classification_trading_condition( df ):
    df['Open-Close'] = df.Open - df.Close
    df['High-Low'] = df.High - df.Low
    df = df.dropna( axis=0)
    X = df[ ['Open-Close', 'High-Low'] ]
    #             the close price tomorrow > the close price today 
    Y = np.where( df['Close'].shift(-1) > df['Close'], 
                  1, -1
                )
    return (X,Y)
```
The regression response variable is Close price tomorrow-Close price today for each day.

---It is a positive value if the price goes up tomorrow, a negative value if the price goes down tomorrow, and zero if the price does not change.
---The sign of the value indicates the direction, and the magnitude of the response variable captures the magnitude of the price move.
```
def create_regression_trading_condition( df ):
    df['Open-Close'] = df.Open - df.Close
    df['High-Low'] = df.High - df.Low
    # the difference between the close price tomorrow and the close price today
    df['Target'] = df['Close'].shift(-1) - df['Close']
    
    df = df.dropna( axis=0 ) # the last item after doing shift(-1) will be nan 
    X = df[ ['Open-Close', 'High-Low'] ]
    Y = df[['Target']]
    return (df, X,Y)
```
Partitioning datasets into training and testing datasets:
One of the key questions regarding a trading strategy is how it will perform on market conditions or datasets that the trading strategy has not seen. Trading performance on datasets that have not been used in training the predictive model is often referred to as out-sample performance for that trading strategy. These results are considered representative of what to expect when the trading strategy is run in live markets. Generally, we divide all of our available datasets into multiple partitions, and then we evaluate models trained on one dataset over a dataset that wasn't used in training it (and optionally validated on yet another dataset after that). For the purpose of our models, we will be partitioning our dataset into two datasets: training and testing.

---We used a default split ratio of 80%, so 80% of the entire dataset is used for training, and the remaining 20% is used for testing.
---There are more advanced splitting methods to account for distributions of underlying data (such as we want to avoid ending up with a training/testing dataset that is not truly representative of actual market conditions).
```
from sklearn.model_selection import train_test_split

def create_train_split_group( X,y, split_ratio=0.8 ):
    # shufflebool, default=True
    # Whether or not to shuffle the data before splitting. 
    # If shuffle=False then stratify must be None.
    # stratify
    # https://blog.csdn.net/Linli522362242/article/details/103387527
    return train_test_split( X, Y, shuffle=False, # since the stock data is a kind of time-series data
                             train_size = split_ratio
                           )
```

Creating predictive models using linear regression methods

Now that we know how to get the datasets that we need, how to quantify what we are trying to predict (objectives), and how to split data into training and testing datasets to evaluate our trained models on, let's dive into applying some basic machine learning techniques to our datasets:

First, we will start with regression methods, which can be linear as well as non-linear.
Ordinary Least Squares (OLS, https://blog.csdn.net/Linli522362242/article/details/111307026) is the most basic linear regression model, which is where we will start from.
Then, we will look into Lasso and Ridge OR regression(https://blog.csdn.net/Linli522362242/article/details/104070847 and https://blog.csdn.net/Linli522362242/article/details/111307026), which are extensions of OLS, but which include regularization and shrinkage features (we will discuss these aspects in more detail later).
Elastic Net is a combination of both Lasso and Ridge regression methods.
Finally, our last regression method will be decision tree regression, which is capable of fitting non-linear models.

Ordinary Least Squares

Given observations of the target variables, m x 1 rows of features values, and each row of dimension 1 x n, OLS seeks to find the weights of dimension that minimize the residual sum of squares of differences between the target variable and the predicted variable predicted by linear approximation:

, which is the best fit for the equation , where
--- X is the matrix of feature values,
--- W is the n x 1 matrix/vector of weights/coefficients assigned to each of the feature values, and
--- y is the m x 1 matrix/vector of the target variable observation on our training dataset.

Here is an example of the matrix operations involved for m = 4 and n = 2 :
Intuitively, it is very easy to understand OLS with a single feature variable and a single target variable by visualizing it as trying to draw a line that has the best fit.
OLS is just a generalization of this simple idea in much higher dimensions, where m is tens of thousands of observations, and n is thousands of features values.
The typical setup in m is much larger than n (many more observations in comparison to the number of feature values), otherwise the solution is not guaranteed to be unique.
There are closed form solutions to this problem where OR (Equation 4-4. Normal Equationhttps://blog.csdn.net/Linli522362242/article/details/104005906) but, in practice, these are better implemented by iterative solutions, but we'll skip the details of all of that for now.
The reason why we prefer to minimize the sum of the squares of the error terms is so that massive outliers are penalized more harshly and don't end up throwing off the entire fit.

There are many underlying assumptions for OLS in addition to the assumption that

the target variable is a linear combination of the feature values, such as
the independence of feature values themselves, and
normally distributed error terms.

The following diagram is a very simple example showing a relatively close linear relationship between two arbitrary variables. Note that it is not a perfect linear relationship, in other words, not all data points lie perfectly on the line and we have left out省略了 the X and Y labels because these can be any arbitrary variables. The point here is to demonstrate an example of what a linear relationship visualization looks like. Let's have a look at the following diagram:

1. start by loading up Google data in the code, using the same method that we introduced in the previous section:

goog_data = load_financial_data( start_date='2001-01-01',
                                 end_date='2018-01-01',
                                 output_file='goog_data_large.pkl'
                               )
goog_data.head()

2. Now, we create and populate the target variable vector, Y, for regression in the following code. Remember that what we are trying to predict in regression is magnitude and the direction of the price change from one day to the next:

# def create_regression_trading_condition( df ):
#     df['Open-Close'] = df.Open - df.Close
#     df['High-Low'] = df.High - df.Low
#     df = df.dropna( axis=0 )
#     X = df[ ['Open-Close', 'High-Low'] ]
#     # the difference between the close price tomorrow and the close price today
#     Y = df['Target'] = df['Close'].shift(-1) - df['Close']
#     return (df, X,Y)

goog_data, X, Y = create_regression_trading_condition( goog_data )

goog_data.head()

3. With the help of the code, let's quickly create a scatter plot for the two features we have: High-Low price of the day and Open-Close price of the day against the target variable, which is Price-Of-Next-Day - Price-Of-Today (future price):

import matplotlib.pyplot as plt

pd.plotting.scatter_matrix( goog_data[['Open-Close', 'High-Low', 'Target']],
                            grid=True,
                            figsize=(10,6),
                            diagonal='kde'# kernel density estimate
                          )# computing an estimate of a continuous probability distribution
                           # that might have generated the observed data
plt.show()

Using this scatter matrix, we can now quickly eyeball how the data is distributed and whether it contains outliers. For example, we can see in the kde (the lower right subplot in the scatter plot matrix) that the Target variable seems to be normally distributed but contains several outliers. Besides, the relationship between the variables (Open-Close and High-Low) and the target variable (target) is not linear.https://blog.csdn.net/Linli522362242/article/details/111307026
#####################

import seaborn as sns
import matplotlib.pyplot as plt


# g = sns.pairplot( goog_data[cols],
#                   # If True, don’t add axes to the upper (off-diagonal) triangle of the grid 
#                   corner=False,
#                   height=1.5,
#                   aspect=2, # Aspect * height gives the width (in inches) of each facet.
#                   diag_kind='kde'
#                 )

g = sns.PairGrid( goog_data[cols],
                  height=1.5,
                  aspect=2
                  )
# g = g.map( sns.scatterplot )
g.map_diag( sns.kdeplot )
g.map_offdiag( plt.scatter ) # since I want to display all xlabels and ylabels
# g.map_offdiag( sns.scatterplot )


# sns.despine( left=False, 
#              bottom=False,
#              # right=False,top=False
#            )

# remove the upper axes and better than setting corner=False,
# g.fig.get_axes()
# return:
# [,
#  ,
#  ,
#  ,
#  ,
#  ,
#  ,
#  ,
#  ]
for ax in g.fig.get_axes():
    if ax.get_geometry()[2] in [2,3,6]:
        ax.remove()

xlabels,ylabels = [],[]        
for ax in g.axes[-1,:]: # g.axes[-1,:] get the last row axes  
    xlabel = ax.xaxis.get_label_text()
    xlabels.append(xlabel)
for ax in g.axes[:,0]: # g.axes[:,0] get the first column axes
    ylabel = ax.yaxis.get_label_text()
    ylabels.append(ylabel)

for row in range( len(ylabels) ):
    for col in range( len(xlabels) ):
        if g.axes[row,col] != None :
            g.axes[row,col].xaxis.set_label_text( xlabels[col] )
            g.axes[row,col].yaxis.set_label_text( ylabels[row] )

plt.subplots_adjust( top=1.5 )
plt.show()

g.map_offdiag( sns.scatterplot ) instead of g.map_offdiag( plt.scatter )

#####################

4. Finally, as shown in the code, let's split 80% of the available data into the training feature value and target variable set ( X_train , Y_train ), and the remaining 20% of the dataset into the out-sample testing feature value and target variable set ( X_test , Y_test ):

X_train, X_test, Y_train, Y_test = create_train_split_group( X,Y, split_ratio=0.8 )
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

X_train.head()

Y_train.head()

5. Now, let's fit the OLS model as shown here and observe the model we obtain:

conda install sklearn
conda install -c anaconda scikit-learn

from sklearn import linear_model

# Fit the model
ols = linear_model.LinearRegression()
ols.fit( X_train, Y_train )

https://blog.csdn.net/Linli522362242/article/details/104005906

6. The coefficients are the optimal weights assigned to the two features by the fit method. We will print the coefficients as shown in the code:

print( 'Intercept: \n', ols.intercept_,
       '\nCoefficients: \n' , ols. coef_
     )

7. The next block of code quantifies two very common metrics that test goodness of fit for the linear model we just built. Goodness of fit means how well a given model fits the data points observed in training and testing data. A good model is able to closely fit most of the data points and errors/deviations between observed and predicted values are very low. Two of the most popular metrics for linear regression models are mean_squared_error, which is what we explored as our objective to minimize when we introduced OLS, and R-squared (), which is another very popular metric that measures how well the fitted model predicts the target variable when compared to a baseline model whose prediction output is always the mean of the target variable based on training data, that is, .
############################https://blog.csdn.net/Linli522362242/article/details/111307026
Let's compute the MSE of our training and test predictions:

from sklearn.metrics import mean_squared_error
 
print('MSE train: %.3f, test: %.3f' % ( mean_squared_error(y_train, y_train_pred),
                                        mean_squared_error(y_test, y_test_pred)
                                      ) )

You can see that the MSE on the training dataset is 19.96, and the MSE on the test dataset is much larger, with a value of 27.20, which is an indicator that our model is overfitting the training data in this case. However, please be aware that the MSE is unbounded in contrast to the classification accuracy, for example. In other words, the interpretation of the MSE depends on the dataset and feature scaling. For example, if the house prices were presented as multiples of 1,000 (with the K suffix后缀), the same model would yield a lower MSE compared to a model that worked with unscaled features. To further illustrate this point, ($10K − 15K)^2 < ($10,000 − $15,000)^2 .

Thus, it may sometimes be more useful to report the coefficient of determination决定系数 () , which can be understood as a standardized version of the MSE, for better interpretability of the model's performance. Or, in other words, is the fraction of response variance响应方差的分数 that is captured by the model. The value is defined as:
OR

Here, SSE is the sum of squared errors(OR the sum of squared of residuals)
This yields a list of errors squared, which is then summed and equals the unexplained variance.
and SST is the total sum of squares(total variance):
the average actual value y.

Let's quickly show that is indeed just a rescaled version of the MSE:
#rescaled by the variance of y :
For the training dataset, the is bounded between 0 and 1, but it can become negative for the test dataset. If = 1, the model fits the data perfectly with a corresponding MSE = 0(since Var(y)>0).

Evaluated on the training data, the of our model is 0.765, which doesn't sound too bad. However, the on the test dataset is only 0.673, which we can compute by executing the following code:

from sklearn.metrics import r2_score
 
print('R^2 train: %.3f, test: %.3f' % ( r2_score(y_train, y_train_pred),
                                        r2_score(y_test, y_test_pred)
                                      ) )

Coefficient of determination, in statistics, (or r^2), a measure that assesses the ability of a model to predict or explain an outcome in the linear regression setting. More specifically, R2 indicates the proportion of the variance方差 in the dependent variable 因变量(Y) that is predicted or explained by linear regression and the predictor variable (X, also known as the independent variable自变量).

The coefficient of determination shows only association. As with linear regression, it is impossible to use to determine whether one variable causes the other. In addition, the coefficient of determination shows only the magnitude of the association, not whether that association is statistically significant.

In general, a high R2 value indicates that the model is a good fit for the data, although interpretations of fit depend on the context of analysis.

An R2 of 0.35, for example,
indicates that 35 percent of the variation变动,差异 in the outcome has been explained just by predicting the outcome using the covariates(协变量, ) included in the model
表明仅通过使用模型中包含的协变量, 预测结果即可解释结果差异的35％.
That percentage might be a very high portion of variation to predict in a field such as the social sciences;
in other fields, such as the physical sciences, one would expect R2 to be much closer to 100 percent. The theoretical minimum R2 is 0. However, since linear regression is based on the best possible fit, R2 will always be greater than zero, even when the predictor(X) and outcome variables(Y) bear no relationship to one another.

R2 increases when a new predictor variable is added to the model, even if the new predictor is not associated with the outcome. To account for that effect, the adjusted R2 (typically denoted with a bar over the R in R2) incorporates the same information as the usual but then also penalizes for the number(k) of predictor variables included in the model. As a result, R2 increases as new predictors are added to a multiple linear regression model, but the adjusted R2 increases only if the increase in R2 is greater than one would expect from chance alone仅当新项对模型的改进超出偶然的预期时，the adjusted R2 才会增加. It decreases when a predictor improves the model by less than expected by chance.In such a model, the adjusted R2 is the most realistic estimate of the proportion of the variation that is predicted by the covariates included in the model.
############################

We will skip the exact formulas for computing but, intuitively, the closer the value to 1, the better the fit, and the closer the value to 0, the worse the fit. Negative values mean that the model fits worse than the baseline model. Models with negative values usually indicate issues in the training data or process and cannot be used:
VVVVVVVVVVVVVV

from sklearn.metrics import mean_squared_error, r2_score

# The mean square error
print( "Mean squared error: %.2f" % mean_squared_error( Y_train,
                                                        ols.predict( X_train )
                                                      )
     )
# Explained variance score: 1 is perfect prediction
print( "Variance score: %.2f" % r2_score( Y_train,
                                          ols.predict(X_train)
                                        )
     )
# The mean square error
print( "Mean squared error: %.2f" % mean_squared_error( Y_test,
                                                        ols.predict(X_test)
                                                      )
     )
# Explained variance score: 1 is perfect prediction
print( "Variance score: %.2f" % r2_score( Y_test,
                                          ols.predict(X_test)
                                        )
     )

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

error_list=Y_test-Y_predicted
np.any( np.isnan( error_list ) )

error_list[-3:]

Y_predicted[-3:]

Y_test[-3:]

I forgot to dropna after df.shift(-1)

if not dropna in the data process:

from sklearn.metrics import mean_squared_error, r2_score

# The mean square error
print( "Mean squared error: %.2f" % mean_squared_error( Y_train,
                                                        ols.predict( X_train )
                                                      )
     )
# Explained variance score: 1 is perfect prediction
print( "Variance score: %.2f" % r2_score( Y_train,
                                          ols.predict(X_train)
                                        )
     )
# The mean square error
print( "Mean squared error: %.2f" % mean_squared_error( Y_test[:-1],
                                                        ols.predict(X_test[:-1])
                                                      )
     )
# Explained variance score: 1 is perfect prediction
print( "Variance score: %.2f" % r2_score( Y_test[:-1],
                                          ols.predict(X_test[:-1])
                                        )
     )

Negative values mean that the model fits worse than the baseline model. Models with negative values usually indicate issues in the training data or process and cannot be used
^^^^^^^^^^^^^^^^^^^^

from sklearn.metrics import mean_squared_error, r2_score

# The mean square error
print( "Mean squared error: %.2f" % mean_squared_error( Y_train,
                                                        ols.predict( X_train )
                                                      )
     )
# Explained variance score: 1 is perfect prediction
print( "Variance score: %.2f" % r2_score( Y_train,
                                          ols.predict(X_train)
                                        )
     )
# The mean square error
print( "Mean squared error: %.2f" % mean_squared_error( Y_test,
                                                        ols.predict(X_test)
                                                      )
     )
# Explained variance score: 1 is perfect prediction
print( "Variance score: %.2f" % r2_score( Y_test,
                                          ols.predict(X_test)
                                        )
     )

8. Finally, as shown in the code, let's use it to predict prices and calculate strategy returns:

Normal log returns
Log returns between two times 0 < s < t are normally distributed.
Log-normal values
At any time t > 0, the values are log-normally distributed.

The regression response variable is Close price tomorrow-Close price today for each day.
---It is a positive value if the price goes up tomorrow, a negative value if the price goes down tomorrow, and zero if the price does not change.
---The sign of the value indicates the direction, and the magnitude of the response variable captures the magnitude of the price move.

# # the difference between the close price tomorrow and the close price today 
# df['Target'] = df['Close'].shift(-1) - df['Close']
goog_data['Predicted_Signal'] = ols.predict(X)
# Normal log returns         log( Close price today ) - log( Close price yesterday )
goog_data['GOOG_Returns'] = np.log( goog_data['Close']/goog_data['Close'].shift(1) )

def calculate_return( df, split_value, symbol ):
    cum_goog_return = df[split_value:][ "%s_Returns" % symbol ].cumsum() * 100
    # Calculates the log returns of the trading strategy 
    # given the prediction values and the benchmark log returns.
    #                        log actual return today   * Predicted return today    
    df['Strategy_Returns'] = df["%s_Returns" % symbol] * df['Predicted_Signal'].shift(1)
    # for classification
    # df['Strategy_Returns']=df["%s_Returns" % symbol] * np.sign( df['Predicted_Signal'].shift(1) )
    return cum_goog_return

def calculate_strategy_return( df, split_value, symbol ):
    cum_strategy_return = df[split_value:]['Strategy_Returns'].cumsum() * 100
    return cum_strategy_return

cum_goog_return = calculate_return( goog_data, 
                                    split_value=len(X_train), symbol='GOOG' )
cum_strategy_return = calculate_strategy_return( goog_data, 
                                                 split_value=len(X_train), symbol='GOOG' )

def plot_chart( cum_symbol_return, cum_strategy_return, symbol ):
    plt.figure( figsize=(15,6) )
    plt.plot( cum_symbol_return, label='%s Returns' % symbol )
    plt.plot( cum_strategy_return, label='Strategy Returns' )
    plt.legend()
    plt.show()
    
plot_chart( cum_goog_return, cum_strategy_return, symbol='GOOG' )

The simplified approach taken here does not account for transaction costs

Here, we can observe that the simple linear regression model using only the two features,
Open-Close and High-Low, returns positive returns. However, it does not outperform the
Google stock's return because it has been increasing in value since inception[ɪnˈsepʃn]. But since that cannot be known ahead of time, the linear regression model, which does not assume/expect increasing stock prices, is a good investment strategy.

# Sharpe ratio: The risk-adjusted return. This ratio is important 
# because it compares the return of the strategy with a risk-free strategy
def sharpe_ratio( symbol_returns, strategy_returns ):
    strategy_std = strategy_returns.std()
    sharpe = (strategy_returns-symbol_returns) / strategy_std
    return sharpe.mean()

print( sharpe_ratio(cum_strategy_return, cum_goog_return) )

goog_data.shift(1).tail(10)

VVVVVVVVVVVVVV

def calculate_return( df, split_value, symbol ):
    cum_goog_return = df[split_value:][ "%s_Returns" % symbol ].cumsum() * 100
    # Calculates the log returns of the trading strategy 
    # given the prediction values and the benchmark log returns.
    #                        log actual return today   * Predicted return today    
    # df['Strategy_Returns'] = df["%s_Returns" % symbol] * df['Predicted_Signal'].shift(1)
    # for classification
    df['Strategy_Returns']=df["%s_Returns" % symbol] * np.sign( df['Predicted_Signal'].shift(1) )
    return cum_goog_return

cum_goog_return = calculate_return( goog_data, 
                                    split_value=len(X_train), symbol='GOOG' )
cum_strategy_return = calculate_strategy_return( goog_data, 
                                                 split_value=len(X_train), symbol='GOOG' )
plot_chart( cum_goog_return, cum_strategy_return, symbol='GOOG' )

print( sharpe_ratio(cum_strategy_return, cum_goog_return) )

 np.sign( goog_data['Predicted_Signal'].shift(1) )[-10:]

^^^^^^^^^^^^^^^^^^^^

Regularization and shrinkage – LASSO and Ridge regression

Now that we have covered OLS, we will try to improve on that by using regularization and coefficient shrinkage using LASSO and Ridge regression. One of the problems with OLS is that occasionally, for some datasets,

the coefficients assigned to the predictor variables can grow to be very large. Also,
OLS can end up assigning non-zero weights to all predictors and the total number of predictors in the final predictive model can be a very large number.

Regularization tries to address both problems, that is, the problem of too many predictors and the problem of predictors with very large coefficients. Too many predictors in the final model is disadvantageous because it leads to overfitting, in addition to requiring more computations to predict. Predictors with large coefficients are disadvantageous because a few predictors with large coefficients can overpower the entire model's prediction, and small changes in predictor values can cause large swings in predicted output. We address this by introducing the concepts of regularization and shrinkage.

Regularization is the technique of introducing a penalty term on the coefficient weights and making that a part of the mean squared error, which regression tries to minimize. Intuitively, what this does is that it will let coefficient values grow, but only if there is a comparable decrease in MSE values. Conversely, if reducing the coefficient weights doesn't increase the MSE values too much, then it will shrink those coefficients. The extra penalty term is known as the regularization term, and since it results in a reduction of the magnitudes of coefficients, it is known as shrinkage.

Depending on the type of penalty term involving magnitudes of coefficients, it is either L1 regularization or L2 regularization. When the penalty term is the sum of the absolute values of all coefficients, this is known as L1 regularization (LASSO), and, when the penalty term is the sum of the squared values of the coefficients, this is known as L2 regularization (Ridge)OR . It is also possible to combine both L1 and L2 regularization, and that is known as elastic net regression. To control how much penalty is added because of these regularization terms, we control it by tuning the regularization hyperparameter. In the case of elastic net regression, there are two regularization hyperparameters, one for the L1 penalty and the other one for the L2 penalty.
##############
https://blog.csdn.net/Linli522362242/article/details/104070847

If α = 0 then Ridge RegressionOR is just Linear Regression.
If α(or ) is very large, then all weights end up very close to zero and the result is a flat line going through the data’s mean
alpha float, default=1.0 sklearn.linear_model.Lasso

Constant that multiplies the L1 term. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

Note how increasing α leads to flatter (i.e., less extreme, more reasonable) predictions; this reduces the model’s variance(preferring a simpler model, try to underfit) but increases its bias.

in the preceding figure, we can see that the contour of the cost function touches the L1 diamond at . Since the contours of an L1 regularized system are sharp, it is more likely that the optimum—that is, the intersection between the ellipses of the cost function and the boundary of the L1 diamond—is located on the axes, which encourages sparsity. The mathematical details of why L1 regularization can lead to sparse solutions are beyond the scope of this book. If you are interested, an excellent section on L2 versus L1 regularization can be found in section 3.4 of The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Springer.https://blog.csdn.net/Linli522362242/article/details/108230328

by increasing the regularization strength via the regularization parameter , we shrink the weights towards zero and decrease the dependence of our model on the training data. Let's illustrate this concept in the following figure for the L2 penalty term.

The quadratic L2 regularization term is represented by the shaded ball. Here, our weight coefficients cannot exceed our regularization budget—the combination of the weight coefficients###W=w1, w2, w3...wn### cannot fall outside the shaded area. On the other hand, we still want to minimize the cost function(such as The term is just added for our convenience https://blog.csdn.net/Linli522362242/article/details/96480059). Under the penalty constraint, our best effort is to choose the point where the L2 ball intersects with the contours of the unpenalized cost function. The larger the value of the regularization parameter gets, the faster the penalized cost functiongrows, which leads to a narrower L2 ball. For example, if we increase the regularization parametertowards infinity, the weight coefficients will become effectively zero, denoted by the center of the L2 ball. To summarize the main message of the example: our goal is to minimize the sum of the unpenalized cost function plus the penalty term, which can be understood as adding bias and preferring a simpler model to reduce the variance(try to underfit) in the absence of sufficient training data to fit the model.

写成矩阵形式
可以简单地看出岭回归的解为 OR （Ridge Regression closed-form solution, 本质在自变量信息矩阵的主对角线元素上人为地加入一个非负因子）
https://blog.csdn.net/Linli522362242/article/details/111307026
##############

Let's apply Lasso regression to our dataset and inspect the coefficients in the following code. With a regularization parameter of 0.1, we see that the first predictor gets assigned a coefficient that is roughly half of what was assigned by OLS:

import pandas as pd
from pandas_datareader import data

def load_financial_data( start_date, end_date, output_file='', stock_symbol='GOOG' ):
    if len(output_file) == 0:
        output_file = stock_symbol+'_data_large.pkl'
        
    try:
        df = pd.read_pickle( output_file ).astype(np.float32)
        print( "File data found. . . reading {} data".format(stock_symbol) )
    except FileNotFoundError:
        print( "File not found. . . downloading the {} data".format(stock_symbol) )
        df = data.DataReader( stock_symbol, 'yahoo', start_date, end_date)
        df.to_pickle( output_file )
            
    return df

goog_data = load_financial_data( start_date='2001-01-01',
                                 end_date='2018-01-01',
                               )

from sklearn.model_selection import train_test_split

def create_train_split_group( X,y, split_ratio=0.8 ):
    # shufflebool, default=True
    # Whether or not to shuffle the data before splitting. 
    # If shuffle=False then stratify must be None.
    # stratify
    # https://blog.csdn.net/Linli522362242/article/details/103387527
    return train_test_split( X, Y, shuffle=False, # since the stock data is a kind of time-series data
                             train_size = split_ratio
                           )

X_train, X_test, Y_train, Y_test = create_train_split_group( X,Y, split_ratio=0.8 )

from sklearn import linear_model

# Fit the model
lasso = linear_model.Lasso( alpha=0.1 )

lasso.fit(X_train, Y_train)

# The coefficients
print( "Coefficients: \n", lasso.coef_ )

If the regularization parameter is increased to 0.6, the coefficients shrink much further to [
0. -0.00540562], and the first predictor gets assigned a weight of 0, meaning that predictor can be removed from the model. L1 regularization has this additional property of being able to shrink coefficients to 0, thus having the extra advantage of being useful for feature selection, in other words, it can shrink the model size by removing some predictors.

# # the difference between the close price tomorrow and the close price today 
# df['Target'] = df['Close'].shift(-1) - df['Close']
goog_data['Predicted_Signal'] = lasso.predict(X)
# Normal log returns         log( Close price today ) - log( Close price yesterday )
goog_data['GOOG_Returns'] = np.log( goog_data['Close']/goog_data['Close'].shift(1) )

def calculate_return( df, split_value, symbol ):
    cum_goog_return = df[split_value:][ "%s_Returns" % symbol ].cumsum() * 100
    # Calculates the log returns of the trading strategy 
    # given the prediction values and the benchmark log returns.
    #                        log actual return today   * Predicted return today    
    df['Strategy_Returns'] = df["%s_Returns" % symbol] * df['Predicted_Signal'].shift(1)
    # for classification
    # df['Strategy_Returns']=df["%s_Returns" % symbol] * np.sign( df['Predicted_Signal'].shift(1) )
    return cum_goog_return

def calculate_strategy_return( df, split_value, symbol ):
    cum_strategy_return = df[split_value:]['Strategy_Returns'].cumsum() * 100
    return cum_strategy_return

cum_goog_return = calculate_return( goog_data, 
                                    split_value=len(X_train), symbol='GOOG' )
cum_strategy_return = calculate_strategy_return( goog_data, 
                                                 split_value=len(X_train), symbol='GOOG' )

def plot_chart( cum_symbol_return, cum_strategy_return, symbol ):
    plt.figure( figsize=(15,6) )
    plt.plot( cum_symbol_return, label='%s Returns' % symbol )
    plt.plot( cum_strategy_return, label='Strategy Returns' )
    plt.axhline( y=0, ls='--', alpha=0.2)
    plt.legend()
    plt.show()
    
plot_chart( cum_goog_return, cum_strategy_return, symbol='GOOG' )

Now, let's apply Ridge regression to our dataset and observe the coefficients:

from sklearn import linear_model

# Fit the model
ridge = linear_model.Ridge( alpha=10000 )

ridge.fit(X_train, Y_train)

# The coefficients
print( "Coefficients: \n", ridge.coef_ )

Decision tree regression

The disadvantage of the regression methods we've seen so far is that they are all linear models, meaning they can only capture relationships between predictors and target variables if the underlying relationship between them is linear.

Decision tree regression can capture non-linear relationships, thus allowing for more complex models. Decision trees get their name because they are structured like an upside-down tree, with decision nodes or branches and result nodes or leaf nodes. We start at the root of the tree and then, at each step, we inspect the value of our predictors and pick a branch to follow to the next node. We continue following branches until we get to a leaf node and our final prediction is then the value of that leaf node. Decision trees can be used for classification or regression, but here, we will look at using it for regression only.

Creating predictive models using linear classification methods

In the first part of this chapter, we reviewed trading strategies based on regression machine learning algorithms. In this second part, we will focus on the classification of machine learning algorithms and another supervised machine learning method utilizing known datasets to make predictions. Instead of the output variable of the regression being a numerical (or continuous) value, the classification output is a categorical (or discrete value). We will use the same method as the regression analysis by finding the mapping function (f) such that whenever there is new input data (x), the output variable (y) for the dataset can be predicted.

In the following subsections, we will review three classification machine learning methods:

K-nearest neighbors
Support vector machine
Logistic regression

K-nearest neighborsK-nearest neighbors

K-nearest neighbors (or KNN) is a supervised method. Like the prior methods we saw in this chapter, the goal is to find a function predicting an output, y, from an unseen observation, x. Unlike a lot of other methods (such as linear regression), this method doesn't use any specific assumption about the distribution of the data (it is referred to as a non-parametric classifier).

There are many underlying assumptions for OLS in addition to the assumption that

the target variable is a linear combination of the feature values, such as
the independence of feature values themselves, and
normally distributed error terms.

The KNN algorithm is based on comparing a new observation to the K most similar instances. It can be defined as a distance metric between two data points. One of the most used frequently methods is the Euclidean distance. The following is the derivative:

d(x,y)=(x1−y1)^2+(x2−y2)^2+…+(xn−yn)^2

When we review the documentation of the Python function, KNeighborsClassifier, we can observe different types of parameters:

One of them is the parameter, p, which can pick the type of distance.

When p=1, the Manhattan distance is used. The Manhattan distance is the sum of the horizontal and vertical distances between two points.
When p=2, which is the default value, the Euclidean distance is used.
When p>2, this is the Minkowski distance, which is a generalization of the Manhattan and Euclidean methods. d(x,y)=(| x1−y1| ^p+| x2−y2| ^p+…+| xn−yn| ^p)^1/p.
http://scikitlearn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html.

The algorithm will calculate the distance between a new observation and all the training data. This new observation will belong to the group of K points that are the closest to this new observation. Then, condition probabilities will be calculated for each class. The new observation will be assigned to the class with the highest probability. The weakness of this method is the time to associate the new observation to a given group.

In the code, in order to implement this algorithm, we will use the functions we declared in the first part of this chapter:

1. Let's get the Google data from January 1, 2001 to January 1, 2018:

goog_data = load_financial_data( start_date = '2001-01-01',
                                 end_date = '2018-01-01',
                                 output_file = 'goog_data_large.pkl'
                               )
goog_data

2. We create the rule when the strategy will take a long position (+1) and a short position (-1), as shown in the following code:

def create_classification_trading_condition( df ):
    df['Open-Close'] = df.Open - df.Close
    df['High-Low'] = df.High - df.Low
    df = df.dropna( axis=0)
    X = df[ ['Open-Close', 'High-Low'] ]
    #             the close price tomorrow > the close price today 
    df['Target'] = Y = np.where( df['Close'].shift(-1) > df['Close'], 
                                 1, -1
                               )
    return (df, X,Y)

goog_data, X, Y = create_classification_trading_condition(goog_data)
goog_data

3. We prepare the training and testing dataset as shown in the following code:

X_train, X_test, Y_train, Y_test = create_train_split_group( X, Y, split_ratio=0.8 )
print( X_train.shape, X_test.shape, Y_train.shape, Y_test.shape )

4. In this example, we choose a KNN with K=15. We will train this model using the training dataset as shown in the following code:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

knn = KNeighborsClassifier( n_neighbors=15 )
knn.fit( X_train, Y_train )

accuracy_train = accuracy_score( Y_train, knn.predict(X_train) )
accuracy_test = accuracy_score( Y_test, knn.predict(X_test) )

print( accuracy_train, accuracy_test )

5. Once the model is created, we are going to predict whether the price goes up or down and store the values in the original data frame, as shown in the following code:

goog_data['Predicted_Signal'] = knn.predict(X)
goog_data

6. In order to compare the strategy using the KNN algorithm, we will use the return of the GOOG symbol without d, as shown in the following code:

#                            log( Close price today ) - log( Close price yesterday )
goog_data['GOOG_Returns'] = np.log( goog_data['Close']/goog_data['Close'].shift(1) )

def calculate_return( df, split_value, symbol ):
    cum_goog_return = df[split_value:][ "%s_Returns" % symbol ].cumsum() * 100
    # Calculates the log returns of the trading strategy 
    # given the prediction values and the benchmark log returns.
    #                        log actual return today   * Predicted return today    
    df['Strategy_Returns'] = df["%s_Returns" % symbol] * df['Predicted_Signal'].shift(1)
    # for classification
    # df['Strategy_Returns']=df["%s_Returns" % symbol] * np.sign( df['Predicted_Signal'].shift(1) )
    return cum_goog_return

def calculate_strategy_return( df, split_value ):
    cum_strategy_return = df[split_value:]['Strategy_Returns'].cumsum() * 100
    return cum_strategy_return

cum_goog_return = calculate_return( goog_data, split_value = len(X_train), symbol='GOOG' )
cum_strategy_return = calculate_strategy_return( goog_data, split_value = len(X_train) )

def plot_chart( cum_symbol_return, cum_strategy_return, symbol ):
    plt.figure( figsize=(15,6) )
    plt.plot( cum_symbol_return, label='%s Returns' % symbol )
    plt.plot( cum_strategy_return, label='Strategy Returns' )
    plt.axhline( y=0, ls='--', alpha=0.2)
    plt.legend()
    plt.show()
    
plot_chart( cum_goog_return, cum_strategy_return, symbol='GOOG' )

The simplified approach taken here does not account for transaction costs

Support vector machine

Support vector machine(SVMhttps://blog.csdn.net/Linli522362242/article/details/104151351) is a supervised machine learning method. As previously seen, we can use this method for regression, but also for classification. The principle of this algorithm is to find a hyper plane that separates the data into two classes.

Let's have a look at the following code, that implements the same:

from sklearn.svm import SVC

# Fit the model
svc = SVC()
svc.fit( X_train, Y_train )

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=- 1, decision_function_shape='ovr', break_ties=False, random_state=None) https://blog.csdn.net/Linli522362242/article/details/104280075
minimize

C float, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.
a smaller C value leads to a wider street but more margin violations（too many ξn OR too large, Note: we want to minimize the cost function or hinge loss）
a high C value the classifier makes fewer margin violations but ends up with a smaller margin

kernel {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’ Gaussian Radial Basis FunctionOR

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).
Figure 5-8. Similarity features using the Gaussian RBF https://blog.csdn.net/Linli522362242/article/details/104280075
It is a bell-shaped function varying from 0 (very far away from the landmark) to 1 (at the landmark). Now we are ready to compute the new features. For example, let's look at the instance x = –1: it is located at a distance of 1 from the first landmark(X1=-2), and 2 from the second landmark(X1=1). Therefore its new features are ≈ 0.74 and ≈ 0.30. The plot on the right of Figure 5-8 shows the transformed dataset (dropping the original features). As you can see, it is now linearly separable.

gamma γ {‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
- if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,
- if ‘auto’, uses 1 / n_features.
- Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

Figure 5-9. SVM classifiers using an RBF kernel

This model is represented on the bottom left of Figure 5-9. The other plots show models trained with different values of hyperparameters gamma (γ) and C. Increasing gamma(γ) makes the bell-shape curve narrower (see the left plot of Figure 5-8 ), and as a result each instance’s range of influence is smaller: the decision boundary ends up being more irregular, wiggling扭动 around individual instances. Conversely, a small gamma(γ) value makes the bell-shaped curve wider, so instances have a larger range of influence, and the decision boundary ends up smoother. So γ acts like a regularization hyperparameter:

if your model is overfitting, you should reduce it (reduces the model’s variance, increases the model’s bias), and
if it is underfitting, you should increase it (similar to the C hyperparameter).

# Forecast value
goog_data['Predicted_Signal'] = svc.predict(X)
#                            log( Close price today ) - log( Close price yesterday )
goog_data['GOOG_Returns'] = np.log( goog_data['Close']/goog_data['Close'].shift(1) )

cum_goog_return = calculate_return( goog_data, split_value = len(X_train), symbol='GOOG' )
cum_strategy_return = calculate_strategy_return( goog_data, split_value = len(X_train) )

goog_data

In this example, the following applies:

Instead of instantiating a class to create a KNN method, we used the SVC class.
The class constructor has several parameters adjusting the behavior of the method to the data you will work on.
The most important one is the parameter kernel. This defines the method of building the hyper plane.
In this example, we just use the default values of the constructor.

plot_chart( cum_goog_return, cum_strategy_return, symbol='GOOG' )

Logistic regression

Logistic regression is a supervised method that works for classification. Based on linear regression, logistic regression transforms its output using the logistic sigmoid, returning a probability value that maps different classes:
https://blog.csdn.net/Linli522362242/article/details/96480059 Note: the weight update is calculated based on all samples in the training set

In practical classification tasks, linear logistic regression and linear SVMs often yield very similar results. Logistic regression tries to maximize the conditional likelihoods of the training data, which makes it more prone to outliers than SVMs（比支持向量机更易于处理离群点）, which mostly care about the points that are closest to the decision boundary (support vectors). On the other hand, logistic regression has the advantage that it is a simpler model and can be implemented more easily. Furthermore, logistic regression models can be easily updated, which is attractive when working with streaming data.https://blog.csdn.net/Linli522362242/article/details/107755405

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression


from pandas_datareader import data
start_date = '2001-01-01'
end_date = '2018-01-01'
SRC_DATA_FILENAME='goog_data_large.pkl'

try:
    goog_data = pd.read_pickle(SRC_DATA_FILENAME)
    print('File data found...reading GOOG data')
except FileNotFoundError:
    print('File not found...downloading the GOOG data')
    goog_data = data.DataReader('GOOG', 'yahoo', start_date, end_date)
    goog_data.to_pickle(SRC_DATA_FILENAME)

goog_data['Open-Close']=goog_data.Open-goog_data.Close
goog_data['High-Low']=goog_data.High-goog_data.Low
goog_data=goog_data.dropna()
X=goog_data[['Open-Close','High-Low']]
Y=np.where(goog_data['Close'].shift(-1)>goog_data['Close'],1,-1)

split_ratio=0.8
split_value=int(split_ratio * len(goog_data))
X_train=X[:split_value]
Y_train=Y[:split_value]
X_test=X[split_value:]
Y_test=Y[split_value:]


logistic=LogisticRegression()
logistic.fit(X_train, Y_train)
accuracy_train = accuracy_score(Y_train, logistic.predict(X_train))
accuracy_test = accuracy_score(Y_test, logistic.predict(X_test))
print('Accuracy: ',accuracy_train, accuracy_test)


goog_data['Predicted_Signal']=logistic.predict(X)
goog_data['GOOG_Returns']=np.log(goog_data['Close']/goog_data['Close'].shift(1))


def calculate_return(df,split_value,symbol):
    cum_goog_return= df[split_value:]['%s_Returns' % symbol].cumsum() * 100
    df['Strategy_Returns']= df['%s_Returns' % symbol] * df['Predicted_Signal'].shift(1)
    return cum_goog_return

def calculate_strategy_return(df,split_value):
    cum_strategy_return = df[split_value:]['Strategy_Returns'].cumsum() * 100
    return cum_strategy_return

cum_goog_return=calculate_return(goog_data,split_value=len(X_train),symbol='GOOG')
cum_strategy_return= calculate_strategy_return(goog_data,split_value=len(X_train))


def plot_shart(cum_symbol_return, cum_strategy_return, symbol):
    plt.figure(figsize=(10,5))
    plt.plot(cum_symbol_return, label='%s Returns' % symbol)
    plt.plot(cum_strategy_return,label='Strategy Returns')
    plt.legend()
    plt.show()

plot_shart(cum_goog_return, cum_strategy_return,symbol='GOOG')

def sharpe_ratio(symbol_returns, strategy_returns):
    strategy_std=strategy_returns.std()
    sharpe=(strategy_returns-symbol_returns)/strategy_std
    return sharpe.mean()

print('Sharpe Ratio: ',sharpe_ratio(cum_strategy_return,cum_goog_return))

Summary

In this chapter, we got a basic understanding of how to use machine learning in trading. We started with going through the essential terminology and notation. We learned to create predictive models that predict price movement using linear regression methods. We built several codes using Python's scikit-learn library. We saw how to create predictive models that predict buy and sell signals using linear classification methods. We also demonstrated how to apply these machine learning methods to a simple trading strategy. We also went through the tools that we can use to create a trading strategy. The next chapter will introduce trading rules that can help to improve your trading strategies.

你可能感兴趣的:(python,大数据,人工智能)

Python中的 redis keyspace 通知_python 操作redis psubscribe(‘__keyspace@0__ ‘) 2301_82243733 程序员 python 学习面试
最后Python崛起并且风靡，因为优点多、应用领域广、被大牛们认可。学习Python门槛很低，但它的晋级路线很多，通过它你能进入机器学习、数据挖掘、大数据，CS等更加高级的领域。Python可以做网络应用，可以做科学计算，数据分析，可以做网络爬虫，可以做机器学习、自然语言处理、可以写游戏、可以做桌面应用…Python可以做的很多，你需要学好基础，再选择明确的方向。这里给大家分享一份全套的Pytho
Python数据分析与可视化程序媛小果 python python 数据分析开发语言
Python数据分析与可视化在数据驱动的商业世界中，数据分析和可视化成为了理解复杂数据集、做出明智决策的关键工具。Python，作为一种功能强大且易于学习的编程语言，提供了丰富的库和框架，使得数据分析和可视化变得简单高效。本文将探讨Python在数据分析和可视化中的应用，包括数据预处理、分析、以及如何通过可视化工具将数据洞察转化为可操作的策略。1.数据分析的重要性数据分析是提取数据中有用信息的过程
【Python 学习 / 7】模块与文件操作卜及中 Python基础 python 学习数据库
文章目录前言一、导入模块1.导入整个模块2.导入模块中的特定函数3.给模块或函数起别名二、常用模块1.`math`模块2.`random`模块3.`os`模块4.`sys`模块三、文件处理1.打开文件2.读取文件3.写入文件4.关闭文件5.使用`with`语句管理文件四、日期时间1.`datetime`模块获取当前日期和时间创建日期和时间对象格式化日期和时间解析字符串为日期对象2.`time`模块
YashanDB访问约束数据库
本文内容来自YashanDB官网，原文内容请见https://doc.yashandb.com/yashandb/23.3/zh/%E6%A6%82%E5%BF%B5%...访问约束是YashanDB特有的一种关系数据结构，基于有界计算理论的访问约束模型（AC，AccessConstraint）实现：通过在数据源上建立AC，实现大数据变小的模型变换。在查询时，通过访问AC数据，缩小查询代价和提升查
探索天气预警API：精准预测，守护安全 api
引言在当今这个快速变化的世界中，天气的波动直接影响着人们的日常生活、农业生产、交通出行乃至公共安全。为了有效应对各种极端天气事件，天气预警API应运而生，成为连接气象数据与公众服务的重要桥梁。本文将深入探讨天气预警API的工作原理、应用场景以及其对社会的积极影响。天气预警API的工作原理天气预警API基于先进的气象监测技术和大数据分析，通过收集全球范围内的气象卫星、雷达、地面观测站等数据源，进行实
知识图谱构建概念、工具、实例调研熟悉的黑曼巴知识图谱人工智能
一、知识图谱的概念知识图谱（Knowledgegraph）知识图谱是一种用图模型来描述知识和建模世界万物之间的关联关系的技术方法。知识图谱由节点和边组成。节点可以是实体，如一个人、一本书等，或是抽象的概念，如人工智能、知识图谱等。边可以是实体的属性，如姓名、书名或是实体之间的关系，如朋友、配偶。知识图谱的早期理念来自SemanticWeb（语义网络），其最初理想是把基于文本链接的万维网落转化为基于
经销商管理系统架构设计方案（附 Java版本和Python版本源代码详解） AI天才研究院 DeepSeek R1 &大数据AI人工智能大模型 AI大模型企业级应用开发实战 AI大模型应用入门实战与进阶计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
经销商管理系统架构设计方案（Java实现源代码详解）关键词：经销商管理系统，Java，SpringBoot，MyBatis，MySQL，架构设计，源代码1.背景介绍随着市场竞争的日益激烈，企业对经销商的管理越来越重视。传统的经销商管理方式效率低下，信息滞后，难以适应现代企业的发展需求。为了提高经销商管理效率，降低运营成本，越来越多的企业开始采用信息化的手段来管理经销商，而经销商管理系统应运而生。经
Python:数据从Excel表格链接到Word文档更新Excel即可自动更新Word 一个花生米生花 python excel word
要使用Python来创建或更新一个Word文档，并将数据从Excel表格链接到Word文档中，你可以使用python-docx库来操作Word文档和openpyxl或pandas库来读取Excel文件。不过，需要注意的是，python-docx库并不支持将外部文件链接到Word文档的功能。你可以在Word文档中插入Excel数据的快照，但它们不会自动更新。如果你想要在Word文档中插入Excel数
【deepseek与chatGPT辩论】辩论题： “人工智能是否应当具备自主决策能力？” 海宁不掉头发软件工程人工智能人工智能 chatgpt deepseek
探讨辩论题这个提案涉及创建一个精确的辩论题目，旨在测试deepseek的应答能力。创建辩论题目提议设计一个辩论题目以测试deepseek的应答能力。希望这个题目具有挑战性并能够测量其回应质量。好的，来一道适合深度学习的辩论题：辩论题：“人工智能是否应当具备自主决策能力？”这个话题涉及到人工智能的发展、伦理以及未来应用，可以从以下几个方面展开辩论：支持方：认为人工智能的自主决策能力能够加速科技进步，
使用Odoo Shell卸载模块 odoo中国 odoo odoo 开源软件 erp
使用OdooShell卸载模块我们在Odoo使用过程中，因为模块安装错误或者前端错误等导致odoo无法通过界面登录，这时候你可以使用OdooShell来卸载模块。OdooShell是一个交互式Pythonshell，允许你直接与Odoo数据库和模型进行交互。以下是使用OdooShell卸载模块的详细步骤：步骤1：启动OdooShell要启动OdooShell，你需要在终端中运行以下命令。确保你已经
NumPy的基本使用 Mo思编程学习 numpy python 开发语言 pip
在Python的数据科学与数值计算领域，NumPy无疑是一颗耀眼的明星。作为Python中用于科学计算的基础库，NumPy提供了高效的多维数组对象以及处理这些数组的各种工具。本文将带您深入了解NumPy的基本使用，感受它的强大魅力。一、安装与导入在使用NumPy之前，首先要确保它已经安装在您的Python环境中。如果您使用的是Anaconda发行版，NumPy通常已经预装。若未安装，可以使用如下命
FOKS-TROT: 一个高效、易用的全功能开源知识图谱生成工具柳旖岭
FOKS-TROT:一个高效、易用的全功能开源知识图谱生成工具项目简介FOKS-TROT是一个基于Python的全功能开源知识图谱生成工具，旨在帮助研究人员和开发者快速构建具有丰富信息的知识图谱。该项目由hkx3upper在GitCode上开发并维护。通过FOKS-TROT，您可以轻松地将各种数据源（如文本文件、数据库、API）转换为结构化的知识图谱，并对其进行可视化分析和机器学习任务。此外，该工
python实现word文档合并 v2.0 task138 python自动化 python 自动化运维开发
目录前言要求运行效果脚本下载链接前言之前发表了一个小工具，python用于合并word文档以完成特定的工作任务，现在领导给出了新需求，适当的调整了一下word文档的合并情况。同时，各位同事反馈说，环境部署太难了，脚本的使用成本比较高，难度大，所以我这次把脚本打包成一个EXE可执行文件，直接双击即可使用。要求由于脚本的具体逻辑发生了变化，因此，exe文件的同级目录下，一定要存在一个txt文件，否则无
GenAI 平台，3 分钟即可构建基于 Claude、DeepSeek 的 AI Agent DO_Community 人工智能
DigitalOcean云服务在前不久发布了GenAI平台——一个让任何团队都能在几分钟内构建和部署AI代理的平台。DigitalOcean的GenAI平台持续扩展，让人工智能驱动的开发变得更加易用、灵活且强大。近日，Digitalocean宣布将Anthropic的Claude模型和DeepSeekR1引入Digitalocean的生态系统，为你提供更多构建和部署AI应用的选择。通过Anthro
智享AI直播三代系统，马斯克旗下AI人工智能直播工具,媲美DeepSeek！ V__17671155793 人工智能
智享AI直播三代系统，马斯克旗下AI人工智能直播工具,媲美DeepSeek！在科技飞速发展的当下，人工智能正以前所未有的态势重塑着各个行业的格局。直播领域，作为信息传播与商业交互的前沿阵地，也在AI技术的赋能下迎来了颠覆性的变革。其中，马斯克旗下的智享AI直播三代系统宛如一颗璀璨的新星，横空出世，以其卓越的性能和创新的理念，迅速在竞争激烈的直播市场中崭露头角，甚至被业界誉为可媲美DeepSeek的
2025年全国CTF夺旗赛-从零基础入门到竞赛，看这一篇就稳了！白帽安全-黑客4148 安全 web安全网络网络安全 CTF
目录一、CTF简介二、CTF竞赛模式三、CTF各大题型简介四、CTF学习路线4.1、初期1、html+css+js（2-3天）2、apache+php（4-5天）3、mysql（2-3天）4、python(2-3天)5、burpsuite（1-2天）4.2、中期1、SQL注入（7-8天）2、文件上传（7-8天）3、其他漏洞（14-15天）4.3、后期五、CTF学习资源5.1、CTF赛题复现平台5.
2025年全国CTF夺旗赛-从零基础入门到竞赛，看这一篇就稳了！白帽安全-黑客4148 网络安全 web安全 linux 密码学 CTF
目录一、CTF简介二、CTF竞赛模式三、CTF各大题型简介四、CTF学习路线4.1、初期1、html+css+js（2-3天）2、apache+php（4-5天）3、mysql（2-3天）4、python(2-3天)5、burpsuite（1-2天）4.2、中期1、SQL注入（7-8天）2、文件上传（7-8天）3、其他漏洞（14-15天）4.3、后期五、CTF学习资源5.1、CTF赛题复现平台5.
基于python深度学习遥感影像地物分类与目标识别、分割实践技术应用 xiao5kou4chang6kai4 深度学习遥感勘测 python 深度学习分类
专题一：深度学习发展与机器学习深度学习的历史发展过程机器学习，深度学习等任务的基本处理流程梯度下降算法讲解不同初始化，学习率对梯度下降算法的实例分析从机器学习到深度学习算法专题二深度卷积网络、卷积神经网络、卷积运算的基本原理池化操作，全连接层，以及分类器的作用BP反向传播算法的理解一个简单CNN模型代码理解特征图，卷积核可视化分析专题三TensorFlow与keras介绍与入门TensorFlow
python 快速实现链接转 word 文档嘿嘿潶黑黑 python word
python快速实现链接转word文档演示代码展示最后演示代码展示fromnewspaperimportArticlefromdocximportDocumentfromdocx.sharedimportPt,RGBColorfromdocx.enum.styleimportWD_STYLE_TYPEfromdocx.oxml.nsimportqn#tkinterGUIimporttkintera
DeepSeek与ChatGPT：会取代搜索引擎和人工客服的人工智能革命云边有个稻草人热门文章 chatgpt 搜索引擎人工智能 DeepSeek
云边有个稻草人-CSDN博客在众多创新技术中，DeepSeek和ChatGPT无疑是最为引人注目的。它们通过强大的搜索和对话生成能力，能够改变我们与计算机交互的方式，帮助我们高效地获取信息，增强智能服务。本文将深入探讨这两项技术如何结合使用，为用户提供更精准、更流畅的对话和搜索体验。目录一、介绍1.1什么是DeepSeek？1.2什么是ChatGPT？1.3DeepSeek与ChatGPT的结合：
Python入门笔记「已注销」计算机
文章目录第0周课程导学第1周Python基本语法元素保留字数据类型语句与函数输入函数第2周Python基本图形绘制turtle库绝对坐标海龟坐标turtle角度坐标体系RGB色彩体系画笔控制函数运动控制函数方向控制函数循环语句第3周基本数据类型整型浮点数科学计数法复数类型数值运算操作符二元操作符有对应的增强赋值操作符数值运算函数字符串类型的表示字符串切片字符串类型及操作字符串类型格式化time库时
pythonxml模块高级用法_Python minidom模块用法示例【DOM写入和解析XML】 Lucy-露西娅 pythonxml模块高级用法
本文实例讲述了Pythonminidom模块用法。分享给大家供大家参考，具体如下：一、DOM写XML文件#-*-coding:utf-8-*-#!python3#导入minidomfromxml.domimportminidom#1.创建DOM树对象dom=minidom.Document()#2.创建根节点。每次都要用DOM对象来创建任何节点。root_node=dom.createElemen
LLM与知识图谱融合:智能运维知识库构建 AI天才研究院 DeepSeek R1 &大数据AI人工智能大模型 AI大模型企业级应用开发实战 AI实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍随着信息技术的飞速发展，IT运维管理面临着越来越大的挑战。海量的设备、复杂的网络环境、日益增长的数据量，使得传统的运维方式难以满足需求。为了提高运维效率和质量，智能运维应运而生。智能运维的核心是将人工智能技术应用于运维领域，通过机器学习、深度学习等算法，实现自动化、智能化的运维管理。其中，大语言模型（LLM）和知识图谱是两个重要的技术方向。LLM能够理解和生成自然语言，可以用于构建智能
React 渲染 Flash 接口数据 ox0080 #北漂+滴滴出行 VIP 激励 Web react.js 前端前端框架
1.后端Python代码使用Flask创建多个接口，每个接口返回不同的数据，并使用自定义装饰器来绑定路由。代码：#app.pyfromflaskimportFlask,jsonifyapp=Flask(__name__)defapi_route(route,methods=['GET']):"""自定义装饰器，用于将函数与HTTP路由绑定"""defdecorator(func):app.rout
LQB---基础练习---十六进制转八进制「已注销」 #LQB LQB
试题基础练习十六进制转八进制资源限制内存限制：512.0MBC/C++时间限制：1.0sJava时间限制：3.0sPython时间限制：5.0s问题描述给定n个十六进制正整数，输出它们对应的八进制数。输入格式输入的第一行为一个正整数n（1<=n<=10）。接下来n行，每行一个由09、大写字母AF组成的字符串，表示要转换的十六进制正整数，每个十六进制数长度不超过100000。输出格式输出n行，每行为
【2025年】全国CTF夺旗赛-从零基础入门到竞赛，看这一篇就稳了！网安詹姆斯 web安全 CTF 网络安全大赛 python linux
【2025年】全国CTF夺旗赛-从零基础入门到竞赛，看这一篇就稳了！基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包目录一、CTF简介二、CTF竞赛模式三、CTF各大题型简介四、CTF学习路线4.1、初期1、html+css+js（2-3天）2、apache+php（4-5天）3、mysql（2-3天）4、python(2-3天)5、burpsuite（1-2天）4.2、中期1、S
机器学习·文本数据读写处理 AAA顶置摸鱼 python 深度学习机器学习人工智能数据处理
前言在自然语言处理的第一步，需要面对的是各种各样以不同形式表现的文本数据，比如，txt、Excel中的表格数据，还有无法直接打开的pkl文件等。针对这些不同类型的数据，可以基于Python中的基本功能函数或者调用某些库进行读写以及作一些基本的处理。一、文本数据读写方法1.读写TXT文件读取方法：read()：读取整个文件，返回字符串。readline()：逐行读取，返回字符串。readlines(
LQB（4）-python-DFS搜索 AAA顶置摸鱼蓝桥杯python组深度优先算法 python 蓝桥杯
前言DFS即深度优先搜索（Depth-FirstSearch），是一种用于遍历或搜索树或图的算法，有三种核心的应用场景（基础遍历、回溯、剪枝）。一、DFS-基础遍历1.核心原理深度优先搜索（DFS）是一种遍历或搜索树/图的算法，优先沿着一条路径尽可能深入，直到无法继续再回溯。实现方式：递归：隐式利用系统调用栈。栈模拟：显式使用栈数据结构。2.代码实现(1)递归实现（树结构）classTreeNod
Python中LLM的知识图谱构建：动态更新与推理二进制独立开发 GenAI与Python 非纯粹GenAI python 知识图谱开发语言自然语言处理人工智能分布式机器学习
文章目录引言1.知识图谱的基本概念1.1知识图谱的定义1.2知识图谱的构建流程2.利用LLM进行知识抽取2.1实体识别2.2关系抽取2.3属性抽取3.知识融合3.1实体对齐3.2冲突消解4.知识存储5.知识推理5.1规则推理5.2基于LLM的推理6.动态更新6.1增量更新6.2实时更新7.结论引言随着人工智能技术的飞速发展，知识图谱（KnowledgeGraph,KG）作为一种结构化的知识表示方法
Python's SQLAlchemy and Object-Relational Mapping zhanglizhuo Python
Acommontaskwhenprogramminganywebserviceistheconstructionofasoliddatabasebackend.Inthepast,programmerswouldwriterawSQLstatements,passthemtothedatabaseengineandparsethereturnedresultsasanormalarrayofrec
java解析APK 3213213333332132 java apk linux 解析APK
解析apk有两种方法 1、结合安卓提供apktool工具，用java执行cmd解析命令获取apk信息 2、利用相关jar包里的集成方法解析apk 这里只给出第二种方法，因为第一种方法在linux服务器下会出现不在控制范围之内的结果。 public class ApkUtil { /** * 日志对象 */ private static Logger
nginx自定义ip访问N种方法 ronin47 nginx 禁止ip访问
　　　因业务需要，禁止一部分内网访问接口，　由于前端架了F5，直接用deny或allow是不行的，这是因为直接获取的前端Ｆ５的地址。　　　所以开始思考有哪些主案可以实现这样的需求，目前可实施的是三种：　　　一：把ip段放在redis里，写一段lua 二：利用geo传递变量，写一段
mysql timestamp类型字段的CURRENT_TIMESTAMP与ON UPDATE CURRENT_TIMESTAMP属性 dcj3sjt126com mysql
timestamp有两个属性，分别是CURRENT_TIMESTAMP 和ON UPDATE CURRENT_TIMESTAMP两种，使用情况分别如下： 1. CURRENT_TIMESTAMP 当要向数据库执行insert操作时，如果有个timestamp字段属性设为 CURRENT_TIMESTAMP，则无论这
struts2+spring+hibernate分页显示 171815164 Hibernate
分页显示一直是web开发中一大烦琐的难题，传统的网页设计只在一个JSP或者ASP页面中书写所有关于数据库操作的代码，那样做分页可能简单一点，但当把网站分层开发后，分页就比较困难了，下面是我做Spring+Hibernate+Struts2项目时设计的分页代码，与大家分享交流。　　1、DAO层接口的设计，在MemberDao接口中定义了如下两个方法： public in
构建自己的Wrapper应用 g21121 rap
我们已经了解Wrapper的目录结构，下面可是正式利用Wrapper来包装我们自己的应用，这里假设Wrapper的安装目录为:/usr/local/wrapper。首先，创建项目应用 &nb
[简单]工作记录_多线程相关 53873039oycg 多线程
最近遇到多线程的问题,原来使用异步请求多个接口(n*3次请求) 方案一使用多线程一次返回数据,最开始是使用5个线程,一个线程顺序请求3个接口,超时终止返回缺点测试发现必须3个接
调试jdk中的源码，查看jdk局部变量程序员是怎么炼成的 jdk 源码
转自：http://www.douban.com/note/211369821/ 学习jdk源码时使用-- 学习java最好的办法就是看jdk源代码，面对浩瀚的jdk（光源码就有40M多，比一个大型网站的源码都多）从何入手呢，要是能单步调试跟进到jdk源码里并且能查看其中的局部变量最好了。可惜的是sun提供的jdk并不能查看运行中的局部变量
Oracle RAC Failover 详解 aijuans oracle
Oracle RAC 同时具备HA(High Availiablity) 和LB(LoadBalance). 而其高可用性的基础就是Failover(故障转移). 它指集群中任何一个节点的故障都不会影响用户的使用，连接到故障节点的用户会被自动转移到健康节点，从用户感受而言，是感觉不到这种切换。 Oracle 10g RAC 的Failover 可以分为3种： 1. Client-Si
form表单提交数据编码方式及tomcat的接受编码方式 antonyup_2006 JavaScript tomcat 浏览器互联网 servlet
原帖地址：http://www.iteye.com/topic/266705 form有2中方法把数据提交给服务器，get和post,分别说下吧。（一）get提交 1.首先说下客户端（浏览器）的form表单用get方法是如何将数据编码后提交给服务器端的吧。对于get方法来说，都是把数据串联在请求的url后面作为参数，如：http://localhost:
JS初学者必知的基础百合不是茶 js函数 js入门基础
JavaScript是网页的交互语言,实现网页的各种效果, JavaScript 是世界上最流行的脚本语言。 JavaScript 是属于 web 的语言，它适用于 PC、笔记本电脑、平板电脑和移动电话。 JavaScript 被设计为向 HTML 页面增加交互性。许多 HTML 开发者都不是程序员，但是 JavaScript 却拥有非常简单的语法。几乎每个人都有能力将小的
iBatis的分页分析与详解 bijian1013 java ibatis
分页是操作数据库型系统常遇到的问题。分页实现方法很多，但效率的差异就很大了。iBatis是通过什么方式来实现这个分页的了。查看它的实现部分，发现返回的PaginatedList实际上是个接口，实现这个接口的是PaginatedDataList类的对象，查看PaginatedDataList类发现，每次翻页的时候最
精通Oracle10编程SQL(15)使用对象类型 bijian1013 oracle 数据库 plsql
/* *使用对象类型 */ --建立和使用简单对象类型 --对象类型包括对象类型规范和对象类型体两部分。 --建立和使用不包含任何方法的对象类型 CREATE OR REPLACE TYPE person_typ1 as OBJECT( name varchar2(10),gender varchar2(4),birthdate date ); drop type p
【Linux命令二】文本处理命令awk bit1129 linux命令
awk是Linux用来进行文本处理的命令，在日常工作中，广泛应用于日志分析。awk是一门解释型编程语言，包含变量，数组，循环控制结构，条件控制结构等。它的语法采用类C语言的语法。 awk命令用来做什么？ 1.awk适用于具有一定结构的文本行，对其中的列进行提取信息 2.awk可以把当前正在处理的文本行提交给Linux的其它命令处理，然后把直接结构返回给awk 3.awk实际工
JAVA(ssh2框架)+Flex实现权限控制方案分析白糖_ java
目前项目使用的是Struts2+Hibernate+Spring的架构模式，目前已经有一套针对SSH2的权限系统，运行良好。但是项目有了新需求：在目前系统的基础上使用Flex逐步取代JSP，在取代JSP过程中可能存在Flex与JSP并存的情况，所以权限系统需要进行修改。【SSH2权限系统的实现机制】权限控制分为页面和后台两块：不同类型用户的帐号分配的访问权限是不同的，用户使
angular.forEach boyitech AngularJS AngularJS API angular.forEach
angular.forEach 描述: 循环对obj对象的每个元素调用iterator, obj对象可以是一个Object或一个Array. Iterator函数调用方法: iterator(value, key, obj), 其中obj是被迭代对象，key是obj的property key或者是数组的index，value就是相应的值啦. (此函数不能够迭代继承的属性.)
java-谷歌面试题-给定一个排序数组，如何构造一个二叉排序树 bylijinnan 二叉排序树
import java.util.LinkedList; public class CreateBSTfromSortedArray { /** * 题目:给定一个排序数组，如何构造一个二叉排序树 * 递归 */ public static void main(String[] args) { int[] data = { 1, 2, 3, 4,
action执行2次 Chen.H JavaScript jsp XHTML css Webwork
xwork 写道 <action name="userTypeAction" class="com.ekangcount.website.system.view.action.UserTypeAction"> <result name="ssss" type="dispatcher">
[时空与能量]逆转时空需要消耗大量能源 comsci 能源
无论如何,人类始终都想摆脱时间和空间的限制....但是受到质量与能量关系的限制,我们人类在目前和今后很长一段时间内,都无法获得大量廉价的能源来进行时空跨越..... 在进行时空穿梭的实验中,消耗超大规模的能源是必然
oracle的正则表达式(regular expression)详细介绍 daizj oracle 正则表达式
正则表达式是很多编程语言中都有的。可惜oracle8i、oracle9i中一直迟迟不肯加入，好在oracle10g中终于增加了期盼已久的正则表达式功能。你可以在oracle10g中使用正则表达式肆意地匹配你想匹配的任何字符串了。正则表达式中常用到的元数据(metacharacter)如下： ^ 匹配字符串的开头位置。 $ 匹配支付传的结尾位置。 *
报表工具与报表性能的关系 datamachine 报表工具 birt 报表性能润乾报表
在选择报表工具时，性能一直是用户关心的指标，但是，报表工具的性能和整个报表系统的性能有多大关系呢？要回答这个问题，首先要分析一下报表的处理过程包含哪些环节，哪些环节容易出现性能瓶颈，如何优化这些环节。一、报表处理的一般过程分析 1、用户选择报表输入参数后，报表引擎会根据报表模板和输入参数来解析报表，并将数据计算和读取请求以SQL的方式发送给数据库。 2、
初一上学期难记忆单词背诵第一课 dcj3sjt126com word english
what 什么 your 你 name 名字 my 我的 am 是 one 一 two 二 three 三 four 四 five 五 class 班级，课 six 六 seven 七 eight 八 nince 九 ten 十 zero 零 how 怎样 old 老的 eleven 十一 twelve 十二 thirteen
我学过和准备学的各种技术 dcj3sjt126com 技术
语言VB https://msdn.microsoft.com/zh-cn/library/2x7h1hfk.aspxJava http://docs.oracle.com/javase/8/C# https://msdn.microsoft.com/library/vstudioPHP http://php.net/manual/en/Html
struts2中token防止重复提交表单蕃薯耀重复提交表单 struts2中token
struts2中token防止重复提交表单 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年7月12日 11:52:32 星期日 ht
线性查找二维数组 hao3100590 二维数组
1.算法描述有序（行有序，列有序，且每行从左至右递增，列从上至下递增）二维数组查找，要求复杂度O(n) 2.使用到的相关知识：结构体定义和使用，二维数组传递（http://blog.csdn.net/yzhhmhm/article/details/2045816） 3.使用数组名传递这个的不便之处很明显，一旦确定就是不能设置列值 //使
spring security 3中推荐使用BCrypt算法加密密码 jackyrong Spring Security
spring security 3中推荐使用BCrypt算法加密密码了，以前使用的是md5， Md5PasswordEncoder 和 ShaPasswordEncoder，现在不推荐了，推荐用bcrpt Bcrpt中的salt可以是随机的，比如： int i = 0; while (i < 10) { String password = "1234
学习编程并不难,做到以下几点即可! lampcy java html 编程语言
不论你是想自己设计游戏，还是开发iPhone或安卓手机上的应用，还是仅仅为了娱乐，学习编程语言都是一条必经之路。编程语言种类繁多，用途各异，然而一旦掌握其中之一，其他的也就迎刃而解。作为初学者，你可能要先从Java或HTML开始学，一旦掌握了一门编程语言，你就发挥无穷的想象，开发各种神奇的软件啦。 1、确定目标学习编程语言既充满乐趣，又充满挑战。有些花费多年时间学习一门编程语言的大学生到
架构师之mysql----------------用group+inner join,left join ,right join 查重复数据（替代in) nannan408 right join
1.前言。如题。 2.代码 (1)单表查重复数据,根据a分组 SELECT m.a,m.b, INNER JOIN （select a,b,COUNT(*) AS rank FROM test.`A` A GROUP BY a HAVING rank>1 )k ON m.a=k.a （2）多表查询，使用改为le
jQuery选择器小结 VS 节点查找（附css的一些东西） Everyday都不同 jquery css name选择器追加元素查找节点
最近做前端页面，频繁用到一些jQuery的选择器，所以特意来总结一下：测试页面： <html> <head> <script src="jquery-1.7.2.min.js"></script> <script> /*$(function() { $(documen
关于EXT tntxia ext
ExtJS是一个很不错的Ajax框架，可以用来开发带有华丽外观的富客户端应用，使得我们的b/s应用更加具有活力及生命力。ExtJS是一个用 javascript编写，与后台技术无关的前端ajax框架。因此，可以把ExtJS用在.Net、Java、Php等各种开发语言开发的应用中。 ExtJs最开始基于YUI技术，由开发人员Jack
一个MIT计算机博士对数学的思考 xjnine Math
在过去的一年中，我一直在数学的海洋中游荡，research进展不多，对于数学世界的阅历算是有了一些长进。为什么要深入数学的世界？作为计算机的学生，我没有任何企图要成为一个数学家。我学习数学的目的，是要想爬上巨人的肩膀，希望站在更高的高度，能把我自己研究的东西看得更深广一些。说起来，我在刚来这个学校的时候，并没有预料到我将会有一个深入数学的旅程。我的导师最初希望我去做的题目，是对appe