深度学习机器人交易

There are so many articles on predicting stock prices, but this article provides two things to the reader, that no other article talks about:

关于预测股价的文章太多，但是本文向读者提供了两件事，而没有其他文章在谈论：

The use of confidence intervals in stock trading to determine stop-loss and take-profit
在股票交易中使用置信区间确定止损和获利
The use of alpaca in stock trading to track profits and test trading strategies
羊驼在股票交易中的使用以跟踪利润并测试交易策略

Both of which provide important tools to the next generation of machine learning trading algorithms.

两者都为下一代机器学习交易算法提供了重要的工具。

概念： (Concept:)

The program will consist of three main parts:

该计划将包括三个主要部分：

The Data Setup
数据设置

The data will be accessed via the yfinance library, in daily intervals, The data will include the opening,high,low and closing price of the asset. The data will be normalized and then reshaped to fit the neural network

该数据将每天通过yfinance库进行访问，该数据将包括资产的开仓价，最高价，最低价和收盘价。数据将被规范化，然后重塑以适合神经网络

The Neural Network
神经网络

The neural network will be a convolutional LSTM network that can extract the feature and also access temporal features of the dataset. This network fits the data because some of the complex patterns are not only convolutional, they are also time based.

神经网络将是一个卷积LSTM网络，可以提取特征并访问数据集的时间特征。该网络适合数据，因为某些复杂模式不仅是卷积的，而且它们都是基于时间的。

Creating Orders
创建订单

The Neural Network will predict the daily opening and closing prices. If the opening prices is larger than the closing price, the network will short sell the stock. If the closing price is larger than the opening price, the network will buy the stock.

神经网络将预测每日开盘和收盘价。如果开盘价大于收盘价，则网络将卖空股票。如果收盘价大于开盘价，则网络将购买股票。

After training the network, I will compute the loss of the network and use this value as a a confidence interval, to determine the stop loss and take profit values. I will use requests to access the Alpaca API to make orders.

训练完网络后，我将计算网络的损耗并将此值用作一个置信区间，以确定止损并获利。我将使用请求访问Alpaca API进行订单。

With the key concept in place let’s move to the code.

有了关键概念，让我们转到代码。

代码： (The Code:)

步骤1 | 先决条件： (Step 1| Prerequisites:)

from numpy import array
from numpy import hstack
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras import callbacks
from sklearn.model_selection import train_test_split
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from IPython.display import clear_output
import datetime
import statistics
import time 
import os
import json
import yfinance as yf
from keras.models import model_from_json
import requests
from keras.models import load_model
from matplotlib import pyplot as plt

There are quite a lot of prerequisties of the program.They are spread out as so, to prevent importing the whole library and taking up space. Please note that importing clear_output from IPython.display is only for Jupyter notebooks. If you are using scripts it is not necessary to import this.

该程序有很多先决条件，它们因此分散开来，以防止导入整个库并占用空间。请注意，从IPython.display导入clear_output仅适用于Jupyter笔记本。如果使用脚本，则无需导入。

步骤2 | 访问数据： (Step 2| Access Data:)

def data_setup(symbol,data_len,seq_len):
    end = datetime.datetime.today().strftime('%Y-%m-%d')
    start = datetime.datetime.strptime(end, '%Y-%m-%d') - datetime.timedelta(days=(data_len/0.463))
    orig_dataset = yf.download(symbol,start,end)
    close = orig_dataset['Close'].values
    open_ = orig_dataset['Open'].values
    high = orig_dataset['High'].values
    low = orig_dataset['Low'].values
    dataset,minmax = normalize_data(orig_dataset)
    cols = dataset.columns.tolist()
    data_seq = list()
    for i in range(len(cols)):
        if cols[i] < 4:
            data_seq.append(dataset[cols[i]].values)
            data_seq[i] = data_seq[i].reshape((len(data_seq[i]), 1))
    data = hstack(data_seq)
    n_steps = seq_len
    X, y = split_sequences(data, n_steps)
    n_features = X.shape[2]
    n_seq = len(X)
    n_steps = seq_len
    print(X.shape)
    X = X.reshape((n_seq,1, n_steps, n_features))
    true_y = []
    for i in range(len(y)):
        true_y.append([y[i][0],y[i][1]])
    return X,array(true_y),n_features,minmax,n_steps,close,open_,high,low

This function takes data from yfinance and splits it into its respective sections. It also reshapes data into the form:

此功能从yfinance提取数据并将其拆分为各个部分。它还将数据重塑为以下形式：

(n_seq,1, n_steps, n_features)

A four-dimensional array to fit the Convolutional LSTM network.

适应卷积LSTM网络的四维数组。

步骤3 | 准备数据： (Step 3| Prepare Data:)

Accessing the data is only half of the challenge. The rest is putting the data into the correct format, and splitting the data into training and testing datasets.

访问数据仅是挑战的一半。剩下的就是将数据放入正确的格式，并将数据拆分为训练和测试数据集。

def split_sequences(sequences, n_steps):
        X, y = list(), list()
        for i in range(len(sequences)):
            end_ix = i + n_steps
            if end_ix > len(sequences)-1:
                break
            seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
            X.append(seq_x)
            y.append(seq_y)
        return array(X), array(y)

This function splits the sequence into time series data, by splitting the sequence into chunks of size n_steps.

此函数通过将序列拆分为大小为n_steps的块，将序列拆分为时间序列数据。

def normalize_data(dataset):
        cols = dataset.columns.tolist()
        col_name = [0]*len(cols)
        for i in range(len(cols)):
            col_name[i] = i
        dataset.columns = col_name
        dtypes = dataset.dtypes.tolist()
#         orig_answers = dataset[attr_row_predict].values
        minmax = list()
        for column in dataset:
            dataset = dataset.astype({column: 'float32'})
        for i in range(len(cols)):
            col_values = dataset[col_name[i]]
            value_min = min(col_values)
            value_max = max(col_values)
            minmax.append([value_min, value_max])
        for column in dataset:
            values = dataset[column].values
            for i in range(len(values)):
                values[i] = (values[i] - minmax[column][0]) / (minmax[column][1] - minmax[column][0])
            dataset[column] = values
        dataset[column] = values
        return dataset,minmax

This function changes all data into a value between 0 and 1. This is as many stocks have skyrocketed or nosedived. Without normalizing, the neural network would learn from datapoints with higher values. This could create a blind spot and therefore affect predictions. The normalizing is done as so:

此功能会将所有数据更改为0到1之间的值。这是因为许多股票暴涨或暴跌。如果不进行标准化，则神经网络将从具有更高值的数据点学习。这可能会造成盲点，从而影响预测。规范化是这样完成的：

value = (value - minimum) / maximum

Where minimum and maximum are the minimum and maximum values of the feature.

其中最小值和最大值是特征的最小值和最大值。

def enviroment_setup(X,y):
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
        return X_train, X_test, y_train, y_test

This function uses sklearn’s train_test_split function to shuffle the data and divide into training and testing datasets.

此函数使用sklearn的train_test_split函数对数据进行混洗，并分为训练和测试数据集。

步骤4 | 创建神经网络： (Step 4| Create Neural Network:)

def initialize_network(n_steps,n_features,optimizer):
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation='relu'))
    model.add(Dense(2))
    model.compile(optimizer=optimizer, loss='mse')
    return model

This is the basic architecture of the Convolutional LSTM network. The optimizer that I found that works best with this network is Adam.

这是卷积LSTM网络的基本架构。我发现最适合此网络的优化器是Adam。

步骤5 | 训练神经网络： (Step 5| Train Neural Network:)

def train_model(X_train,y_train,model,epochs):
    dirx = 'something directory'
    os.chdir(dirx)
    h5='Stocks'+'_best_model'+'.h5'
    checkpoint = callbacks.ModelCheckpoint(h5, monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=True, mode='auto', period=1)
    earlystop = callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=epochs * 1/4, verbose=0, mode='auto', baseline=None, restore_best_weights=True)
    callback = [earlystop,checkpoint] 
    json = 'Stocks'+'_best_model'+'.json'
    model_json = model.to_json()
    with open(json, "w") as json_file:
        json_file.write(model_json)
    history = model.fit(X_train, y_train, epochs=epochs, batch_size=len(X_train)//4, verbose=2,validation_split = 0.3, callbacks = callback)
    return history

For the training function, I used the criminally underused Model Checkpoint callback to save the best weights of the model. Change the dirx variable to where you want to store your model.

对于训练功能，我使用了未被充分利用的Model Checkpoint回调，以节省模型的最佳权重。将dirx变量更改为要存储模型的位置。

步骤6 | 评估与预测： (Step 6| Evaluation and Prediction:)

def load_keras_model(dataset,model,loss,optimizer):
    dirx = 'something directory'
    os.chdir(dirx)
    json_file = open(dataset+'_best_model'+'.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.compile(optimizer=optimizer, loss=loss, metrics = None)
    model.load_weights(dataset+'_best_model'+'.h5')
    return modeldef evaluation(exe_time,X_test, y_test,X_train, y_train,history,model,optimizer,loss):
    model = load_keras_model('Stocks',model,loss,optimizer)
    test_loss = model.evaluate(X_test, y_test, verbose=0)
    train_loss = model.evaluate(X_train, y_train, verbose=0)
    eval_test_loss = round(100-(test_loss*100),1)
    eval_train_loss = round(100-(train_loss*100),1)
    eval_average_loss = round((eval_test_loss + eval_train_loss)/2,1)
    print("--- Training Report ---")
    plot_loss(history)
    print('Execution time: ',round(exe_time,2),'s')
    print('Testing Accuracy:',eval_test_loss,'%')
    print('Training Accuracy:',eval_train_loss,'%')
    print('Average Network Accuracy:',eval_average_loss,'%')
    return model,eval_test_loss

After saving the best weights, load the model again to ensure that you are using the best weights. The program then evaluates the program, based on data that it has not seen before. It then prints a set of variables to give comprehensive insight on the training of the network.

保存最佳权重后，再次加载模型以确保您使用的是最佳权重。然后，程序将根据之前从未见过的数据对程序进行评估。然后，它会打印一组变量，以全面了解网络的培训。

def market_predict(model,minmax,seq_len,n_features,n_steps,data,test_loss):
    pred_data = data[-1].reshape((len(data[-1]),1, n_steps, n_features))
    pred = model.predict(pred_data)[0]
    appro_loss = list()
    for i in range(len(pred)):
        pred[i] = pred[i] * (minmax[i][1] - minmax[i][0]) + minmax[i][0]
        appro_loss.append(((100-test_loss)/100) * (minmax[i][1] - minmax[i][0]))
    return pred,appro_loss

This is the function that makes the prediction of the program. We have to perform the inverse of the normalization function to get the value in terms of USD.

这是预测程序的功能。我们必须执行标准化函数的反函数才能获得以美元为单位的价值。

步骤7 | 创建订单： (Step 7| Create Order:)

BASE_URL = 'https://paper-api.alpaca.markets'
API_KEY = 'XXXXXXXX'
SECRET_KEY = 'XXXXXXXX'
ORDERS_URL = '{}/v2/orders'.format(BASE_URL)
HEADERS = {'APCA-API-KEY-ID':API_KEY,'APCA-API-SECRET-KEY':SECRET_KEY}

These are the basic parameters and endpoints to make alpaca orders. You can get your own API key and secret key here.

这些是制作羊驼毛定单的基本参数和端点。您可以在此处获得自己的API密钥和秘密密钥。

def create_order(pred_price,company,test_loss,appro_loss):
    open_price,close_price = pred_price[0],pred_price[1]
    if open_price > close_price:
        side = 'sell'
    elif open_price < close_price:
        side = 'buy'
    if side == 'buy':
        order = {
            'symbol':company,
            'qty':round(20*(test_loss/100)),
            'type':'stop_limit',
            'time_in_force':'day',
            'side': 'buy',
            'take_profit': close_price + appro_loss,
            'stop_loss': close_price - appro_loss
                }
    elif side == 'sell':
        order = {
            'symbol':company,
            'qty':round(20*(test_loss/100)),
            'type':'stop_limit',
            'time_in_force':'day',
            'side': 'sell',
            'take_profit':close_price - appro_loss,
            'stop_loss':close_price + appro_loss
                }
    r = requests.post(ORDERS_URL, json = order,headers = HEADERS)
    print(r.content)

This function applies the take profit and stop loss idea:

此功能应用了止盈和止损的想法：

Take profit and prevent loss as the close price fluctuates along the predicted closing price.

当收盘价沿着预测收盘价波动时，请获利并防止亏损。

The length between the borders of the red area and the center is the loss value. The borders act as the stop loss and take profit value, as these are the value that the program predicts the price will fluctuate within.

红色区域和中心的边界之间的长度是损失值。边界充当止损并获取利润值，因为这些值是程序预测价格将在其中波动的值。

结论： (Conclusion:)

Machine Learning and Stock Trading come hand in hand, as both are the prediction of complex patterns.

机器学习和股票交易齐头并进，因为两者都是对复杂模式的预测。

I hope that more people will use the Alpaca API and confidence intervals when it comes to algorithmic trading.

我希望有更多的人在算法交易中使用Alpaca API和置信区间。

Thank you for reading my article!

感谢您阅读我的文章！

翻译自: https://medium.com/analytics-vidhya/using-deep-learning-to-create-a-stock-trading-bot-a96e6351d31c