t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘

     By now, we know how to implement a trading strategy idea. We learned how to write the code to make it run in a trading system. The final step before going live with a trading strategy is backtesting. Whether you want to be more confident in the performance of your strategy or you want to show your managers how well your trading idea performs, you will have to use a backtester using a large amount of historical data.

     In this chapter, you will learn how to create a backtester. You will improve your trading algorithm by running different scenarios with large amounts of data to validate the performance of your trading strategy. Once a model is implemented, it is necessary to test whether the trading robot behaves as expected in the trading infrastructure.

     In this chapter, we will learn how backtesting works, and then we will talk about the assumptions you will need to consider when creating a backtester. Finally, we will provide a backtester example by using a momentum trading strategy.

In this chapter, we will cover the following topics:

  • Learning how to build a backtester
  • Learning how to choose the correct assumptions
  • Evaluating what the value of time is
  • Backtesting the dual-moving average trading strategy

Learning how to build a backtester

     Backtesting is key in the creation of trading strategies. It assesses how profitable a trading strategy is by using historical data. It helps to optimize it by running simulations that generate results showing risk and profitability before risking any capital loss. If the backtesting returns good results (high profits with reasonable risk), it will encourage getting this strategy to go alive. If the results are not satisfactory, backtesters can help to find issues.

     Trading strategies define rules for entry and exit into a portfolio of assets. Backtesting helps us to decide whether it is worth going live with these trading rules. It provides us with an idea of how a strategy might have performed in the past. The ultimate goal最终目标 is to filter out bad strategy rules before we allocate any real capital.

     Backesting can sound out a run of a trading strategy using past market data. Most of the time, we consider a backtester like a model of reality. We will make assumptions based on the experience. But if the model is not close enough to reality, the trading strategies will end up not performing as well, which will result in financial losses.

     The first part we will cover in this chapter is getting the data. The data will be stored in many different forms and, depending on them, we will need to adapt our backtester.

     Backtesters use data heavily. In trading, getting 1 terabyte of data a day is pretty common. It can take a few minutes for a hard disk to read this amount of data. If you are looking for a specific range of dates, or if you are looking for specific symbols. It will be very important to have a performance index for the dates, the symbols, or other attributes. The data in finance is a value associated to a particular time, called time series. Regular relational databases are not efficient at reading these time series. We will review a few ways to handle time series.

In-sample versus out-of-sample data

     When building a statistical model, we use cross-validation to avoid overfitting. Cross-validation imposes a division of data into two or three different sets. One set will be used to create your model, while the other sets will be used to validate the model's accuracy. Because the model has not been created with the other datasets, we will have a better idea of its performance.

     When testing a trading strategy with historical data, it is important to use a portion of data for testing. In a statistical model, we call training data the initial data to create the model. For a trading strategy, we will say that we are in the in-sample data. The testing data will be called out-of-sample data. As for cross-validation, it provides a way to test the performance of a trading strategy by 模拟resembling real-life trading as far as possible by testing on new data.

     The following diagram represents how we divide the historical data into two different sets. We will build our trading strategy using the in-sample data. Then, we will use this model to validate our model with the out-of-sample data:
t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第1张图片
     When we build a trading strategy, it is important to set aside between 70% and 80% to build the model. When the trading model is built, the performance of this model will be tested out of the out-of-sample data (20-30% of data).

Paper trading (forward testing)

     Paper trading (also known as forward performance testing) is the final step of the testing phase. We include the trading strategy to the real-time environment of our system and we send fake orders. After a day of trading, we will have the logs of all the orders and compare them to what they were supposed to be. This step is useful because it allows us to test the strategy and use the entire trading system.

      This phase is a way to do a last test of the trading strategy before investing real money. The benefits of this phase are the absence of any financial risk whatsoever, while the trading strategy creator can acquire confidence and practice in a stress-free environment while building new datasets that will be used for further analysis. Unfortunately, performance obtained by paper trading is not directly correlated to the market. It is difficult to ensure that an order can be fulfilled, or not, and at what price. Indeed, during a period of high market volatility, most orders can be rejected. Additionally, orders could be fulfilled at a worse price (negative slippage[ˈslɪpɪdʒ]滑移,下降).

Naive data storage原始数据存储

     One of the most intuitive ways to store data is to use flat file on the hard disk. The problem with this approach is that the hard disk will need to traverse a vast area to get to the part of a file corresponding to the data you would like to use for your backtesting. Having indexes can help enormously in looking up the correct segment to read.

HDF5 file

     The Hierarchical Data Format (HDF) is a file format designed to store and manage large amounts of data. It was designed in the 90s at the National Center for Supercomputing Applications (NCSA), and then NASA decided to use this format. Portability and efficiency for time series storage was key in the design of this language. The trading world rapidly adopted this format, in particular, High-Frequency Trading (HFT) firms, hedge funds, and investment banks. These financial firms rely on gigantic[dʒaɪˈɡæntɪk]巨大的,庞大的 amounts of data for backtesting, trading, and any other kinds of analysis.

     This format allows HDF users in finance to handle very large datasets, to obtain access to a whole section or a subsection of the tick data. Additionally, since it is a free format, the number of open source tools is significant.

The hierarchical structure of the HDF5 shown uses two major types:

  • Datasets: Multidimensional arrays of a given type
  • Groups: Container of other groups and/or datasets

The following diagram shows the hierarchical structure of the HDF5:
t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第2张图片
     To get the dataset's content, we can access it like a regular file using the POSIX syntax /path/file . The metadata is also stored in groups and datasets. The HDF5 format uses B-trees to index datasets, which makes it a good storage format for time series, especially financial asset price series.

     In the code, we will describe an example of how to use an HDF5 file in Python. We will use the load_financial_data function we used in this book to get the GOOG prices. We store the data frame in an HDF5 file called goog_data . Then, we use the h5py library to read this file and read the attributes of these files. We will print the data content of this files.

     In this code will get the GOOG financial data. We store this data into the data frame goog_data:

pip install --user h5py==3.1.0

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第3张图片

pip install tables

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第4张图片
hd5pandareader.py

#!/bin/python3
import pandas as pd
import numpy as np
from pandas_datareader import data
import matplotlib.pyplot as plt
import h5py

def load_financial_data( start_date, end_date, output_file ):
    try:
        df = pd.read_pickle( output_file )
        print( 'File data found...reading GOOG data')
    except FileNotFoundError:
        print('File not found...downloading the GOOG data')
        df = data.DataReader( 'GOOG', 'yahoo', start_date, end_date )
        df.to_pickle( output_file )
    return df

goog_data = load_financial_data( start_date='2014-01-01',
                                 end_date = '2018-01-01',
                                 output_file='goog_data_t9.pkl'
                               )
goog_data.head()

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第5张图片

 In this part of the code we will store the data frame goog_data into the file

goog_data.to_hdf( 'goog_data_t9.h5',# File path or HDFStore object.
                  key='goog_data',  # Identifier for the group in the store.
                  mode='w',
                  format='table',
                  data_columns=True
                )
NaturalNameWarning:

 Solution:'Adj Close' changed to 'Adj_Close'

h = h5py.File('goog_data_t9.h5')

print(h['goog_data']['table'])

 

print(h['goog_data']['table'][:])

for attributes in h['goog_data']['table'].attrs.items():
    print(attributes)

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第6张图片

pd.read_hdf('goog_data_t9.h5','goog_data')

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第7张图片

     Despite being portable and open source, the HDF5 file format has some important caveats[ˈkæviæt]警告:

  • The likelihood of getting corrupted data is high. When the software handing the
  • HDF5 file crashes, it is possible to lose all the data located in the same file.
  • It has limited features. It is not possible to remove arrays.
  • It offers low performance. There is no use of operating system caching.

     Many financial companies still use this standardized file. It will remain on the market for a few years. Next, we will talk about the file storage alternative: databases.

Databases

     Databases are made to store data. Financial data is time series data, and most databases do not handle time series data in the most efficient way. The biggest challenge associated with storing time series data is scalability. An important data stream comes rapidly. We have two main groups of databases: relational and non-relational databases. 

Relational databases

     Relational databases have tables that can be written and accessed in many different ways without having the need to reorganize the database structure. They usually use Structured
Query Language
(SQL). The most widely used databases are Microsoft SQL Server, PostgreSQL, MySQL, and Oracle.

     Python has many libraries capable of using any of these databases. We will use PostGresSQL as an example. The PostGresSQL library, Psycopg2 , is used by Python to handle any SQL queries:

1. We will use the GOOG data prices to create the database for GOOG data: 

goog_data.head(n=10)

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第8张图片
2. To create a table in SQL, we will use the following command. You will need to install PostGresSQL on your machine.
t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第9张图片https://www.enterprisedb.com/downloads/postgres-postgresql-downloadst9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第10张图片

https://get.enterprisedb.com/postgresql/postgresql-10.19-1-windows-x64.exe

 next,

pip install psycopg2

Then, you will need to insert the following content:

#!/usr/bin/python3 -u
"""OHLC data feed."""
import cgitb
import psycopg2

# Connect to PostgreSQL
conn = psycopg2.connect(dbname='goog_stock', user='postgres', password='LlQ54951')  # set the appropriate credentials
cursor = conn.cursor()

# Create table
cursor.execute("""
    CREATE TABLE "GOOG"
    (
        dt timestamp without time zone NOT NULL,
        high numeric NOT NULL,
        low numeric NOT NULL,
        open numeric NOT NULL,
        close numeric NOT NULL,
        volume numeric NOT NULL,
        adj_close numeric NOT NULL,
        CONSTRAINT "GOOG_pkey" PRIMARY KEY (dt)
    );
""")
conn.commit()
conn.close()

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第11张图片

     This command will create a SQL table named GOOG . The primary key of this table will be the timestamp, dt .

goog_data=goog_data.reset_index() # df.set_index()
goog_data

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第12张图片

goog_data.dtypes

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第13张图片

 Note:

  • Spaces are not allowed in the columnName of the table in the database
  • the columnNames in the goog_data(DataFrame) are different with the columnName of the table('GOOG') which we just created.
goog_data.columns = ['dt','high','low','open','close','volume','adj_close']
goog_data

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第14张图片
https://docs.sqlalchemy.org/en/14/core/engines.html#postgresql

pip install sqlalchemy

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第15张图片Please restart your kernel, if not, you will see 
AttributeError: ‘Engine‘ object has no attribute ‘cursor‘
OR :
DatabaseError: Execution failed on sql ‘SELECT name FROM sqlite_master WHERE type=‘table‘ AND name=?

# AttributeError: ‘Engine‘ object has no attribute ‘cursor‘
# please restart juopyter notebook
from sqlalchemy import create_engine
                                    # username:password
engine = create_engine(r'postgresql://postgres:LlQ54951@localhost:5432/goog_stock')
goog_data.to_sql(name='GOOG', con=engine, if_exists='append', index=False) # schema='public'

 https://docs.sqlalchemy.org/en/14/dialects/postgresql.htmlt9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第16张图片

goog_db.py 
https://www.psycopg.org/docs/module.html

#!/usr/bin/python3 -u
"""OHLC data feed."""
import cgitb
import psycopg2

# Connect to PostgreSQL
conn = psycopg2.connect(dbname='goog_stock', user='postgres', password='LlQ54951', port='5432')  # set the appropriate credentials
cursor = conn.cursor()

SQL = '''
        SELECT dt,high,low,open,close,volume, adj_close
        FROM "GOOG"  
        WHERE dt BETWEEN '2016-11-08' AND '2016-11-09'  
        ORDER BY dt  
        LIMIT 100;
      '''

def query_ticks(date_from=None, date_to=None, period=None, limit=None):
    """Dummy arguments for now.  Return OHLC result set."""
    cursor.execute(SQL)
    ohlc_result_set = cursor.fetchall()

    return ohlc_result_set

def format_as_csv(ohlc_data, header=False):
    """Dummy header argument.  Return CSV data."""
    csv_data = 'dt,o,h,l,c,vol\n'

    for row in ohlc_data:
        csv_data += ('%s, %s, %s, %s, %s, %s, %s\n' %
                     (row[0], row[1], row[2], row[3], row[4], row[5], row[6])
                    )

    return csv_data

if __name__ == '__main__':
    cgitb.enable()
    
    ohlc_result_set = query_ticks()
#   # ['dt','high','low','open','close','volume','adj_close']    
#     print(ohlc_result_set)
#     [( datetime.datetime(2016, 11, 8, 0, 0), 
#        Decimal('795.6329956054688'), 
#        Decimal('780.1900024414062'), 
#        Decimal('783.4000244140625'), 
#        Decimal('790.510009765625'), 
#        Decimal('1366900.0'),
#        Decimal('790.510009765625')
#      ), 
#      ( datetime.datetime(2016, 11, 9, 0, 0),
#        Decimal('791.2269897460938'),
#        Decimal('771.6699829101562'),
#        Decimal('779.9400024414062'),
#        Decimal('785.3099975585938'),
#        Decimal('2607100.0'),
#        Decimal('785.3099975585938')
#      )
#     ]
    csv_data = format_as_csv(ohlc_result_set)

    print('Content-Type: text/plain; charset=utf-8\n')
    print(csv_data)

    cursor.close()

https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html#pandas.read_sql

# Connect to PostgreSQL
conn = psycopg2.connect(dbname='goog_stock', user='postgres', password='LlQ54951', port='5432')  # set the appropriate credentials
cursor = conn.cursor()

SQL = '''
        SELECT dt,high,low,open,close,volume, adj_close
        FROM "GOOG"  
        WHERE dt BETWEEN '2016-11-08' AND '2016-11-09'  
        ORDER BY dt  
        LIMIT 100;
      '''
pd.read_sql( SQL,
             conn,
             parse_dates=["dt"]
           )

 

     The main issue with a relational database is speed. They are not made to work with large amounts of data indexed by time. To speed up, we will need to use non-relational databases.

Non-relational databases

     Non-relational databases are very widespread. Because the nature of the data is increasingly based on time series, this type of database has developed rapidly during the last decade. The best non-relational database for time series is called KDB. This database is designed to achieve performance with time series. There are many other competitors, including InfluxDB, MongoDB, Cassandra, TimescaleDB, OpenTSDB, and Graphite.

All of these databases have their pros and cons:
t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第17张图片

     As shown in the table, it is difficult to choose an alternative to KDB. 

Learning how to choose the correct assumptions

     Backtesting is a required step for deploying trading strategies. We use the historical data stored in databases to reproduce the behavior of the trading strategy. The fundamental assumption is that any methodology that functioned in the past is probably going to function in the future. Any strategies that performed ineffectively in the past are probably going to perform inadequately in the future. This section investigates what applications are utilized in backtesting, what sort of information is obtained, and how to utilize them.

      A backtester can be a for-loop or event-driven backtester system. It is always important to consider how much time you will spend in order to achieve higher accuracy. It is impossible to obtain a model corresponding to reality; a backtester will just be a model of reality. However, there are rules to be followed in order to be as close as possible to the real market:

  • Training/testing data: As with any models, you should not test your model with the data you use to create this model. You need to validate your data on unseen data to limit overfitting. When we use machine learning techniques, it is easy to overfit a model; that's why it is capital to use cross-validation to improve the accuracy of your model.
  • Survivorship-bias free data: If your strategy is a long-term position strategy, it is important to use the survivorship-bias free data. This will prevent you from focusing on winners alone without considering the losers.
  • Look-ahead data: When you build a strategy, you should not look ahead to make a trading decision. Sometimes, it is easy to make this mistake by using numbers calculated using the whole sample. This may be the case with an average that could potentially be calculated within all the data; data that you shouldn't have since you calculate the average using just the prices you get before placing an order.
  • Market change regime: Modeling stock distribution parameters are not constant in time because the market changes regime[reɪˈʒiːm]机制.
  • Transaction costs: It is important to consider the transaction costs of your trading. This is very easy to forget and not to make money on the real market.
  • Data quality/source: Since there are many financial data sources, data composition differs a lot. For instance, when you use OHLC data from Google Finance, it is an aggregation of many exchange feeds. It will be difficult to obtain the same highs and lows with your trading system. Indeed, in order to have a match between your model and reality, the data you use must be as close as possible to the one you will use.
  • Money constraint: Always consider that the amount of money you trade is not infinite. Additionally, if you use a credit/margin account, you will be limited by the position you take.
  • Average daily volume (ADV): The average number of shares traded over a day for a given ticker. The quantity of shares you choose to trade will be based on this number so as to avoid any impact on the market.
  • Benchmark testing: In order to test the performance of your trading strategy, you will compare against another type of strategy or just against the return of some indexes.
    If you trade futures期货, do not test against the S&P 500.
    If you trade in airlines, you should check whether the airline industry as a whole performs better than your model.
  • Initial condition assumption: In order to have a robust way of making money, you should not depend on the day you start your backtesting or the month. More generally, you should not assume that the initial condition is always the same.
  • Psychology: Even if we are building a trading robot, when we trade for real, there is always a way to override what the algorithm is doing, even if, speaking, based on the backtest, a trading strategy can have a large dropdown but, after a few days, this strategy can bring in a lot of profit if we maintain a given position. For a computer, there are no problems with taking that risk but, for a human, it is more difficult. Therefore, psychology can play a large role in the performance of a strategy.

     On top of the prior rules, we will need to assume how we expect the market to behave. When you present a trading strategy to anyone, it is important to specify what these assumptions are.

     One of the first assumption you need to consider is the fill ratio. When we place an order, depending on the type of strategies, the change of getting the order executed varies. If you trade with a high-frequency trading strategy, you may have 95% of the orders rejected. If you trade when there are important news on the market (such as FED announcements), you may have most of your orders rejected. Therefore, you will need to give a lot of thoughts on the fill ratio of your backtester.

     Another important consideration is when you create a market making strategy. Unlike market trading strategies, a market making strategy does not remove liquidities from the market but add liquidities. Therefore it is important to create an assumption regarding when your order will be executed (or maybe it will not be executed). This assumption will add a condition to the backtester. We may get additional data. For instance, the trades which have been done in the market at a given time. This information will help us to decide whether a given market making order was supposed to be executed or not.

     We can add additional latency assumptions. Indeed, since a trading system relies on many components. All the components have latencies and they also add latency when communicating. We can latency of any components of the trading systems, we can add network latency but also the latency to have an order executed, acknowledged.

     The list of assumptions can be pretty long but it will be very important to show these assumptions to explain how likely your trading strategy will perform on the real market.

Backtesting the dual-moving average trading strategy

     The dual-moving average trading strategy

  • places a buy order when the short moving average crosses the long moving average in an upward direction(average(self.small_window) > average(self.large_window)) and
  • will place a sell order when the cross happens on the other side.

This section will present the backtesting implementation of the dual-moving average strategy. We will present the implementation of a for-loop backtester and an event-based backtester

For-loop backtest systems

     The for-loop backtester is a very simple infrastructure. It reads price updates line by line and calculates more metrics out of those prices (such as the moving average at the close). It then makes a decision on the trading direction. The profit and loss is calculated and displayed at the end of this backtester. The design is very simple and can quickly discern whether a trading idea is feasible.

An algorithm to picture how this kind of backtester works is shown here:

for each tick coming to the system (price update) :
    create_metric_out_of_prices()
    buy_sell_or_hold_something()
    next_price()

     We will create a ForLookBackTester class. This class will handle, line by line, all the prices of the data frame. We will need to have 2 lists capturing the prices to calculate the two moving averages. We will store the history of profit and loss, cash, and holdings to draw a chart to see how much money we will make.

import pandas as pd
import numpy as np
from pandas_datareader import data
import matplotlib.pyplot as plt
import h5py
from collections import deque

def load_financial_data( start_date, end_date, output_file ):
    try:
        df = pd.read_pickle( output_file )
        print('File data found...reading GOOG data')
    except FileNotFoundError:
        print('File not found...downloading the GOOG data')
        df = data.DataReader('GOOG', 'yahoo', start_date, end_date)
        df.to_pickle( output_file )
    return df

goog_data = load_financial_data( start_date='2001-01-01',
                                 end_date='2018-01-01',
                                 output_file='goog_data.pkl'
                               )
goog_data

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第18张图片

     The create_metrics_out_of_prices function calculates the long moving average (100 days) and the short moving average (50 days). When the short window moving average is higher than the long window moving average(average(self.small_window) > average(self.large_window)), we will generate a long signal(self.long_signal=True).

The buy_sell_or_hold_something function will place orders.

  • The buy order will be placed when there is a short position or no position.
    if self.long_signal and self.position<=0
  • The sell order will be placed when there is a long position or no position.
    elif not self.long_signal and self.position>0
  • This function will keep track of the position, the holdings(self.holdings = self.position * price_update['price']), and the profit(self.total = (self.holdings + self.cash )).

4. Let's now define the ForLoopBackTester class as shown. This class will have the data structure to support the strategy in the constructor. We will store the historic values for profit and loss, cash, positions, and holdings. We will also keep the real-time profit and loss, cash, position, and holding values: 

def average(lst):
    return sum(lst)/len(lst)

class ForLoopBackTester:
    def __init__(self):
        self.small_window=deque()
        self.large_window=deque()
        self.list_position=[]
        self.list_cash=[]
        self.list_holdings=[]
        self.list_total=[]
        
        self.long_signal=False
        self.position=0
        self.cash=10000
        self.total=0
        self.holdings=0
        
    def create_metrics_out_of_prices( self, price_update ):
        self.small_window.append( price_update['price'] )
        self.large_window.append( price_update['price'] )
        
        if len(self.small_window)>50:
            self.small_window.popleft()
            
        if len(self.large_window)>100:
            self.large_window.popleft()
        
        if len(self.small_window) == 50:
            # similar to short_move_average > long_move_average
            if average(self.small_window) > average(self.large_window):
                self.long_signal=True
            else:
                self.long_signal=False
            return True # if it is tradable
        return False
    
    def buy_sell_or_hold_something( self, price_update ):
        if self.long_signal and self.position<=0:
            print( str(price_update['date']) + 
                   " send buy order for 10 shares price=" + 
                   str(price_update['price'])
                 )
            self.position += 10
            self.cash -= 10*price_update['price']
        elif self.position>0 and not self.long_signal:
            print( str(price_update['date']) + 
                   " send sell order for 10 shares price=" + 
                   str(price_update['price'])
                 )
            self.position -= 10
            self.cash -= -10*price_update['price']
            
        self.holdings = self.position * price_update['price']
        self.total = (self.holdings + self.cash )
        print( '%s total=%d, holding=%d, cash=%d' %
               ( str(price_update['date']), self.total, self.holdings, self.cash )
             )
        self.list_position.append( self.position )
        self.list_cash.append( self.cash )
        self.list_holdings.append( self.holdings )
        self.list_total.append( self.holdings+self.cash )
        
naive_backtester = ForLoopBackTester()
for line in zip( goog_data.index, goog_data['Adj Close'] ):
    date = line[0]
    price = line[1]
    price_information = {'date':date,
                         'price':float(price)
                        }
    is_tradable = naive_backtester.create_metrics_out_of_prices(price_information)
    if is_tradable:
        naive_backtester.buy_sell_or_hold_something( price_information )

 
... ...

fig = plt.figure( figsize=(10,6))
plt.plot( naive_backtester.list_total,
          label='Holdings+Cash using Naive BackTester'
        )
plt.legend()
plt.show()

     When we run the code, we will obtain the following curve. This curve shows that this strategy makes around a 50% return((15000-10000)/10000) with the range of years we are using for the backtest. This result is obtained by assuming a perfect fill ratio. Additionally, we don't have any mechanism preventing drawdown, or large positions. This is the most optimistic approach when we study the performance of trading strategies: 

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第19张图片
     Achieving improved confidence in the way the strategy will perform in the market implies having a backtester that considers the characteristics of the trading system (more generally, the specificities of the company trading strategy where you work) and market assumptions. To make things more akin to scenarios encountered in real life, we will need to backtest the trading strategy by using most of the trading system components. Additionally, we will include the market assumptions in a market simulator.

Advantages

     The for-loop backtester is very simple to comprehend. It can be easily implemented in any programming language. The main functionality of this type of backtester is to read a file and calculate new metrics based on price alone. Complexity and the need for calculating power are very low. Therefore, execution does not take too long and it is quick to obtain results regarding the performance of the trading strategies.

Disadvantages

     The main weakness of the for-loop backtester is accuracy in relation to the market. It neglects transactions costs, transaction time, the bid and offer price, and volume. The likelihood of making a mistake by reading a value ahead of time is pretty high (look-ahead bias).

     While the code of a for-loop backtester is easy to write, we should still use this type of backtester to eliminate low-performance strategies. If a strategy does not perform well with for-loop backtesters, this means that it will perform even worse on more realistic backtesters.

     Since it is important to have a backtester that's as realistic as possible, we will learn how an event-driven backtester works in the following section.

Event-driven backtest systems

     An event-driven backtester uses almost all the components of the trading system. Most of the time, this type of backtester encompass all the trading system components (such as the order manager system, the position manager, and the risk manager). Since more components are involved, the backtester is more realistic.

     The event-driven backtester is close to the trading system we implemented in Chapter 7 , Building a Trading System in Pythonhttps://blog.csdn.net/Linli522362242/article/details/122295708. We left the code of the TradingSimulation.py file empty. In this section, we will see how to code that missing code.

     We will have a loop calling all the components one by one. The components will read the input one after the other and will then generate events if needed. All these events will be inserted into a queue (we'll use the Python deque object). The events we encountered when we coded the trading system were the following:

  • Tick events – When we read a new line of market data
  • Book events – When the top of the book is modified
  • Signal events – When it is possible to go long or short
  • Order events – When orders are sent to the market
  • Market response events – When the market response comes to the trading system

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第20张图片

 The pseudo code for an event-driven backtesting system is as follows: 

from chapter7.LiquidityProvider import LiquidityProvider
from chapter7.TradingStrategy import TradingStrategy
from chapter7.MarketSimulator import MarketSimulator
from chapter7.OrderManager import OrderManager
from chapter7.OrderBook import OrderBook
from collections import deque

def main() :
    lp_2_gateway = deque()
    ob_2_ts = deque()
    ts_2_om = deque()
    ms_2_om = deque()
    om_2_ts = deque()
    gw_2_om = deque()
    om_2_gw = deque()

    # LiquidityProvider == new order or updated order via lp_2_gateway ==> OrderBook.py
    # order{id,buy or sell, price, quantity, action}
    lp = LiquidityProvider(lp_2_gateway)
    # here, the lp_2_gateway is gw_2_ob
    # Liquidity(order) == gw_2_ob.popleft() ==> OrderBook(handle_new, modify,delete)
    # OrderBook : check_generate_top_of_book_event ==> book_event via ob_to_ts.popleft() ==> TradingStrategy
    ob = OrderBook(lp_2_gateway, ob_2_ts)
    # TradingStrategy( signal(the bid price(sell at the current bid price)
    #                             > 
    #                         the offer price(buy at the current ask price)
    #                        ), create_orders, execution ==> ts_2_om)
    # handle_input_from_bb checks whether there are book events in deque ob_2_ts
    # OrderManager == om_2_ts.popleft() ==> TradingStrategy(handle_response_from_om)
    ts = TradingStrategy(ob_2_ts, ts_2_om, om_2_ts)
    # OrderManager(order) <== gw_2_om/om_2_gw ==> Market(handle_order_from_gw)
    # collect the order from the gateway (the order manager) through the om_2_gw channel
    ms = MarketSimulator(om_2_gw, gw_2_om)
    # ts_2_om.popleft() ==> OrderManager == om_2_gw ==> Gateway Out ==> venue(Market)
    # gw_2_ms ==> Order_Manager == om_2_ts ==> TradingStrategy
    om = OrderManager(ts_2_om, om_2_ts, om_2_gw, gw_2_om)
    lp. read_tick_data_from_data_source()

while len(lp_2_gateway) >0:
    ob.handle_order_from_gateway()
    ts.handle_input_from_bb()
    om.handle_input_from_ts()
    ms.handle_order_from_gw()
    om.handle_input_from_market()
    ts.handle_response_from_om()
    lp.read_tick_data_from_data_source()

if __name__ == '__main__':
    main()

TradingStrategyDualMA.pyt9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第21张图片

     We will code the TradingStrategyDualMA class inspired by the TradingStrategy class that we coded in Chapter 7 , Building a Trading System in Python. This class will take care of keeping track of two series of values, the values for paper trading and the values for backtesting:

# Python program to get average of a list
from collections import deque

def average(lst):
    return sum(lst)/len(lst)

class TradingStrategyDualMA:
    def __init__(self, ob_2_ts, ts_2_om, om_2_ts):
        self.orders = []
        self.order_id = 0
        
        self.current_bid = 0
        self.current_offer = 0
        self.ob_2_ts = ob_2_ts
        self.ts_2_om = ts_2_om
        self.om_2_ts = om_2_ts
        
        self.position = 0
        self.pnl = 0
        self.cash = 10000
        self.list_position=[]
        self.list_cash=[]
        self.list_holdings=[]
        self.list_total=[]        
        
        self.long_signal = False
        self.total = 0
        self.holdings = 0
        self.small_window = deque()
        self.large_window = deque()       

        self.paper_position = 0
        self.paper_pnl = 0
        self.paper_cash = 10000
        self.list_paper_position=[]
        self.list_paper_cash=[]
        self.list_paper_holdings=[]
        self.list_paper_total=[]
    
    # calculates the short and long moving averages    
    def create_metrics_out_of_prices( self, price_update ):
        self.small_window.append( price_update )
        self.large_window.append( price_update )
        
        if len(self.small_window)>50:
            self.small_window.popleft()
            
        if len(self.large_window)>100:
            self.large_window.popleft()
        
        if len(self.small_window) == 50:
            # similar to short_move_average > long_move_average
            if average(self.small_window) > average(self.large_window):
                self.long_signal=True
            else:
                self.long_signal=False
            return True # if it is tradable
        return False
    
    
    def create_order( self, book_event, quantity, side ):
        self.order_id += 1
        ord = {
            'id': self.order_id,
            'price': book_event['bid_price'],
            'quantity': quantity,
            'side': side,
            'action': 'to_be_sent'
        }
        self.orders.append( ord.copy() )
        
    def buy_sell_or_hold_something( self, book_event ):
        # Based on the signal, we will place an order and we will keep track of 
        # the paper trading position, cash, and profit and loss. 
        if self.long_signal and self.paper_position<=0 :
            # actual: min( book_event['bid_quantity'], book_event['offer_quantity'] )
            self.create_order( book_event, book_event['bid_quantity'], 'buy' ) # Sell at the price you bid
            self.paper_position += book_event['bid_quantity']
            self.paper_cash -= book_event['bid_quantity'] * book_event['bid_price']
            
        elif not self.long_signal and self.paper_position>0 :
            self.create_order( book_event, book_event['bid_quantity'], 'sell' ) # buyer bid price to sell
            self.paper_position -= book_event['bid_quantity'] 
            self.paper_cash += book_event['bid_quantity'] * book_event['bid_price']
            
        self.paper_holdings = self.paper_position * book_event['bid_price']
        self.paper_total = ( self.paper_holdings + self.paper_cash )
        
        self.list_paper_position.append( self.paper_position )
        self.list_paper_cash.append( self.paper_cash )
        self.list_paper_holdings.append( self.paper_holdings )
        self.list_paper_total.append( self.paper_holdings+self.paper_cash )
        
        # Trading( execution() ) using Event-Based BackTester
        # records the value of the backtested values of position, cash, and profit and loss
        self.list_position.append( self.position )
        self.holdings = self.position*book_event['bid_price']
        
        self.list_holdings.append( self.holdings )
        self.list_cash.append( self.cash )
        self.list_total.append( self.holdings+self.cash )

        
    def signal( self, book_event ):
        if book_event['bid_quantity'] != -1 and book_event['offer_quantity'] != -1:
            if self.create_metrics_out_of_prices( book_event['bid_price'] ): # if it is tradable
                self.buy_sell_or_hold_something( book_event )
    
    def execution( self ):
        orders_to_be_removed=[]
        for index, order in enumerate( self.orders ):
            if order['action'] == 'to_be_sent': # order from Trading Strategy to Order Managemer
                # Send order = 'new'
                order['status'] = 'new'
                order['action'] = 'no_action'
                # ts_2_om : send orders from TradingStrategy to the OrderManager
                # then, the TradingStrategy will receive order updates from OrderManager
                if self.ts_2_om is None:
                    print( 'Simulation mode')
                else:
                    self.ts_2_om.append( order.copy() )
                    
            if order['status'] == 'rejected' or order['status'] == 'cancelled':
                orders_to_be_removed.append( index )
                
            if order['status'] == 'filled': # this order has been executed
                orders_to_be_removed.append( index )
                #                          go long                     go short
                pos = order['quantity'] if order['side'] == 'buy' else -order['quantity']
                self.position += pos
                self.pnl -= pos * order['price']
                self.cash -= pos * order['price']
                
                self.holdings = self.position * order['price']
            
            for order_index in sorted( orders_to_be_removed, reverse=True ):
                del ( self.orders[order_index] )

    def handle_book_event( self, book_event ):
        if book_event is not None:
            self.current_bid = book_event['bid_price']
            self.current_offer = book_event['offer_price']
            
            self.signal( book_event )
            self.execution()
              
    # OrderBook(book_event) == ob_2_ts ==> TradingStrategy( signal, create_orders, execution ==> ts_2_om)
    # handle_input_from_bb checks whether there are book events in deque ob_2_ts
    def handle_input_from_bb( self, book_event=None ):
        # ob_2_ts : take the book events from OrderBook to the TradingStrategy
        if self.ob_2_ts is None:
            print( 'simulation mode' )
            self.handle_book_event( book_event )
        else:
            if len( self.ob_2_ts ) > 0:
                # be : book event
                be = self.handle_book_event( self.ob_2_ts.popleft() )
                # be = None
                self.handle_book_event( be )

    def lookup_orders( self, id ):
        count = 0
        for o in self.orders:
            if o['id'] == id:
                return o, count # count == the index of the order( o['id'] == id )
            count += 1
        return None, None
    
    def handle_market_response( self, order_execution ):
        print(order_execution)
        order, _ = self.lookup_orders( order_execution['id'] )
        if order is None:
            print( 'Error: the order was not found!')
            return
        # else
        order['status'] = order_execution['status'] # order updates : filled or rejected
        self.execution()
    
    # OrderManager == om_2_ts ==> TradingStrategy
    def handle_response_from_om( self ):
        # om_2_ts : the TradingStrategy receives order updates from OrderManager
        if self.om_2_ts is not None:
            self.handle_market_response( self.om_2_ts.popleft() )
        else:
            print( 'simulation mode' )
    
    def get_pnl( self ):        # unrealized profit
        return self.pnl + self.position * (self.current_bid + self.current_offer)/2

     In this section, we will create an EventBasedBackTester class. This class will have a queue between all the components of the trading systems. Like when we wrote our first Python trading system, the role of these queues is to pass events between two components. For instance, the gateway will send the market data to the book through a queue. Each ticker (price update) will be considered an event. The event we implemented in the book will be triggered each time there is a change in the top of the order book. If there is a change in the top of the book, the book will pass a book event, indicating that there is a change in the book. This queue will be implemented using the deque from the collection library. All the trading object components will be linked to one another by these queues. 

     The input for our system will be the Yahoo finance data collected by the panda DataReader class. Because this data doesn't contain any orders, we will change the data with the process_data_from_yahoo function. This function will use a price and will convert this price to an order.

      The order will be queued in the lp_2_gateway queue. Because we need to fake the fact that this order will disappear after each iteration, we will also delete the order. The process_events function will ensure that all the events generated by a tick have been
processed by calling the call_if_not_empty function. This function has two arguments:

  • A queue: This queue is checked if empty. If this queue is not empty, it will call the second argument.
  • A function: This is the reference to the function that will be called when the queue is not empty.
from chapter7.LiquidityProvider import LiquidityProvider
from chapter7.MarketSimulator import MarketSimulator
from chapter7.OrderManager import OrderManager
from chapter7.OrderBook import OrderBook
from chapter9.TradingStrategyDualMA import TradingStrategyDualMA
from collections import deque

import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt

def call_if_not_empty(deq, fun):
    while (len(deq) > 0):
        fun()

class EventBasedBackTester:
    def __init__(self):
        self.lp_2_gateway = deque()
        self.ob_2_ts = deque()
        self.ts_2_om = deque()
        self.om_2_gw = deque()
        
        self.gw_2_om = deque()
        self.om_2_ts = deque()

        self.lp = LiquidityProvider( self.lp_2_gateway )
        self.ob = OrderBook( self.lp_2_gateway, self.ob_2_ts)
        self.ts = TradingStrategyDualMA( self.ob_2_ts, self.ts_2_om, self.om_2_ts )
        self.ms = MarketSimulator( self.om_2_gw, self.gw_2_om )
        self.om = OrderManager( self.ts_2_om, self.om_2_ts,\
                                self.om_2_gw, self.gw_2_om )

    def process_events(self):
        while len( self.lp_2_gateway )>0:
            call_if_not_empty( self.lp_2_gateway, self.ob.handle_order_from_gateway )
            call_if_not_empty( self.ob_2_ts, self.ts.handle_input_from_bb )
            call_if_not_empty( self.ts_2_om, self.om.handle_input_from_ts )
            call_if_not_empty( self.om_2_gw, self.ms.handle_order_from_gw )
            
            call_if_not_empty( self.gw_2_om, self.om.handle_input_from_market )
            call_if_not_empty( self.om_2_ts, self.ts.handle_response_from_om )        

    def process_data_from_yahoo(self,price):

        order_bid = {
            'id': 1,
            'price': price,
            'quantity': 1000,
            'side': 'bid',
            'action': 'new'
        }
        order_ask = {
            'id': 1,
            'price': price,
            'quantity': 1000,
            'side': 'ask',
            'action': 'new'
        }
        
        self.lp_2_gateway.append( order_ask )
        self.lp_2_gateway.append( order_bid )
        self.process_events()
        
        order_ask['action']='delete'
        order_bid['action'] = 'delete'
        self.lp_2_gateway.append( order_ask )
        self.lp_2_gateway.append( order_bid )

eb = EventBasedBackTester()


def load_financial_data(start_date, end_date,output_file):
    try:
        df = pd.read_pickle( output_file )
        print('File data found...reading GOOG data')
    except FileNotFoundError:
        print('File not found...downloading the GOOG data' )
        df = data.DataReader( 'GOOG', 'yahoo', start_date, end_date )
        df.to_pickle( output_file )
    return df

goog_data=load_financial_data( start_date='2001-01-01',
                               end_date = '2018-01-01',
                               output_file='goog_data.pkl'
                             )

for line in zip( goog_data.index,goog_data['Adj Close'] ):
    date=line[0]
    price=line[1]
    price_information={'date' : date,
                       'price' : float(price)
                      }
    eb.process_data_from_yahoo( price_information['price'] )
    eb.process_events()

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第22张图片
... ...
t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第23张图片

fig = plt.figure( figsize=(10,6) )
plt.plot( eb.ts.list_paper_total, label="Paper Trading using Event-Based BackTester" )
plt.plot( eb.ts.list_total, label="Trading using Event-Based BackTester" )
plt.legend()
plt.show()

Note: It is difficult to ensure that an order can be fulfilled, or not, and at what price. Indeed, during a period of high market volatility, most orders can be rejected. Additionally, orders could be fulfilled at a worse price (negative slippage)

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第24张图片
      We will now modify the market assumptions by changing the fill ratio used by the market simulator. We are getting a fill ratio of 10%, and we can see that the profit and loss is profoundly impacted. Since most of our orders are not filled, we will not make money where the trading strategy was supposed to make money:

chapter7/MarketSimulator.py

from random import randrange

class MarketSimulator:
    def __init__(self, om_2_gw=None, gw_2_om=None):
        # om_2_gw : from OrderManager to Gateway Out ==> venue(Market)
        # gw_2_om : from Gateway In to OrderManager
        self.orders = []
        self.om_2_gw = om_2_gw
        self.gw_2_om = gw_2_om
        
    def lookup_orders( self, order ):
        count = 0
        for o in self.orders:
            if o['id'] == order['id']:
                return o, count
        return None, None
    
    def fill_all_orders( self, ratio=100 ):
        orders_to_be_removed = []
        for index, order in enumerate( self.orders ):
            # stop = 100
            if randrange(100) <= ratio:
                order['status'] = 'filled'
            else:
                order['status'] = 'cancelled'
                
            orders_to_be_removed.append( index )
            
            # gw_2_om : send update order via Gateway In to OrderManager
            if self.gw_2_om is not None:
                self.gw_2_om.append( order.copy() )
            else:
                print( 'simulation mode' )
                
        for i in sorted( orders_to_be_removed, reverse=True ):
            del (self.orders[i])
            
    
    def handle_order( self, order ):
        
        o, offset = self.lookup_orders( order )
        
        if o is None:
            if order['action'] == 'New':# new order 
                order['status'] = 'accepted' # update the order status
                self.orders.append( order )
                # gw_2_om : from Gateway In to OrderManager
                if self.gw_2_om is not None:
                    self.gw_2_om.append( order.copy() )# send the market response to the order manager
                    ##########################
                    self.fill_all_orders( 10 ) # process all orders includng the amended orders
                else:
                    print('simulation mode')
                return
            elif order['action'] == 'Cancel' or order['action'] == 'Amend':
                print( 'Order id - not found - Rejection' )
                # gw_2_om : from Gateway In to OrderManager
                if self.gw_2_om is not None:
                    self.gw_2_om.append( order.copy() )
                else:
                    print('simulation mode')
                return
        #  If an order already has the same order ID, the order will be dropped    
        elif o is not None: # found the order id
            if order['action'] == 'New':
                print( 'order id={} was {}'.format( o['id'],
                                                    o['status']
                                                  )
                     )
                print('Duplicate order id - Rejection')
                return
            elif order['action'] == 'Cancel':  # cancel the order
                o['status'] = 'cancelled'
                # gw_2_om : from Gateway In to OrderManager
                if self.gw_2_om is not None:
                    self.gw_2_om.append( o.copy() )
                else:
                    print( 'simulation mode' )
                del ( self.orders[offset] )    # delete the order
                print( 'Order cancelled' )
            elif order['action'] == 'Amend':
                print('You can amend the order here!')
                o['status'] = 'accepted' # amend the order status
                if self.gw_2_om is not None:
                    self.gw_2_om.append( o.copy() ) # send the market response to the order manager
                    # will be processed after the new order coming and call self.fill_all_orders( 100 ) 
                else:
                    print( 'simulaton mode' )
                print( 'Order amended' )
    
    # OrderManager(order) == om_2_gw ==> Market
    # collect the order from the gateway (the order manager) through the om_2_gw channel
    def handle_order_from_gw( self ):
        # om_2_gw : from OrderManager to Gateway Out ==> venue(Market)
        if self.om_2_gw is not None:
            self.handle_order( self.om_2_gw.popleft() )
        else:
            print( 'simulation mode' )
from chapter7.LiquidityProvider import LiquidityProvider
from chapter7.MarketSimulator import MarketSimulator
from chapter7.OrderManager import OrderManager
from chapter7.OrderBook import OrderBook
from chapter9.TradingStrategyDualMA import TradingStrategyDualMA
from collections import deque

import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt

def call_if_not_empty(deq, fun):
    while (len(deq) > 0):
        fun()

class EventBasedBackTester:
    def __init__(self):
        self.lp_2_gateway = deque()
        self.ob_2_ts = deque()
        self.ts_2_om = deque()
        self.om_2_gw = deque()
        
        self.gw_2_om = deque()
        self.om_2_ts = deque()

        self.lp = LiquidityProvider( self.lp_2_gateway )
        self.ob = OrderBook( self.lp_2_gateway, self.ob_2_ts)
        self.ts = TradingStrategyDualMA( self.ob_2_ts, self.ts_2_om, self.om_2_ts )
        self.ms = MarketSimulator( self.om_2_gw, self.gw_2_om )
        self.om = OrderManager( self.ts_2_om, self.om_2_ts,\
                                self.om_2_gw, self.gw_2_om )

    def process_events(self):
        while len( self.lp_2_gateway )>0:
            call_if_not_empty( self.lp_2_gateway, self.ob.handle_order_from_gateway )
            call_if_not_empty( self.ob_2_ts, self.ts.handle_input_from_bb )
            call_if_not_empty( self.ts_2_om, self.om.handle_input_from_ts )
            call_if_not_empty( self.om_2_gw, self.ms.handle_order_from_gw )
            
            call_if_not_empty( self.gw_2_om, self.om.handle_input_from_market )
            call_if_not_empty( self.om_2_ts, self.ts.handle_response_from_om )        

    def process_data_from_yahoo(self,price):

        order_bid = {
            'id': 1,
            'price': price,
            'quantity': 1000,
            'side': 'bid',
            'action': 'new'
        }
        order_ask = {
            'id': 1,
            'price': price,
            'quantity': 1000,
            'side': 'ask',
            'action': 'new'
        }
        
        self.lp_2_gateway.append( order_ask )
        self.lp_2_gateway.append( order_bid )
        self.process_events()
        
        order_ask['action']='delete'
        order_bid['action'] = 'delete'
        self.lp_2_gateway.append( order_ask )
        self.lp_2_gateway.append( order_bid )

eb = EventBasedBackTester()

def load_financial_data(start_date, end_date,output_file):
    try:
        df = pd.read_pickle( output_file )
        print('File data found...reading GOOG data')
    except FileNotFoundError:
        print('File not found...downloading the GOOG data' )
        df = data.DataReader( 'GOOG', 'yahoo', start_date, end_date )
        df.to_pickle( output_file )
    return df

goog_data=load_financial_data( start_date='2001-01-01',
                               end_date = '2018-01-01',
                               output_file='goog_data.pkl'
                             )

for line in zip( goog_data.index,goog_data['Adj Close'] ):
    date=line[0]
    price=line[1]
    price_information={'date' : date,
                       'price' : float(price)
                      }
    eb.process_data_from_yahoo( price_information['price'] )
    eb.process_events()

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第25张图片

fig = plt.figure( figsize=(10,6) )
plt.plot( eb.ts.list_paper_total, label="Paper Trading using Event-Based BackTester" )
plt.plot( eb.ts.list_total, label="Trading using Event-Based BackTester" )
plt.legend()
plt.show()

 t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第26张图片

     The chart reminds us of the importance of having a fast system. If we place an order, in most cases, the order is rejected. This will negatively impact the profit and loss of the trading strategy.

 Advantages

     Because we use all the components, we will have a result that more closely corresponds to reality. One of the critical components is the market simulator ( MarketSimulator.py ). This component must have the same market assumptions. We can add the following parameters to the market simulator:

  • Latency to send an acknowledgement
  • Latency to send a fill
  • An order filling condition
  • A volatility filling condition
  • A market making estimate

The advantages of the event-based backtester are as follows:

  • Look-ahead bias elimination—since we receive events, we cannot look at the data ahead.
  • Code encapsulation[ɪnˌkæpsjuˈleɪʃn]封装—because we use objects for the different parts of the trading system, we can just change the behavior of our trading system by changing the objects. The market simulation object is one such example.
  • We can insert a position/risk management system and check whether we do not go against the limit.

Disadvantages

     Even if the advantages are numerous, we need to consider that this type of event-based system is difficult to code. Indeed, if there are threads in the trading system, we will need to make this thread deterministic. For instance, let's assume the trading system takes care of timing out if an order doesn't get a response within 5 seconds. The best practice to code this functionality would be to have a thread counting 5 seconds and then timing out超时. If we use the thread in backtesting, the time shouldn't be the real time because when we read the tick, the time will be the simulated time

     Additionally, it requires a lot of handling, such as log management, unit testing, and version control. The execution of this system can be very slow.

Evaluating what the value of time is

     As we saw in the previous parts of this chapter, backtester accuracy is critical when we build a trading strategy. The two main components creating discrepancies[dɪˈskrepənsi]差异  between the paper trading of your trading strategy and the actual performance are as follows:

  • The market behavior that we face when the trading strategy goes live
  • The trading system that you use to trade

     We saw that the market impact can be medicated[ˈmedɪkeɪtɪd]缓解 by making assumptions regarding the manner in which the market will respond. This part is very challenging because it is just based on assumptions. As regards the second cause of discrepancies, the trading system itself, we can find an easy solution. We will be able to use the trading system as it is to be the backtester. We will get all the main trading components together and we will have them communicate between one another as if they were in production.

     When we use the time in production, we can get the time from the computer's clock. For instance, we can stamp a book event coming to the trading strategy by just getting the time from the function now coming from the datetime module in Python. By way of another example, suppose we place an order. Because it is unsure whether the market will respond to this order, we will use a timeout system. This timeout system will call a function after a given period of time if no acknowledgement has been received by the trading system from the market. To accomplish this operation, we usually spawn[spɔːn]产卵;大量生产 a thread counting the number of seconds up to the timeout time. When counting, if the state of the order has not changed to acknowledge the order, this thread will call a callback function, onTimeOut . This callback will have the role of handling what should occur when an order timed out on the market. If we want to mock[mɑːk]模仿,仿制 the timeout system in the backtester, this is going to be more challenging. Because we cannot use the real-time clock of the machine to count to the timeout time, we will need to use a simulated clock during the whole process.

     The following diagram shows how the backtester will work with the new simulated clock component handling the time. Each time a component needs to get the time, it will call a function, getTime. This function will return the simulated time (being the time of the last tick read by the LiquidityProvider class):
t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第27张图片
1. We will implement the Simulated Clock function ( SimulatedRealClock class). Each time the trading system is started in backtest mode, we will use the SimulatedRealClock class with the simulated=True argument.

  • If the trading system runs in real time to place orders on the market, the SimulatedRealClock class will be created without arguments or with the simulated=False argument.
  • When the time is given by a simulated time, the time will come from the order timestamps:

simulatedclock.py

from datetime import datetime

class SimulatedRealClock:
    def __init__(self, simulated=False):
        self.simulated = simulated
        self.simulated_time = None
        
    def process_order(self, order):  # strptime(date_string, format)         
        self.simulated_time = datetime.strptime( order['timestamp'], '%Y-%m-%d %H:%M:%S.%f' )
        
    def getTime(self):
        if not self.simulated:
            return datetime.now()
        else:
            return self.simulated_time

realtime = SimulatedRealClock()
print( realtime.getTime() )

simulatedtime = SimulatedRealClock( simulated=True )
simulatedtime.process_order( {'id':1,
                              'timestamp':'2018-06-29 08:15:27.243860'
                             }
                           )
print( simulatedtime.getTime() )

 

     When coding a trading system, when you need the value of time, you will always need to use a reference to the SimulatedRealClock class and use the value returned by the getTime function.

2. In the following code, we will see the implementation of an order management system timing out 5 seconds after sending an order. We will first show you how to create a TimeOut class counting to the timeout value and calling a function when a timeout occurs. This TimeOut class is a thread. It means that the execution of this class will be concurrent to the main program. The arguments to build this class are the SimulateRealClock class, the time considered as the timeout time, and a function that will be called as a callback, fun . This class will run a loop as long as the current time is not older than the time to stop the countdown倒计时. If the time is higher and the TimeOut class has not been disabled, the callback function will be called. If the TimeOut class is disabled because the response to the order arrived in the system, the callback function will not be called. We can observe that we will compare the time to stop the timer with the current time by using the getTime function from the SimulatedRealClock class: 

omstimeout.py

from chapter9.simulatedclock import SimulatedRealClock
import threading
from time import sleep
from datetime import datetime, timedelta

class TimeOut( threading.Thread ):
    def __init__( self, sim_real_clock, time_to_stop, fun ):
        super().__init__()
        self.sim_real_clock = sim_real_clock
        self.time_to_stop = time_to_stop
        self.callback = fun
        self.disabled = False
        
    def run(self):
        
        # the TimeOut class has not been disabled
        # the current time is not older than the time to stop the countdown
        while not self.disabled and self.sim_real_clock.getTime() < self.time_to_stop:
            print('TimeOut is sleep!')
            sleep(1)
            
        print('TimeOut is alive:',self.is_alive())    
        if not self.disabled: # and the status of the thread is alive
            self.callback()

3. The following OMS class that we will implement is just a small subset of what the order manager service can do. This OMS class will be in charge of sending an order. Each time an order is sent, a 5-second timeout will be created(timedelta(0,5)). This means that the onTimeOut function will be called if the OMS does not receive a response to the order placed on the market. We can observe that we build the TimeOut class by using the getTime function from the SimulatedRealClock class:

omstimeout.py

class OMS:
    def __init__( self, sim_real_clock ):
        self.sim_real_clock = sim_real_clock
        # datetime.timedelta( days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0 )
        self.five_sec_order_time_out_management = TimeOut( sim_real_clock, 
                                                           sim_real_clock.getTime()+timedelta(0,5),
                                                           self.onTimeOut # only the function without ()
                                                          # self.onTimeOut() : appear TypeError: 'NoneType' object is not callable
                                                         )
        
    def send_order( self ):
        self.five_sec_order_time_out_management.disabled = False
        # Once a thread object is created, its activity must be started by calling the thread’s start() method.
        # This invokes the run() method in a separate thread of control.
        self.five_sec_order_time_out_management.start() # TimeOut.run()
        print('send order')
        
    def receive_market_reponse( self ):
        self.five_sec_order_time_out_management.disabled = True
        
    def onTimeOut( self ): # callback
        print( 'Order Timeout Please Take Action!' )

 When we run the following code to verify whether that works, we create two cases:

  • Case 1: This will use the OMS in real time by using SimulatedRealClock in real-time mode.
  • Case 2: This will use the OMS in simulated mode by using SimulatedRealClock in simulated mode.

4. In the following code, Case 1 will trigger a timeout after 5 seconds, and Case 2 will trigger a timeout when the simulated time is older than the time to trig the timeout

if __name__ == '__main__':
    
    print('case 1: real time')
    simulated_real_clock=SimulatedRealClock()# now+timedelta(0,5)
    
    oms=OMS(simulated_real_clock) 
    oms.send_order() # self.five_sec_order_time_out_management.disabled = False
    for i in range(10):
        print( ' do something else: %d' % (i) )
        sleep(1)

    print('\ncase 2: simulated time')
    simulated_real_clock=SimulatedRealClock(simulated=True)
    simulated_real_clock.process_order( {'id' : 1,
                                         'timestamp' : '2018-06-29 08:15:27.243860'
                                        }
                                      )
    oms = OMS(simulated_real_clock)# getTime return simulated_time : '2018-06-29 08:15:27.243860' # then + 5s
    oms.send_order() # self.five_sec_order_time_out_management.disabled = False
    simulated_real_clock.process_order( {'id': 1, 
                                         'timestamp': '2018-06-29 08:20:27.243860'
                                        }
                                      )

t9_Creating a Backtester in Python_hdf_h5py_tables_dataframe_PostgreSQL_thread_no attribute ‘cursor‘_第28张图片当线程是激活(alive),TimeOut将继续工作,sleep之后就是线程死亡,所以没有调用输出(Order Timeout Please Take Action!)
     When we use a backtester as a trading system, it is very important to use a class capable of handling simulation and real time. You will be able to achieve better accuracy by using the trading system and you will build better confidence in your trading strategy.

Summary

     In this chapter, we highlighted how important backtesting is. We talked about two sorts of backtesters: a for-loop backtester, and an event-based backtester. We showed the two main differences and we implemented an example of both. This chapter concludes the creation path of a trading strategy. We initially introduced how to create a trading strategy idea, and then we explained how to implement a trading strategy. We followed that by explaining how to use a trading strategy in a trading system and then we finished our learning experience by showing how we can test a trading strategy.

     In the next chapter, we will conclude this book by talking about your next steps in the algorithmic trading world.

你可能感兴趣的:(python,big,data,大数据)