转载<Portfolio Backtesting: Using SAS® to Generate Randomly Populated Portfolios for Investment Strategy Testing>,author:Xuan Liu, Mark Keintz
Abstract
One of the most regularly used SAS programs at our business school is to assess the investment
returns from a randomly populated set of portfolios covering a student-specified historic period,
rebalancing frequency, portfolio count, size, and type. The SAS program demonstrates how to
deal with dynamically changing conditions, including periodic rebalancing, replacement of
delisted stock, and shifting of stocks from one type of portfolio to another.
The application is a good example of the effective use of hash tables, especially for tracking
holdings, investment returns.
1. What is Backtesting and how does it work?
Backtesting is the process of applying an investment strategy to historical financial information
to asses the results (i.e. change in value). That is, it answers the question “what if I had applied
investment strategy X during the period Y?.
The backtest application developed at the Wharton school, used for instructional rather than
research purposes, is currently applied only to publically traded stocks. Later, in the “Creating a
Backtest” section, we go over the more important considerations in creating a backtest program.
However, as an example, a user might request a backtest of 4 portfolios, each with 20 stocks, for
the period 2000 through 2008. The four portfolios might be from a cross-classification of (a) the
top 20% of market capitalization (and bottom 20%) crossed with (b) top 20% and bottom 20%
of book-to-market ratios. The user might rebalance the stocks every 3 months (i.e. redivide the
investment equally among the stocks) and refill (i.e. replace no-longer eligible stocks) every 6
months.
2. Source file for Backtesting
The source file used for Backtesting is prepared by merging monthly stocks data (for monthly
prices and returns), “event” data (to track when stocks stopped or restarted trading), and annual
accounting (for book equity data) data filed with the SEC. (shown in Figure 2.1).
This yielded a monthly file, with changes in price, monthly cumulative returns, and yearly
changes in book value). Because the data is sorted by stock identifier (STOCK_ID) and DATE,
it allows the calculation of a monthly cumulative return (CUMRET0) for each stock in the
dataset using the single month returns (RETURN), as below. CUMRET0 will be used later to
determine the actual performance of each portfolio.
/*Calculation of monthly cumulative returns */ data monthly_cumreturns; set monthly_file; by stockid date; if first.stock_id then do; if missing(return)=0 then cumret0=(1+return); else cumret0=1; end; else do; if missing(return)=0 then cumret0=cumret0*(1+return); else cumret0=cumret0; end; retain cumret0; run;
Now, as mentioned earlier users may restrict portfolios to specific percentile ranges of variables
like market capitalization (MARKETCAP). These percentiles (deciles in this example) are
generated via PROC RANK for each refill date, as below:
/*Portfolio deciles using the market cap criteria*/ proc sort data=monthly_cumreturns out=source; by date stockid; run; proc rank data=source out=temp group=10; by date; var marketcap; ranks rMarketcap; run;
The resulting dataset looks like this:
3.Create a Backtest
Once the primary file has been created,the backtest can be defined through these parameters:
Structural Parameters:
a.Data range of the investment.
b.Nummber of portfolios and amount invested in each portfolio
c.Number of stock in each portfolio.
d.Rebalancing Frequency:For portfolios designated as "equally weighted"
the stocks in each portfolio are periodically reallocated so they have
equal value.(Portfolios that are "value weighted" are not rebalanced.)
e.Refilling Frequency:The frequency of determining whether a stock still
qualifies for a portfolio(see "Portfolio Criteria" below) and replacing the
stock if it doesn't.
This introduces a number of programming tasks. The primary tasks are:
1. For the start date and each refill date, generate percentiles for the portfolio criteria.
2. At the start date, randomly draw stocks for each portfolio from qualifying stock.
3. Track monthly cumulative return (i.e. cumulative increase or decrease) in the value of each
stock in each portfolio. Each stock is tracked so that rebalancing can be done, if needed.
4. If a stock stops trading at any point, reallocate its residual value to the rest of the portfolio.
5. At every “refill” point, keep all stocks in the portfolio that are still eligible (buy and hold)
and randomly select replacements for all stocks no longer eligible.
By default, all available securities are considered for inclusion in the backtest. The universe can
be filtered by adding one or more screens based on the portfolio criteria (expressed in deciles in
this paper). Multiple portfolios can be created by dividing securities into distinct partitions based
on the value of one or two metrics. For example, using two metrics, book to market and price
with 2 partitions for book-to-market and 3 partitions for price will result in 6 portfolios. Once the
portfolio is constructed, performance of each portfolio will be analyzed.
4. Portfolios are populated by randomly selected securities
During the creation of a backtest, securities within a portfolio are randomly selected, which is
made possible by generation of a random number for each stock_id,
/*randomization of the stocks*/ proc sort data=inds out=outds; by stock_id date; run; %let seed =10; data randomized_stocks / view = randomized_stocks; set outds; by stock_id; retain ru; if first.stock_id then ru=ranuni(&seed); output; run;
inds is the input dataset with one record per stock_id - date. outds is the output dataset with
added random variable sorted by stock_id - date. ru is the random variable generated from the
seed. A constant unique random value is generated for each stock_id. Each call with different
seed will cause a new set of random numbers generated for the stock_ids (See table 4.1).
5. The refill process
People buy and hold securities for a certain period of time. During the holding period, some
stocks may disappear due to delisting or become disqualified using the initial portfolio set up
criteria. In either case, the size of the portfolio shrinks. To bring the portfolio back to its original
size, a refill process is performed on each user specified date.
One possible problem that can distort the refill process is the possibility that a stock can cease
trading (become “delisted”) and later reappear on the market. If the stock retains the same
randomly assigned priority used in the initial sampling then it would be included in the refill
event after its re-entry on the market. In order to avoid this problem we used the following
approach: generate the random number that associates with the date variable and assign a stage
variable to indicate its on-off appearance if any. Whenever the stock reappears, generate a new
random number for that stock. Sort the stock pool by date and random number. When it is the
time for refill, the first nth stocks (n is the number of stocks asked by the user) should be selected
to form the desired portfolio.
/* Randomization procedure used for portfolio Buy & Hold and Refill process*/ data stocks_held(drop=lagdate); set stocks; by stock_id; retain ru stage; lagdate = lag(date); if first.stock_id then do; stage =1; ru = date + ranuni(&seed); end; else if intck('month', lagdate, date)>1 then do; stage = stage +1; ru = date + ranuni(&seed); end; run; proc sort data= stocks_held; by date ru; run;
6. Rebalance
Rebalancing brings your portfolio back to your original asset allocation mix. This is necessary
because over time some of your investments may become out of alignment. Table 6.1 illustrates
a simple example for equal- weighted portfolio with two stocks,