The introduction of Two Sigma(Using News to Predict Stock Movements) AI competition at Kaggle

The introduction of Two Sigma(Using News to Predict Stock Movements) AI competition at Kaggle

Can we use the content of news analytics to predict stock price performance? The ubiquity of data today enables investors at any scale to make better investment decisions. The challenge is ingesting and interpreting the data to determine which data is useful, finding the signal in this sea of information. Two Sigma is passionate about this challenge and is excited to share it with the Kaggle community.

The most exciting thing about this competition is, the Kaggle is now using the submitted model to do real time future market prediction, which is really meaningful for the verification of the possibility that AI can understand market more better than human been? Like AI understands the game of GO much better than top 1 chess player.

Two Sigma Investments LP is a New York City-based hedge fund that uses a variety of technological methods, including artificial intelligence, machine learning, and distributed computing, for its trading strategies. The firm is run by John Overdeck and David Siegel. (from Wikipedia)
Assets under management: 51 billion USD (2017)
The introduction of Two Sigma(Using News to Predict Stock Movements) AI competition at Kaggle_第1张图片

Data for this competition comes from the following sources:

  1. Market data provided by Intrinio.
  2. News data provided by Thomson Reuters. Copyright ©, Thomson Reuters, 2017. All Rights Reserved.
    Use, duplication, or sale of this service, or data contained herein, except as described in the Competition Rules, is strictly prohibited.

Evaluation

In this competition, you must predict a signed confidence value, y^ti∈[−1,1] , which is multiplied by the market-adjusted return of a given assetCode over a ten day window. If you expect a stock to have a large positive return–compared to the broad market–over the next ten days, you might assign it a large, positive confidenceValue (near 1.0). If you expect a stock to have a negative return, you might assign it a large, negative confidenceValue (near -1.0). If unsure, you might assign it a value near zero.

For each day in the evaluation time period, we calculate:
在这里插入图片描述
y confidence value indicates the possibility of the stock price rise or drop, if 100% confidence of rising, the value will be 1, which multiplies with r(return: 10th day’s close price - 0th day’s close price) as a result fully utilize the r value, like wise 100% confidence of dropping, the value will be -1, which multiplies with the minors r then still plus value.

u means the stock is available for transaction or not during the particular date, for example, if google was suspended on 24th Jan 2019, then the value will be 0 so that it will not be calculated at all

where rti is the 10-day market-adjusted leading return for day t for instrument i, and uti is a 0/1 universe variable (see the data description for details) that controls whether a particular asset is included in scoring on a particular day.

Your submission score is then calculated as the mean divided by the standard deviation of your daily xt values:
在这里插入图片描述
Why 2Sigma/Kaggle is using average of daily x to divide the standard division? Because by this way, they can select out the best model not only with good return overall, but also with considering of a steady good daily performance.

Market Data

The marketdata contains a variety of returns calculated over different timespans. All of the returns in this set of marketdata have these properties:

  1. Returns are always calculated either open-to-open (from the opening time of one trading day to the open of another) or close-to-close (from the closing time of one trading day to the open of another).
  2. Returns are either raw, meaning that the data is not adjusted against any benchmark, or market-residualized (Mktres), meaning that the movement of the market as a whole has been accounted for, leaving only movements inherent to the instrument.
  3. Returns can be calculated over any arbitrary interval. Provided here are 1 day and 10 day horizons.
  4. Returns are tagged with ‘Prev’ if they are backwards looking in time, or ‘Next’ if forwards looking.

Within the marketdata, you will find the following columns:
5. time(datetime64[ns, UTC]) - the current time (in marketdata, all rows are taken at 22:00 UTC)
6. assetCode(object) - a unique id of an asset
7. assetName(category) - the name that corresponds to a group of assetCodes. These may be “Unknown” if the corresponding assetCode does not have any rows in the news data.
8. universe(float64) - a boolean indicating whether or not the instrument on that day will be included in scoring. This value is not provided outside of the training data time period. The trading universe on a given date is the set of instruments that are avilable for trading (the scoring function will not consider instruments that are not in the trading universe). The trading universe changes daily.
9. volume(float64) - trading volume in shares for the day
10. close(float64) - the close price for the day (not adjusted for splits or dividends)
11. open(float64) - the open price for the day (not adjusted for splits or dividends)
12. returnsClosePrevRaw1(float64) - see returns explanation above
returnsOpenPrevRaw1(float64) - see returns explanation above
returnsClosePrevMktres1(float64) - see returns explanation above
returnsOpenPrevMktres1(float64) - see returns explanation above
returnsClosePrevRaw10(float64) - see returns explanation above
returnsOpenPrevRaw10(float64) - see returns explanation above
returnsClosePrevMktres10(float64) - see returns explanation above
returnsOpenPrevMktres10(float64) - see returns explanation above
13. returnsOpenNextMktres10(float64) - 10 day, market-residualized return. This is the target variable used in competition scoring. The market data has been filtered such that returnsOpenNextMktres10 is always not null.

Current Leading Board

The introduction of Two Sigma(Using News to Predict Stock Movements) AI competition at Kaggle_第2张图片
Kaggle also filtered out all the score more than 1.51, which is considered as cheating with unreasonable result. Since now, all the submitted models are testing with future data, let us monitor this leading board diligently, we are waiting for the result

leading board link:
https://www.kaggle.com/c/two-sigma-financial-news/leaderboard

你可能感兴趣的:(AI,Machine,Learning,and,Predictive)