Tuesday, May 27, 2014

Backtesting dilemmas: pyalgotrade review

Ok, moving on to the next contestant: PyAlgoTrade

First impression: actively developed, pretty good documentation, more than enough feautures ( TA indicators, optimizers etc) . Looks good, so I went on with the installation which also went smoothly.

The tutorial seems to be a little bit out of date, as the first command yahoofinance.get_daily_csv() throws an error about unknown function. No worries, the documentation is up to date and I find that the missing function is now renamed to yahoofinance.download_daily_bars(symbol,year,csvFile). The problem is that this function only downloads data for one year instead of everything from that year to current date. So pretty useless.
After I downloaded the data myself and saved it to csv, I needed to adjust the column names because apparently pyalgotrade expects Date,Adj Close,Close,High,Low,Open,Volume to be in the header. That is all minor trouble.

Following through to performance testing on an SMA strategy that is provided in the tutorial. My dataset consists of 5370 days of SPY:

%timeit myStrategy.run()
1 loops, best of 3: 1.2 s per loop

That is actually pretty good for an event-based framework.

But then I tried searching documentation for functionality needed to backtest spreads and multiple asset portfolios and just could not find any. Then I tried to find a way to feed pandas DataFrame as an input to a strategy and it happens to be not possible, which is again a big disappointment. I did not state it as a requirement in the previous post, but now I come to realisation that pandas support is a must for any framework that works with time series data. Pandas was a reason for me to switch from Matlab to Python and I never want to go back.

Conclusion pyalgotrade does not meet my requrement for flexibility. It looks like it was designed with classic TA in mind and single instrument trading. I don’t see it as a good tool for backtesting strategies that involve multiple assets, hedging etc.

Monday, May 26, 2014

Backtesting dilemmas

A quantitative trader faces quite some challenges on a way to a successful trading strategy. Here I’ll discuss a couple dilemmas involved in backtesting. A good trading simulation must :
  1. Be good approximation of the real world. This one is of course the most important requirement .
  2. Allow unlimited flexibility: the tooling should not stand in the way of testing out-of-the-box ideas. Everything that can be quantified should be usable.
  3. Be easy to implement & maintain. It is all about productivity and being able to test many ideas to find one that works.
  4. Allow for parameter scans, walk-forward testing and optimisations. This is needed for investigating strategy performance and stability depending on strategy parameters.
The problem with satisfying all of the requirements above is that #2 and #3 are conflicting ones. There is no tool that can do everything without the cost of high complexity (=low maintainablity). Typically, a third party point-and-click tool will severely limit freedom to test with custom signals and odd portfolios, while at the other end of the spectrum a custom-coded diy solution will require tens or more hours to implement with high chances of ending up with cluttered and unreadable code. So in attempt to combine the best of both worlds, let’s start somewehere in the middle: use an existing backtesting framework and adapt it to our taste.
In the following posts I’ll be looking at three possible candidates I’ve found:
  • Zipline is widely known and is the engine behind Quantopian
  • PyAlgotrade seems to be actively developed and well-documented
  • pybacktest is a light-weight vector-based framework with that might be interesting because of its simplicity and performance.
I’ll be looking at suitability of these tools benchmarking them against a hypothetical trading strategy. If none of these options fits my requirements I will have to decide if I want to invest into writing my own framework (at least by looking at the available options I’ll know what does not work) or stick with custom code for each strategy.
First one for the evaluation is Zipline.
My first impression of Zipline and Quantopian is a positive one. Zipline is backed by a team of developers and is tested in production, so quality (bugs) should be great. There is good documentation on the site and an example notebook on github .
To get a hang of it, I downloaded the exampe notebook and started playing with it. To my disappointment I quickly run into trouble at the first example Simplest Zipline Algorithm: Buy Apple. The dataset has only 3028 days, but running this example just took forever. Here is what I measured:
dma = DualMovingAverage()
%timeit perf = dma.run(data)

1 loops, best of 3: 52.5 s per loop
I did not expect stellar performance as zipline is an event-based backtester, but almost a minute for 3000 samples is just too bad. This kind of performance would be prohibitive for any kind of scan or optimization. Another problem would arise when working with larger datasets like intraday data or multiple securities, which can easily contain hundreds of thousands of samples.
Unfortunately, I will have to drop Zipline from the list of useable backtesters as it does not meet my requirement #4 by a fat margin.
In the following post I will be looking at PyAlgotrade.
Note: My current system is a couple of years old, running an AMD Athlon II X2 @2800MHZ with 3GB of RAM. With vector-based backtesting I’m used to calculation times of less than a second for a single backtest and a minute or two for a parameter scan. A basic walk-forward test with 10 steps and a parameter scan for 20x20 grid would result in a whooping 66 hours with zipline. I’m not that paitient.