Raw OHLCV bars flowing into rolling features, a model score, and an activation threshold before trades are taken.
The pipeline only works when feature lag and the threshold stay in the right order.

Synthetic data generation

Each scenario is based on a statistical model fitted to real historical data. The generator generates data based on that model every time you press "New price path."

Returns are based on a GARCH-family volatility process: each bar's variance depends on the previous bar's variance. On top of that sits a small autoregressive drift, so consecutive returns are not fully independent.

Prices are constructed bar-by-bar. Volume is generated by another autoregressive model whose level depends on return magnitude and current volatility, with intraday session shaping applied for sub-daily frequencies. Timestamps follow a business-day calendar with realistic session hours.

The result is a full OHLCV path with the goal of preserving fat tails, volatility clustering, volume-volatility correlation, and gap structure of the original market. Each scenario has its own fitted model, so energy futures behave differently to a growth equity.

Feature engineering

From the generated (or imported) OHLCV bars, the site computes a library of rolling features at each point:

  • Returns
  • Momentum - cumulative returns over a period
  • Realized volatility - standard deviation of returns
  • Volume z-scores - measuring how unusual current volume is relative to its recent history
  • Range z-score - how the current bar's high-low range compares to its recent distribution
  • Distance from moving averages expressed as a ratio
  • Calendar signals - sine and cosine encodings based on: time-of-day for intraday bars; day-of-year for daily bars; and week-of-year for weekly bars

You choose which features to include. There are some default feature sets defined. Optional settings let you apply winsorization, or standardization before training.

What the model is predicting

The target is based on triple barriers (vertical time barrier, stop loss, profit target). The site simulates a directional trade from each bar, entering at the next open, with stop and profit-target exits, and measures the gross outcome in risk multiples. The model learns to predict this trade outcome, not the price move. This means the target already embeds the stop distance, reward/risk ratio, and max holding period set in the execution panel.

Training

The primary model is a gradient-boosted regression tree. At each boosting round a shallow decision tree is fitted to the residuals - the gap between the current prediction and the actual target. Each tree's contribution is scaled by a learning rate, so the ensemble builds up gradually. Trees are depth-limited and enforce a minimum leaf size, which keeps individual splits from memorizing noise. Optional L2 regularization further penalizes large leaf values, and a feature fraction setting can restrict each tree to a random subset of features for additional variance control.

The best boosting round is selected by tracking Pearson correlation between predictions and realized trade outcomes on the validation set at every iteration. Because the target is a continuous trade outcome in risk multiples, a regressor is trained rather than a classifier. Only the trees up to and including the best round are used for scoring. If later rounds show declining validation correlation, the site flags the drop as overfitting.

The evaluation pipeline also supports training a simpler linear reference on the same data and feature set. This ridge-penalized regressor uses gradient descent on standardized inputs, which can be used as a baseline check vs. the tree.

Validation

Two validation modes are available.

Single chronological split divides the timeline into roughly 60% train, 20% validation, and 20% test. Only one model is trained using the first block as the training data; the validation block selects the best boosting round and calibrates the activation threshold; the test block provides an unseen evaluation set.

Walk-forward is the default and more realistic option. It reserves 40% of the data for initial training and 5% for initial validation, then rolls forward in 5% steps. At each step the model is retrained on all data up to the new validation window, the best round is re-selected, and the threshold is recalibrated. Predictions for each forward step come from the model that was trained without seeing that step. This mirrors how a live system would periodically retrain on new data.

Threshold calibration

The model outputs a continuous score for every bar. Not every score is worth trading. The site derives an activation threshold from the validation scores: it looks at the distribution of absolute scores and picks a percentile that matches your target trade frequency. Only bars whose absolute score exceeds the threshold generate a trade.

In walk-forward mode the threshold is recalibrated at each retraining step, so it adapts to shifts in score magnitude over time. The diagnostics panel shows the impact of the threshold on net PnL.

Trade simulation

When a bar's score clears the threshold, the site opens a trade in the direction of the score (long for positive, short for negative, or long-only if you restrict the book). Entry is at the open of the next bar.

Each trade has three possible exits:

  • Stop-loss at a configurable distance from entry, measured in ATR or return standard deviation multiples.
  • Profit target at a configurable reward/risk multiple of the stop distance.
  • Timeout if the max trade duration expires before either level is hit, closing at the bar's close price.

For intraday frequencies an optional session-close rule forces exits at the end of the trading day.

All results are expressed in risk-reward multiples (R). A gross result of 3R means the trade captured three times its stop distance. Transaction costs and slippage are deducted as fixed R values per trade, with slippage scaled by exit type: zero for limit-fill profit targets, full for stop-outs, half for timeouts.

Trades within each phase (train, validation, walk-forward / test) are confined to their phase boundaries. A trade opened in the validation window cannot run into the test window.

Diagnostics and stress tests

The site reports several layers of diagnostic output:

  • Signal quality. Rank correlation between scores and realized trade outcomes, and a quintile breakdown showing whether higher scores actually lead to better outcomes.
  • Threshold sensitivity. What happens to trade count and mean R/trade as the threshold is scaled from 0.72x to 1.28x of the calibrated level.
  • Cost sensitivity. Net P&L at 0.5x, 1x, 1.5x, and 2x the configured transaction cost, showing how quickly costs erode the edge.
  • Stress subsets. Performance on bars where realized volatility is in the top third, performance on bars where Amihud illiquidity, |return| / (close x volume), is in the highest third of eligible out-of-sample rows, and performance after injecting noise into the model scores.
  • Phase breakdown. Separate trade counts and gross R for train, validation, and test / walk-forward phases, so you can see how much the edge degrades out of sample.

Seed robustness

Synthetic paths are generated from a random seed. A single seed can produce a lucky or unlucky path. The robustness check reruns the entire pipeline - data generation, training, threshold calibration, and trade simulation - across five different seeds and reports mean R/trade for each. If the result changes significantly across seeds, the edge is not stable.

Monte Carlo resampling

After the main pipeline finishes, the page resamples the realized trade returns with replacement across thousands of iterations. Each iteration draws trades at random (with replacement) from the pool of completed trades and builds a cumulative P&L curve. The 10th, 50th, and 90th percentile final outcomes are reported, along with the actual simulation paths at those percentiles. This gives a distributional view of what the strategy's equity curve might look like if the same kind of trades were repeated in a different order.

What the site does not do

The synthetic price generator is fitted to real market data, but it is still a statistical reproduction. It preserves aggregate properties - volatility clustering, fat tails, volume patterns - but cannot reproduce event-driven moves or structural regime changes that were absent from the fitting window.

The validation set selects both the best boosting round and the threshold. This double use means validation metrics carry a mild upward bias. Walk-forward mode reduces this by recalibrating at each step.

Slippage and transaction costs are flat approximations. Real execution costs vary with order size, time of day, venue, and market conditions. The site's cost model is enough to teach the relationship between edge and friction, but should not be mistaken for a realistic execution simulator.

Everything runs client-side in the browser. The boosted tree implementation is hand written, not a production-grade library. It handles the datasets the site generates, but is not optimized for scale.