A score scatter plot with a decision threshold, illustrating how to check whether higher scores map to better trade outcomes.
Look for rank order first. If the best scores do not line up with better outcomes, the signal is weak.

Start with a baseline

If the signal cannot beat a simple baseline, stop there. A ridge model or a plain linear regressor gives you a clean reference point. If the fancy model only wins on paper, the baseline is telling you what the market already knew.

Check chronology

Use walk-forward validation or at least a chronological split. Random shuffles make the score look cleaner than it is. If performance collapses as time moves forward, the signal was probably keyed to one regime.

Read the score, not just the average

A useful signal should rank better opportunities higher, not just print one good summary number. Check the score distribution, the hit rate in the top buckets, and whether the result stays stable across folds.

Then test the trade rule

A score becomes a signal only when it crosses a threshold and turns into a trade. That cutoff should be tied to the frequency you actually want, then checked again after costs and slippage are applied.

The blunt rule

If the signal only works when the split, threshold, or cost model is flattering it, the signal is not ready.

Common mistakes

  • Judging only the headline score. That hides weak rank order and fragile thresholds.
  • Skipping the baseline. If a simpler model gets the same result, use the simpler model.
  • Ignoring friction. Gross performance is not tradable performance.