Why the linear model matters
A linear regressor is not a toy. It is a speed test for your features. If the linear model gets most of the way there, the model family probably mattered less than the signal itself. That is useful information.
Where trees earn their keep
Trees help when the relationship is lumpy. Thresholds, regime changes, and feature interactions are where boosting can pick up extra edge. But that extra edge comes with a cost: more knobs, more chances to overfit, and more ways to fool yourself with validation noise.
- Linear models reward clean features.
- Boosted trees reward messy relationships.
- Both can fail on the same bad target.
How to compare them properly
Use the same feature set, the same train/validation split, the same cost assumptions, and the same trade rules. If one model gets easier data or a friendlier threshold, the comparison is fake.
In practice, the right question is not whether the tree has a higher score. It is whether the extra complexity survives out of sample after the linear baseline has already taken its best shot.
The blunt rule
If the linear model can do most of the job, the tree has to earn the rest. If it cannot, stop tuning and fix the signal.
A worked comparison
Suppose both models see the same lagged returns, volatility, and volume features. The linear baseline might get a modest but stable score across folds. The boosted tree may lift the headline metric, but if the lift disappears after costs or in later windows, the extra complexity was mostly noise.
That is the clean test: same inputs, same split, same costs, same trade rule. Anything else is a vanity comparison.
Common mistakes
- Giving the tree more help. Different features make the comparison meaningless.
- Judging only raw score. A lift that vanishes after costs is not a lift.
- Tuning the baseline less. The linear model deserves the same care.