Only read this post, if you want to digg deep into AI Factor models.
Interesting Paper: Design choices, machine learning, and the cross-section of stock returns —> https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5031755
Learnings:
“the abnormal return relative to the market (RET-MKT) exhibits the highest portfolio returns”
Use relative strength as Target (for example 6 Month relative strength) if you aim total return
Use CAPM beta-adjusted as a Target if you aim for low risk (agree 100%, if you define Sortino1Y as target, the ML will capture low vola features and spit out a low vola AI Factor Portfolio Strategy).
“…the recommended target variable depends on the prediction goals.”
“If the aim is to forecast higher relative raw returns, as is common in cross-sectional stock return studies, the abnormal return relative to the market is more suitable than the excess return over the risk-free rate.” “Conversely, if the goal is to achieve high market-risk adjusted returns, CAPM beta-adjusted returns are preferable, as the feature importance shows that this target effectively captures the low-risk effect.”
”Finally, non-linear ML models significantly outperform linear OLS models only when using abnormal returns relative to the market as the target variable, employing continuous target returns, or adopting expanding training windows.”
“We document that the composite non-linear model (ENS ML) outperforms the linear model (OLS) only under the following conditions: (i) the target variable is defined as the abnormal return over the market (Target = RET-MKT), (ii) the target variable is a continuous return (Target Transformation = Raw), and (iii) an expanding training window (Window = Expanding) is used.”Agree —> but only on mid to big caps!
P123 AI Factor Model: Relative strength as target (6MRel in P123 terms) + Extra Trees as ML (= non-linear!), + Target Variable = Date in conjunction with Rank as preprocessor, Predictor Trained from 2005 - 2019, then 5 Year OOS Backtest of the predictor (extra trees fast 1) —>On small caps (excluded in the paper!) —> Total return as target (3 Month!)+ LightGBM I - III gives better results, than 3-6 Month relative return as target!
P123 AI Factor Model:Predictor Trained from 2005 - 2019, then 5 Year OOS Backtest of the predictor (LigthGBM III) —>
Study: most important Features (Factors): “trend factor (TrendFactor), momentum (Mom12m), beta (Beta), short-term reversal (STreversal), and analyst earnings revisions (Analyst Revisions)” —> most important factors
I would add forward earnings yield, EPS Estimate Variability CQ, unlevered free cash flow to EV, MktCap, ROA%Q (all stuff from the Pre-defined Ranking System “Small and Micro Cap Focus”).
Other important features:
CurQEPSStdDev/abs( CurQEPSMean) —> very important!
CurFYEPSMean/Price
(SMA(150,21)-SMA(150,252))/ATRN(150)
(OperCashFlTTM - CapExTTM + (1-TaxRate%TTMInd/100)*IntExpTTM)/EV
$CYCVE
(OperCashFlTTM - CapExTTM + (1-TaxRate%TTMInd/100)*IntExpTTM)/EV
EBITDAYield
#AnalystsPriceTarget
Beta1Y
Surprise%Q1
Daily Volatility 6M —> on small cap system, ML will capture it and build a low vola system!
Daily Volatility 1Y —> on small cap system, ML will capture it and build a low vola system!
Price Standard Deviation 1Y —> on small cap system, ML will capture it and build a low vola system!
But —> “Therefore, feature pre-selection had little improvement, and [non linear] machine learning algorithms can effectively disregard redundant features.”
Not my experience on Small Caps, if you add low volatility Features, the ML will capture them and build less aggressive lower vola systems.Most important: “In contrast, post-publication adjustments, feature selection, and training sample size have minimal impact on the outperformance of non-linear models.” Post publication factors do much less well after they have been documented, non linear ML Models can mitigate that effect (THIS IS A MONSTER LEARNING!)
Use long training periods (on validation and predictor training!)
”Based on our results, an expanding window is superior, in particular for methods that allow non-linearities and interactions.”
“These findings indicate that more complex [e.g. non linear] machine-learning models require larger training datasets to robustly capture non-linearities and interactions in the data.”•“…that aligning the training sample with the evaluation sample is sufficient.”
e.g. it is fine to train on the SP500 and then run the system on the SP500 Universe (I agree, did not have success otherwise).
All right, here you have it.
All the best in 2025 and best regards.
Andreas