Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great write-up, thank you. Do you have rough measures for what constitutes high/mid/low- dimensional data? And how do you use XGBoost et al for multi-step forecasting, I.e. in scenarios where you want to predict multiple time steps in the future?


Because they're so cheap to train, you can just use n models if you want to predict n steps ahead.

In sklearn, if you have a single-output regressor, use this for ergonomics: https://scikit-learn.org/stable/modules/generated/sklearn.mu...

The added benefit is that you optimize each regressor towards its own target timestep t+1 ... t+n. A single loss on the aggregate of all timesteps is often problematic


There's been recent advances in joint fitting of multi-output regression forests ("vector leaf"): https://xgboost.readthedocs.io/en/stable/tutorials/multioutp...

In theory, this might suit the multi-step forecast use case.


I've found that it works well to add the prediction horizon as a numerical feature (e.g. # of days), and them replicate each row for many such horizons, while ensuring that all such rows go to the same training fold.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: