Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This approach makes sense for predicting the data. Obviously, one could split the data to run distributed prediction. But, how does this work for training the linear model mentioned here with scikit-learn in a distributed fashion?


There are approaches for using Spark to distribute hyperparameter tuning and cross validation: https://databricks.com/blog/2016/02/08/auto-scaling-scikit-l...

However, for the example in this post, I would recommended using the logistic regression provided by MLlib to scale up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: