This approach makes sense for predicting the data. Obviously, one could split th... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mrbonner on May 16, 2019 \| parent \| context \| favorite \| on: Scalable Python Code with Pandas UDFs This approach makes sense for predicting the data. Obviously, one could split the data to run distributed prediction. But, how does this work for training the linear model mentioned here with scikit-learn in a distributed fashion?

bweber on May 16, 2019 [–]

There are approaches for using Spark to distribute hyperparameter tuning and cross validation: https://databricks.com/blog/2016/02/08/auto-scaling-scikit-l...

However, for the example in this post, I would recommended using the logistic regression provided by MLlib to scale up.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact