More

bweber · on May 9, 2020

This article doesn't even mention one of my favorite tools on GCP which is Dataflow. It's really easy to build batch and streaming workflows that can be used for data and machine learning pipelines. It really shows how well the different services within GCP are integrated, since it's trivial to set up PubSub or BigQuery as sources and sinks.

bweber · on Jan 12, 2020

Sounds like the author is using a similar stack to me, markdown -> PDF, epub. I wrote a bit more about the writing process and consideration for traditional publishers here: https://news.ycombinator.com/item?id=22027026

bweber · on Jan 5, 2020

Please leave a review if you purchased on Amazon.

bweber · on Jan 5, 2020

For paper size, I decided to use 6x9 from the start. I also didn't consider the epub format until the end, and then used the "print replica" feature on Kindle Direct to create a kindle version, which lacks text resizing and a few other features. Once I settled on a page size, I wrote each chapter independently and made sure to avoid any widowed text or code samples. I decided that orphaned text would be fine, given that the size of the page is relatively small.

I didn't really need to write any scripts, beyond using the sample bookdown project. I did use a custom book class and made some tweaks for the code formatting, but these were mostly Latex changes.

bweber · on Jan 5, 2020

Virtual environments are useful when setting up a single machine, but many of the tools covered in the book do not directly support venv, such as Lambda functions, Cloud Dataflow, and Databricks. In general, the goal is to get readers to explore tools beyond Conda for setting up environments and dependencies.

Marketing will be a challenge. There's been great reception here, but I expect sales to taper quickly and then paid sponsorship will be necessary to continue generating sales. I'm currently testing out Amazon Advertising, but I don't seem to have bids high enough to get to my target budget.

bweber · on Jan 5, 2020

Thanks. I originally planned on covering more topics related to DevOps, such as CI/CD for model deployment, but felt that this might be a bit of a stretch for some readers, and it's any area where I have less experience. Glad to here it's useful from a full-stack perspective.

bweber · on Jan 5, 2020

Sounds like there is a few requests for this, so I'll look to authoring a post on this. And also talk about motivation for going the self-publishing route.

You can follow me on Medium for this update: https://medium.com/@bgweber

lifeisstillgood · on Jan 6, 2020

thank you - look forward to it

bweber · on Jan 5, 2020

Here's the complete source for the last text I authored using this pipeline: https://github.com/bgweber/StartupDataScience/tree/master/bo...

You can use the same tooling to create an epub output, but the formatting will be substantially different.

bweber · on Jan 5, 2020

I might do one of the trade-offs between self-publishing and working with a publisher. With Kindle Direct Publishing, it's pretty straight forward. It's more about the tooling to produce the text and the process of using Kindle Direct.

bweber · on May 16, 2019

There are approaches for using Spark to distribute hyperparameter tuning and cross validation: https://databricks.com/blog/2016/02/08/auto-scaling-scikit-l...

However, for the example in this post, I would recommended using the logistic regression provided by MLlib to scale up.