moritzmeister's comments

moritzmeister · on May 4, 2021

A fantastic ML talk that I think gives a good idea about where ML products should be heading.

moritzmeister · on Feb 10, 2021

What is the project about, how will this improve curl and Hyper, how was it done, what lessons can be learned, what more can we expect in the future and how can newcomers join in and help?

Participating speakers in this webinar are:

Daniel Stenberg. Founder of and lead developer of curl.

Josh Aas, Executive Director at ISRG / Let’s Encrypt.

Sean McArthur, Lead developer of Hyper.

moritzmeister · on Dec 7, 2020

We also think there should be more Ablation Studies in deep learning research. To make it easier, we are working on automating the process of disabling components of models with a framework called Maggy. Pass in your Keras Model and specify which layes/components to mask out and we will generate all models and train them in parallel. https://databricks.com/session_eu20/parallel-ablation-studie... https://github.com/logicalclocks/maggy

moritzmeister · on Oct 23, 2020

One of the developers of the Hopsworks Feature Store here. We wrote this article because we keep getting the question how a feature store is different from a key value store. If we missed out on anything or got anything wrong, please let us know here.

moritzmeister · on Oct 8, 2020

This looks like a highly specialised tool. How is it going to integrate with a Data Scientist's favourite tools, such as Jupyter notebooks, Pandas or Spark and especially ML frameworks like TensorFlow, SkLearn etc.?

maycotte · on Oct 8, 2020

Great question. Right now we have pushed hard to make SQL the primary interface to our feature store and can output Pandas and other formats at query time, however we are working on integrated/hosted Jupyter notebooks and excited to continue collaborating with the community on better feature first/oriented endpoints/interfaces. There is so much room to innovate here.

moritzmeister · on Aug 25, 2020

And as a Data Scientist, my first thought was it is a replacement for Jupyter Notebooks :)

moritzmeister · on Oct 27, 2019

Too bad that they don't provide any reasoning for this labelling. Also it isn't clear to me over which time span they were looking at individuals to make the label decision, i.e. are they considering sessions or just single clicks with subsequent bookings?

moritzmeister · on April 18, 2019

I am working on a little python framework to efficiently distribute hyperparameter search on a Spark cluster. We haven't released the first version yet but will do so in the next two weeks. https://github.com/logicalclocks/maggy

A limitation of existing hyperparameter search algorithms is that they are typically stage or generation-based. For example, if genetic algorithms are used for hyperparameter search, one has to wait for all models to finish in order to generate a new generation of potential parameters from the best performing individuals. However, some instances will have suboptimal parameters during a given iteration and will know quickly during the training that they can stop early. Hence, the early stopped machine can’t be provided with a new set of parameters early but is instead idle.

Compared to stage-based algorithms like genetic optimization algorithms, maggy (the framework) will support asynchronous algorithms, that are able to provide new candidate sets of parameters as soon as a worker finishes evaluating a combination and does not have to wait until all models in one stage finish. For this to be possible, we establish communication between the driver and executors in Spark. The driver will then collect performance metrics during training which enables us to stop badly performing models early during training and reassigning the executor task with a new, more promising set of parameters (new trial) right away, instead of waiting for a stage to finish.

jjn2009 · on April 18, 2019

Nice. I've been using hyperopt with random search, I'll definitely check out your work.