I'm the founder and a core developer of Pachyderm so I can weigh in on how it co...

Eridrus · on May 12, 2018

Random question:

I want to automate some workflows on my local machine, and besides the obvious of just writing a script, I am interested in a system where I could describe my workflow as a DAG and then have an easy (web?) UI where I could specify which DAG nodes have changed (e.g. my data pre-processing code) and have it automatically run all of the nodes that (recursively) depend on it as an input, while not executing those whose inputs have not changed.

I am passing around very little actual data between these jobs; they are mostly writing data to a (distributed) file system, so at most I need to pass some paths around.

Some of the stages require launching a remote job and polling to find out if it has completed.

Is there a good system for doing this? Now that I've described it I could probably hack it together with a command-line UI without too much difficulty, but having a pretty UI for launching and monitoring jobs would be great.