I do this at my job.. Disclaimer: I’m a web dev (“architect”) who does some lightweight data engineering tasks to facilitate views in some of my apps.
My pipelines are very simple (no DAG-like dependencies across pipelines). I could just have separate scripts, but instead I have a monorepo of pipelines that implement an interface with Extract, Transform, Load methods. I run this as a single process that runs pipelines on a schedule and has an HTTP API for manually triggering pipelines.
At some point I felt guilty that I am doing something nobody else seems to do, and that I had rolled my own poor-man’s orchestrator. I played around with Dagster and it was pretty nice but I decided it was overkill for my needs (however I definitely think the actual data analysis team at my company should switch from Jenkins to Dagster heh…)
On a separate note, all of my pipelines Load into Elasticsearch, which I’m using as a data warehouse. I’ve realized this is another unconventional decision I’ve made, but it also seems to work well for my use-cases.
My pipelines are very simple (no DAG-like dependencies across pipelines). I could just have separate scripts, but instead I have a monorepo of pipelines that implement an interface with Extract, Transform, Load methods. I run this as a single process that runs pipelines on a schedule and has an HTTP API for manually triggering pipelines.
At some point I felt guilty that I am doing something nobody else seems to do, and that I had rolled my own poor-man’s orchestrator. I played around with Dagster and it was pretty nice but I decided it was overkill for my needs (however I definitely think the actual data analysis team at my company should switch from Jenkins to Dagster heh…)
On a separate note, all of my pipelines Load into Elasticsearch, which I’m using as a data warehouse. I’ve realized this is another unconventional decision I’ve made, but it also seems to work well for my use-cases.