I've had exactly the same experience, it's a nice language but using it for thin...

Lyngbakr · on Aug 13, 2024

Do you have experience implementing ETL pipelines in Go? I think it'd be a better fit for us over our current language, but I'm curious to hear from people who've actually done it.

physicsguy · on Aug 13, 2024

Yes. It works fairly well. With that said, I've got a feeling that life would be a lot easier to change around if we weren't using it, we end up writing a lot of code to do relatively simple things.

brushyamoeba · on Aug 13, 2024

I do this at my job.. Disclaimer: I’m a web dev (“architect”) who does some lightweight data engineering tasks to facilitate views in some of my apps.

My pipelines are very simple (no DAG-like dependencies across pipelines). I could just have separate scripts, but instead I have a monorepo of pipelines that implement an interface with Extract, Transform, Load methods. I run this as a single process that runs pipelines on a schedule and has an HTTP API for manually triggering pipelines.

At some point I felt guilty that I am doing something nobody else seems to do, and that I had rolled my own poor-man’s orchestrator. I played around with Dagster and it was pretty nice but I decided it was overkill for my needs (however I definitely think the actual data analysis team at my company should switch from Jenkins to Dagster heh…)

On a separate note, all of my pipelines Load into Elasticsearch, which I’m using as a data warehouse. I’ve realized this is another unconventional decision I’ve made, but it also seems to work well for my use-cases.

fifilura · on Aug 13, 2024

What is current language and have considered doing it in SQL?

I don't think go will be the right choice. It is just not its strength.

physicsguy · on Aug 13, 2024

It depends on what you're doing right? The commenter here replied to me, and we're processing really large data files that are deliberately not in a SQL database due to size, only artefacts of these files eventually make it into a time series DB. For us Go works well and is performant without any great difficulty. For domain specific analytics we generally use Python, and Go just calls out to an API to do them.

fifilura · on Aug 13, 2024

You are right. And I am mostly a "T" guy so I guess the answer was mostly about the transform.

For extracting the data, go is probably a very good choice. But for transforming, pretty often not, although your use case may be suitable.

In the end, the question was very open ended.

caeril · on Aug 13, 2024

> it's a nice language but using it for things it's not suited for like data exploration

Pedantic point, but this is an issue of library support, not the language.

For whatever reason, data scientists and ML researchers decided to write their libraries and tools in a Kindergartner's language with meaningful whitespace, dynamic typing, and various other juvenile PL features specifically aimed at five year olds and the mentally infirm.

physicsguy · on Aug 14, 2024

Nobody would really use a compiled language for this, the compile-run-edit-cycle just takes too long. Prior to Python people really just used MATLAB and Mathematica for that sort of work in the physics/engineering side, and R and Stata/SPSS on the bio + maths side. MATLAB, Mathematica, Stata and SPSS are all commercial and R has exactly the same problems to Python in environment management and compiled binaries, if you use it today you end up doing a lot of manual compilation of dependencies and putting them in the PATH on Linux at least.

Python became popular because the key scientific ecosystem libraries copied the libraries from MATLAB closely which made it easy to pick up, and because it was free. Anaconda made a distribution that was easy to install with all the dependencies compiled for you, which worked on Linux/Mac/Windows which made it much easier to use than R. The other interactive languages around at the time were Ruby which was heavily web dev focused, and Perl. Node didn’t yet exist.

Once you have an ecosystem in a language it’s very hard to supplant. You need big resources to go against the grain. That no big company has decided to pour lots of money into alternatives even despite the problems probably tells us that it’s not viewed as being worth it.