More

reinhardt · 2025-05-17T22:04:23 1747519463

Also curious why every comment mentions just the number of rows as the only factor that matters. A 100M rows table of 3 integer columns is quite different from 50+ columns, 5 of which are text up to a few MB long.

reinhardt · on Oct 9, 2024

Getting a cyclic import error is not a bug, it's a feature alerting you that your code structure is like spaghetti and you should refactor it to break the cycles.

reinhardt · on Oct 9, 2024

That's not a problem, let alone the biggest one. You should just use relative imports explicitly.

int_19h · on Oct 9, 2024

It is a problem because stdlib does not use relative imports for other stdlib modules, and neither do most third-party packages, which then breaks you regardless of what you do in your code.

reinhardt · on July 27, 2024

I haven't used Airflow for years but it used to be quite clunky, not sure how much it's improved since. I'd look into Prefect and/or Dagster first, both are more modern alternatives built with Airflow's shortcomings in mind.

reinhardt · on May 27, 2019

I'd guess career-progression points, or even keep-getting-a-paycheck points at worst.

mixmastamyk · on May 27, 2019

This low-stakes exchange won't affect much, it's about putting Mr. Incomp in his place. Remember it's a PM we're talking about, not "the Boss."

reinhardt · on May 25, 2019

> Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow???

The bidder/pacer state is not necessarily massive, and certainly it does not consist of all the gazillions of past events. Depending on the strategy/bidding model, it can range from a few MB to several GBs, something that can fit in a beefy node.

> Google Fi/Spanner and BigTable have certainly been developed to support these issues.

I doubt any external store can be used with so low latency constraints (2-10ms) and high throughput (millions RPS). Perhaps Aerospike but even that is a stretch to put it in the hot-path. At this scale you're pretty much limited to fetch the state in memory and update it asynchronously every couple of minutes/hours.

Source: I also work in ad tech.

reinhardt · on May 10, 2018

Why PostgreSQL only? The mara-DB dependency [1] claims to support more.

[1] https://github.com/mara/mara-db

martin_loetzsch · on May 10, 2018

(author here)

Currently there is a hard dependency to Postgres for the bookkeeping tables of mara. I'm working on dockerizing the example project to make the setup easier.

For ETL, Mysql, Postgres & SQL Server are supported (and it's easy to add more).

random4369 · on May 10, 2018

I'm a bit confused about this. What if the target is HDFS? Why this dependency on SQL databases for ETL?

reinhardt · on May 10, 2018

> Airflow requires task queues (e.g. celery), message broker (e.g. rabbitmq), a web service, a scheduler service, and a database. You also need worker clusters to read from your task queues and execute jobs.

All these are supported but the scheduler is pretty much the only requirement.

Source: been running Airflow for the last two years without a worker cluster, without having celery/rabbitmq installed and sometimes without even an external database (i.e. a plan sqlite file).

reinhardt · on April 26, 2018

Yet another reason for trimming off old jobs after some point; the primary one being nobody cares going through 3+ page long resumes.

bonesss · on April 27, 2018

I think resumes should be treated less as a report card and more of a brochure. Hiring managers have little time, so keeping it focused on relevant highlights and selling the candidate for that job are the entire point.

A 15 page menu isn't better than a 1 page menu... A spa advertising every stone in its parking lot doesn't make you think nice things about their mud baths...

reinhardt · on April 24, 2018

In which case you can just compare the dicts without performing the multiplication (which happens to be the costliest part for arbitrary-precision integers).

tuukkah · on April 25, 2018

Exactly.