That looks like a nifty service, but we were an AWS shop and the bandwidth inten...

chimerasaurus · on May 9, 2018

Disclaimer: I'm the PM for Composer. :)

If the cost of Composer is an issue, ping me. Running a static environment _does_ have a cost, but for serious ETL it should be pretty inexpensive all things considered. You _should_ be able to use Airflow (in GCP or anywhere else) to call on other services, like S3/Redshift to operate without moving the data through Airflow, keeping network tx low.

If it's network traffic for the actual data moving to and from, that's unfortunately an artifact of how public clouds price.

occams_chainsaw · on May 9, 2018

QQ, why is it called Google cloud composer? I totally would've named it Gooflow

chimerasaurus · on May 9, 2018

My key advice for any new PM:

Every Dilbert cartoon about naming a product is true.

andscoop · on May 10, 2018

Engineer at Astronomer.io here. We offer Airflow as a managed cloud service as well as an affordably priced Enterprise Edition to run on your own infrastructure wherever you'd like. Check us out - and feel free to reach out to me personally if you have any questions.

mattbillenstein · on May 9, 2018

I did multi-cloud doing the data stuff in GCP (mostly GCS and BigQuery) and the rest in EC2 -- costs weren't really an issue, but I guess if you're moving TB's around daily, that's the problem?

cosmie · on May 9, 2018

Not quite TB's daily, but close. We were in B2B lead generation, and a lot of my ETL workloads involved heavy text normalization and standardization, then source layering to ultimately stitch together as complete and accurate of a record as possible based on the heuristics we had available.

Providers of that type of data essentially live in a world of "dump dataset to csv[1] periodically, place csv onto the FTP account for whomever is paying us for it currently". No deltas for changed or net new records, no per customer formatting requests, nothing. So the entire thing had to be re-processed every single time from every single vendor and then upserted into our master data.

[1] Hell, usually not even technical information was provided like the character encoding the data was stored or exported at or whether it's using database style character escapes (any potential special character is escaped with a backslash) or csv-style escapes (everything is interpreted as a literal except for a double-quote, which is escaped with a second double-quote).

dragonwriter · on May 9, 2018

Tangentially, I've usually seen “multicloud" for that and “hybrid cloud” for combining remote cloud services with on-premises resources in a blended system.

mattbillenstein · on May 9, 2018

Agree, multicloud is a better term for this.