That looks like a nifty service, but we were an AWS shop and the bandwidth intensity of ETL would have made a GCP-hosted service cost prohibitive. That said, at least once a month I tried to convince our CTO to let me move our data workloads over to GCP due to all the nifty managed services they have available.
I was primarily munging data between FTP drops, S3, RDS, and Redshift, which mainly fell into free buckets for internal data transfer.
If the cost of Composer is an issue, ping me. Running a static environment _does_ have a cost, but for serious ETL it should be pretty inexpensive all things considered. You _should_ be able to use Airflow (in GCP or anywhere else) to call on other services, like S3/Redshift to operate without moving the data through Airflow, keeping network tx low.
If it's network traffic for the actual data moving to and from, that's unfortunately an artifact of how public clouds price.
Engineer at Astronomer.io here. We offer Airflow as a managed cloud service as well as an affordably priced Enterprise Edition to run on your own infrastructure wherever you'd like. Check us out - and feel free to reach out to me personally if you have any questions.
I did multi-cloud doing the data stuff in GCP (mostly GCS and BigQuery) and the rest in EC2 -- costs weren't really an issue, but I guess if you're moving TB's around daily, that's the problem?
Not quite TB's daily, but close. We were in B2B lead generation, and a lot of my ETL workloads involved heavy text normalization and standardization, then source layering to ultimately stitch together as complete and accurate of a record as possible based on the heuristics we had available.
Providers of that type of data essentially live in a world of "dump dataset to csv[1] periodically, place csv onto the FTP account for whomever is paying us for it currently". No deltas for changed or net new records, no per customer formatting requests, nothing. So the entire thing had to be re-processed every single time from every single vendor and then upserted into our master data.
[1] Hell, usually not even technical information was provided like the character encoding the data was stored or exported at or whether it's using database style character escapes (any potential special character is escaped with a backslash) or csv-style escapes (everything is interpreted as a literal except for a double-quote, which is escaped with a second double-quote).
Tangentially, I've usually seen “multicloud" for that and “hybrid cloud” for combining remote cloud services with on-premises resources in a blended system.
I was primarily munging data between FTP drops, S3, RDS, and Redshift, which mainly fell into free buckets for internal data transfer.