This is some of the most insightful high-level thinking about AWS services that I've seen. No taxonomy is perfect, but this is invaluable for gaining a panoramic perspective on the vast sprawl of AWS.
I'm interested in hearing opinions about this principle in the context of data engineering:
> This also means leaning heavily into all the service offerings and orchestration tooling that is afforded to you by your platform.
I've built a data lake and several ETL pipelines using AWS native services (Kinesis, Lambda, Athena). It works but it's a bit...fiddly. I spend a lot of time configuring these services and handling various failure modes. I've been wondering if I should be looking at third party vendors like Fivetran or Matillion for ETL.
Does anyone who's worked with AWS data engineering services have thoughts on the trade-off between AWS native services and third party vendors in this area?
I can strongly attest to snowflake. I regret AWS doesn't offer the same features without making us jump through the maze of services with which we can emulate the same concept.
Thanks for sharing, I've heard many good things about Snowflake. In the past I've seen them as more of a Redshift competitor (data warehouse, as opposed to a data lake) but if they can simplify data ingest then I am definitely interested.
They're fundamentally different only in the model of decoupling storage and compute completely, but in a far more simpler way than redshift spectrum I feel like. Some of their features like zero-copy-clone are just not possible in AWS and make it extremely simple to do pipeline management in way that (at least to me) makes the most sense.
It's also the most democratizable model I have seen - anyone who knows the slightest amount of SQL can be set up to explore the data in minutes.
The elephant in the room is that you need to use SQL. Their spark connecters are as of now useless, so you either have to go with DBT, some homebrew SQL stringing mess or something like sqlalchemy. We're currently developing some wrappers around sqlalchemy to make this a bit less painful, but it's still so worth it.
Understanding the AWS bill is far harder than it should be. That being said, there are some resources that weren't mentioned in this blog post. The ultimate source of truth is the AWS Cost & Usage Report [1] which can be delivered in Parquet and queried with SQL via Athena.
Although the Cost & Usage Report alone can solve many billing mysteries, in some cases it's also necessary to go to CloudTrail logs to determine exactly which user or application incurred charges.
The writing in this post is superb. One of my favorite lines:
"Windows is the Superbowl Halftime Show of operating systems. Given what everyone got paid, and how many people were involved, you’d think it would be a lot more memorable."
As someone who spends a lot of time with Windows...yeah. But the post is really an amazing blend of technology and sentiment, kind of reminiscent of Neal Stephenson at his best. Makes me want to dust off my PDP-11 emulator and take another crack at Unix V6.
It's both beautiful writing, and beautiful screenshots. Something in me jumped for joy as I was scrolling, and I didn't even have most of those systems (went from a TRS-80 Model 1 to a Sanyo MBC-550 to a 386 running MS-DOS 6.22, and then on to a bunch of variants of Windows.)
The author deliberately used the term "LISP Machine", with all of the letters of Lisp capitalized, but that's an anachronism (and a fairly good shibboleth). When they were developed they ran Lisp Machine Lisp, not LISP. And as far as I know they never used the specific appellation "LISP Machines", where the word Lisp was written in all capitals but the word machines was not.
Edit: It's hard to distinguish someone's tone on the internet, but I did not intend to be captious or mean spirited. My point is that there are enough differences between LISP and Lisp Machine Lisp, a descendent of Maclisp, that the two are not interchangeable.
If LispMs are for monks or nuns, is ITS for sorcerers?
I especially liked how you could patch a running kernel, but only if you entered a specific keystroke sequence the right way the first time. If you muffed it, a flag got set and the system would disallow future attempts even if you got it right later. It's more like a puzzle in a text adventure than an OS security mechanism.
And, of course, the command shell was a machine code debugger, but that really wasn't hugely weird in itself.
Also worth mentioning is the GnuWin32 project (http://gnuwin32.sourceforge.net/). It provides native Windows builds of many GNU utilities, including a few handy ones missing from the Git distribution like file and wget.
If you're serious about using Vim on Windows, check out gvim (http://www.vim.org/download.php#pc) which adds some nice Windows integration.
I would also echo the author's point about avoiding Cygwin. It gives you the worst of both worlds: you can't access Windows facilities, and many Linux programs don't quite work right. By far a better solution is to use a native Windows shell (Powershell) together with native Windows builds of various GNU utilities (from a Git install, GnuWin32, MinGW, etc). Powershell gives you fluent access to a wide variety of Windows-specific API's, and you don't have to sacrifice the convenience of a Linux command line environment.
Your zip example actually demonstrates why Powershell is significantly more powerful than Bash (or any other Unix shell). I presume that if asked "How do you zip a file in Bash?" you would reply with something like this:
zip archive.zip folder
The equivalent Powershell code would look like this:
7zip a archive.zip folder
Pretty much exactly the same, since in both cases the shell isn't doing anything other than invoking a standalone executable. So this doesn't really tell us anything at all about the relative merits of Bash and Powershell. A better comparison would be this: how do you zip a folder in Python? Again via Stackoverflow (http://stackoverflow.com/questions/1855095/how-to-create-a-z...):
import os
import zipfile
def zipdir(path, zip):
for root, dirs, files in os.walk(path):
for file in files:
zip.write(os.path.join(root, file))
if __name__ == '__main__':
zip = zipfile.ZipFile('Python.zip', 'w')
zipdir('tmp/', zip)
zip.close()
Compare that to your Powershell example, and I think you'll agree that the .NET API is nicer to use (heh, don't say that very often). But to return to the subject, how do you zip a folder in Bash? YOU CAN'T. Bash can't zip folders. It can't call libraries that can zip folders. It can only invoke programs that can zip folders. And any shell, even the wretched cmd.exe, can do that.
I have plenty of problems with Powershell but it is by far the best effort at a shell that the world has yet seen. I come from a Linux background and when asked what I like about working on Windows, I reply with "well, it's got a nice shell." Usually gets me a weird stare, but if we're really comparing shells to shells (and not shells to a wide variety of standalone utilities) then Powershell blows everything else out of the water.
I actually don't like PowerShell, I don't like the verbosity of it all.. it's easy enough to get most unix tools in Windows (minw) variants... put these into a folder added to your path, and most stuff just works the same, or very similar. I tend to use the command line tools, and piping, or I'll use node.js scripts, which work pretty much the same everywhere I need them...
The only differences are nssm for windows services, and init.d for linux ... haven't had to setup any background/startup services on my mac yet, so not sure what it uses.
I don't get why people hate windows so much as a rule, I really like the Win7 UI (not a fan of 8, but can see how some would be)... Also, warming up to Unity (more than win8)
I'll add another recommendation for MediaWiki. It is a good low-friction way to establish documentation for internal services and libraries. The markup syntax is rather idiosyncratic, but powerful enough for most of our needs. "Does it have a wiki page" is now a standard question for any new service or library in our shop.
I quite enjoyed this, it's a nice piece of "inside baseball" for high energy theoretical physics. For those who don't have time to read 20 pages of jargon from another specialty, here's an excerpt from the end of the paper that I think communicates the key point:
"I think that string theory is a wonderful theory. I have a tremendous admiration for the people
that have been able to build it. Still, a theory can be awesome, and physically wrong. The
history of science is full of beautiful ideas that turned out to be wrong. The awe for the math
should not blind us. In spite of the tremendous mental power of the people working in it, in
spite of the string revolutions and the excitement and the hype, years go by and the theory
isn’t delivering physics. All the key problems remain wide open. The connection with reality
becomes more and more remote. All physical predictions derived from the theory have been
contradicted by the experiments. I don’t think that the old claim that string theory is such
a successful quantum theory of gravity holds anymore. Today, if too many theoreticians do
strings, there is the very concrete risk that all this tremendous mental power, the intelligence
of a generation, is wasted following a beautiful but empty fantasy."
I spent a few years during my undergrad working in a high energy experiment lab, and this was a frequently debated subject. If the LHC does find supersymmetry, it will be interesting to see the impact on the current theoretical landscape.
I enjoyed this as well, and I know absolutely nothing about the field. I think that's why the writing is so enjoyable, I understood the conversation without understanding most of the details.
I for one would greatly enjoy more such writing on HN, it's a refreshing change from some of the bloggier posts that make the majority of the frontpage.