Hacker Newsnew | past | comments | ask | show | jobs | submit | sylr's commentslogin

Am I the only one enforcing a strict no database in kubernetes policy ?


You're not the only one, obviously. But that stance is not really relevant anymore. Just about every database as a service company will run their databases in kubernetes now.


Honest question: Why is that? What is wrong with your DB running in Kubernetes and what do you suggest as alternative?


I consider anything in kubernetes disposable. I want to be able to lift and shift anything in a new cluster without any migration in a matter of minutes.

Anything "stateful" like a database breaks this paradigm.

I have nothing against databases used as cache that can be "re-filled" upon re-creation, but I believe anything holding business critical data shall be held outside of a kubernetes cluster. Why, because being one command away deleting your StatefulSet, Helm Release ... etc scares the shit out of me.

You can of course minimize the risk with correct RBAC, ensure proper backup/restore migrations but that require lots of staff and efforts I can't spare.

So until I can be reassured that I have all the tooling that can recover rapidly any catastrophic failure/mishap, and that all this tooling is tested monthly, I enforce using managed databases services.


if you set a retain policy on persistentvolumes it will prevent your volumes from being deleted even when you delete the owning objets. cloud providers will keep the virtual drive on those cases even if you delete everything

regardless, it’s the wrong thing to fear. this is at the level of logging in every user as root on your servers and databases because proper user management would require extra staff and efforts you can’t spare.


There are databases where individual nodes are disposable. A lot of managed dbs aren’t exactly zero ops and hard to service without downtime too


Performance is a common reason. There are architectural incompatibilities between high-performance database engines and the way Kubernetes is designed to work. Kubernetes does not respect the traditional contract between the OS and the database engine, transparently interfering with that interface in adversarial ways that degrade important optimizations. Ironically, the performance loss can be substantially worse than running databases in a properly configured virtual machine -- virtual machines have some overhead but they otherwise don't interfere with the proper functioning of this software. Kubernetes wasn't designed to efficiently support the syscall and hardware utilization patterns of I/O intensive applications and this is evident throughout, even requiring non-standard hacks to set things up in Kubernetes that really shouldn't be necessary.

Deploying databases in Kubernetes is fine for many applications, I've done both. Not every application that uses a database is data intensive.


Do you have any specific examples of Kubernetes interfering with this contract? I've not heard of this kind of behaviour before.


All high-performance database kernels do full kernel bypass e.g. they control their storage hardware, CPU affinity/context-switching, memory, network, etc explicitly. For all practical purposes Linux turns into little more than a device driver. This enables integer factor gains in performance via various optimizations. Ironically, it also makes the code simpler because behavior is explicit.

Linux is specifically designed to support this type of usage. The necessary syscalls were added decades ago, originally to support databases. Kubernetes intercepts these syscalls because they break its abstractions; while they appear to function like the underlying kernel syscall, the resultant behavior is not the same and generally unsuitable for these types of database architectures. The practical effect is degraded and unpredictable performance because it violates invariants that core optimizations rely on.

This has been kicked around by Kubernetes people for years, including within my own orgs because we use a lot of Kubernetes. No one has every been able to make this type of software achieve comparable performance, even when we've used a lot of hack-y workarounds. Kubernetes was not designed to allow software to interact with the Linux kernel in this way. Consequently, this type of software is deployed on VMs or bare metal in practice, even if everything else is on Kubernetes.


Sounds like parent misunderstood how PDs work in k8s or maybe is referring to THPs - still doesn’t make a ton of sense


No. You are not alone. There are still a number of organizations that are cautious about deploying databases in Kubernetes. That said, various third party surveys as well as anecdotal evidence of what we are seeing at Crunchy Data suggests that deploying databases on Kubernetes is increasingly common. The degree to which it makes sense often depends on whether or not the organization is standardizing around Kubernetes for their deployment model more generally.


I'm hoping these kinds of policies continue to be phased out.

The Kubernetes world has changed a lot in the past few years in ways that make databases-in-k8s more appealing. Such as:

- Kubernetes "eating the world", meaning some teams may not even have good options for databases outside k8s (particularly onprem).

- Infrastructure-as-code being more prevalent. Since you already have to use k8s manifests for the rest of your app, adding another IaC tool to set up RDS may be undesirable.

- The rise of microservices, where companies may have hundreds of services that need their own separate data stores (many which don't see high enough traffic to justify the cost of a managed database service).

- Excellent options like the bitnami helm charts: https://github.com/bitnami/charts or apparently Vitess (haven't used it myself): https://vitess.io/

Obviously if the use-case is a few huge, highly-tuned, super-critical databases, managed database services are perfect for that. But IMO a blanket ban might be restricting adoption of some more modern development practices.


I wrote recently a piece [0] on why I believe you should run your databases in Kubernetes. FYI.

[0]: https://thenewstack.io/kubernetes-will-revolutionize-enterpr...


No, other people also need consistent performance and a simpler system that works in practice.


I have a strict policy of not running my own database at all. Does that fit within your policy, or does it violate your policy if my database-as-a-service vendor uses a container orchestration platform?


No it does not. I don't run database in kubernetes because I don't trust myself to be able to recover from disasters.

How database-as-a-service vendor run their services is none of my business as long as they deliver the performances I need and working backup/recovery procedures.


why is that?


Because resource intensive stateful workloads with persistent data is basically k8s’s Achilles heel. It’s not that k8s can’t handle it, it’s just that you get pretty much no benefits from k8s and so the extra configuration overhead is rarely worth it compared to running an external db cluster.



Wow thank you ! I have read the whole article but I couldn't wrap my head around it. I found the "rod story" confusing then I followed your link (and also watched the related Youtube video).. This really makes things clearer. I am really not into maths but this version seems much easier to understand and very logical.


Nope, error: aborting due to 3 previous errors; 31 warnings emitted


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: