Probably because it has all but zero filters on the input and output. It took widespread media outrage about "grok show me her in a bikini" to at least create a filter that bans such things.
While I love Redis as a versatile tool for external data structures, it's still lacking in two areas IMHO:
One, it would be cool to be able to embed it, similar to sqlite, directly into applications.
Two, the HA story is so much more complicated than it should be. I totally acknowledge that concurrency and distributed computing is hard, but it should not require reading heaps of documentation and understanding two entirely separate multi-node approaches only to figure out there are lots of subtle strings attached that make it impractical for many applications.
What would be the point of embedding Redis into an application? What's the advantage of using Redis over using the builtin (or third party) data structures of the language the application is developed in?
I'm asking as a non-webdev who never quite got what Redis actually does, but would love to learn.
To me the thing I like about Redis is that it gives you a storage engine very suitable for caches; it handles TTLs and memory pressure, as well as built-in serialization with the ability to get better performance by allowing for some data loss. At the same time, many users will be deploying small programs to individual machines. If you could just have Redis be embedded this would make it very operationally simple: no additional daemons and a single file to backup if you want to.
It would also be useful because of the ability to switch modalities. When running a multi node service, you can use Redis to share data between nodes and use Redis pubsub as a communication bus. If you wanted to support a simple single node configuration too, then it wouldn't need to be a special case, it could just go through the same mechanism but with an embedded Redis instance.
It's pretty similar to SQLite: being able to embed more or less a complete storage engine into your app can be very convenient and powerful.
Well the most basic redis replacement would be just a global hashmap to replace GET and SET, possibly with a background thread to periodically delete expired keys. But obviously that stops working as soon as you get a second node.
The entire value of redis IMO is that is ISN'T inside your normal application, but rather some shared storage that all nodes can use to coordinate and that survives deploys, but that provides more ergonomic data structures than SQL databases. Caches are only one type of such shared data, but things like feature flags, circuit breakers and rate limiters are also super common (and super useful).
Neat. Write that up, match parity, and give all the function calls with the same name as redis, and you're both happy! You get to hand roll something, he gets to use a library that others have perfected over the years!
Unfortunately I have never really used Erlang outside of deploying RabbitMQ. I mostly use Go, Rust, Python, sometimes C/C++.
However, Mnesia seems like it is quite a bit more of a complete distributed database engine than Redis. To me the nicest thing about Redis is just the convenience of what it offers: very fast data structures, serialized, optimized (at least by default) for cases where speed is more important than durability. It is simple on many levels and somewhat constrained in scope. Mnesia seems to be aiming more generally in the distributed database category.
Really it would be more like Nebulex/Cachex which provide a really nice caching interface across ETS (what Mnesia is built off of) or other data stores.
Probably because Redis gives you a very well-defined/understood set of rich data structures with built-in behavior like TTL, atomic operations, eviction, and persistence. These things are otherwise usually scattered across native types, helper classes, or entirely separate libraries.
Language's own native data-structures are generally much more capable and vast. 99%+ developers use only a very limited set of those capabilities. This approach packages those most used ones into a nice, consistent DSL.
It's similar in effect to what busybox does to shell utilities, though the motives are different.
Doesn’t depend on the language? Actually I am thinking of the standard library… Python’s in kinda huge and some are hard (for me) to grasp. Golangs seem pretty simple.
Can you name a single language that can talk to redis and doesn't have these in a form of a library that integrates with an app better than mystical embedded redis?
Every language you can talk to redis most likely has a library to do that, and it probably works much better with the rest of application than "embedded redis". If it doesn't, it probably has C-FFI and there is "fast, robust and well understood" implementations in C.
Sure. But if Redis was embeddable you'd get a robust C-FFI style implementation of those data structures which has been tested a lot more than some random library that has almost no existing users or active maintenance.
(I'm not personally sold on embedded Redis myself, but the question was "Aren’t your own programming language’s constructs much more well-defined / understood?")
A few nice things about doing this in no particular order:
Embedding would make local dev/CI integration testing convenient.
Embedding replicated Redis with each application instance would give you HA benefits while infra-management complexity.
Embedded redis (even via local RPC) is still going to be faster than a lot of languages or frameworks’ built-in data structures. Large array operations in, say, Python are gonna slower than RPCing to Redis (assuming that the data structures are built gradually and not built all at once); to beat Redis you’d have to use numpy or something—-which is definitely preferable, but is extra work if your app already uses Redis for other things.
Just like choosing SQLite over e.g. LMDB or RocksDB, embedded Redis would be a nice future proofing option for small apps during the prototype phase; less would have to be changed to move Redis out of the app than if a different cache or persistence service were chosen.
I mostly use redis for pub/sub communication between services. If the app wasn't a collection of knative functions, and instead a monolith, it would be cool to also use redis for event based communication.
In practice, mostly scaling sessions and ephemeral data (caching) across multiple intances of a microservice on multiple machines. Seperating the kv store and the application allows upgrading each application while retaining availability and avoiding loss of session data.
For simple cases, it is probably a total overkill to even consider it, but for something heavier, embedding the database gives you a chance to trivially migrate later to a separate database server.
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary.
Network hops are not free! Those milliseconds are an eternity compared to local function calls.
The optimal architecture is something like what Service Fabric or Orleans can do with their distributed dictionary types: reads are generally in-process and take only nanoseconds (but writes require a synchronous replica copy to a remote host.)
Obviously this requires load balancers to steer traffic consistently, but that’s a common feature… outside of the public clouds where they forgot latency exists.
A typical (meaningful) example might be communication between threads or actors in a single process, or idempotent tests.
As with SQLite, an external xxx that does this for you is certainly better, etc. but it’s convenient sometimes, to have an application that doesn’t go “now before you run this install Postgres…”.
It’s seldom useful for a web app where you control everything.
> One, it would be cool to be able to embed it, similar to sqlite, directly into applications.
I've found myself wanting this on several occasions too. I.e. wanting all my rust backend processes (k8s pods) to have some minimal shared state, without having to spin up a Redis cluster. I've talked to Claude about it a couple of times, and it descends into something like, "you gotta use Raft or CRDTs, and pick 2 out of 3 from CAP". Which honestly seems pretty fair, and indicates to me that I'm dreaming for something magical.
Nonetheless, it is nice to hear someone else asking for this. If this is indeed feasible (even if simple/limited), then I'd be interested to try it.
I don't know if that'll make you feel any better but yeah, you're indeed asking for the impossible! You need consensus between your nodes that store state _somehow_, either these nodes are Redis and it does that for you, or these nodes are your pods and you need to do consensus yourself (zookeeper might help, but you're definitely in "complicated stuff" territory).
Spinning up an in-memory (no persistence) Redis cluster in your k8s should be easy enough, hopefully?
And yes, adding a Redis cluster is fine, it is just another moving part to manage. But given that the alternative is made out of unobtainium, I guess that is just the way of it :-)
Genuinely interested why we need HA in redis, just not read round robin from multiple non-HA instances?
Redis (and memcache) are memory caches and should be treated like that, not like highly consistent distributed session store.
> Redis (and memcache) are memory caches and should be treated like that
If you haven't come across Kvrocks yet, it may be worth a look: https://github.com/apache/kvrockshttps://kvrocks.apache.org/ . It's a database with a Redis-compatible wire protocol, but the database is stored on disk. This means your working set is not limited by RAM and can be a few orders of magnitude larger! On modern SSDs this is still very fast. I think it improves the durability story as well. But the big win is the orders of magnitude larger database space.
As I've been improving my side project https://totalrealreturns.com/ recently I've ended up using both Redis and Kvrocks together. Redis is great for small global state that needs to be super fast. Kvrocks is great for larger bulk data storage (large precomputed datasets), but also supports a lot of the Redis data structures as well as Lua scripts.
It’s not. Imagine a web app that stores your user information in a session store, mapped by your cookie-provided session ID. Your web app searches redis 1 for the session id, but since that key is on redis 2, the lookup fails and the application thinks there is no such session, and rejects the request.
Now you could solve this specific case by sharding by prefix, or by querying all instances, but then you still do not have high availability: if the instance a specific session is on is down, these users cannot authenticate. At that point you’re better off with a single instance.
But that is his point.
If you cannot find the session id in redis, you login again.
If your Redis server crash, you start a new one and everyone just login again. No data is lost.
This discussion is a bit weird. We started off from, Redis should have better availability guarantees. Specifically to avoid the degradation of service you described.
But that requires running on multiple instances, which in turn requires to share the data across all replicas.
These two concerns are not mutually exclusive, the kind of database or data stored within it doesn't give any availability guarantees on its own. Even a single Postgres instance, which I suppose fits your understanding of a real database, is a single point of failure and not a highly available setup: If your database server goes down, clients get errors and the database is thus unavailable.
> The app would look up in both databases. If it exists in any, there would be a session.
And if you find the session with differing values in both databases, how do you know which one is up-to-date?
You need an algorithm to pick which data is right, such as electing a master instance.
And that brings us back to the original discussion: to manage sessions (unlike caches) in a highly available way, you need to setup HA (or reimplement it, which obviously is a bad idea). You can't read round robin from multiple non-HA instances.
For the project I've been working on for more than 15 years, we make extensive use of the pub/sub functionality for distributing live data. Pub/sub scales well across the cluster. Publish to one, and it goes out to subscribers on any of the nodes that they've connected to.
Will millions of users, high availability is critical for this functionality.
Redis doesn't necessarily have to be used as a cache. Streams, for example, make it a great message queue; but a single-node message queue is a single point of failure and thus not viable for many setups.
That you do. Until you realise that there is only a single writer in that scenario, it doesn’t address any sharding concerns, you need to use compatible clients that opt into the sentinel protocol, during failover you’ll see client errors… there’s lots of room for improvement on redis HA.
With the amount of problems I had using Redis Sentinel, I really wish there was another way. On multiple occasions, with completely different deployments, it got itself into a non-repairable state where the only option was to drop it and setup the replicas manually. I was hoping someone would do a Patroni-like project for Redis, but I've not found it yet. I've moved all persistent data to PostgreSQL and use a number of Valkeys behind Envoy proxy as a cache.
I suggest you to take a look at rdsync (https://github.com/yandex/rdsync), exactly what you want: Patroni-like high-availability tool for Valkey/Redis. Uses ZooKeeper for external coordination. We use it in our large deployment and with a couple patches you will forged about the need to take manual actions to resolve broken states.
To be honest - at any scale, this really does help me not wake up at night to fix broken states by hand as sometimes on-call engineer. Although note that rdsync is mainly for Valkey up to 9.1, there were Redis patches for 7.2 (last BSD version).
Redis have many use cases, and acting as a cache is only one of them. One very common usage is as a backend for background worker jobs. That can need HA.
> Two, the HA story is so much more complicated than it should be.
Really? I am curious, how would simplify it? It’s a very well defined problem and all the “solutions” are very complicated and with many strings attached. I have managed one or two systems that came in different modalities and you had to pick your poison and had to make sure the other engineering teams understand the trade offs. Some were more successful than others, but “easy” never crossed my mind.
Redis is single threaded and doesn’t concern itself with these things, directly, exactly because Antitez understood the trade offs and made all the right choices.
How would you improve the HA story without sacrificing ease of use and performance on a single thread?
What kind of an answer is that? This software is perfect the way it is, you’re just to inept to hold it right?
A high availability protocol should not leak into the client. It should be able to discover other nodes. It should not land in broken states so easily. It should not limit the number of writers. It should not error during failover.
Are these hard problems? Yes. Should we just accept that things are hard because that’s how the gods have given them to us? No.
High availability and abstraction complexity are orthogonal.
Redis is a low-level concurrency primitive, and it made certain choices in dealing with CAP.
It might be single-threaded, but it can easily absorb 100,000+ requests per second.
I've built systems that handle billions of dollars of online payments flow, active-active, with six nines of uptime reliability on top of Redis. It does what it says on the tin, and it doesn't need to be everything for everybody. This is a hard domain and you're going to have to deal with different problems and tradeoffs.
If you want something higher level, there are other systems to reach for.
It still doesn't reflect the design philosophy at all, though. A wacky approximation of early MacOS that offers nonfunctional UI affordances doesn't fit my bill of No obscurantist programming languages and styles, or simple, maintainable software akin to machines that need to work under all circumstances in the far north.
I was also a little disappointed with the philosophy's goals in general, which seem to be mostly the personal preferences of a lone-wolf style open source developer, not a universal approach to software design.
When you describe my programming and design philosophy as "the personal preferences of a lone-wolf style open source developer, not a universal approach to software design", I consider that the absolute best compliment I could have ever hoped for!
A "universal" approach to software design is the problem I am addressing, not the solution. Coming up with your own philosophy of design and implementation that works for you, and hopefully works for others, is how we get better software.
I'm not arguing with that, I think; I agree with your general sentiment and apparently read many of the same books you read as well. Yet I still believe there's value in a shared understanding of what quality software is, and what ideals to strive for in its conception.
> I was also a little disappointed with the philosophy's goals in general, which seem to be mostly the personal preferences of a lone-wolf style open source developer, not a universal approach to software design.
How would a universal approach to software design be in any way appropriate for this?
I like the general concept of software that treats its users as responsible adults, in the sense of not restricting them in how they can use the software; the analogy to machines that must work in remote areas with an extreme climate and no connection to the outside world is an apt one. Rejecting complexity in favour of maintainability, allowing to reach into and modify if necessary, those things I feel could be sharpened into proper, and universal guiding principles.
The perspective. From within the simulation there is no point in making the distinction, but from the outside it does. For another example - running a program on a virtual machine or a physical computer is the same to the program, but very different to debug when you see hardware errors.
I feel like systemd units could need a layer of abstraction above them, so instead of editing the files manually, a tool would do it, some kind of declarative CLI or something. Probably not really a concern in the age of LLMs anymore, but it feels just slightly too tedious every time.
This is something I think about a lot, especially how one could pull it off without tearing down anonymity online. Having some sort of "proof of humanity" is a hard problem to solve.
Nowhere in history we had products as addictive as those. There are people making a career studying how to exploit people's weaknesses, to induce them to buy products.
reply