PostgreSQL Row Level Security

sbuttgereit · on April 16, 2023

Hmm.. some bits of this don't ring true to my understanding... for example:

"There are a lot of reasons that you don't want to coopt your database's role management to manage your app's users as well.

It doesn't scale. Postgres isn't designed to have thousands of roles"

While the general assertion that you don't want the database role management system to be conflated with your app's is certainly the common practice, the assertion that the database not scaling with many roles doesn't seem to be well supported. As a counterpoint to this part of the arguement:

https://www.cybertec-postgresql.com/en/creating-1-million-us...

"The bottom line is really that PostgreSQL can easily handle a million users without any problems."

The Cybertec conclusion is more in line with what I'd expect. Database roles are really just data and if PostgreSQL can't handle a few tables with thousands of records (or millions).... you've got more scalability problems than role scalability to be worried about.

Maybe the author is conflating the well known scalability issues with the number of concurrent connections to PostgreSQL with the number of defined roles.

hyperman1 · on April 16, 2023

That's only true for postgres itself. The postgres tooling and library ecosystem deals less well with 1000s of roles, as it assumes it can load all rules in memory and show them in a list. It gets annoying quickly.

This is a common theme with even mature tech. If you walk off the betrotten paths, the core tech deals well enough, but you limit your ecosystem choice. I'd advise against it, there are surprising hidden costs.

nextaccountic · on April 16, 2023

> The postgres tooling and library ecosystem deals less well with 1000s of roles

Do you have an example of such tool?

> as it assumes it can load all rules in memory and show them in a list

Which tool has trouble loading millions of users in memory? This should be a few megabytes of RAM usage at most

hyperman1 · on April 16, 2023

It is more the UI/list aspect. take e.g pgadmin and click open the list of users. I have a db with maybe 1000 users and even with that low number it gets annoying to navigate. Compare with e.g. LDAP tools that allow to organize users in a tree.

Now 1000 is still survivable, but the experience made it clear to me that one doesn't want ones whole corporation in 1 long list.

To be clear: This is not a problem of the postgres ecosystem, which has very high quality standards and deserves much praise. But you get a feel that it wasn't intended to be used that way.

dimfeld · on April 16, 2023

Author here. This page was actually just some notes I threw together in a couple of hours combining some research on RLS with my own existing knowledge, so it's quite possible there are some inaccuracies.

My "source" for thousands of users not scaling well was simply a few comments I found scattered around; there wasn't really anything empirical to back up that line. I'll remove it. Thanks for the note!

swyx · on April 16, 2023

best way to learn is to get knowledgeable people from HN reviewing your doc :)

smashed · on April 16, 2023

> conflating the well known scalability issues with the number of concurrent connections to PostgreSQL with the number of defined roles.

They are related in some way, since a connection is tied to a user, you will need to open a connection for every active user.

That means solutions like pgbouncer that pools connections might not work as well than your typical app where most connections are interchangeable.

paulddraper · on April 16, 2023

There's no need to create a new connection per role.

Simply use `SET ROLE`/`RESET ROLE`.

https://www.postgresql.org/docs/15/sql-set-role.html

yencabulator · on April 18, 2023

It does seem like that adds two round trips (SET ROLE, RESET ROLE) to each database interaction, though.

sbuttgereit · on April 16, 2023

"They are related in some way, since a connection is tied to a user, you will need to open a connection for every active user."

This isn't exactly correct. While every connection will be made with a role with permission to connect, it is not necessary for every role taking action on the database to have its own individual connection. As @paulddraper points out, there are commands which appropriately privileged database roles can use to switch their role context.

For example, if I wanted to take advantage of row level security with many thousands of users, I could (not saying you should, just a hypothetical for this discussion) do the following. 1) For each application user, create a unique database role and assign privileges as desired; 2) open up a connection pool with a reasonable number of PostgreSQL connections using a special role designated for just establishing connections and with privileges enough to change its role (otherwise as unprivileged as possible); 3) as transactions for users come in and get assigned to a PostgreSQL connection from the pool, the first thing that happens at the beginning of the transaction is that the connection switches the current role to that of the user requesting the transaction. 5) at the end of the transaction the role is reset back to the role owning the connection. This role switch then continues for all transactions to be processed.

Such a process as above may have its own issues, but is technically feasible and allows for a small set of connections to serve any user that shows up using that user's database role for transaction processing. The number of connections can be much, much smaller than the number of database roles that might need to process something. Of course, having a million database roles might have implications for how many database connections you really need to service them, but it's not one to one for each active application user.

samdk · on April 16, 2023

It's not that postgres can't handle large tables, it's more that many of the ways it uses system tables aren't particularly optimized for them being large, so you're operating outside the territory the table/query design was probably built for.

Like most things postgres it'll probably still work better than it has any right to for a while, but it'd be extremely unsurprising if it caused you unexpected problems at some point. (And if it does, they'll likely excitingly novel and you'll probably have to dig pretty deep to understand what's going on.)

swyx · on April 16, 2023

so is a good rule of thumb "put your users in postgres first until 100-500k, then revisit if you need to move to a dedicated solution"?

IshKebab · on April 16, 2023

Probably not unless you like rearchitecting your entire security system when you already have 500k users!

tuatoru · on April 16, 2023

That's probably easier than if you have only 2k. Although the special cases in the 500k might squeal a bit, they're more ignorable.

nextaccountic · on April 16, 2023

> There are a lot of reasons that you don't want to coopt your database's role management to manage your app's users as well.

> Roles are global in the entire Postgres system, not scoped to a particular database.

A workaround is the db_user_namespace option

https://www.postgresql.org/docs/current/runtime-config-conne...

> db_user_namespace (boolean)

> This parameter enables per-database user names. It is off by default. This parameter can only be set in the postgresql.conf file or on the server command line.

> If this is on, you should create users as username@dbname. When username is passed by a connecting client, @ and the database name are appended to the user name and that database-specific user name is looked up by the server. Note that when you create users with names containing @ within the SQL environment, you will need to quote the user name.

> With this parameter enabled, you can still create ordinary global users. Simply append @ when specifying the user name in the client, e.g., joe@. The @ will be stripped off before the user name is looked up by the server.

> db_user_namespace causes the client's and server's user name representation to differ. Authentication checks are always done with the server's user name so authentication methods must be configured for the server's user name, not the client's. Because md5 uses the user name as salt on both the client and server, md5 cannot be used with db_user_namespace.

> Note

> This feature is intended as a temporary measure until a complete solution is found. At that time, this option will be removed.

It's a bit concerning that the documentation states that the option is intended to be removed; Postgresql has a ton of misfeatures that, in name of maintaining backwards compatibility, aren't removed just because new, better features are introduced.

However, I suppose that if a better way to scope roles is created, migration tools will be made available

dimfeld · on April 16, 2023

Thanks, I hadn’t seen that!

cybice · on April 16, 2023

Performance is huuuge issue with RLS, as it works like an optimization barrier. Because of security RLS rules are applied initially and then all/most indexes you have just not working at all. ITs not like single WHERE clause with injected RLS checks

dimfeld · on April 16, 2023

Hmm, I thought this only really applied when you use functions which Postgres can’t reason about. Hence the LEAKPROOF keyword to help with this case.

Either way, I think it underscores that RLS shouldn’t be used for business logic. I’ll update the page to emphasize that more.

jemmyw · on April 16, 2023

Lots of dislike for this feature in the comments, or maybe it's this particular implementation? We use RLS at my work to good effect, although it did take some time to bed in with performance adjustments. In that situation it's used as a way to prevent any statements returning cross account records rather than for fine grained authorisation.

I've also been playing with supabase recently and that leans heavily on RLS policies. It works well there, but I won't be hitting any potential performance at scale issues.

bobse · on April 16, 2023

row rever?