bear9628's comments

bear9628 · on March 21, 2024

You can get an overview of the platform in this Post: https://xata.io/blog/serverless-postgres-platform#the-platfo...

Zombodb is a really cool project, but no, we don't use it. We use logical replication and triggers to also capture schema changes with your records into an event stream. The event stream is used to send your data to Elasticsearch (and create/update the index). See: https://xata.io/blog/serverless-postgres-platform#logical-re...

Stay tuned, we are planning to open source this component as well.

bear9628 · on March 21, 2024

> I haven't worked with pgzx, but it's possible that memory management is easier. Zig uses memory arenas, and so does Postgres. If one can map directly to the other, it would be a huge win.

Yes. This was indeed a great motivator for using Zig. It was quite easy to integrate Zig with the Postgres memory management. This way we can use the Zig standard library or other Zig libraries without a second thought. Another advantage is that in Postgres memory context cleanup and error handling are somewhat well integrated with each other. This gives us some peace of mind as almost any Postgres function you might want to use in your extension is likely to raise an exception.

bear9628 · on March 21, 2024

One of the reasons why we are using Zig is that we can consume the C API mostly as is.

When importing the C headers Zig translates the C headers into Zig declarations. Unfortunately this is not always possible for C macros, and the reason why we have to maintain those structs. And this is where we have to step in with Custom Zig code. But most of the time we actually consume the C APIs as is.

> If we were to provide zig consumable ways of creating the struct from C headers, would that help?

Yes, that would be awesome. I'm curious how they will look like in the future.

We have had the most difficulties with the module magic, function info, and varatt/Datum macros. Fortunately you have to solve the module magic/function info "only once". The Datum conversion and VARATT macros are more troublesome. We have some conversion support for a number of common zig types. But ideally we would like users to be able to use the C APIs as is, while we provide some type directed default conversions for convenience.

The main problems we've been facing with C macro translations are type conversions/casts in macros, especially if the underyling struct heavily uses union (for example VARATT macros). In some cases translating inline functions instead of macros might work better, due to the translator having more type information available. We fixed some of the translations manually. You can find them in varatt.zig and datum.zig (where we opted to implement the text to cstring translation ourselves).

Data structures like lists, slist, dlist, hash tables are quite consumable as is. We have some typed wrappers for those and provide iterators. Macros with control flow can not be reused, but I think this is fair, especially as the foreach macros are a very common C patterns. All in all we have had no troubles with them.

anarazel · on March 21, 2024

Hi,

> > If we were to provide zig consumable ways of creating the struct from C headers, would that help?

> Yes, that would be awesome.

I don't know much about zig and won't have a whole lot of time to learn - but if you can outline what is required to make C macros [un]usable, it might be possible to improve something. Either on its own, or as part of future work.

> I'm curious how they will look like in the future.

There's quite a few things.

For one, I'd like to introduce a faster function calling infrastructure for the simple cases (mainly small number of arguments, without SRF support). That'll need to be declared in the function info struct.

For another, eventually I want to support a different encoding for variable length types. Including making it reasonably efficient to have variable-length integers.

> We have had the most difficulties with the module magic, function info, and varatt/Datum macros. Fortunately you have to solve the module magic/function info "only once". The Datum conversion and VARATT macros are more troublesome. We have some conversion support for a number of common zig types. But ideally we would like users to be able to use the C APIs as is, while we provide some type directed default conversions for convenience.

Ugh, the varatt stuff doesn't look easily maintainable long-term. It looks like you just need it for getDatumTextSliceZ()?

At the same time, I don't really know why you need it? Most of this should be doable via C functions, and the parts that are not, you could easily wrap yourself - you already seem to have some C code as part of pgzx.

bear9628 · on March 21, 2024

Edit: I wasn't finished before submitting the response by accident :)

> > > If we were to provide zig consumable ways of creating the struct from C headers, would that help? > > Yes, that would be awesome. > I don't know much about zig and won't have a whole lot of time to learn - but if you can outline what is required to make C macros [un]usable, it might be possible to improve something. Either on its own, or as part of future work.

Hm. Sometimes it is difficult to tell until you try to use a macro. This is because the compiler ignores code (no type checking) that is not used. Difficult to explain, but assume that you write a program that emits typed code that is eventually compiled. This is what enables comptime, and "best effort" C header imports.

The toolchain tries to convert macros into inline functions. That means any macro that contains some form of control flow or opens/closes a code block can't be used. Most obvious ones are the foreach loops, PG_TRY and friends or the PG_RETURN_X macros (luckily we can just use the XGetDatum functions).

Union types are difficult as I said. But maybe this is rather a Zig problem.

Sometimes using consts. For example when working with the varattrib variants we have bit wise operations and shift for example:

``` #define VARSIZE_4B(PTR) \ ((((varattrib_4b *) (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF) ```

Now the 2 and the bit pattern might be translated into different types (e.g. int), which might not be compatible with va_header (which is an uint32). Sometimes the types for the constants look ok, sometimes not. Maybe this is something that could still improved in Zig, not sure. I haven't tried this, but I wonder what would happen if I annotate the types for the constatns in the macro (which might not make them more readible :) ).

We later decided to allow mixing C with Zig code in case we need some kind of "complex" wrapping in C. This might not be fully ideal, but fortunately Zig is also a C compiler which allows us to fallback to C if we find something to complicated.

> For one, I'd like to introduce a faster function calling infrastructure for the simple cases (mainly small number of arguments, without SRF support). That'll need to be declared in the function info struct.

This sounds great. In pgzx we actually allow developers to capture the function call info as argument in their function implementation (not shown in our examples). For example if someone wants to use the collation, do some checks on nargs, implement a function with variable number of arguments.

But out of the box we try to derive input and return types and conversions at compile time. I would have to see how the new API looks like, but I think we still would be able to continue to automatically derive the conversions to extract the arguments into values in Zig.

> Ugh, the varatt stuff doesn't look easily maintainable long-term. It looks like you just need it for getDatumTextSliceZ()?

> At the same time, I don't really know why you need it? Most of this should be doable via C functions, and the parts that are not, you could easily wrap yourself - you already seem to have some C code as part of pgzx.

True. We introduced C into the code base later in our development. The project is still very new and we might revisit some choices on the Datum encoding.

The `getDatumTextSliceZ` actually resembles the `text_to_cstring` function, which we might want to use in the future instead. In `Zig` a string is a slice, which is a fat pointer (pointer + length field). The type `[:0]const u8` represents a slice with 0 terminator (fun fact, Zig gives you a stack traces if you forget to write the terminator into your buffer). Initially we implemented this function directly so we can directly initialize the fat pointer without having to get the string length after doing the conversion.

We added C to our code base later in time to allow us to wrap simple cases more easily without having to reproduce Postgres code in Zig. I guess we should revisit `getDatumTextSliceZ` :). Either have a small C wrapper over `text_to_cstring` that also returns the length or just bite the bullet and do a `strlen` after.

Another motivator to try to fix the VARATT/VARDATA macros was to allow developers to use those in their own extensions. Looking at some extensions in contrib we find e.g. `VARDATA_ANY` or `VARSIZE_ANY_EXHDR` being used quite a bit.

bear9628 · on March 21, 2024

No TLE support is a not a goal. Languages/extensions in Postgres should only be marked as trusted if you can guarantee that they do not access Postgres internals or even the file filesystem. With Zig and in pgzx in particular we can not guarantee this to be the case.

With pgzx we provide you a number of utilities on top of the C API. But depending on the extension you are working on you might heavily use the Postgres internals and their C APIs as is.

Postgres has support for a number of languages. Some having their own strenghts or weaknesses. One strength of Zig is its closeness to C, while providing a many quality of live improvements over C (optional types instead of null pointer, generic data structures, ...). This makes it a formidable choice for extensions that want to integrate deeply with the Postgres internals.

anarazel · on March 21, 2024

If I suddenly had unlimited resources and attention, I'd really like to provide infrastructure to write trusted functions by running them via a webassembly runtime, with a small injected safe surface for accessing postgres functionality. That'd make it a lot easier to write trusted PLs in a variety of languages. Still probably needs a bit of per-language sugar on top, but a lot less than doing it from scratch.

Edit: Minor grammar fix

michelpp · on March 21, 2024

I don't know much about it, but I think this has been tried?

https://dylibso.com/blog/pg-extism/

Looking at the repo though it looks like it hasn't been updated in a while.

nilslice · on March 21, 2024

we're actually working on a new version of this, which _does_ support TLE! if you're interested, please join https://extism.org/discord and we can get you a preview.

bear9628 · on March 21, 2024

Oh wow, that is really cool.

bear9628 · on March 21, 2024

Hehe :)

I also have WASM in mind when thinking about future TLE support :)

pjmlp · on March 22, 2024

Oracle, IBM and Microsoft have been doing that for decades with JVM, CLR and TIMI, yet another use case where Webassembly is a follower.