Could you explain what a git trailer is if not appended to the message body? My understanding is that trailers are just key-value pairs in a particular format at the end of the message; there's not an alternative storage mechanism.
Even so, trailers or message body might be moot - rerolling the committed at timestamp should be sufficient!
OCI artifacts, using the same protocol as container registries. It's a protocol designed for versioning (tagging) content addressable blobs, associating metadata with them, and it's CDN friendly.
Homebrew uses OCI as its backend now, and I think every package manager should. It has the right primitives you expect from a registry to scale.
A lot of folks think this, but did you also implement EDNS0?
The golang team also thought DNS clients were simple, and it led to almost ten years of difficult to debug panics in Docker, Mesos, Terraform, Mesos, Consul, Heroku, Weave and countless other services and CLI tools written in Go. (Search "cannot unmarshal DNS message" and marvel at the thousands of forum threads and GitHub issues that all bottom out at Go implementing the original DNS spec and not following later updates.)
This also happens on switching virtual desktops, even with reduce animations there is a 100ms+ delay before any input on the new desktop will be sent to the correct app.
Oh, this is quite similar to an online parser I'd written a few years ago[1]. I have some worked examples on how to use it with the now-standard Chat Completions API for LLMs to stream and filter structured outputs (aka JSON). This is the underlying technology for a "Copilot" or "AI" application I worked on in my last role.
Like yours, I'm sure, these incremental or online parser libraries are orders of magnitude faster[2] than alternatives for parsing LLM tool calls for the very simple reason that alternative approaches repeatedly parse the entire concatenated response, which requires buffering the entire payload, repeatedly allocating new objects, and for an N token response, you parse the first token N times! All of the "industry standard" approaches here are quadratic, which is going to scale quite poorly as LLMs generate larger and larger responses to meet application needs, and users want low latency outputs.
One of the most useful features of this approach is filtering LLM tool calls on the server and passing through a subset of the parse events to the client. This makes it relatively easy to put moderation, metadata capture, and other requirements in a single tool call, while still providing low latency streaming UI. It also avoids the problem with many moderation APIs where for cost or speed reasons, one might delegate to a smaller, cheaper model to generate output in a side-channel of the normal output stream. This not only doesn't scale, but it also means the more powerful model is unaware of these requirements, or you end up with a "flash of unapproved content" due to moderation delays, etc.
I found that it was extremely helpful to work at the level of parse events, but recognize that building partial values is also important, so I'm working on something similar in Rust[3], but taking a more holistic view and building more of an "AI SDK" akin to Vercel's, but written in Rust.
Not using the NVDEC and NVJPG units to decompress weights into registers? And you say you're using the whole GPU. There are entire blocks on the silicon going idle!
Ha made me chuckle. For those wondering seriously about this, it’s not a viable optimization because weights are not readily compressible via JPEG/DCT, and there are a limited number of these units on the chip which bottlenecks throughout, meaning speed is dwarfed by simply reading uncompressed weights from HBM.
Good fun. Now I wish RT cores would be programmable with some form of PTX, but for now it's Optix or die. Managed to do fun stuff with it but it's like pulling teeth.
I won an GPU hackathon back in 2019 doing something very similar to this; although the other way around, I was compressing weights using hardware modules.
“Never believe that anti-Semites are completely unaware of the absurdity of their replies. They know that their remarks are frivolous, open to challenge. But they are amusing themselves, for it is their adversary who is obliged to use words responsibly, since he believes in words. The anti-Semites have the right to play. They even like to play with discourse for, by giving ridiculous reasons, they discredit the seriousness of their interlocutors. They delight in acting in bad faith, since they seek not to persuade by sound argument but to intimidate and disconcert. If you press them too closely, they will abruptly fall silent, loftily indicating by some phrase that the time for argument is past.” - Jean-Paul Sartre
Words to remember when one hears it's "just locker room talk" or "boys being boys" or "just a joke, lighten up".
Always exciting to see Robert Escriva's next database project. Building on top of the incredible engineering of S3 is clever, I'm definitely going to be looking at and learning from the implementation in wal3.
Looking forward to the benchmarks on using S3 for piece and the design of the scale out architecture.
Even so, trailers or message body might be moot - rerolling the committed at timestamp should be sufficient!
reply