Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: WarpBuild – x86-64 and arm GitHub Action runners for 30% faster builds (warpbuild.com)
154 points by suryao on Dec 8, 2023 | hide | past | favorite | 87 comments
Hey HN, I’m Surya and I’m excited to show you WarpBuild!

WarpBuild provides fast, secure `x86-64` and `arm64` Github actions runners. This speeds up your workloads by 30%, at half the cost, and takes ~2mins to get started.

We’ve been seeing pretty good results since we opened up signups a week ago and I’ve shared some numbers publicly here [1].

Currently, we support linux runners for Github organizations (not personal accounts) and MacOS support is coming soon (~Jan).

The way the runners work is deceptively simple: Runners are assigned to hardware that is ideal for build workloads with fast NVMe disks and high single-core performance.

The runners are allocated on VMs, not containers. This provides faster performance and enables use cases requiring (1) nested virtualization for running firecracker and other hypervisors, (2) k8s without relying on kind, and (3) Android emulators on `arm64` instances in test workflows.

We also have released a Github Action called `Action-Debugger` that allows you to SSH into a running workflow for simplifying pesky debugging[2].

The same set of packages that you’d get on Github hosted runners are pre-configured (on x86-64 runners) so everything works out of the box with no modifications needed.

A very minor detail that I’m rather proud of, and I’d love your thoughts on improving it further, is the onboarding flow for the ease of moving workflows to WarpBuild. We’ve also put in a lot of effort into making the workflow start up time where we are as fast or faster than Github.

[1] https://x.com/suryaoruganti/status/1732932591001735419 [2] https://github.com/WarpBuilds/action-debugger, h/t to tmate

Making builds faster by providing optimal hardware and configurations across CI providers is the first step in our mission to make build engineering better.

I’d love your feedback on the product and thoughts on other CI pain points we could solve to enable better collaboration and developer experience.



Congrats on the launch! There do seem to be a number of other entrants in this space: https://github.com/neysofu/awesome-github-actions-runners#li...

What makes you stand out from the pack? The VM approach seems very cool - is this unique in the space? Do you have different approaches that provide speedups or security benefits not possible with other third party runner systems? Any benchmarks against competitors?

Separately, I'm curious about how you address VM startup speed. Do you boot VMs on demand, or do you have a pool of booted VMs awaiting jobs?

Anyways, it's exciting to see new approaches in the space! Wishing you and the team the best of luck!


VMs are a necessity if you are serious about security and isolation guarantees. I'd hope everyone else also uses it.

I haven't run benchmarks but this comment provides a glimpse - https://news.ycombinator.com/item?id=38571518

VM startup speed has many levels to it. Right now, we are doing the inefficient job of having a pool though we have some items in the roadmap to fix this better.

In terms of speed up, we are doing things differently. For instance, we are baking in container layer caching natively so that users can benefit. This leads to speed ups of 2-10x depending on how the dockerfile is structured for caching.

This is just the first step - we have a very exciting roadmap :-)


> VMs are a necessity if you are serious about security and isolation guarantees. I'd hope everyone else also uses it.

How do you ensure that the VMs are clean on every run? Do you boot up a fresh clean install?

How do you make sure your host machines are clean too? What’s the cadence for resetting those host machines up?


They are ephemeral VMs and are alive only for the duration of a single job. They are not reused.


We are essentially competing in the same space with https://dime.run/

We also use VMs. But they are persistent. So you always see your runners as Online in GitHub UI.

We achieve this by investing in virtualization technology so that idle runner VMs do not consume too many resources. Disclaimer: I used to work for Google Cloud.


Just out of curiosity, are there any of these 3rd party github action runner services that support persistent disks or have some kind of very fast local cache that can be shared across runners? The majority of time in my workflows is spent downloading the same docker images and dependencies to the runner over and over. I've found Github's own cache to be fairly slow and lackluster.


We already have a local container cache for speeding up dockerhub pulls.

Automated container layer caching is coming in ~2 weeks.

This will be present transparently so you'll be able to get the goodness with zero changes to your current actions.


sounds great, will definitely check it out in the new year!


We have persistent disks support in https://dime.run


Awesome, just signed up for your waitlist :)


Congrats on the launch! I've spent some time recently with great success speeding up CI for my teams via alternate actions runners, and the increase in efficiency that comes with dramatic reductions in build times is worth it. When the cost is the same (or less), it's an absolute no-brainer.

How do you differentiate from BuildJet, which takes a similar approach?


We've had a few customers migrate over from BuildJet because WarpBuild is in active development. For instance, we are adding support for macos runners in Jan.

Our mission is broader than just fast runners - it's about better CI dev ex. This includes surfacing recommendations that would optimize build times, insights into the critical paths of workflows and more.

We're also investing in tooling to overcome issues that currently exist, such as an action to ssh into running workflows for easy debugging.


Awesome. Do docker image layers persist across build runs? Github, BuildJet, etc. use ephemeral runners, so subsequent runs have to re-pull everything from scratch, which is where most of my actions' time is spent now. If you're able to persist these across runs, that'd be a reason to switch alone.


We have this with https://depot.dev out of the box. You connect to a native BuildKit and run your Docker image build on native Intel and Arm CPUs with fast persistent SSD cache orchestrated across builds. It’s immediately there on the next build without having to save/load it over the network.


Not yet, but coming soon (~2 weeks)


This will, by itself, immediately sell me. I’ve spent countless hours and lots of deep deep reading trying to get satisfactory results on GitHub Actions, with no success. From what I’ve seen, plenty of other people are in the same boat.


I'll keep hn posted!


I migrated from BuildJet this week because BuildJet’s caching is broken. Installing cached pnpm dependencies takes about 12s on GitHub and WarpBuild runners. It takes 2m on BuildJet, which is about half the runtime, effectively negating the cost savings of BuildJet over GitHub.

I reported this issue to BuildJet over a week ago and haven’t received any response.


Exactly my experience as well: https://x.com/crohr/status/1732442731715113374

In the tests with my GitHub Action [1] that spawns ephemeral runners for any workflow, I found BuildJet bandwidth speed 10 to 20 times slower than machines at AWS.

[1]: https://github.com/runs-on/action


I’m currently evaluating Buildjet. I’m curious about this caching issue. Were you using actions/cache or buildjet/cache?

https://buildjet.com/for-github-actions/docs/guides/migratin...


We used BuildJet cache for months. It’s possible it was always broken and I only noticed a few days ago. I tried both and neither actually cached data. I even tried forking, and upgrading, the BuildJet variant to no avail.

I spent a solid couple hours trying to fix this before moving to WarpBuild.


You can continue to use actions/cache if using WarpBuild :)


Thanks for your trust! I'm here to ensure you have a good experience with WarpBuild and for feedback/requests.


First attempt feedback: onboarding is amazing, I love how easy it is to create a PR with the VM.

- Github actions are faster for us (~30% faster)

- Some of our tests failed while waiting for a dockerized server to be up

- It takes several minutes before all jobs are running (I have a pipeline with 6 parallel jobs, a few started with 2 minutes delay).


Thanks for the feedback!

We are currently seeing fairly heavy load. The hn hug of death is real. Tweaked some settings and the startup delays should be back to the sub-10 second range in a few minutes.


I suspected that :) Where do you guys host the servers that run Github actions? (startup time is better now but speed is still much worse than Github)


Right now, we are on a public cloud. We will be moving things onto our own infra with overflow on public clouds eventually.

Try out the jobs once again - you should be okay (I think :) )


Hi, quick question:

> Runners are assigned to hardware that is ideal for build workloads with [...] high single-core performance

In my kind of projects (C++, Rust, C) the builds are highly parallelizable, so single core performance is generally not what you want, if you can instead get a lot of cores.

The main bottleneck to my own build pipelines on github was how painful it is to use containers, and how "helpful to idiots but not experts" a lot of the github actions docs are (microsofts style, I guess?).

Good luck though!


If you need machines with high number of CPUs, you can check https://github.com/runs-on/action which allows you to select any EC2 instance type as ephemeral runner. Plus it’s the cheapest on the market, open source, and not a SaaS


We do have high core options too (up to 16) but not crazy high like with GPUs. You'll probably still see good benefits.

Could you elaborate on the pain points with using containers?


Yes! everything changes when using containers, whereas in gitlab everything is based on containers to begin with


We support machines with large core counts in https://dime.run so check it out :)


This looks promising, but it doesn't work on personal accounts. I'm not ready to install it on my organization account just yet.

Can I expect complex caching actions like https://github.com/DeterminateSystems/magic-nix-cache to work as quickly as they do on GitHub?


Yes. Graham is a great guy and I'm working with him to ensure it does.


Awesome work! Congratulations on the launch. This reminds me a lot of https://depot.dev

I'm not officially affiliated with them at all. But I'm a big fan of their product.

It appears that one difference though is that Depot is more focused on just docker builds and y'all are more generalized runners Is that right?


That's true. We have a slightly different focus - CI workloads. However, the goodness of depot.dev comes from buildkit remote builders and remote cache. That'll be natively integrated into our runners in ~2 weeks.

So you'll get that goodness when running CI with zero changes to your actions needed.


1. Can you share information about the specific hardware/CPUs used and where you are hosted? 2. Running untrusted workloads is a huge security challenge. Can you share the technology you are using for isolation and how you have approached mitigation of security threats?


Would you still have a viable product if GitHub decides to performance-optimize their action runners?

I get that performance is important, and if MS puts their weight behind it I can see them fixing their stuff and basically removing the market for 3rd party solutions.

Or is this maybe a “hey MS buy us?” thing?


There is a performance - cost envelope that we are pushing, which I believe github will be hard pressed to match.

Also, this is the first step in our broader objective to: (a) support all CI providers (b) provide ecosystem support and tooling for efficient build engineering. The latter is in the form of additional tools, recommendations to be incorporated into workflow design, build insights etc. We are just getting started.


It seems half-baked to me. I logged in, and it says only org accounts are supported. I'm interesting because I can not use Win32 and mac M1 instance on github actions, but I failed to find any docs mentioning supported instance types.


Hey, thanks for the feedback. It's a decision I took to not support individual accounts since the github requirements usually suffice for those needs.

The docs are wip - it'll be updated in the next couple of days.


Oh neat, I came across BuildJet the other day.

I was trying to cross-compile a side project (https://github.com/marcus-crane/october) for Linux arm64 but trying to do so would throw up some instruction set errors.

I had parted the idea of supporting Linux arm since Github has no runners but I threw in BuildJet and it spat out a working build with no problems!

Given it only needs to run on release, for a small open source project, being charged something like 1 cent per build is surprisingly reasonable compared to having no runner at all / having to spin up a self-hosted runner :)


Check us out! We put some effort into the onboarding to make it super easy to use WarpBuild runners.


Oh, maybe I hadn't made it clear enough that I have already been successfully using it :)


<3


One of the reasons we didn't go with buildjet was their concurrency limits (https://buildjet.com/for-github-actions/docs/about/pricing#c...) and the pricing on extending those limits.

We are a small company but our autoscaling cluster for GitHub actions on aws will scale up to >500vcpus during the work day when there are a lot of prs going in.

I don't see it documented anywhere, what are your concurrency limits on accounts?


We don't enforce concurrency limits. It is not something that we want our users to think about. I'd hate to worry about it too.

In general, we should be able to deal with spiky workloads of that scale without issue in a couple of minutes.

I'd love for you to try us out.


> The runners are allocated on VMs, not containers. This provides faster performance

What? I haven't benchmarked it lately, but containers should (almost?) always have less overhead and better performance than VMs

(I do agree that VMs are far more flexible and let you do privileged things; it's only perf that I question)


From a first-principles perspective, the stack is as follows:

VMs on warpbuild: baremetal > hypervisor for VM > runner workload Container: baremetal > (cloud VM [1]) > k8s worker node OS > containerd > container OS > runner workload

This assumes containers are running on k8s, which is an okay assumption in this case. The perf penalty of using a VM is much lower.

Note: if you are referring to the VM spin up time, then it is a whole other story and we have taken some pains to mitigate that to achieve comparable spinup time.

[1] if using a non-bare metal ec2/gce instance, say


I like your product and I think VMs are a fine choice, but this representation seems somewhat inaccurate. For one: don't the VMs you launch have their own OS?

And also while containerd and potentially some Linux distro in a container are involved, they aren't really adding runtime overhead. containerd (via runc) instructs the Linux kernel to isolate the workload processes, but doesn't really sit in between them and the kernel. Further, the OS in the container doesn't have its own kernel and most of the time not even its own init. It's really just a set of libraries and binaries.

I believe that you run workloads faster than competitors relying on containers, but it doesn't seem to me that containers are the problem. If you installed Linux on the same baremetal host, I'm convinced you can get that same performance in a container that you can get in a VM on that host.


You're right - I missed that VM OS in my illustration up there.

Benchmarks are hard to come by but iirc, each VMM adds a 3-5% overhead. The gatekeeping for permissions and sandboxing that is done at each level involves active compute cycles and that adds a little overhead.

Something of the order of 10% may not be large but there's a difference.[1]

Disk IO is another major factor btw, since virtual filesystems can be ... flaky.

[1] https://www.vmware.com/pdf/hypervisor_performance.pdf


What's the story with the LICENSE file in this repo <https://github.com/WarpBuilds/warpbuild-agent/blob/main/cmd/...> which is not only zero bytes but also down in a subdirectory?


That was oversight. Fixed it - thanks for pointing it out.


How do people rationalize using a service like this for anything other than toy projects? Sending your source code to some service, then adopting and executing the artifacts it produces, means this is the central, most critical aspect of your security story. For real projects it doesn't stand even a moment's scrutiny.


In this world of SaaS, we're already sharing many of our crown jewels with other services anyway. Hosted DB's, auth services - this is just another in a long line. Some might say that data is more important than a snapshot of the code anyway (depends on the domain).

Call me old school, but I'm with you. A vendor would have to be exceptionally well regarded over a long time to get my trust in such a scenario.


It's easier to imagine that Microsoft isn't going to monkey with your private data, and that they in all likelihood have a 24x7 security team actively prowling around looking for intruders. But a fly-by-night outfit has no reputation at stake and probably has no ability to realize that their whole junk has been pwned by Chinese intelligence or whomever.


A lot of smaller teams/companies don't really care about security, sadly... and is this really that different from Github Actions, CircleCI, Vercel, anything...?


It has the difference of not having been around for as long and accruing reputation. That's nothing inherent, and will come with time of course. Of those, the only other difference is that github actions has the advantage of being first party; if you use github, then there's no increase in security exposure by using their own actions.


That's fair!


I am not sure I see the problem. There is a business with a reputation behind that service. There is a contract and it no different than a contract with another provider like GitHub.

Who are you to judge "real projects" BTW? It might not suit your security profile, but it might for other businesses that have different security concerns.


We are currently on Buildjet and it has made significant difference for our rust builds. But I would change in a heartbeat for Windows and even more for MacOS support. Mind you only if it is per minute pricing. The GitHub Actions macos machines are sooo bad that anything else at this point I will take.


We have a few users who were using BuildJet try us out and they like it. It's an easy switch if you want to check it out.


What do you need macOS for? Have you considered cross-compilation?

Zig + Rust allows cross-compilation for Rust: https://actually.fyi/posts/zig-makes-rust-cross-compilation-...


Anything that interact with the kernel needs to be tested on the actual OS. Simiarly if you call native libraries. And Apple Clang masquerading as GCC or not having all upstream LLVM patches is also annoying.


Sure, but now your interaction with macOS is limited to automated testing and not compilation + testing. Depending on the application, this could be worthwhile.

For example, at work, we cross-compile C/C++/Fortran/Rust code in R packages. We compile for supported versions of R, so that ends up being tens of thousands of packages that we need to compile.

By cross-compiling we saved a lot of work and nearly eliminated our need for macOS.


A very common use case for macos runners is iOS builds.


You can cross-compile iOS apps.


Are you having any slowdown right now? I have been waiting for a 32x to pick up a job for some time now:

  Requested labels: warp-ubuntu-latest-x64-32x
  Job defined at: .../workflows/ci.yml@refs/pull/294/merge
  Waiting for a runner to pick up this job...


We do only 2x, 4x, 16x right now. We can add 32x but generally haven't seen much demand for it.


Ideally it should not be advertised then e.g. https://www.warpbuild.com/pricing


Thank you so much for the speedy response in all mediums I tried to reach out, downgraded to 16x for now, checking if it'll work.


It’s a great idea. I’d want even faster though, GitHub Actions are quite a bit slower than my Air M2. If you spun up a fleet of top of the line Hetzner boxes I’d expect it’d be 100% faster than actions. 30% faster for half cost is just a bit too small of a gain to make the leap.


There are 2 factors here:

1) I was being conservative on my promises. For instance, we have users who reported GHA runtimes going down from ~25m to ~9m [1] 2) Local builds have the amazing advantage of being able to cache subsequent runs. CI workflows are ephemeral. This introduces a performance penalty. However, we are working on something that automagically caches some builds, especially container builds, that would enable huge benefits of 2x-10x further.

Hope this clarifies!

[1] https://x.com/suryaoruganti/status/1730419264132370556


I've been using github actions cache to store build artifacts (object files) between builds. It takes a bit of fiddling but it's possible.


Yeah, interested to compare this with optimized and properly cached GitHub Hosted Runner builds.


Main CI pain point is all the CIs have some kinda crappyness to them and you are wedded with all the yaml config that is not portable.

ArgoCD might be the way to go (run in Kubernetes) since at least it is hackable if it doesn’t do what you need or there is a bug.


https://github.com/philips-labs/terraform-aws-github-runner

This works extremely well. It just spins up spot (or normal) instances as needed.


How can it be 30% faster when the vast vast majority of the execution time is spent doing things like building, bundling, etc. ie things independent of the action runner performance.

Edit: Never mind. Misunderstood what this is.


Thanks for taking the time to read my little launch and glad that it's clearer what we do.

Try it out if you can, you will like the results :-)


Pardon my ignorance being on mobile, maybe I’m misunderstanding what this does exactly, but does this support multi-OS builds? Like having a Windows, Mac, and Linux runner for native builds using Rust?


I added your RSS feed so I can check back in when macOS runners are added, it just has one article which is "TODO"


Anyone able to compare with buildjet? Happy to see more competition in the space. :)


I don't have benchmarks but here's another comment from the thread. https://news.ycombinator.com/item?id=38571518


When it says "Get 2000 build minutes for free," is that per month?


Yes. And to clarify to ensure no surprises, it's 2000 build minutes of the 2x runner. That's 1000min if on the 4x, and 25min on the 16x. It's twice more minutes if using the arm runners.

May be easier to think of it as an $8 credit. It shows up as such in the dashboard.


macOS support will be a big deal. GitHub's macOS runners are straight trash. They are painfully slow and horribly expensive. In same cases I've wondered if it made sense to build part of my project on linux then build the last bit on macOS since macOS minutes are 10x as expensive _and_ the same operation is noticeably slower. In the end the savings weren't worth the complexity and development time. That said, GitHub should be ashamed of how bad their macOS runners are.


Absolutely! We're working on it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: