The GitHub Load Balancer

NicoJuicy · on Sept 22, 2016

I notice a lot of negativity arround here. Don't know why that is... But i'll take my 5 cents on it.

NIH - Not invented here and redoing an opensource project.

- Github said they used HAProxy before, i think the use case of github could very well be unique. So they created something that works best for them. They don't have to re-engineer an entire code base. When you work on small projects, you can send a merge request to do changes. I think this is something bigger then just a small bugfix ;). Totally understand them there for creating something new

- They used opensource based on number of open source projects including, haproxy, iptables, FoU and pf_ring. That is what opensource is, use opensource to create what suits you best. Every company has some edge cases. I have no doubt that Github has a lot of them ;)

Now,

Thanks GitHub for sharing, i'll follow up on your posts and hope to learn a couple of new things ;)

otoburb · on Sept 22, 2016

Given this is based on HAProxy and seems to improve the director tier of a typical L4/L7 split design, I'm led to believe GLB is an improved TCP-only load balancer.

But they also talk about DNS queries, which are still mainly UDP53, so I'm hoping GLB will have UDP load-balancing capability as gravy on top. I excluded zone transfers, DNSSEC traffic or (growing) IPv6 DNS requests on TCP53 because, at least in carrier networks, we're still seeing a tonne of DNS traffic that still fits within plain old 512-byte UDP packets.

Looking forward to seeing how this develops.

EDIT: Terrible wording on my part to imply that GLB is based off of HAProxy code. I meant to convey that GLB seems to have been designed with deep experience working with HAProxy as evidenced by the quote: "Traditionally we scaled this vertically, running a small set of very large machines running haproxy [...]".

dcgudeman · on Sept 22, 2016

Where did it say that it is based on HAProxy?

bogomipz · on Sept 22, 2016

If you look under "stay tuned" it says:

"Now that you have a taste of the system that processed and routed the request to this blog post we hope you stay tuned for future posts describing our director design in depth, improving haproxy hot configuration reloads and how we managed to migrate to the new system without anyone noticing."

That leads me to believe it involves HAProxy.

otoburb · on Sept 22, 2016

EDIT: I read the post more carefully - the reference to DNS was merely to highlight that often times a single public [V]IP can get overloaded and common mitigation strategies.

jimjag · on Sept 22, 2016

I am increasingly bothered by the "not invented here" syndrome where instead of taking existing projects and enhancing them, in true open source fashion, people instead re-create from scratch.

It is then justified that their creation is needed because "no one else has these kinds of problems" but then they open source them as if lots of other people could benefit from it. Why open source something if it has an expected user base of 1?

Again, I am not surprised by this. They whole push of Github is not to create a community which works together on a single project in a collaborative, consensus based method, but rather lots of people doing their own thing and only occasionally sharing code. It is no wonder that they follow this meme internally.

alexandros · on Sept 22, 2016

here's an alternative interpretation: Due to their unique requirements, they were forced to investigate alternative approaches to the problem, and the approach they went with, they believe, is useful to a lot more people than just themselves. This is both plausible and matches the announcement, so why take a bad faith approach to someone releasing some code to the open source? Would you prefer if they kept it closed?

And since we're talking about github, haven't they already launched a highly successful pair of projects in atom/electron, in areas where both had competition? why start with negativity before we see what they come out with?

saurik · on Sept 22, 2016

Huh? Atom/Electron is another great example of GitHub duplicating a ton of existing projects (whether dependencies such as CEF and node-webkit or high-level solutions such as ACE) without seemingly having any interest at all in joining those existing projects. Just because someone is successful at doing this does not make what they are doing any more reasonable: if anything it should just put them in a similar place in your mind to the Microsoft of the 90s which many people here would denigrate. GitHub's model of "open source"--the one which it is, devastatingly, teaching to an entire generation of developers--is only about code being available as opposed to being about community and collaborative design. Asking if one would prefer an alternative where the code is simply kept closed source ignores the premise of the compliant: that the code being an advertised separate project undermines the premise that working with an existing project to solve a problem tons of those users almost certainly also have. :/

alexandros · on Sept 22, 2016

Open source does not entail any responsibility to work or not work together with any pre-existing project. What you describe is more cathedral than bazaar. There are thousands of possible reasons, from architectural choices to personalities, that someone may have not chosen to work with an existing project. I will not fault them for giving me a superior result, for free (in both senses). To compare this with Microsoft in the nineties is sheer madness.

cstejerean · on Sept 22, 2016

Or maybe they wanted to spend their energy actually solving their problems rather than trying to persuade the maintainers of other projects about their approach. For example, could they have proposed some changes to IPVS to get rid of multicast for state sharing? Maybe. But then they'd spend all that time arguing with other users of that project about the relative merits of each. Instead they built a new solution and users now can have a choice, including the choice to take some of these ideas and apply them to the other projects if they are clearly superior.

I would accuse them of NIH if they simply reinvented another wheel when there was a perfectly acceptable solution already out there. But it doesn't seem like that was the case. Instead they clearly evaluated the existing solutions, found shortcomings, and decided to solve those problems for themselves, and then publish the resulting code. I see nothing wrong with that approach.

olalonde · on Sept 22, 2016

> Due to their unique requirements [...] This is both plausible

I'm skeptical that Github's load balancing requirements are that different from any of the other large file hosts and SaaS companies. But it's possible and I'm not in a position to tell. That being said, the "NIH syndrome" is a largely overlooked problem in our industry and I think it's reasonable to raise concerns over new projects that may be reinventing the wheel.

positr0n · on Sept 22, 2016

Some of the requirements they list are pretty unique to github since they serve .git to git clients as well as http to http clients (which is what most SaaS companies do). Like a very long running git clone from someone with slow internet not having its connection dropped.

sitkack · on Sept 23, 2016

As person who has used github on totally lousy connections in remote parts of the world, it faired far better than most Isomorphic web apps. People in Nepal can use github to actually get work done. Most current generation web apps won't even load.

contingencies · on Sept 23, 2016

I second this from China. Apple Appstore based OSX updates are literally impossible here. Annoyingly, they are required to upgrade Xcode.

sitkack · on Sept 23, 2016

One technique I used is to download updates directly via

    wget -c --limit-rate=200K http://support.apple.com/downloads/DL1833/en_US/osxupd10.10.5.dmg

-c resumes download where left off, the rate limiting can be used to match the speed of your connection or intermediaries reducing stalls, drops and angry network users. I would often keep a list of URLs in `download-queue.txt` and use the above flags with -i to load the list of urls, letting it run overnight at some much lower speed.

contingencies · on Sept 23, 2016

A download manager would be fine if there was a non-opaque method of determining the download address, and they actually let you do that (I know the developer site for instance uses some combination of cookies/referrers to limit external access to binaries presented for download).

sitkack · on Sept 23, 2016

I haven't tested this specifically, but I did something similar

https://gist.github.com/dbr/294151

extracting cookies out of Safari so I could download developer binaries from apple.

ckdarby · on Sept 22, 2016

Have you ever contributed to HaProxy? Have you ever tried committing massive alterations to major open source projects?

It isn't as simple as here's my massive rewrite, click the accept button and everything works out for the open source community.

Let me be the first to say that the level of politics, circle jerking and knowing people is ridiculous.

polpo · on Sept 22, 2016

Given the good reaction to an out-of-the-blue patch from me on the HAProxy mailing list, I'd imagine that contributing even major changes to HAProxy probably would go rather well. It's one of the best open source development communities I've experienced. Welcoming, but still highly focused on quality contributions. The quality and performance of HAProxy reflects this approach.

paulddraper · on Sept 23, 2016

HAProxy is exceptionally good in this regard.

zepolen · on Sept 23, 2016

As someone who's contributed multiple patches to multiple open source projects, this is 100% truth.

I have >3 month old pull requests to add tests which have never been looked at - whereas someone who knows the project maintainer will get a PR looked and merged the next day.

senko · on Sept 22, 2016

"Contributing to open source is hard, let's write our own and open source it instead."

JungleGymSam · on Sept 23, 2016

Sounds like you're suggesting you can't use your own modified fork until your PR is accepted.

eikenberry · on Sept 22, 2016

NIH syndrome is overrated. A small application doing specifically what you want is usually more robust, easier to maintain and easier extend as you need than a pre-existing application that includes most of what you need as a subset of its functionality.

jcastro · on Sept 22, 2016

> in true open source fashion, people instead re-create from scratch.

True open source fashion is also the freedom to work on whatever you want.

"Just think about it, if they would have contributed to foo instead of working on bar, then foo would be twice as good!" keeps being thrown around every time someone announces something new around here.

mobiuscog · on Sept 23, 2016

This is just ridiculous.

Open Source is about just that - allowing anyone and everyone to have access to the source and (if free) making their own version.

There never needs to be justification to write something from scratch, even if it's been done a million times before.

You can appreciate open-source, you can wish that proprietary code was open source, but you never have the right to tell people what they should do, nor are you ever likely to be correct as to what they should do - you are not them.

brazzledazzle · on Sept 22, 2016

It may not based on haproxy but they definitely used it prior to switching. If we're being generous in our interpretation that would indicate that they found that it didn't work for them. I'm not saying your overall point isn't valid but I wouldn't be so quick to judge.

gjvc · on Sept 22, 2016

by that rationale, Apache httpd would have stayed Apache httpd, and we wouldn't have gotten nginx.

paulddraper · on Sept 23, 2016

Shudder.

gjvc · on Sept 23, 2016

imagine a world full of sendmail, not postfix

bogomipz · on Sept 22, 2016

I'm not sure I agree, they mention HAProxy and Foo over UDP so they are leveraging existing open source technologies. Custom additions to suit ones particular use case isn't necessarily the same as NIH syndrome.

jimjag · on Sept 22, 2016

They mention HAProxy, but it doesn't look like it's based on it.

logicalstack · on Sept 22, 2016

Joe from GitHub here, we'll talk about it later posts but GLB is based on a number of open source projects including, haproxy, iptables, FoU and pf_ring.

Many existing open source solutions are optimized for short lived HTTP requests and don't address the long running connection issue (like a large git clone). We wanted something better for our use case.

halifaxbeard · on Sept 22, 2016

I'm currently working with GitHub Support on dealing with zip downloads of a 5GB repo failing after 2-3 minutes, with curl error "transfer closed with outstanding read data remaining".

Sure about the long running connection issue being solved? :-)

jimjag · on Sept 22, 2016

That is good to know. Thx.

serge2k · on Sept 22, 2016

> I am increasingly bothered by the "not invented here" syndrome

I'm bothered by the increasing prevalence of "never invent here".

FOSS is great, but if it's not meeting your needs then writing your own is perfectly valid.

Retra · on Sept 23, 2016

And often one of your needs is "have people on hand who know exactly how it works", and the easiest way to achieve that is to have those people build it from scratch. That's the only way to actually ensure it makes your needs priority #1.

jedberg · on Sept 23, 2016

> It is then justified that their creation is needed because "no one else has these kinds of problems" but then they open source them as if lots of other people could benefit from it. Why open source something if it has an expected user base of 1?

Two reasons.

1) Recruiting. Check out our awesome code! Don't you want to work on this too?

2) Our unique problem today will be the problems of everyone in three years.

This has borne out with the Netflix opes source. At the time it was a problem unique to Netflix -- now a bunch of people are using that software or derivatives.

dismantlethesun · on Sept 22, 2016

For me, NIH is about the idea of already having deep knowledge of the problem domain so a solution is relatively straight forwards. Sure it'll take X time, and there might be some hiccups, but its all about effort and not having to learn anything new.

Joining a pre-existing project is more reasonable when you just can't replicate the basis of the project without learning a lot of new things.

That's the reason why there's a plethora of compile to JS languages, but only a few actual javascript virtual machines.

merb · on Sept 22, 2016

basically a load balancer must often do special stuff and most companies building custom software around haproxy, nginx whatever to support their needs.

im4w1l · on Sept 23, 2016

Why would you start a new company when you can join and enhance an existing one?

Scaevolus · on Sept 22, 2016

Related presentations/papers about large scale load balancing:

Facebook: https://www.usenix.org/conference/srecon15europe/program/pre...

Google: http://static.googleusercontent.com/media/research.google.co...

gwright · on Sept 23, 2016

While I understand that NIH syndrome is a real thing, it is very dissapointing to read many of the comments here.

I think very few HN readers are really in a position to have an informed opinion regarding Github's decision to build new piece of software rather than using an existing system.

Personally I find this area quite interesting to read about because it is very difficult to build highly available, scalable, and resilient network service endpoints. Plain old TCP/IP isn't really up to the job. Dealing with this without any cooperation from the client side of the connection adds to the difficulty.

I look forward to hearing more about GLB.

Ianvdl · on Sept 22, 2016

Given the title and the length of the post I was expecting a lot more detail.

> Over the last year we’ve developed our new load balancer, called GLB (GitHub Load Balancer). Today, and over the next few weeks, we will be sharing the design and releasing its components as open source software.

Is it common practice to do this? Most recent software/framework/service announcements I've read were just a single, longer post with all the details and (where applicable) source code. The only exception I can think of is the Windows Subsystem for Linux (WSL) which was discussed over multiple posts.

logicalstack · on Sept 22, 2016

Joe from GitHub here, frankly there is a lot we want to talk about and release and it was simply too much for one post. We'd like to give it a proper treatment and single very long post won't do that. Also, it allows us to get folks interested in the project and give us time to prepare our code for release. It's a surprisingly big job.

otterley · on Sept 23, 2016

Personally, I would have preferred you waited until you could release all the documents at once. I admit I was interested, but I've seen too many people and organizations start a conversation but never finish it or show the goods. It's misleading and unfair to dangle a solution when all you really have is a problem.

Take, for example, this post from CoreOS back in March 2016 that suggested that they might know a way to improve systemd-journald performance: https://coreos.com/blog/eliminating-journald-delays-part-1.h...

It smelled suspicious, but its release generated a bunch of noise on HN anyway. And they never followed up with subsequent parts, which suggests to me that they never found a solution in the first place.

I'm not suggesting that GitHub is blowing smoke -- if you truly have a solution, that's great! But there's no harm in gathering documentation and source code and cleaning it up and waiting until it's good and ready to go. Otherwise, I frankly mistrust the motives and abilities of those involved. Call me cynical if you must.

To paraphrase from another industry, "sell no wine before its time." There's a lot of wisdom there that is equally applicable to products in our industry too.

JdeBP · on Sept 29, 2016

As if by magic, part 2 has just appeared. (-: See https://news.ycombinator.com/item?id=12603322 .

falsedan · on Sept 22, 2016

eBay released a series of posts on how they manage their Jenkins installations [0][1][2] (part four is missing).

I think people choose this pattern when:

* engineers implement something cool

* management want engineering content to promote the company/drive recruitment

* the engineers are pressed for time/not professional technical writers

Some bigger companies stagger the context when it's really hefty and makes sense to chunk it up, plus it drives repeat visitors. For smaller companies, the schedule usually slips, and the first post is usually "look how hard this problem is! wow it's really really hard! isn't is amazing that we even tried to fix it? OK see you next time!".

Yes, I think GitHub and eBay are small companies.

[0]: http://www.technology-ebay.de/the-teams/mobile-de/blog/tamin... [1]: http://www.technology-ebay.de/the-teams/mobile-de/blog/tamin... [2]: http://www.technology-ebay.de/the-teams/mobile-de/blog/tamin...

emilburzo · on Sept 22, 2016

I don't know if it's common practice, but I've noticed this trend starting to gain momentum.

See the product unveiling from Apple, GoPro and probably others that I haven't been following.

Hype and slow release, it's the new clickbait.

seangrogg · on Sept 22, 2016

As a JS developer I see these all the time with respect to frameworks/libraries. Relay and GraphQL were both announced with high-level overviews long before being open-sourced. Angular 2 is another. It seems to be the case that such announcements are more to generate interest in the subject which then helps suss out requirements that may not have been considered by the developing team.

wtarreau · on Oct 2, 2016

Did people really read the article ? For me it was pretty clear, maybe it involves some regular load-balancing terms that people are not familiar with, because I'm seeing a lot of bullshit written in the comments, but here is what is described there :

- in a traditional L4/L7 load balancing setup (typically what is described in my very old white paper "making applications scalable with load balancing"), the first layer (L3-4 only, stateless or stateful) is often called the "director".

- the second level (L7) necessarily is based on a proxy.

For the director part, LVS used to be used a lot over the last decade, but over the last 3-4 years we're seeing ECMP implemented almost in every router and L3 switch, offering approximately the same benefits without adding machines.

ECMP has some drawbacks (breaks all connections during maintenance due to stateless hashing).

LVS has other drawbacks (requires synchronization, cannot learn previous sessions upon restart, sensitivity to SYN floods).

Basically what they did is something between the two for the director, involving consistent hashing to avoid having to deal with connection synchronization without breaking connections during maintenance periods.

This way they can hack on their L7 layer (HAProxy) without anyone ever noticing because the L4 layer redistributes the traffic targeting stopped nodes, and only these ones.

Thus the new setups is now user->GLB->HAProxy->servers.

And I'm very glad to see that people finally attacked the limitations everyone has been suffering from at the director layer, so good job guys!

gumby · on Sept 22, 2016

They talk about running on "bare metal" but when I followed that link it looked like they were simply running under Ubuntu. Is it so much a given that everything is going to be virtualized?

When I think of "bare metal" I think of a single image with disk management, network stack, and what few services they want all running in supervisory mode. Basically the architecture of an embedded system.

wmf · on Sept 22, 2016

Yes, it is assumed that all startups are running in EC2 us-east-1 and "bare metal" is the accepted term for non-virtualized systems.

colemickens · on Sept 23, 2016

I don't get it. What else is bare metal meant to mean? "bare metal" = "embedded system"? What does "embedded system" mean then? I guess my age / cloud-nativeness is showing?

unwind · on Sept 23, 2016

In the embedded space, there often isn't any type of OS or kernel between the application code and the hardware resources ("the metal").

If I want to send out a character through the board's serial debugging port, I don't do an open()/write()/close(), I poke the UART's transmit register.

When they said "bare metal", I too thought they ran without OS which had been kind of cool.

gumby · on Sept 23, 2016

Thanks for the clarification. Yet another dilution of a technical term, sigh.

I was quite excited when I read it, and felt quite let down when I followed up.

Veen · on Sept 23, 2016

In the hosting world, bare metal generally means dedicated servers and clusters of dedicated servers. It's just a shorthand for "not cloud".

malodyets · on Sept 23, 2016

I wondered the same thing: "Wow, they have their own git kernel?" But no.

yladiz · on Sept 23, 2016

I'm of two minds about this. Part of me agrees with many of the commenters here, in that Not Invented Here syndrome was probably in effect during the development of this. I don't really know Github's specific use case, and I don't know the various open source load balancers outside of Haproxy and Nginx, but I would be surprised if their use case hasn't been seen before and can be handled with the current software (with some modification, pull requests, etc.). On the other hand, I would guess Github would research into all of this, contact knowledgeable people in the business, and explore their options before spending resources on making an entirely new load balancer. Maybe it really is difficult to horizontally scale load balancing, or load balance on "commodity hardware".

That being said, why introduce a new piece of technology without actually releasing it if you're planning to release it, without giving a firm deadline? This isn't a press release, this is a blog post describing the technical details of the load balancer that is apparently already in production and working, so why not release the source when the technology is introduced?

p1mrx · on Sept 22, 2016

GitHub only speaks IPv4, so I would be extra-skeptical about using any of their networking code to support a modern service.

_gtly · on Sept 22, 2016

I'm curious if they looked into pf / CARP as part of their research into allowing horizontal scalability for an ip. See: https://www.openbsd.org/faq/pf/carp.html

logicalstack · on Sept 22, 2016

CARP and similar systems require an active/passive configuration which we did not want since it needs at least twice as many hosts, half of which are not doing any work. We had similar issues with our former Git storage system based on DRDB (http://githubengineering.com/introducing-dgit/).

pfsync, lvs and etc uses multicast to share connection state which we also wanted to avoid.

voltagex_ · on Sept 22, 2016

Why do you want to avoid multicast?

ctrlrsf · on Sept 23, 2016

You would need a large L2 network to support it, or you'd have to route it at L3, which is not trivial.

treve · on Sept 22, 2016

I half expect a comment here explaining why Gitlab does it better ;)

sytse · on Sept 22, 2016

:) We're not doing this better.

We're struggling with our load balancers right now. We're using Azure load balancers and then HAproxy. But the Azure ones sometimes don't work. Luckily the new network type on Azure supports floating IPs so we can set something up ourselves https://gitlab.com/gitlab-com/infrastructure/issues/466

jdc0589 · on Sept 25, 2016

do you guys have a blog post or anything talking about why you settled on Azure? I've used both azure and aws a decent amount now, and GitLab falls outside the "right place for the right tool" mentality I've been operating with. Most of our infrastructure is in Azure, but it's also mainly windows (azure has the best windows VM pricing with a moderate was agreement, which isn't surprising). There are some regulatory considerations for us too, buts that's anither conversation.

dogismycopilot · on Sept 22, 2016

I would love to see the solution. We're also desiring to run HAProxy in Azure with keepalived (even in unicast mode). The black-box "windows based" load balancer that Azure offers is quite limited.

sytse · on Sept 22, 2016

Cool, we're trying to do all the infrastructure work in the open under https://gitlab.com/groups/gitlab-com

pcarranza · on Sept 23, 2016

Infrastructure issues are here: https://gitlab.com/gitlab-com/infrastructure

The one about the load balancer is https://gitlab.com/gitlab-com/infrastructure/issues/467 but we don't have a lot of data published just yet as we are currently figuring it out.

jedberg · on Sept 23, 2016

Awesome. The whole time I was reading I was thinking "they need Rendezvous hashing". And then bam, last paragraph mentions that is in fact what they are using.

lifeisstillgood · on Sept 22, 2016

I love using GitHub and appreciate the impact it is and has had. But this post is what is wrong with the web today. They have taken a distributed-at-it's-plumbing technology, and centralised it so much that now we need to innovate new load balancing mechanisms.

Years ago I worked at Demon Internet and we tried to give every dial up user a piece of webspace - just a disk always connected. Almost no one ever used them. But it is what the web is for. Storing your Facebook posts and your git pushes and everything else.

No load balancing needed because almost no one reads each repo.

The problem is it is easier to drain each of my different things into globally centralised locations, easier for me to just load it up on GitHub than keep my own repo on my cloud server. Easier to post on Facebook than publish myself.

But it is beginning to creak. GitHub faces scaling challenges, I am frustrated that some people are on whatsapp and some slack and some telegram, and I cannot track who is talking to me.

The web is not meant to be used like this. And it is beginning to show.

bm1362 · on Sept 22, 2016

This reminds me of the criticism of Dropbox [1] when it was first announced on HN - I think you're not the norm.

[1] https://news.ycombinator.com/item?id=8863

petercooper · on Sept 22, 2016

Years ago I worked at Demon Internet and we tried to give every dial up user a piece of webspace - just a disk always connected. Almost no one ever used them.

I did. That was one of the best features of Demon at the time, when 10MB of Web space could cost you £100+ per year :-) Thanks for your part in making it work!

RodericDay · on Sept 22, 2016

I started coding 3 years ago, and only started playing around with VPSs a year ago. It's hard to explain how... unaware I was, that I could just have a little "plot of land" on the internet, not managed by anyone other than me (and DO).

auxbuss · on Sept 22, 2016

This is very enlightening. One of things that everyone on HN think as obvious, yet is far from that to most folk; even those with a fair degree of technical knowledge.

audleman · on Sept 22, 2016

Are you saying instead of having Github, we should all be hosting our own Git repos?

> But it is beginning to creak. GitHub faces scaling challenges,

I don't agree that Github facing scaling issues means the web is creaking. More like old wooden boats are being replaced by big, sturdy battleships. I think the web is getting stronger thanks to engineers facing the challenges coming their way.

> I am frustrated that some people are on whatsapp and some slack and some telegram, and I cannot track who is talking to me.

If you're annoyed by people messaging on you through multiple platforms, it seems the solution would be to only have one provider. But you earlier call that "what is wrong with the web today," and that we should have distributed systems.

lifeisstillgood · on Sept 22, 2016

>>> Are you saying instead of having Github, we should all be hosting our own Git repos?

Well, yes. That's the point. It was designed as an entirely distributed setup. It's crazy that in order to post a message to my neighbours I have to send data to Facebook in SV and just as crazy that two devs on the same team need to write their code commits in a load balanced mega server in ... Err ... Washington? Wherever.

And I don't mind having lots of clients but I object to no open standards, incompatible and frequently unavailable APIs and lack of control over my messages and how they are dealt with. I want procmail for messaging platforms ! And I want a pony !

josegonzalez · on Sept 22, 2016

No one is forcing you to use github or facebook. I host my own gitlab installation for certain private repositories and I still send emails to some people when coordinating outings.

lifeisstillgood · on Sept 23, 2016

I Am not feeling forced to use it. I use it because it is easier for me as an individual developer. I use readthedocs because they have better uptime than my own servers.n All the reasons i use GitHub are good choices for me.

I get the economics of centralised vs decentralised service provision - it's just ironic that GitHub is facing load balancing problems precisely because they have taken a distributed technology and made it, de facto, a centralised technology.

We can imagine a perfect storm of GitHub going down just as someone pulls a vital package from npm and Google losing jquery CDN; all Of a sudden the web will stop working.

It's amazing how fragile we can make a system designed to be resilient - I presume there is a real cost with keeping things distributed that a good economist could explain to me

jstimpfle · on Sept 23, 2016

Multiple platforms != distributed

Yes, multiple platforms is overhead. That's why I mostly use mail, basically never IM, even for the most short-lived or informal conversations.

It's no problem at all for me having email conversations with people whose mailboxes are hosted at a diverse set of providers.

auxbuss · on Sept 22, 2016

You're not alone in thinking this (and I was one of those Demon customers back in the day :-) ). I've had pub chats with many tech folk over the years about how we might enable this. It needs to be cheap, of course, like 5 USD pa cheap. But, perhaps, with today's container tech that's viable. It needs other things too.

A fully distributed "web" of personally managed data is where we'll get to one day. It might need a few cycles of centralisation and distribution, though.

It's also one of the reasons why we must not let the "privacy is dead" and "back door encryption" folk have their way.

To be fair, though, I don't think GitHub is a big part of that problem.

lifeisstillgood · on Sept 23, 2016

Those are good pub chats aren't they :-)

I was at PyCon UK, and watched 60 kids connecting up microbits to RPis and inventing ways to send signals over BTLE without a stack. I would like to solve the whole worlds problems, but if the past twenty years have taught us anything, it's that a few kids can invent the fire and we will all follow. So inthinknwe are going to be in good hands.

jimjag · on Sept 22, 2016

++1!!

contingencies · on Sept 23, 2016

I am intrigued by their opening statement of multiple POPs, but the lack of multi-POP discussion further in the system description.

My understanding is that the likes of, for example, Cloudflare or EC2 have a pretty solid system in place for issuing geoDNS responses (historical latency/bandwidth, ASN or geolocation based DNS responses) to direct random internet clients to a nearby POP. Building such a system is not that difficult, I am fairly confident many of us could do so given some time and hardware funding.

Observation #1: No geoDNS strategy.

Observation #2: Limited global POPs.

Given that the inherently distributed nature of git probably makes providing a multi-pop experience easier than for other companies, I wonder why Github's architecture does not appear to have this licked. Is this a case of missing the forest for the trees?

lamontcg · on Sept 23, 2016

Why not just use DNS load balancing over VIPs served by HA pairs of load balancers?

Back in the day we did this with Netscalers doing L7 load balancing in clusters, and then Cisco Distributed Directors doing DNS load balancing across those clusters.

It can take days/weeks to bleed off connections from a VIP that is in the DNS load balancing, but since you've got an H/A pair of load balancers on every VIP you can fail over and fail back across each pair to do routine maintenance.

That worked acceptably for a company with a $10B stock valuation at the time.

manigandham · on Sept 28, 2016

Company stock value has nothing to do with their scaling, performance and customized processing requirements.

madmulita · on Sept 23, 2016

We are in the process of moving all of our infrastructure to OpenStack, OpenShift, Ansible, DevOps, Microservices, Docker, Agile, SDN and what not.

There are some brainiacs pushing these magic solutions on us and one of the promises is load balancing is not an issue, even better, it's not even being talked about.

Please, please, tell me there's something I'm missing.

squiguy7 · on Sept 22, 2016

I know they mentioned their SYN flood tool but I recently saw a similar project from a hosting provider and thought it was neat [1]. It seems like everyone wants their own solution to this when it is a very common and non-trivial problem.

[1]: https://github.com/LTD-Beget/syncookied

bogomipz · on Sept 22, 2016

Do the Directors use Anycast then? That wasn't clear to me.

jssjr · on Sept 22, 2016

Anycast usually implies traffic will be directed to the nearest node advertising that prefix. The GLB directors leverage ECMP which provides the ability to balance flows across many available paths.

bogomipz · on Sept 22, 2016

Anycast and ECMP work together in the context of load balancing. ECMP without Anycasted destination IPs would be pointless for horizontally scaling your LB tier.

What Anycast means is just that multiple hosts share the same IP address - as opposed to unicast. When all the nodes sharing the same IP are on the same subnet "nearest" is kind of irrelevant. So the implication is different.

jssjr · on Sept 22, 2016

Sure. Feel free to call it anycast then. I usually hear anycast routing used in the context of achieving failover or routing flows to the closest server/POP, but there is probably a more formal definition in an RFC that I'll be pointed to shortly. =)

We are using BGP to advertise prefixes for GLB inside the data center to route flows to the directors. In our case all of the nodes are not on the same subnet (or at least not guaranteed to be) which is one of the reasons why we chose to avoid solutions requiring multicast. I expect Joe and Theo will get into more details about that in a future post though.

bogomipz · on Sept 23, 2016

Are you running Quagga or Bird on the director instances then? I'm looking forward to reading more about it.

logicalstack · on Sept 23, 2016

We use Quagga.

jerkstate · on Sept 23, 2016

This is really cool work, I worked with a team that implemented an ECMP hashing scheme using a set of IPs kept alive by VRRP in a previous lifetime, so I have a bit of familiarity with the space and a few questions.

The article says the L4 layer uses ECMP with consistent/rendezvous hashing. is this vendor implemented or implemented by you using openflow or something similar? How does graceful removal at the director layer work? I know you would have to start directing incoming SYNs to another group, but how do you differentiate non-SYN packets that started on the draining group vs. ones that started on the new group?

If you are using L4 fields in the hash, how do you handle ICMP? This approach could break PMTU discovery because a icmp fragmentation needed packet sent in response to a message sent to one of your DSR boxes might hash to a different box, unless considerations have been made.

tadelle · on Sept 22, 2016

[flagged]

nathankleyn · on Sept 22, 2016

They mention in the post that they will be open-sourcing this code over the coming weeks.

tadelle · on Sept 22, 2016

I'm talking about entire GitHub not components or Load Balancer...

cortesoft · on Sept 22, 2016

I think the model of "keep our core business software proprietary and open source all non-core software" is a good one, and one we should encourage all companies to share.

If they open sourced their core site, how would they make any money?

the_duke · on Sept 22, 2016

Agreed.

Considering the amazing value Github provides for the IT world, I really don't care if they release their code.

Isn't it enought that they have revolutionized development workflows, project hosting and made contributing and collaborating so much easiert?

(to be clear, GH wasn't the first site of it's kind, but the network effect of having one site that that basically every developer uses daily -and be it just via one of the documentation site on GH pages- really was revolutionary)

fps · on Sept 22, 2016

I use github every day for professional and personal projects, and typically like what they've done, but Sourceforge.net was the github of the early 2000's, where almost every developer had an account and where you could get free development and distribution tools for your project. Look at what it is now (or was before it was re-opened a few months ago.) People's open source projects were being leveraged to push malware on unsuspecting users. We might not think it's so great to have all this centralized code storage in a few more years when the new hotness replaces github and everyone's zombie products are being used maliciously again.

the_duke · on Sept 22, 2016

Mhm, I used Sourceforge from time to time back then.

But I remember it as always clumsy (even before they got bought and started the ad craziness), and only a small subset of projects was on Sourceforge, in my experience at least.

fps · on Sept 22, 2016

It would eat into their enterprise "behind the firewall" sales, but github.com's biggest asset is that it's where people already have accounts and already have development tools integrated to. That doesn't change when you open source your platform.

tadelle · on Sept 22, 2016

Just the same way GitLab makes money. Sells enterprise version of their software and support. Community version could be good...

tomschlick · on Sept 22, 2016

Because that's not their business model and not everything you use has to be FOSS unless you're RMS

tadelle · on Sept 22, 2016

But they could release trimmed community version at least ;)

tomschlick · on Sept 22, 2016

Why? Open sourcing GH would probably be a huge undertaking for a considerable amount of their team (removing proprietary code, making sure no credentials are in there, etc).

nkassis · on Sept 22, 2016

Not to mention the fact that the whole of github is probably a bunch of disparate little project worked on by individual teams that would require a lot of effort to turn into something that could be run by an individual person. Probably makes sense to expose some of those projects on their own as open source project like they are doing here.

MichaelGG · on Sept 22, 2016

They already have a version they ship to customers (Enterprise) so presumably they have done the work of removing secrets.

noselasd · on Sept 22, 2016

Then that is a discussion that does not belong here.

voidfunc · on Sept 22, 2016

Why do you think they should open source it?

falcolas · on Sept 22, 2016

My two cents? Because they have based their entire business and clientele around an open source piece of technology. Their contributions back to the community which gave them the core of their company feels... anemic.

Even if their entire stack was open sourced, GitHub would still be a profitable company simply because it's GitHub. They're hosting hundreds of thousands of repos on bare metal and providing tools around those repos allowing for the maintenance of a community - that's why they are making money; not because of some secret sauce in the code. Otherwise, GitLabs would have eaten their lunch by now.

I could definitely be wrong, and they do have 8ish pages of projects they offer up themselves, but so many are just meaningless dumps of one-off projects.

dpcx · on Sept 22, 2016

By that definition Google and Facebook should do the same. Or asking a farmer to give away their seeds for a similar reason.

tadelle · on Sept 22, 2016

Why not? They could release community version at least.

sanderjd · on Sept 22, 2016

I have never worked on a piece of proprietary software that would be in a state where it could be released publicly in any form without a very large outlay of development time and money. I'd be very surprised if github is any different. "Why not" is for things that are easy to do. Things that are hard and / or expensive to do require strong justification.

gm-conspiracy · on Sept 22, 2016

Honest question - do you think if the proprietary software you worked on had been planned from the beginning to be released publicly, a very large outlay would be unnecessary?

dismantlethesun · on Sept 22, 2016

Not him, but if I were writing something that was going to be publicly released then I could imagine the team would:

a) Have to include a lot more documentation

b) Be a lot more paranoid about best practices, and writing generalizable code even if it doesn't fit their particular slice of the market

c) Never have the chance to use a closed source solution for part of the problem (e.g. no we can't have half our app actually be a series of Oracle specific stored procedures, or buy a UI package for $20,000 rather than use a heavily modified version of Bootstrap).

sanderjd · on Sept 22, 2016

Not at all - if it were part of the strategy from the outset, different decisions would be made along the way. I wasn't trying to say that it's necessarily harder to build software for public release, just that it's hard to publicly release software that was not built that way. But I guess I don't really know for sure - I've never worked on a project like that.

SteveNuts · on Sept 22, 2016

Because if you say to management "hey, why don't we release the code that our entire business is built off of for anyone to host for themselves for free!" you'd be laughed out of the building on your way to the psych ward.

gm-conspiracy · on Sept 22, 2016

Their entire business is already built upon software that anyone can host themselves for free.

Retra · on Sept 23, 2016

Then what's the problem? You already have access to their core business software.

SteveNuts · on Sept 22, 2016

So github has no custom/proprietary code or value adding features?

benologist · on Sept 22, 2016

They only said they will release components of it as open source. They do not say if a coherent piece of software will ever be released but today would have been the traditional time to do so for actually open source projects.

It looks like marketers leveraging open source on their marketing blog that leverages developers.

jssjr · on Sept 22, 2016

A "whole" piece of software would require you to have made the same data center design decisions we've made at GitHub. While some of our choices are opinionated, I think you'll find the GLB architecture adheres to the unix philosophy of individual components each doing one thing well.

Either way, I hope the upcoming engineering-focused posts are interesting and informative! Developing GLB was a challenging engineering project and if open-sourcing it means other companies can benefit from our work and spend more time developing their products, then I'll consider that a success.

alsadi · on Sept 22, 2016

I never like github approach, they alway use larger hammers