That might be the trigger but it wouldn't be the root cause.
For example web and backend servers for cloud services shouldn't be affected by whether people are sat in one location or another. However if those systems requires lots of maintenance to keep running and people are less available due to COVID then you'd see a rise in downtime. But that would mean it's not COVID that's the problem, it's the amount of maintenance required and COVID only surfaced that problem.
They do have some latency or slowness issues, but couldn't find like whole system down thing,
Like in one of the comments here, reminded me of 2017 incident,
https://about.gitlab.com/blog/2017/02/10/postmortem-of-datab...
They should have improved a lot by now, but still I am curious, why such large or frequent downtimes are happening to GitHub. Is it due to making it more open for teams with Private repos, and more perks along with quarantine and WFH things
That GitLab downtime happened when we had deadlines, luckily git isn't a centralised platform so we merged our changes on a new GitHub repo we created.
Also the GitLab sluggishness reminds me of their daemon which kills the server to control memory leaks[1], although this probably isn't the main cause of the platform's slowness.
Could you provide details on how you plan to be more reliable with a self-hosted solution? what kind of archtecture would you use? how many people will be involved in maintaining?
This is an issue I see at a lot of companies make, even going further than self-hosting. For example replacing off the shelf software with an internal product.
"This is not good, we have to do this ourselves". And in the end it costs way more, and you end up with an inferiour solution because it's not your core business, and nobody really has any time to work on it.
Very sad really.
The first thing to ask is exactly what you asked here. I can't upvote this enough.
It's not that I'm against, the contrary, but in the last months I've experienced a physical server crash, a random kernel panic and a failed upgrade of GitLab: that all took 10/20 hours of work, didn't result in any data loss, but the downtime was clearly higher than what GitHub has (+ nobody on the team was on holidays at that moment), plus any maintenance like software or system upgrades has to be done outside team working hours to limit the impact.. real self-hosting is not that easy!
A failed upgrade of GitLab? Can you give details - I got used to monthly Gitlab updates going very smoothly and now I’m a little afraid. What went wrong?
Last one we had was because the ssh connection was lost during the apt upgrade (gitlab omnibus ce).
Another one I remember was when some new configuration key was needed and prevented one of the components from starting (took ~2 hours to find what it was).
It doesn't happen often, don't worry that much though, gitlab omnibus works well, but is hard to investigate/fix because it's huge and not a tech stack we do understand well since we mostly use gitlab-ctl commands, so when something goes wrong it's hard
Git is decentralized by default, so all you really need is for your self-hosted version to not crash at the same time as github does. That's not too hard to pull off.
I would choose self-hosting for small to medium size teams any day. I can't fathom why people choose not to self-host at this scale. Your data. Your control. Your network. Your infrastructure. Your responsibility. Are people becoming more afraid of responsibility these days?
Small to medium size teams should be hyper-focused on developing their product and bringing in customers and revenue. Spending time on commoditized infrastructure is only a good idea if that saves time in the short to medium term.
But hey, with the number of outages GitHub has recently that break even point is coming closer and closer.
> Your control. Your network. Your infrastructure. Your responsibility
These are all the reasons why I don't want to self-host my own git. I can live with it being down every now and then. And when it is down, I don't want it to be my job to fix it. I've got more important stuff to worry about.
> Are people becoming more afraid of responsibility these days?
I wound't say people become more afraid, they just don't see the reason to bother. I'd choose GitHub or Gitlab for small teams at any time. I'd probably even be fine with the free versions.
I see little to no reason for self-hosting in a small team. I cannot imagine a performant server, bandwith and the employee it takes to maintain it to be cheaper than the 10€/month/user for a hosted solution.
From zero servers to one requires an employee with the skills, but from 100 to 101 (my situation) does not. It requires less than 1% more employee, since more is likely to be automated etc.
Many small companies already have VMs running other services.
> I cannot imagine a performant server, bandwith and the employee it takes to maintain it to be cheaper than the 10€/month/user for a hosted solution.
Gitlab and gitea don't have that high performance requirements, especially when you consider how absurdly cheap high-core-count CPUs, RAM and NVMe SSDs are these days… and that's assuming you don't just chuck it into your existing cloud infrastructure and call it a day.
But even self-hosted redundant gitlab setups for several hundred users can be done with cheap commodity servers (or dedicated hosting providers like Hetzner) for <5€/user/month, and maintenance is on the order of 1-2 hours per month.
it's literally a money sink for wishy-washy benefits if you're a small/medium sized team. It's why heroku became a billion dollar company, time+money is better spent elsewhere at the start.
Self hosting quickly becomes a mess of weird gatekeeping and bad maintenance. Almost never have I seed well maintained self hosted infrastructure. Most of the time it is installed, gatekept and then left to bitrot without any upgrades or maintenance.
Self hosting GitHub is an option as well. We've been self hosting GitHub Enterprise for years and it has had no downtime other than during scheduled updates (at night, when nobody's at work).
> We've been self hosting GitHub Enterprise for years and it has had no downtime other than during scheduled updates (at night, when nobody's at work).
Is it free to self host like GitLab or Gitea? If not, what does it offer that you can't get from those?
> On-premise or own-cloud hosting is quite import for some customers.
Yes it is, and GitLab and Gitea both offer that, which brings us back to the question I asked, what does GitHub Enterprise offer that you can't get from GitLab and Gitea?
self-hosting requires quite a bit of scale to be more reliable, otherwise you'll most likely still have possibly longer outages, just at different times.
That surely depends how much they use Github, and when they need the service available.
Our self-hosted build and artifact servers have not gone down without planning in many years. Planned outages are brief, for a small team it's not inconvenient to announce a 10 minute outage for an upgrade.
Very often that's ok if you can choose when the outages occur. With most you can because outages tend to be for regular maintenance and upgrades far more often than hardware failures. A small geographically concentrated company only needs 9 fives.
Even for bigger companies that need more uptime they can at least give a shout out to everyone that it will be unavailable from X to Y, it doesn't just go down randomly with no notice whenever github decides to push a release.
If you think self-hosting doesn't randomly go down, you have never done self-hosting.
In the last few months we had a server that physically crashed and had to be replaced, we had a bad kernel that paniced randomly and a gitlab upgrade that failed in the middle.
Is you team equiped and ready to handle these? If not, don't self-host anything critical to your company, or your costs, downtimes and losses will be much much higher.
this always comes up, and it's one of those things that's obviously true on the surface until you understand context.
I have a self-hosted gitlab instance and it has much higher uptime than github in the last year. At some point a weekly downtime is going to trump any potential random outages.
The major difference might be your ability to affect those outages (positive or negative) because humans are the primary cause of outages, and you can restore service to a gimped passable state if its your own thing.
I mean, hardware nowadays is quite reliable. We have a simple server with three disks in raid1, and it basically just works. Gitlab has regular backups, so at worst we would have one day of lost work.
It's surprisingly easy to set up and run, even with redundancy, integrates well with Jira, Confluence. Was fast enough, stable enough, had everything one needed (offered more than a hosted solution such as github). IIRC kept git completely separate, so it worked even during updates or restarts.
Disclaimer: I passionately hate Atlassian products, but I don't feel like using Stash/Bitbucket is a bad choice in any way.
I actually enjoyed BitBucket for smaller projects. They offered free private-repos earlier than GitHub did, so that was the main sellingpoint for me.
The self hosted version of BitBucket comes in at a pretty reasonable price. I don't know that for sure (and I can't check thanks to GH being down) but I'd imagine the self-hosted versions for GitHub Enterprise are quite a bit more expensive than Bitbucket or Gitlab.
At work, we have an employer-provided (some groups also have their own server/setup) GitLab instance. I work on there much more than on GitHub, and am really liking it. Everything seems neatly integrated, especially the CI/CD is nice.
For all I care, GitLab can remain as it is right now. My primary fear is for it to become too bloated, trying to be all things to all people (project management, CI/CD pipeline, bugtracker, Wiki, Git repository, ...).
I would be happy to find out about people's experiences rolling their own GitLab, like what pitfalls to look out for. We are on GitLab CE 13.1.3. The biggest bug plaguing us has been CI/CD failures [0] for no apparent reason; but this is solved by a
retry:
max: 1
when: runner_system_failure
block in the CI YAML, and seems to have been fixed for new releases.
I just setup auto deployment of my Android app to the app store view gitlab-ci, very cool. At this point im not sure if I should spend the time to replicate this in github actions on the copy of the repo.
It's important to note that it's the website that is down. Git itself worked through all the outages I can remember. Unfortunately, at least the last time I think, the integrations didn't.
Always have an extra customer, like the flowershop downstairs. Let her borrow your wifi in exchange for some office flowers. Now she is technically your customer.
When your shit goes down and nothing works you can still write "some of our customers are experiencing issues" in the statuspage as the flowershop still has wifi (hopefully).
I still don't understand people who always mentions to Microsoft's acquisition. Until the official statement, it isn't Microsoft failure. Don't blame them.
I think you're right in that blaming Microsoft without any evidence is probably a mistake, but I also think "until the official statement, it isn't Microsoft failure" is putting a bit too much faith in corporate PR as a source of truth.
I have to agree that this feels like an abnormal amount of downtime the past few months -- it would be interesting to see some actual data over a longer term.
You know if you use Microsoft products every day, and every day they let you down, and every day you experience the worst most unintuitive design in the world, and every day you have to deal with their reliability issues, and then MS acquires github, and github starts to behave like everything else MS touches ...
Clearly something about how MS runs is responsible for their past outcomes, why is it a stretch to assume it is responsible for another similar outcome?
It is like saying we don't know the rotation of the earth is why the sun rose this morning because we have not had an official investigation into the matter.
It's just your own feelings. I haven't had any issue with Microsoft products (and yes I use their services every day in my working life).
All I can see from you is making a guess with your full of sentiments and negative thoughts. So as I said, don't blame them until official statement.
How old are you? Where you using tech during the Win95 times? BSODs, office dying on you (Norton crashward anyone?) , plug and pray? Hotmail eating all your mail without a warning, Internet Explorer malware, NetBios worm (which infected your pc as soon after you installed windows) and oooh so many more.
There is data about this, which makes it more than pure anecdotes.
Nope. Github is down, I don't feel it is down, it is down. Office 365 falls apart when two people try edit a spreadsheet at once. Outlook does not allow you to unreject a meeting invite. Windows won't report DLL errors over a powershell terminal, Azure takes 10 minutes to delete a VM instance, Azure DevOps is so poorly designed that nobody can figure out how to find a repo without someone explaining it to them first, I can go on, but none of this is my feelings, these are actual things that people have to put up with from MS every day.
I have noticed a pattern than when I generate markdown from org-mode and have the text 'language' selected for highlighting push, this causes github to hang like crazy. I don't think I'm crazy in thinking it might be me. I push frequently to my blog and am starting to notice a correlation.
I export this into the below markdown.
#+BEGIN_SRC text -n :f "translate-shell -s fr -t en" :async :results verbatim code
I learned some French so that I can talk to
you during tennis. I hope I know enough so you
will not get bored.
#+END_SRC
When I get a page build failure it's usually my fault for creating .
This is the markdown which was pushed to my blog. The 'Page Build failure' messages take a long time to arrive to my inbox and I can see that the page build is hanging.
{{< highlight text "linenos=table, linenostart=1" >}}
I learned some French so that I can talk to
you during tennis. I hope I know enough so you
will not get bored.
{{< /highlight >}}
What is going on that people expect something better from Microsoft? Really this is quite on par with the quality they deliver. The only surprising thing is that people are surprised by this.
Their subpar uptime dates way back before Microsoft acquired them. Nothing to do with Microsoft. To be fair, I am sure they have unique scaling and infrastructure problems.
Well, there goes my night. I was waiting on a build triggered by Github actions and was wondering what was up.
I guess this is my sign to get some sleep.
Microsoft needs to slow things down and focus on stability. This really isn't good. I need these weekend and late night hours for my side hustle. I already have enough trouble as is, I don't need an injection of additional difficulty. (That's just my frustration; I can't imagine what y'all are all going through.)
They're making some very frustrating choices lately. Their redesign broke READMEs with tables (which now require horizontal scrolling), and they don't seem to care about all the repos they impacted.
I don't need two CI/CD pipelines. Nor do I have have time to build something like that. I need to spend my time working on the core product.
The risk here is that my deploy SLA is tied to Github. I accepted this risk as the cadence of my deploys tends to be once to twice a day on average and Github is usually available (many nines).
I made a choice, and now that choice is biting me. Now I'll evaluate if I should spend the time and effort to migrate off it. If this is the last of it and Microsoft makes a commitment to not break things, I'll likely stay as the effort to move is nonzero. If this begins to happen every month, on the other hand...
Maybe my question was unclear, but I was not asking why are no not using two services, I was asking why are you not using an alternative, and I meant instead of GitHub, not in addition to GitHub.
> If this is the last of it and Microsoft makes a commitment to not break things, I'll likely stay as the effort to move is nonzero.
Commitment without liability is not a commitment. It is just empty words. And I don't see Microsoft making any commitments with liability on their part for a free service.
Simplicity. Microsoft has a marketplace where they bundle a bunch of services together where developers already are. I didn't have to go elsewhere to look. It's a real competitive advantage they're building. That said, I may begin to look elsewhere.
> And I don't see Microsoft making any commitments with liability on their part for a free service
It's not a free service. I pay roughly $50/mo for the features I use.
> And I don't see Microsoft making any commitments with liability on their part for a free service.
I imagine they take their SLAs very seriously. Especially in light of this string of outages. If they don't, they're going to lose customers and good will.
What are the liabilities on them for failing to meet their SLA? Taking thing seriously is not a liability. Specifically in this case, what are their liability to you?
Move off Rails now GitHub, is clearly not working for you. It took you this far, now go move to something that doesn't sh*ts the bed twice every month.
Rails is (apparently still) a popular target for haters, but for projects at Github's scale it's rarely a code logic/framework-level blunder that's taking the service down. It's generally a cascade of failures in things like multiple database systems, auto-scaling, dns/caching, etc.
There are projects bigger than GitHub that do all of these things and still doesn't fail as much as GitHub. Honestly, both of us speculating. You can't say for sure if it's not Rails and I can't say if it's not anything else.
> for projects at Github's scale it's rarely a code logic/framework-level blunder that's taking the service down. It's generally a cascade of failures in things like multiple database systems, auto-scaling, dns/caching, etc.
This is just guessing. You can't honestly tell what is causing the issue.
Not guessing, postmortems are regularly posted on this site and elsewhere. These kinds of failures are generally not someone using some "magic" (or whatever other pejorative term) feature of a framework.
Flask (python) or even CodeIgniter (php) are my frameworks of choice for this sort of thing. They may be a bit old and organizing a large project could be difficult but nothing can beat them on performance!
Indeed. Anything that has so much magic in it is bound to bite you in the a. There will be a time when there is so much code and magic that no one really understands the system well.
Twitter too was on Rails in the initial days which helped it took off, and then realised they need to move to JVM(Scala) to scaled and they have been performing brilliantly since then.
Now, that MS has bought them, why don't they use the brilliant tech MS has (.NET, .NET Core) and be more reliable?
Wow, what a read. I’m sure GitLab has improved and grown a lot since then, but that’s like a comprehensive list of what not to do from an IT perspective. Not ever checking to see if your backups are working, wiping production DBs without a second thought, this post has it all.
I'm not privy to their management decisions, but I have used windows and azure, and I use office 365 every day, and this seems about right for Microsoft Quality.
I'm sure they have progressed a lot since, but last time we evaluated a self-hosted git "hub" solution at work, Gitlab couldn't handle our main repository (history starting in the 90s, but probably more imporantly loads of junk and a 10GB size on disk). We went with Stash (now Bitbucket) instead.
I prefer gitlab too but github is where most projects are, and that plays a overwhelming huge role on deciding where devs (and so other projects too) will be.. :/
It could change, but would require a pretty huge fuck-up or a lot of time and many not-so-smal mistakes from github.
How come we first get the almighty git software for free, then the next thing happening is people flocking to "hosted" services that provide very little on top of git for the vast majority of projects (apart from the social networking effect of course, but that also means your project has tens or hundreds of issues and build status "failing"). Then people bitch about gh on HN in a consumerish "coke-vs-pepsi" manner. Have F/OSS devs no self-respect? If you're into F/OSS, why do you lead your peers into a github silo where clicks to the content you create with lots and lots of effort in your spare time only helps big data, and which is even blocking indie search engines?
"small" fuckups like this one have outsized impacts due to the amount of projects that use github. And the amount of people who just refuse to even consider gitlab is staggering.
But, UI of Gitlab is very bloated. I feel that UI of Github is less bloater than Gitlab.
Gitea is OK,but there is no one big instance(yes,I know about codeberg) . Sourcehut is made by one-man in his one-man shop,and it lacks many features that one needs.Also,it is Alpha software.
> But, UI of Gitlab is very bloated. I feel that UI of Github is less bloater than Gitlab.
Bloated how? There is a lot of functionality in gitlab, and maybe there is a fair argument that the product itself is bloated, but the UI seems pretty appropriate for the functionality. Maybe I'm just used to it though.
Probably both. I remember the very first versions of Gitlab when it was just announced on habr.com, and it used to be a minimalistic (more in terms of UI) version of Github. As the functionality grew, things got a bit out of control.
Currently UI looks more like it's been designed by people who write code rather than those who have spent considerable time thinking through user interactions and designing based on that. So it's not bloated in terms of the code and libraries, but rather all of the functionality is just thrown to the user as is without much thought.
Gitlab is functional and has more features/functionalities than Github.
But,UI wise , Github is very much slim ,fast and quick. It will function without js. Gitlab although functional will take very much time to load a page(because ,js is used a lot).Making it PWA would solve the problem but it is still not made PWA.
https://www.githubstatus.com/uptime/kr09ddfgbfsf?page=2
Wonder what was the trigger for the reliability hit - actions went GA on nov 2019, so it's something else (or possibly a combination of things)