More

brianwski · on Jan 31, 2023

Disclaimer: I work at Backblaze.

> Do we know if these drives are the same as you would purchase in a retail outlet

Seagate (for instance) won't sell ANYBODY drives directly, so they force us to get bids from various resellers and distributors. So when we pick the lowest price, I don't think Seagate knows who the drives are going to, but there might be a trick in there somewhere I am not aware of.

toomuchtodo · on Jan 31, 2023

Have you considered becoming a distributor yourself? I would assume at a certain purchasing scale, the cost benefit is apparent, but perhaps not.

Kye · on Jan 31, 2023

This is why every major tech company eventually becomes a domain registrar.

jjeaff · on Jan 31, 2023

Do all major tech companies really register that many domains? I assumed that becoming a registrar was more about pushing your control up the pipeline to reduce risk if things like losing your domain name was the main reason. Rather than saving money on registrations.

brianwski · on Dec 20, 2022

Disclaimer: I work at Backblaze, and I was here first.

> I'd worry they would sue for...

If you are referring to Backblaze, we're not going to "sue" anybody for anything.

We (Backblaze) have dealt with a bunch of frivolous lawsuits (and patent trolls) over the years suing us, and OMG we're not going to instigate any lawsuits over some honest person legitimately reporting some issue and being helpful. It isn't going to happen.

Our reputation is important to us. Not just for Backblaze: I'm saying the individuals that founded Backblaze and those people that work now here base our entire existence and careers and the number one marketing efforts at Backblaze are based around we are trying to be "the good guys" and transparent and acting like it. There is no possibly world where we try to suppress a screwup like this through legal means. That would be a PR debacle of epic proportions.

If something went wrong, let's shine a spotlight on that cockroach and figure it out together. I'm not sure the exact drive we are all talking about, but my first guess would be a customer ordered a $189 "USB Restore" all their data shipped to them on an encrypted drive) and we (Backblaze) shipped the customer a USB restore drive and they are subsequently selling it (after copying their restore off of it) on the open market. If it is above 8 TBytes this is absolutely *NOT* the case and we should get to the bottom of it. Without lawyers mucking up the situation.

brianwski · on Sept 15, 2022

> I doubt I'd get anywhere close to ROI with SSDs

I think most people assume the SSDs are more expensive, but depending on their application they choose SSDs for the OTHER properties, like SSDs are quieter, and faster, and like you mention use less power.

There was this narrative for a long time that SSDs failed way more often, or beating them up with reads and writes would make them fail while hard drives could take the same number of reads and writes and not fail. So a subset of people chose hard drives for durability regardless of cost. Personally I never saw any evidence of SSDs failing more often than hard drives, and PLENTY of counter evidence. For instance, I wrote a program that unintentionally overwrote the same fixed length 16 byte file in the same location over and over again too often, and it failed more on drives than SSDs. (The failure mode was not whole disk failure, the 16 byte file itself mysteriously stopped being writable on some computers after a long time of working.) I had some friends that worked at Tivo and they said they had the same bug/shortcoming early on. The Tivo would overwrite the same section of disk over and over and it failed LONG before the drive manufacturers said it would fail. The drive manufacturer said they needed to vary the location of where they wrote the data to get anywhere near the published numbers.

I had assumed the file system layer would take care of that kind of management for me, but I guess not so much.

jbotz · on Sept 15, 2022

> the same fixed length 16 byte file in the same location

On the SSDs you weren't writing to the same location. The firmware keeps mapping your writes to different locations for wear leveling. So if those 16 bytes are the only thing you're writing your SSD would last a very long time indeed!

> I had assumed the file system layer would take care of that kind of management for me

Depends on the file system; some do, some don't, and for some uses cases that isn't what you would want, so you need to chose your filesystem (or its options) appropriately. A journaling file system (like ext4 with -j) would never write to the same spot (but without -j it would), and a more advanced one with COW like btrfs usually won't either (but COW can be turned off).

brianwski · on Sept 14, 2022

> Capitalism does not reward hard work. Smart work, maybe, but not hard work.

I'd say a person successful/rewarded in a capitalistic society (defined by accruing more wealth than they started with and not just spending down some large family inheritance they were born into) probably has some combination of: luck, smart (intelligent) decisions, hard word, probably has a good "setup" going into the working world like parents didn't abuse them growing up and at least helped pay for their education, etc. Maybe that last one is just a subset of luck, we certainly do not choose our parents.

Sure, you can be missing one of the above list and do Ok, but probably not two. Statistically a smart, hard working person who had the world's crappiest upbringing probably won't be that successful compared with somebody who had all three going for them. But if you look at 20 "success stories" it is extremely rare the person didn't work hard. I think it's an important component of being rewarded in capitalism. It does a disservice to people to say "slacking off or working hard results in identical outcomes".

To support at least half of your point, you cannot be stupid and work hard and be successful. But a smart person who works hard will most likely do better than a smart person who doesn't work hard.

LordDragonfang · on Sept 14, 2022

>you cannot be stupid and work hard and be successful.

Unless you are born into wealth, of course. One of your three/four criteria. Lots of examples of people grifting and grinding to fail upwards who have no special talent or intelligence of their own. Some are very powerful, even.

brianwski · on May 4, 2022

Disclaimer: I work at Backblaze but more on the backup client side that runs on laptops, not in the datacenter storage side.

> Is there a failure list for write heavy drives?

To be clear about what these drive failure stats are and what they are not: Backblaze runs a data storage service with about 214,000 hard drives in it right now. We don't run any specific tests or induce issues on purpose FOR the drive failure stats, we just report what occurred in our datacenter.

Sometimes readers think we're carefully running a "study", but it's more just what we have experienced as honestly as we can offer it up. If the reads and writes and seeks our drives experience matches your particular application, great! Or maybe it is just interesting to read.

Now, we do save (and publish) all the raw data, and some other awesome people out there have done various analysis on it, which always makes us happy also. You can find the raw data here: https://www.backblaze.com/b2/hard-drive-test-data.html At this point it goes back almost a full decade.

stonecharioteer · on May 5, 2022

Is there something you published regarding regular consumer drives? I ask this because I cannot find / afford high capacity enterprise drives in India, and I am forced to make multiple backups on cheap external hard drives.

I'm building https://github.com/stonecharioteer/renfield for this purpose. Before I get around to it, I'm trying out git-annex, but I must say I don't like the git approach to files.

isomorphic · on May 4, 2022

It looks like the data does contain SMART 241, total LBAs written, for at least some drives.

Someone with some time could correlate that to failure rate. My hypothesis is of course it's correlated--but by how much?

brianwski · on May 4, 2022

> Someone with some time could correlate SMART data to failure rate.

We looked into it a little, some notes written up here: https://www.backblaze.com/blog/what-smart-stats-indicate-har...

Short summary is there are a few SMART stats that seem to predict failure way more than others, which is probably obvious. But we aren't PhDs in statistics and it isn't our area of focus, so....

This guy wrote a paper based on the Backblaze SMART data: https://etd.ohiolink.edu/apexprod/rws_etd/send_file/send?acc...

These 5 guys wrote another paper based on the Backblaze SMART data to train up a Bayesian network to predict failures: https://ieeexplore.ieee.org/document/8489097

This is another article of predicting hard drive failures using the Backblaze SMART data: https://karthikna.github.io/Prediction-of-Hard-Drive-Failure...

I can't comment on their findings, but it's DEFINITELY an interesting thing to study now that we have almost 10 years of these drive stats across a pretty big drive farm.

brianwski · on March 7, 2022

Disclaimer: I work at Backblaze. Here is a histogram of backup sizes of our Backblaze Personal Backup Customers if it is useful:

https://i.imgur.com/GiHhrDo.gif

You have to zoom in to see the meaningful information. It can be super surprising to developers and IT people (like us) to see a lot of customers have less data than you might think.

When we started we had no idea what might come of all of this, so it was "stressful". :-) So I welcome another backup client developer to our club, and hope this can help them out.

magicalhippo · on March 8, 2022

They aren't that many, but I'm assuming he's starting up so not sure how many he can handle. I recall Crashplan having enough >10TB guys that they decided "unlimited" wasn't, and that was many years ago.

Personally I have 3TB of essential data, and about 10TB if I wanted to backup most of my computer.

Interesting graph though, thanks for sharing!

quinncom · on March 8, 2022

That url just shows a bunch of memes. Can you reupload?

brianwski · on Dec 15, 2021

Disclaimer: I'm the engineer you are commenting on, I just want to straighten out one mis-understanding.

> This quote especially seemed childish and immature to me:

>

> > "And this one makes me actively angry, because both Microsoft and Apple will happily throw away portions of your files and not tell you about it"

>

> No, this is NOT Microsoft's or Apple's fault. It is 100% HIS fault for not understanding what's going on. Even if a file flush is correctly requested

Several times you seem to jump to the assumption that I don't understand fsync and disk flushing and that is the core issue. You aren't understanding what I'm criticizing. Here is an example of what bothers me:

You take a picture at your wedding, and you store it on your hard drive. You like the photo, it means a lot to you, and you use it as the background for your desktop FOR FIVE YEARS. You have rebooted hundreds of times, and it's always the background for your desktop. Then one day 5 years after your wedding, you reboot your laptop, and it seems to take a little longer to boot, and then after you sign into the laptop half the image you use for your desktop background is scrambled. The middle of it looks like dirty snow. You didn't get any reports of any issues from the OS manufacturer, but now one of your photos is corrupted.

This isn't because the software that wrote the photo 5 years earlier forgot to flush the picture to disk. It just isn't. Behind the scenes, as your laptop was booting from an aging drive, it probably encountered some issue, and it went about fixing the problem as best it could - which I have no problem with. My issue is the drive lost some data, and if the OS manufacturer would let you know this occurred you could take useful actions like order a new drive, prepare a restore from a few weeks ago before that issue occurred, etc.

> No, this is NOT Microsoft's or Apple's fault.

It isn't their fault that the drive is going bad, I agree. Drives go bad, that's why we have backups. My issue is the OS manufacturer try to cover up too much, keep too much hidden from the user, and didn't let the user know data loss has occurred (or might have occurred). And yes, I hold them accountable for not telling customers what is going on. It isn't anybody's "fault" that it occurred, but there is a responsibility to let customers know about it so the customer can take appropriate actions so they don't lose data (or more data).

I try to write incredibly paranoid software. Part of the reason is that is the "average" environment the Backblaze client runs in is more unstable than what most software developers are used to. The whole point of backups is to run when the computer is going sideways, it has bad RAM, it's losing disk sectors, or a customer's cat likes sleeping on the keyboard because it's warm, and the fans are clogged with cat fur. And because the family has teenage children that don't know about computer security problems, they download and install unstable junk from all over the internet because why not? That's the environment my software runs in, and I take my job of trying to protect my customer's data very seriously.

brianwski · on Dec 15, 2021

Disclaimer: I work at Backblaze so I'm biased and you should keep me honest.

> if you run into big trouble, you can go get the box back for a really fast restore

Backblaze provides this service for our customers! Customers can ask for an 8 TByte encrypted USB attached external hard drive to be prepared in the Backblaze datacenter with all their data beautifully restored on it, then we FedEx this anywhere in the world to them. This is a free service as long as the customer returns the USB drive to Backblaze in a reasonable amount of time (a couple months, and we can work with customers if they need longer). Or customers can pay $189 (which includes the drive, the data, and world wide shipping) and keep the 8 TByte USB drive.

You can read more about the USB hard drive restores here: https://www.backblaze.com/blog/usb-hard-drive-restore/ and why it is a free service here: https://help.backblaze.com/hc/en-us/articles/217665948-Resto...

What we often see is that if a customer's laptop is stolen or crashes, they sign into the Backblaze website to download the 3 or 4 individual files they were working on when the laptop was stolen. Let's say that is a term paper due the next day. That way they are back up and running IN SECONDS. Then the customer orders a free USB drive with 8 TBytes of their data which will show up in a few days. They can live without their wedding photos and their music for 3 days, but that term paper has to be handed in.

brianwski · on Dec 15, 2021

> Symlinks have been supported since Vista

I'm not aware of even Windows 10 or Windows 11 being able to create Symlinks on FAT32 and exFAT, but I could be wrong? But it doesn't matter, the code as written is cross platform, it works on Apple Journaled File Systems, APFS, exFAT, etc. We then compile it for Macintosh, compile it for Windows, and compile it for Linux.

We can then spend extra time and carefully detect each filesystem and each platform and then make the optimization if we can. And this is a valid criticism that we have not done this yet. But no matter what we need this general code that will always work FIRST, what the links are is a space optimization to save valuable SSD space when it is possible.

brianwski · on Dec 15, 2021

> > Backblaze ships with 21 identical copies of the same executable

>

> This seems needlessly dismissive. I feel like they definitely know that you can execute the same binary multiple times.

I'm the programmer at Backblaze that made the copies on purpose, I wrote some extra code to do this, and it's meant to help us debug certain things. Yes they are identical, the installer only ships with one copy of the executable, the installer then makes the copies on purpose. I get to explain this from time to time. :-)

In Windows when you want to know what is going on behind the scenes, you can bring up Task Manager and look at the different names of the different processes that are running. On the Macintosh this is called Activity Monitor, same sort of thing. The different names for the executables are for different "threads" which have different roles. Backblaze is multi-threaded to get higher performance.

The parent coordination process is called "bztransmit". But when doing the actual transmission it spawns the bztrans_thread01, bztrans_thread02, bztrans_thread03, etc.

So BEFORE I made multiple copies of the executables with different names, a customer would say "bztransmit is hung" or "bztransmit is using up too much memory". There was very little visibility into this. But now that I made multiple copies with different executable names, when the same customer says "bztrans_thread03 is hung". Or they say "bztransmit is using too much memory". We immediately have narrowed down what to look at.

Here is a screenshot showing what "Chrome" looks like to me in Windows, and how it compares to how Backblaze's bztransmit looks like to me in Windows: https://i.imgur.com/KOJHJ9Q.jpg In that screenshot, you can see there is the "main thread", and "worker threads". Meanwhile chrome is just one big list of processes all named the same thing (see the screenshot). I prefer the Backblaze system, but I understand it upsets some customers that prefer the chrome experience.

That's it. It's not some huge mystery.

One question asked here was do we know you can launch the same executable twice? Yes, and we do that. The bztrans_thread05 is launched for thread 05, thread 25, thread 45, thread 65, etc. It's THREAD_NUMBER mod 20. Here is what it looks like to hit 500 Mbits/sec upload speeds, this isn't photoshopped, it's a real screenshot on my development computer: https://i.imgur.com/hthLZvZ.gif

Another question is: why not use hard links or symbolic links? That's the only real optimization possible here, everything else was on purpose. The answer is not an excuse, it's just an explanation if you are curious. The software we develop at Backblaze is cross platform, so what we like to do is make the most general form first that will always work, then if customers complain or we want to refine it we special purpose code in per file system or per platform. The most general thing to do is make full copies. We could then go on to make links on the Mac WHEN POSSIBLE and the equivalent on Windows WHEN POSSIBLE, but it never became a large priority. The reason I can't use one technology is we support several file systems, and not all of them are the same or support the same technology for links.

Every feature we have is the result of prioritizing it over working on other things. Until recently, we did not have a lot of funding or an infinite supply of programmers, so we had to choose what order to implement each feature in. I'm not saying we got all the priorities correct, or that we did things in the correct order. I don't really even think there is one correct order. For example, some individual home user customers prefer saving 180 MBytes of their valuable boot SSD space over me implementing single sign on for our corporate customers. On the other hand some corporate customers DEMANDED single sign on or they wouldn't purchase the product at all. They are both correct, but there is only one of me, so we made some judgement calls and left the multiple copies and worked on single sign on. Some customers were happy, some are miserable.

We do have open client recs for both Windows and Mac programmers, so if you would like to make a good salary, full benefits, and help us out, come join us! :-)

rgovostes · on Dec 15, 2021

Thanks for the explanation (and congrats on the IPO).

Consider also that having multiple distinct binaries interferes with (dis)allowing Backblaze to reach the internet with process-based firewalls like Little Snitch, because each copy needs to be configured separately.

Some tools, like Docker, have a function for collecting relevant diagnostics. Perhaps it would be useful to you to migrate towards a solution like that rather than asking the user to identify a misbehaving process on their own.

brianwski · on Dec 15, 2021

> and congrats on the IPO

Thanks, that was super exiting for us. After 14 years, I claim (and this is controversial) that we're no longer a startup and now we're just a mid-sized publicly traded company. :-)

> process-based firewalls like Little Snitch, because each copy needs to be configured separately

Yeah, that was actually a surprise and unfortunate. What the Mac architect (one of my business partners) and I think is that now that it is nice and stable, we might go down to 1 or 2 bztrans_thread executables, and one bztransmit. That seems like a better tradeoff where we waste much MUCH less disk space, and it is only 3 executables to allowlist in Little Snitch, and it achieves basically what we want now that it's stable and working well.

Originally there were 10 threads MAXIMUM, and we made 10 copies. And each copy was linking with shared libaries so it was only 10 MBytes of disk space which nobody noticed. Then Windows lost their friggin' minds with one of their releases and forced us to link statically which bloated it way up to 5 or 10 MBytes per executable. Then we went to 20 threads maximum and the whole thing was silly. When we went to 100 threads maximum we said "enough" and went to mod 20 for re-using executable names.

By the way, ALL OF THIS could be avoided if Microsoft and Apple provided an API to set the name displayed in Task Manager/Activity Monitor. Maybe that's a security issue, I don't know. But frankly wouldn't it be SUPER TOTALLY USEFUL if chrome displayed the current web page loaded in the process name of each and every chrome process? Then you would know which one to kill when something goes sideways.

fouc · on Dec 15, 2021

>By the way, ALL OF THIS could be avoided if Microsoft and Apple provided an API to set the name displayed in Task Manager/Activity Monitor.

The problem is process lists should be showing the true state of the computer. It wouldn't be a good idea to hide the actual executable name. But it sounds like it could be useful to add another column for "label", so that threads could set a label and offer more insight on the process list.

brianwski · on Dec 15, 2021

> useful to add another column for "label",

Yeah, that would work really well. When you look at the "services" control panel in Windows, there are two columns. One column is "Name" of the service, and another column is a longer explanation with the column header "Description". I put a small description in there for bzserv (our service) plus a URL to our company website. I think this is just being polite, customers who don't recognize what "bzserv" is can immediate find more information on it.