Hacker Newsnew | past | comments | ask | show | jobs | submit | bbillings's commentslogin

This is the correct answer. Source: I am a FB employee.


Thanks.

So, how much does the 4PB of data go down to when it goes into hive? My guess would be something like 200TB.

Is it zipped (or lz4/lzo/zopfli/lzma whatever)? Or is it just "distilled"?


99% of raw data is useless. Just as a rule.

Most of it likely gets tossed into a program to determine if somebody actually needs to do anything, or if something is actually breaking.

For example raw user interaction doesn't really grow. While the event is likely ~1kb or so of raw data, at the end your just incrementing a 64bit counter.

This is a baseless exaggerated post but should shed some light on.


> 99% of raw data is useless. Just as a rule.

Well, as a non-facebook user, I think 99.9999% of facebook data is useless :) But facebook is in the habit of tracking everyone's surfing habits across the web (through "like" links), and I assume they do more than just "increment a counter with it", even if they don't keep every single detail.


http://www.joelonsoftware.com/articles/fog0000000069.html

Is a pretty good explanation of why we don't rewrite the whole site. A rewrite would involve every product dev stopping what they are doing and probably spending at least 18 months reimplementing their product in a different language. During that time all forward progress on products would be halted. I am absolutely certain it would be an unmitigated disaster for the company to try to rewrite the site in another language.

Instead we have a small team of people making PHP really nice work work in. HipHop gives us full control over the language and lets us add in any constructs we like. We make sure stock PHP runs, but also the additions we need to the language can be added as well.

I don't really understand devs who think reimplementing in another language is easy or the right thing to do. It's almost always the wrong thing from my experience and always lot harder and takes a lot longer than expected.


I'm not sure why this is such a common misconception. I almost never work 70, or 60 hour weeks. 50, often, but that's usually my own fault for making promises to others I want to keep.

I've managed inside Engineering at Facebook and am now back to just being a straight Engineer, and my advice to everyone I work with is the same: Don't kill yourself working 70/80 hour weeks. We pay you to give 40 solid hours, and that is what I want. What I ask of you is that you set expectations for the people and teams around you as accurately as you can. Don't make a bunch of promises you have to work crazy hours to keep.

The majority of our most successful engineers have a very good work-life balance.


Because it seems like every mainstream news article about engineering at Facebook is written like this one: http://www.fastcompany.com/3005165/how-facebook-survived-34-...

Maybe it's Facebook PR's fault.


So we do lock-downs still, but it's not about working crazy hours, it's about focus. Under lock-down you are allowed to push the other stuff to the side for awhile. For instance I typically do 2-3 interviews a week, help with training new interviewers, etc. This can easily be 5-6 hours of total time out of my week. This is part of being an engineer at Facebook. Onboarding sessions, new engineer mentoring, etc. This is all part of normal workdays, unless you are in a lock-down, then you focus on getting whatever it is you need to get done done and not worry about the rest for awhile.

I personally disagree pretty heavily on sleeping in the office and working crazy hours. I find it very counter productive and generally think it leads to bad decisions, bad code, and ultimately bad products. It is certainly the exception, and not something you should be doing if you aren't 22 years old and/or more than a little crazy. I personally haven't worked crazy hours at Facebook since probably 2009.

Talking about this sort of thing from a PR perspective seems has to balance getting people excited about our environment and explaining our goals and day-to-day. I've found it's something the media outlets love to play up. The normal day-in-the-life of an engineer is fun, but chill, and not particularly news worthy.


    I personally disagree pretty heavily on
    sleeping in the office and working crazy hours.
I agree completely. Facebook's tried to recruit me on at least one occasion and I turned them (you?) down flatly for this reason (that, and I don't want to ever have to write PHP).

I get that you don't have complete control of the press you get, but maybe it'd be worth focusing some attention on these work-life balance issues in the outlets you do have complete control over.

    getting people excited about our environment 
1 billion users. Boom, done.


Yes, but no. Since moving to HipHip which is a translated/compiled version of PHP, we have been able to fork the language to clean it up. We have a strongly typed version, Generators, etc.

Syntactically it looks a lot like PHP and is backward compatible for many thing, but it's not really PHP anymore. The HipHop team has done a ton to make the language a lot more enjoyable to work in.


I did not know you went to such extent. At that scale that makes sense. Do you plan to open source it?


https://www.youtube.com/watch?v=Dwek7dZDFN0 is a pretty good talk about everything coming down the pipe. Generics, Strict typing, Collections, etc.

Too my knowledge (I'm not directly on the Hip-Hop team) it will be open sourced, additionally we also generally present a PHP patch as well to allow it in future versions of standard PHP.


One of our struggles at Facebook was actually with circular includes that made our php codebase super tasty spaghetti.

File A includes File B includes File C includes File A by about 10000 files.

It was for a very scary trying to change a core library during this time because it was almost impossible to figure out where all it was included and impossible to test all the code that touched it. Additionally, we were basically loading up our entire init stack on every page load and async request because as soon as you loaded up one file all the circular dependencies would load up the entire stack. This had a big performance overhead prior to our switch to HipHop.

To this end we developed a new include system for library files that forces developers to be sane. Every module in the our library files must explicitly include everything that they need and it forbids circular dependencies. If module A requires module B, module B cannot require module A.

Making this change took a long time (in some places we are still untangling the code), but our core code is now infinitely more manageable and most importantly testable.


I worked on a large PHP codebase for a few years, and that sounds very familiar. We never did get it untangled enough to test before I left.

Do you have a static analyzer or something like it that enforces the no-circular-dependencies rule, or is it just a procedural thing?


We have a static analyzer that runs when we submit a diff yes or try to commit, but in our "require_module" we also check for circular dependencies at run time. If you create one you get the module stack dumped into your log and promptly exited with an error message.


That sounds like it could be useful to the PHP community. Are there plans to open source it?


I'll look into it. Not sure how tightly coupled it is with some Facebook specific stuff.

http://phabricator.org/ has a very similar system if you are curious. epriestley made some nice updates to it in there as well.


How does Facebook use Autoload? Does it help in this situation?


Actually, we do know when you are on the Facebook corp network and can filter by that, but we rarely gate on that. We have a system we call gatekeeper for controlling the launch of new features. If I'm coding up something new I'll usually just add my own user ID (we call GUIs in Facebook FBIDs) first. The gatekeeper system is a very full featured roll-out system. I can launch to just employees, 1% of users world wide, Facebook users in Peru, viral growth mechanism, etc..

We also maintain a robust employee list that is cached in APC on every web host that you can always call an is_employee style function for any user ID on. The careers site in particular has some employee only functionality that this endpoint is probably checking.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: