Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While there's a ton of talk about scaling on here, it isn't needed by the vast majority of people and even Facebook's way isn't that complicated when you get down to it.

Basically, the only change that needs to be done is that you have MySQL trigger the delete from memcached on replication. All your slave database gets an updated record for you, saves it in its tables and then hits memcached with a delete command for that key. While it does mean diving into the MySQL code, when you get to the size at which you need such a feature, you can just hire a MySQL expert.



This is probably a bad idea in a lot of cases and fb replication probably is rather complicated, most likely relying on some sort of system that relies on stale data when it can and asks for fresh data when it needs it. Number one if you're high scalability then you don't want to run a query when you already know how the underlying data has changed so any kind of invalidation trigger would be ill advised. Further, if you're replicating data over a large distance what's the chance that you'll ever have the right data in the right state. From the moment the data is changed on the master you probably need to fire off an invalidation, if not an update to that data, which is where some sort of collection class to manage this is probably more suitable. Invalidation will often quit to work well, even in a small replicated environment, if replication takes more than even a very small amount of time, which happens quite easily if a server is overloaded. Imagine I change my friends, wait on my process for replication, invalidate and some friend of mine views my page and re-caches my friend list from a mysql server that hasn't yet replicated my newest data. It's fairly solvable but even then you aren't sure that an update will go through on memcache so you have to have some sanity check to know whether you think the data is actually what you want to be seeing. I guess I don't really have a point but just more open ended problems for consideration for those of you wishing to scale. I think it's all pretty common sense but it's by no means simple. Best of luck.


The reasons there is a ton of talk about scaling here: a) everybody wants to start their own Facebook someday so they think they need to know this stuff AND b) scalability is actually kinda neat to study. Reorder them as you like but I think they are both factors.

That being said we are rapidly moving towards a world with more and more data being stored and queried. Scalability will therefore be something you increasingly need to know about in order to build a significant application -- regardless of your entrepreneurial ambitions.


Yeah I was surprised at how simple their solution sounds.

But then I realized I was imagining a single MySQL DB at each datacenter. In reality they must have pretty big clusters at each.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: