1. Yes. It's a matter of doing this right, which will take some time. 2. Yes. Th...

ww520 · on Nov 10, 2012

Thanks for your and jdoliner's detail answers! Hope I didn't ask too many questions. :) I'll respond to both here.

For 2 and 3, I think I didn't make it clear. Let me clarify. A common db problem with multiple clients is dealing with concurrent update on the same piece of data. E.g both client1 and client2 read D as D=15 at the same time. Client1 adds 1 to D as 16 and saves it. Then client2 adds 1 to D as 16 and save it as 16, which is wrong. It should be 17.

Conditional update is one feature db usually provides to let clients deal with this problem, i.e. the update would only go through if certain condition is met otherwise abort. Update D=16 if D==15. Client1 would succeed while client2 would fail, where it can retry the whole read-increment-update cycle again with the new read value.

The litmus test to see if a db system can handle this problem is to try to implement a sequential Id generation feature run by multiple clients at the same time.

For 8, if the query is parsed into a query execution plan, you can ship the plan to all equivalent replicas to ask them to estimate the execution cost based on their current load. After they reply, pick the lowest cost one and send the execute command. Even a simple approach of asking for machine load of all replicas and picking the lowest one could have adaptive utilization of all the servers.

For 9, Bloomer Filter is a relative simple technique that can dramatically reduce the amount of data to ship across peers to do join. You basically filter out the vast majority of the non-matching data before shipping.

It's a good start. Good luck going forward!

Guillaume86 · on Nov 10, 2012

Your exemple of conditional update can be addressed using atomic update:

r.table('tv_shows') .filter({ name: 'Star Trek TNG' }) .update({ episodes: r('episodes').add(1) }) .run()

http://www.rethinkdb.com/docs/advanced-faq/#atomic

ww520 · on Nov 11, 2012

I think the atomicity model here works like a transaction on the whole document, where all the changes to the attributes of a document are updated all at once.

The scenario I described has to do with read-consistency, where the value read by a client should not be changed during the time of the read and the time of the update. The usual way of handling it was to take a write lock for the duration to prevent update from others but that degrades concurrency. The other way is to do optimistic lock (or conditional update) to allow the client to detect change during the time and retry with the new value.

coffeemug · on Nov 11, 2012

My point was that you don't have to do that with rethink because the entire query gets executed on the server. You don't have to take the value down to the client, make the change, and then send it back. The entire update gets evaluated on the server and the server handles atomicity in various ways (depending on the query).

ww520 · on Nov 12, 2012

That approach would only work if all the logic to compute the update can be expressed in the update query. It will break down if the read-eval-update cycle involves the client. There are many scenarios involved the clients.

E.g. the client reads a value, displays to the user, gets input from the user which is based on the old value, and stores the updated value. If another user doing the same thing has already changed it, the client would like to know that and let the user retry, with the new current value.

muhqu · on Nov 12, 2012

I think you, ww520, have a very well point here and I'm also interesten in what RethinkDB can offer for this very usage scenario. From what I read from the ReQL command reference there it should be possible to do something like:

  r.table('foo').get(5).update({ 'bar': r.branch(r['baz'] == 0, "foo", r.error("invalid baz!"))})

have not tested it, but this is how I understand it...

muhqu · on Nov 12, 2012

Do you think something like the following should work with RethinkDB?

  r.table('foo')
   .get(5)
   .update({
     '_rev': r.branch(r['_rev'] == 5,
       r('_rev').add(1),
       r.error("invalid revision")
     ),
     'name': "awesome name"
   })

the basic idea is that `name` should be update to "awesome name" and `_rev` should be incremented by 1, but only if `_rev` is 5, otherwise an "invalid revision" error should be thrown.

ww520 · on Nov 13, 2012

That would work. I didn't realize you can raise error on the row. Good work!

coffeemug · on Nov 12, 2012

Yes, this will work.

muhqu · on Nov 12, 2012

awesome, thanks!