This will set attribute bar to 1 if baz is 0, or to two 2 otherwise. Everything is atomic on that document.
3. Currently the server doesn't support a sequential (or even loosely sequential) id autogeneration. You'd have to do that on the clients, but using a timestamp for example.
4. I don't know yet how to do this really efficiently. It's relatively easy to do on a single shard, but cross-shard boundaries make this really hard.
5. Any client can connect to any server. The server will then parse and route the query. There is no central server, everything is peer-to-peer. The client library doesn't know about multiple servers now, so responsibility is on the user to hit a random server. Alternatively you can run "rethinkdb proxy" on localhost and connect the client to that. The proxy will then route queries to proper nodes in the cluster.
6. In the web UI, if you click on the table and reshard, everything will be rebalanced. You don't even have to add or remove shards, it'll just rebalance data for the number of shards you have. The UI has a bar graph with shard distribution, so you can see how balanced things are.
7. Currently there is no authentication support - we expect users to use proper firewall/ssh tunneling precautions.
8. Yes, that's how queries get routed. Currently this isn't very smart, but it will get much better over time. If something breaks for you performance-wise, just reach out and we'll fix it.
9. No, not yet. If you run eq_join on a small subset of the data (99% of OLTP workloads) it will be very fast. Other joins work ok, but there's A LOT of room for optimization.
Thanks for your and jdoliner's detail answers! Hope I didn't ask too many questions. :) I'll respond to both here.
For 2 and 3, I think I didn't make it clear. Let me clarify. A common db problem with multiple clients is dealing with concurrent update on the same piece of data. E.g both client1 and client2 read D as D=15 at the same time. Client1 adds 1 to D as 16 and saves it. Then client2 adds 1 to D as 16 and save it as 16, which is wrong. It should be 17.
Conditional update is one feature db usually provides to let clients deal with this problem, i.e. the update would only go through if certain condition is met otherwise abort. Update D=16 if D==15. Client1 would succeed while client2 would fail, where it can retry the whole read-increment-update cycle again with the new read value.
The litmus test to see if a db system can handle this problem is to try to implement a sequential Id generation feature run by multiple clients at the same time.
For 8, if the query is parsed into a query execution plan, you can ship the plan to all equivalent replicas to ask them to estimate the execution cost based on their current load. After they reply, pick the lowest cost one and send the execute command. Even a simple approach of asking for machine load of all replicas and picking the lowest one could have adaptive utilization of all the servers.
For 9, Bloomer Filter is a relative simple technique that can dramatically reduce the amount of data to ship across peers to do join. You basically filter out the vast majority of the non-matching data before shipping.
I think the atomicity model here works like a transaction on the whole document, where all the changes to the attributes of a document are updated all at once.
The scenario I described has to do with read-consistency, where the value read by a client should not be changed during the time of the read and the time of the update. The usual way of handling it was to take a write lock for the duration to prevent update from others but that degrades concurrency. The other way is to do optimistic lock (or conditional update) to allow the client to detect change during the time and retry with the new value.
My point was that you don't have to do that with rethink because the entire query gets executed on the server. You don't have to take the value down to the client, make the change, and then send it back. The entire update gets evaluated on the server and the server handles atomicity in various ways (depending on the query).
That approach would only work if all the logic to compute the update can be expressed in the update query. It will break down if the read-eval-update cycle involves the client. There are many scenarios involved the clients.
E.g. the client reads a value, displays to the user, gets input from the user which is based on the old value, and stores the updated value. If another user doing the same thing has already changed it, the client would like to know that and let the user retry, with the new current value.
I think you, ww520, have a very well point here and I'm also interesten in what RethinkDB can offer for this very usage scenario.
From what I read from the ReQL command reference there it should be possible to do something like:
the basic idea is that `name` should be update to "awesome name" and `_rev` should be incremented by 1, but only if `_rev` is 5, otherwise an "invalid revision" error should be thrown.
2. Yes. There is no special command, you just combine update and branch (http://www.rethinkdb.com/api/#py:control_structures-branch) Here's an example in Python:
This will set attribute bar to 1 if baz is 0, or to two 2 otherwise. Everything is atomic on that document.3. Currently the server doesn't support a sequential (or even loosely sequential) id autogeneration. You'd have to do that on the clients, but using a timestamp for example.
4. I don't know yet how to do this really efficiently. It's relatively easy to do on a single shard, but cross-shard boundaries make this really hard.
5. Any client can connect to any server. The server will then parse and route the query. There is no central server, everything is peer-to-peer. The client library doesn't know about multiple servers now, so responsibility is on the user to hit a random server. Alternatively you can run "rethinkdb proxy" on localhost and connect the client to that. The proxy will then route queries to proper nodes in the cluster.
6. In the web UI, if you click on the table and reshard, everything will be rebalanced. You don't even have to add or remove shards, it'll just rebalance data for the number of shards you have. The UI has a bar graph with shard distribution, so you can see how balanced things are.
7. Currently there is no authentication support - we expect users to use proper firewall/ssh tunneling precautions.
8. Yes, that's how queries get routed. Currently this isn't very smart, but it will get much better over time. If something breaks for you performance-wise, just reach out and we'll fix it.
9. No, not yet. If you run eq_join on a small subset of the data (99% of OLTP workloads) it will be very fast. Other joins work ok, but there's A LOT of room for optimization.
Phew!