It looks almost like SQL; the main philosophical differences:
a) it embraces order (that is, every query result has an implicit "running index" field (called i, starting at 0).
This single handedly solves a lot of inconsistencies in practical SQL having to do with order, which is abhorred by the relational data model, thus not a first class concept, but is often required in practice, and thus inconsistently bolted on.
b) columns can nest, and do not have to have the same type - which means that aggregations like count, sum, max, min and distinct are not special in any way.
c) there's a simple underlying programming language, so if you have an intermediate select result that you need twice, you just give it a name by prepending a "name: " to the select.
temp: select from grades where age>10;
b1: select from temp where eyecolor=`blue;
b2: select from temp where eyecolor=`brown;
(compare to the mess that is correlated sub-queries, or alternatively, horrible "create temporary table x as ..."
I definitely see why (a) is good for time-series data, but I (personally) think that it shouldn't be a first-class part of a relational query language; I think most databases do support the ROW_NUMBER() function now though. I think (c) has been introduced as Common Table Expressions in SQL.
I'm not sure I fully understand (b) - do you have a link so I can learn more?
As you can probably guess, I'm very interested in this stuff!
> good for time-series data, but I (personally) think that it shouldn't be a first-class part of a relational query language;
First of all, there's some tautology here - a "relational database" (and similarly, relational calculus, relational algebra, etc), BY DEFINITION deal with sets of tuples ("relations") which are (again, by definition) unordered. So let's just ignore the word "relational" in this discussion.
> I think most databases do support the ROW_NUMBER() function now though.
Many do. But do compare (sql 2005):
SELECT * FROM
( SELECT
ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS ROW_NUMBER,
COLUMNS
FROM tablename
) foo
WHERE ROW_NUMBER <= 10
To (kdb+ / q):
select from tablename orderby asc sort_key where i<10;
And usefulness of order goes why beyond time-series data (and sorting): let's say you have a tiered pricing scheme for widgets you sell:
a) duplicate the range data (have a "from-count", "to-count" fields for each record, risking that you might have holes or overlaps)
b) not duplicate data, but have crazy subselects (among all with count > from_count, select the one with maximum to_count) or stuff like that.
When you actually have order, you have operator that embrace that order - e.g. kdb+/q's "bin" which finds the "bin" (as in "bucket", not as in binray) something fits in:
select unit_price[from_count bin order_count] from table
There are many other use cases involving running sums (e.g, you have a table of weights and priority; select the list of highest priority items whose weight sums to 100lbs or less).
Order is really really missing in SQL, but it's one of those things people are not aware of because they've never used anything that does support it properly.
nested columns just means what it sounds like: that you can put anything in a cell (including lists, tables, lists of tables, lists of lists of lists of tables). Many one-to-many relationships in sql that need additional tables can just be done within the same table in kdb+/q
regarding Common Table Expressions - I wasn't aware of them, they do help a lot. The syntax is horrible, but I guess they do work...
edit: more info on kdb+/q can be found in http://kx.com/q/d/kdb+1.htm - you have to get to section 8 before they start discussing the query language, but it's short and to the point. There's a lot more in http://kx.com/q/d/ if you are interested.
a) duplicate the range data (have a "from-count", "to-count" fields for each record, risking that you might have holes or overlaps)
That's never how I have done it. Since this is a tiered pricing model, you just:
select * from widget_price where min_size < ? ORDER BY min_size desc limit 1;
We actually do something almost identical for sales tax rate changes in LedgerSMB. You look to the date of the transaction and take the most recent rate. No range types, etc.
Where range types are handy (and why I am looking forward to them in 9.2) is where you have to deal with things like a financial transaction that should be amortized over a period of time. This would allow you to adjust the transaction incrementally as a reporting, rather than an accounting, function. But I wouldn't use them for cases where you don't want overlaps or gaps. The best thing there is to just put in floor values and select the next highest floor.
There you go. Again order by and limit/offset do pretty much what you want without doing crazy inline views like you are doing in your example. BTW, I see inline views as an antipattern in SQL best avoided if you can.
nested columns just means what it sounds like: that you can put anything in a cell (including lists, tables, lists of tables, lists of lists of lists of tables).
PostgreSQL supports this btw although back in 7.3 or so if I remember, I found that tables with tuples in columns were write-only but that was fixed pretty quickly (first with a "don't do that" check and then with a real fix).
The point of CTE's is to give you a stable intermediate result set.
.... and you've just embraced (in a limited inconsistent way). using "ORDER" is outside of the relational model, where order plays no part.
ORDER was there from the beginning, but CTEs, nested columns, window functions, connect-by recursive selects and similar stuff is being added because SQL and the relational model are actually quite limited when it comes to real world problems.
Of course, SQL will keep getting extended to solve real world problems; however, that does not mean SQL is "the best solution out there" (or "the worst solution except all others that were tried").
Regarding ordering. Relations are defined to not be in a meaningful order. This doesn't mean you couldn't define some derivative that is ordered or that the relations might not be ordered in some way that is ignored by the relational math.
Suppose we have a relation R. We may order this relation physically in order to help the computer retrieve data faster, clustering on an index for example. However, clustering on an index does not mean that we are guaranteed to get the same order back when we do a select * from.... (we probably will but we aren't guaranteed to). And if we add a join or other relational transformation, the order will probably not be the same. In other words, ordering is outside the scope of relational math per se.
However that doesn't mean you can't have pre-ordered relations. It just means the ordering is meaningless as far as the math goes. The ordering may however be of great practical importance as the computer goes about grabbing the relevant tuples from the relation.
Similarly I don't see a reason why ordering can't happen after the relational math is done either, in this case for humans.
What this tells me is that the relational model is not entirely complete in itself in that ordering is an orthogonal consideration largely ignored which means there are certain questions you cannot answer directly with relational math (such select from R a relation L such that it includes the tuples with the five highest values of R(2) lower than 25. I think that's mostly what you are getting at. But that's a matter of relational math being incomplete for real-world scenarios, not SQL (since SQL implementations do provide for ordering).
There are certain things I really like about SQL at least compared to other programming languages. I am not sure it is the best possible db query language. Personally I have always thought Quel was elegant. However compared to app languages, I would far rather spend my time debugging 200-1000 line SQL statements than 200-1000 line subroutines in Perl, Ruby, or C++.....
a is accomplished by windowing functions. These can also do running totals among other things which is really helpful in accounting environments.
b has been supported since at least 8.3. I think a column can actually be an array of complex types in 8.4 and higher (it can be a tuple in 8.0-8.3 at least though I reported a bug in this in 7.3 which resulted in a "don't do that" check).
c is handled using common table expressions.
Examples for b and c:
CREATE TABLE foo (id int, value text);
CREATE TABLE bar (id int, values foo[]);
INSERT INTO bar (values) values ({row(1, 'test')}); -- not sure if this is quite the syntax. Might take some playing around with.
My point was (and still is) that standard SQL is horrible, and there are way better solutions.
Real world usage makes SQL vendors extend SQL to make it less sucky; some of these extensions were later encoded into standard, and some are still proprietary.
Windowing functions are nice and all, but are a complex solution to a problem that would hardly exist if you actually embraced order as fundamental.
Ok, how about the most useful kdb+ extension (which I forgot about earlier): foreign key chasing: if table t has field a which has a foreign key reference to table s (which has field b which has a foreign key reference to table r (which has field c which has a foreign key ...)
in kdb+, you do:
select a.b.c from t
Does pgsql have something similar? Or do you have to spell out all the joins?
can foreign key chasing handle composite primary/foreign key joins?
You can build something to do this in PostgreSQL using stored procedures and a (a.b).c syntax but that's kind of advanced stuff. To do this you have to create a b function such that b(a) returns tuple of type b which has column or function c.
Example:
create table address (...)
create table employee (...., address_id);
create function address(employee) returns address as $$...$$;
select (employee.address).country from employee; will then return the country field from the address returned by address(employee).
So yeah, kinda, if you build your own.
edit: I would be willing to bet you could make an implicit join operator of this sort also but I haven't done so. I don't know what the performance ramifications would be of throwing this into the column list.
> can foreign key chasing handle composite primary/foreign key joins?
Yes. The only requirement for foreign key chasing to work is that it uniquely identifies one record in the foreign table. Whether that key is atomic or composite is of no consequence.
(internally kdb+ stores a pointer to the foreign record when it verifies the existence of said record on insert, so it doesn't have to do a join query - it always knows exactly which record to bring in. So in practice, it is very efficient regardless of what kind of indexes you might have in place, the size or the composition of the foreign key field)
> So yeah, kinda, if you build your own. I would be willing to bet you could make an implicit join operator of this sort also but I haven't done so.
pgsql is a wonderful beast. I really like it. And I would be even happier if they adopted some kdb+/q syntax and semantics, though I don't think that's likely to happen.
SQL is fundamentally about sets, thinking of it in any other way provokes unhappiness. Your differences are set-subverting add-ons which could be accomplished by any higher level front end to SQL.
SQL is comparable to assembly language. Most people don't need it and wouldn't know how to use it properly anyway. These are the sort of people who use PHP and MySQL.
Nope. "Relational Algebra" / "Relational Calculus" / "The Relational Model" is about sets.
SQL is about bags (orderless like sets, but each item might be repeated multiple times). It's also about order ("ORDER BY" clause) in a horrible inconsistent way.
> SQL is comparable to assembly language. Most people don't need it and wouldn't know how to use it properly anyway
No, SQL is not comparable to assembly in any meaningful way (you could replace "assembly language" with "danish" in your statement and would be equally true)
While assembly language is more verbose, it is more fundamental than everything else in the sense that eventually everything must be expressed in assembly language (machine code, actually, which is equivalent to a proper subset of assembly language) to be executed. Thus, going down to assembly language might be more up-front work, but it is guaranteed that you can match or improve on run-time results from any other language.
SQL is an inconsistent abstraction that makes some things simple, some things hard, and some things essentially impossible -- and many of the things it does do, it does in a way that's inherently inefficient. (And don't tell me about the possible smart query optimizer - it doesn't really exist any more than Intel's Itanium optimizer that makes code properly utilize the VLIW; or a Unicorn).
QUEL was pretty clean. Far more consistent than SQL.
It died, mostly because SQL was used by Oracle and IBM DB2 which were better marketed than Ingres. The DB world ended up standardizing on SQL because of this.
It is cleaner, although it doesn't feel _that_ different to SQL to my eye. I guess if we're now thinking about replacing SQL then I'd want to get a lot in return. (I do know that I'm sort of moving the goalposts here - thank you for contributing the QUEL info)
By the way, the fundamental concern I have with SQL is the issue of ambiguity regarding NULLs. We are taught that NULL means one of two things, but really it means one of three:
* Unknown value
* Not applicable value
* Value does not exist
This is a big issue, because you would expect operators to treat these cases differently. known || unknown is obviously unknown, but known || not_applicable should probably be known, and known || does_not_exist should be known. In sane RDBMS's there is a possibility of magic values which provides a sane way to handle not_applicable (for example an empty string as distinct from NULL and yes I am calling into question the sanity of Oracle). However, you still have the fact that the first and third cases are ambiguous although you hope not in any given query (the third case implies a missing value from an outer join), the ambiguity could in fact happen.
This is a fundamentally broken aspect of SQL. The problem with ambiguity is that if your data is ambiguous mathematically, then it cannot be reliably transformed using math.
What are you comparing it to?