> The only grudge I have with PostrgreSQL is that when confronted with this problem PostgreSQL fans respond with "We've got MVCC so it is supposed to be slow. Stop thinking it's supposed to be fast." not with "Hmm... Maybe we should introduce new facility similar to indexes where you could define what you want to count and the db will count by this conditions and provide results for you fast."
Plausible, but somewhat bothersome. It could be a special case of the materialized view problem, but it would take some convincing that to suggest that a special count-optimizing physical structure is worth the additional knob and maintenance burden it brings with it. Not impossible to convince, but it would require some tight argumentation with ample evidence.
In my quick assessment, the need for faster counts over lots of data is subsumed by materialized views, which unfortunately is a very tough feature to write, but has the advantage of generally considered being being worth it. A cut-down version of materialized views that just supports counting on various dimensions that adds more bulk to the planner as well as requiring execution maintenance and bug-fixing seems less-worth it. Somewhere in-between is a variant that supports more aggregate functions besides count(), particularly ones where one can define an inverse transition function (which in the case of count is "add one for every record").
There are many interesting features one could add -- and this is one of them -- but at the end of the day there is going to be an assessment by the long-time community members who have the privilege of fixing obscure bugs for years as to:
* Who is going to write it?
* How long will it take vs. other features?
* Is a feature is going to be maintained to a level of quality we find acceptable into the foreseeable future?
* Is the impact worthwhile?
* Could this in any way be done as an extension?
The level of the bar of quality has changed over time, too, pretty much inexorably moving up. Getting LISTEN/NOTIFY in now would be much harder than when it showed up in...Postgres95 (and, granted, it was pretty busted back then, apparently, along with...a lot of stuff). LISTEN/NOTIFY turned out to have a good impact/complexity ratio, so it gets good maintenance, but in a hypothetical world where it was not already committed with a good history of service and use I bet it would have required quite some convincing to add.
There is some old code in existence now that would not be accepted again today. A more sad example is dusty implementation of hash indexes that thankfully few people use or see reason to use. Hash indexes possess ample warnings in the manual to not use them. Yet, it doesn't seem reasonable to deprecate them for people that rely on them, and nobody seems to care enough to improve it, or even convincingly whine about improving it. The community is pretty sensitive to these code barnacles, so if one goes in wanting such a feature, one has to be well-prepared to argue both feasibility of implementation and size and duration of impact.
People ask "why others are using MySQL instead proper database like PostgreSQL?"
Some of the people try and get burned because features that MySQL did without anyone noticing now take so much time that one wonders if he triggered some strange bug that caused db to do full table scan where one is unnecessary. Then he goes to the internet and look for clues and he sees response "oh, it's how it supposed to work, nobody is working on because it does not seem like a problem" over and over.
I think that feature might have an impact.
Humble question from person who never tried to build a database engine:
Do you know why db can't just keep track of number of entries in a full or partial primary key indexes and use them to give count fast ?
> Some of the people try and get burned because features that MySQL did without anyone noticing now take so much time that one wonders if he triggered some strange bug that caused db to do full table scan where one is unnecessary. Then he goes to the internet and look for clues and he sees response "oh, it's how it supposed to work, nobody is working on because it does not seem like a problem" over and over.
Just about every non-trivial database implementation will have a pathology that others do not. In this case, InnoDB is was faster because it supported index-only scans, a feature that was very useful, but hard to implement in PG because of some details of its MVCC implementation -- now there is an implementation in 9.2 that gives it a shot at being in the same class of performance even on wide-row tables. However, both are still much slower than MyISAM: if you search for "why is count slow on innodb?" you'll get a lot of hits -- it suffers much the same defect vs. MyISAM (which has poor SMP concurrency, allowing it to count things as a luxury) that Postgres does.
Perhaps "not a problem" is a lazy answer for "it's hard to get exactly right, and nobody in the same class of implementation is bothering, and this has been discussed to death and is more work than you realize, so don't expect anyone to implement it soon unless you intend to argue it (unless you are also a long-term maintainer) and do it." As you can see...that was many more words, including giving background on two MySQL storage engines.
> Do you know why db can't just keep track of number of entries in a full or partial primary key indexes and use them to give count fast ?
It could. But you'd have to spec out a brand new top-level set of utility commands (like CREATE INDEX) to make this physical structure, it now is yet another knob that can drastically change performance characteristics, the planner gets to scan the projections and qualifiers for yet another little optimization, someone else gets to maintain it (unless you are maintainer), and then there's going to be the person that asks "why isn't is this snapshot-isolated version not even nearly as fast as caching the count in memcached?" (or, if one does it the other way, the reverse), and then finally when you get around to implementing a more general set of functionality you'll be stuck with this old form for basically eternity (depending on your release policy) much like hash indexes. Don't forget to add backup-and-restore support, manual pages, and EXPLAIN and/or other diagnostic output, if necessary.
On the plus side, it could end the endless howling about this problem.
Personally, I think we can get enough hooks in place that someone should be able to implement this secondary structure in a satisfying way without coupling it inextricably into the database, and that's a project I'd have enthusiasm for.
If someone else made this feature in another popular database that serves similar use cases and it got used a lot, then I think the understanding of the upsides would be such that such a feature is more likely to happen. Index-scans fall into that category, and a special structure not seen in other databases doesn't appear to at this time.
Plausible, but somewhat bothersome. It could be a special case of the materialized view problem, but it would take some convincing that to suggest that a special count-optimizing physical structure is worth the additional knob and maintenance burden it brings with it. Not impossible to convince, but it would require some tight argumentation with ample evidence.
In my quick assessment, the need for faster counts over lots of data is subsumed by materialized views, which unfortunately is a very tough feature to write, but has the advantage of generally considered being being worth it. A cut-down version of materialized views that just supports counting on various dimensions that adds more bulk to the planner as well as requiring execution maintenance and bug-fixing seems less-worth it. Somewhere in-between is a variant that supports more aggregate functions besides count(), particularly ones where one can define an inverse transition function (which in the case of count is "add one for every record").
There are many interesting features one could add -- and this is one of them -- but at the end of the day there is going to be an assessment by the long-time community members who have the privilege of fixing obscure bugs for years as to:
* Who is going to write it?
* How long will it take vs. other features?
* Is a feature is going to be maintained to a level of quality we find acceptable into the foreseeable future?
* Is the impact worthwhile?
* Could this in any way be done as an extension?
The level of the bar of quality has changed over time, too, pretty much inexorably moving up. Getting LISTEN/NOTIFY in now would be much harder than when it showed up in...Postgres95 (and, granted, it was pretty busted back then, apparently, along with...a lot of stuff). LISTEN/NOTIFY turned out to have a good impact/complexity ratio, so it gets good maintenance, but in a hypothetical world where it was not already committed with a good history of service and use I bet it would have required quite some convincing to add.
There is some old code in existence now that would not be accepted again today. A more sad example is dusty implementation of hash indexes that thankfully few people use or see reason to use. Hash indexes possess ample warnings in the manual to not use them. Yet, it doesn't seem reasonable to deprecate them for people that rely on them, and nobody seems to care enough to improve it, or even convincingly whine about improving it. The community is pretty sensitive to these code barnacles, so if one goes in wanting such a feature, one has to be well-prepared to argue both feasibility of implementation and size and duration of impact.