Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In certain domains, the trend has been to give up constant factors in order to increase programmer productivity (e.g., python pays a 10x slowdown but is batteries included).

So in that case I would use this data structure even if it weren't faster. I can't count the number of times I have had to mess with inserting negative priorities into a min heap to create a max heap! We should just have one data structure that does both.

(though taking this idea to the logical extreme means we should just use Order Statistic Tree for everything since it not only gives you log(n) min/max, but also log(n) find kth and get rank of x)



So, kind of a non sequitur, but I occasionally think about Qt's QList<T>. When introduced, it was odd for reserving space at the beginning as well as the end, so you had constant-time inserts/removals at the beginning. That and storing pointers for any T larger than a pointer also reduced the constant factor on insertion/deletions from the middle. Since it was always stored as a list of pointer-sized items, some code could be shared between implementations for different types, so your program's binary could be a bit smaller. I think they said at the initial release that they benchmarked a bunch of options in some real code and this won.

That was waaay at odds with the mindset of the STL, which seemed to me and I think a lot of folks like the state-of-the-art of containers at the time: with the STL if you prepend/insert much you need to use a deque/linked list, you decide pointer vs. value each time you declare a type, and if you want to know about the constant factor on an insert in the middle of the list you probably already lost.

(Qt has/had some other fun pragmatic things, like copy-on-write types in a lot of places. I didn't really do nearly enough with it to discover the pitfalls, but was fun to read.)

So I think about that, about how there are likely some trees out there that could just as well be sorted arrays, about adaptive structures that change their behavior based on dynamically observing how they're used (one recently discussed here: http://blog.erlang.org/the-new-scalable-ets-ordered_set/), even about tricks in in JS string implementations and VMs. I don't have answers, but I sometimes wonder if we should be thinking more in terms of tools that may not be perfect on all axes but soften or eliminate performance cliffs we take for granted. Not that the zero-overhead fine-tuned types are going away, just that they aren't the only imaginable approach.

I wonder if we'll see progress in those sort of imperfect-but-resilient tools over time. I think the long-run trend is towards some form of programming being accessible to more people more easily (like how, as you note, folks do real work in Python now). That certainly fits with trying to make your tools resilient to "misuse"--or, better, properly supporting more patterns so they're not even misuse. No conclusions, just hand-waving, but I do wonder if anyone working in software construction tools will eventually more usefully wave their hands in that direction :P


You would probably enjoy reading about an alternative implementation for STL's std::deque, called a tier vector[1][2].

It supports O(1) push_front/pop_front/push_back/pop_back/operator[]/at (like you would expect from a deque) but also O(sqrt(N)) insert and remove from middle!!!

The paper was from 1999 and 2001 but I only learned about it from a recent HN post[3] where some guy rediscovered it (20 years too late). I still wonder why this design lost to the std::deque we have in STL today.

[1] http://www.cphstl.dk/Report/Deque-first-attempt/cphstl-repor...

[2] https://www.ics.uci.edu/~goodrich/pubs/wads99.pdf

[3] https://news.ycombinator.com/item?id=20872696


This made me search around about unsorted B-tree-like collections. Turns out Clojure[1] and the im immutable-collections crates in Rust[2] use a structure called an RRB-tree, and searching 'blist' (because why not) turned up a Python module[3]. So, yes, a thing!

Tricky bit is trying to work well when used like a plain vector too. Clojure's RRB-tree special-cased repeated small appends to the end to help with this, and the Rust page mentions a clever chunking technique for append/prepend. And I imagine largeish leaf pages might help with average prepend/append/iteration overhead, at some cost when inserting in the middle?

Anyway, just neat that it's been taken further than I realized!

[1] https://infoscience.epfl.ch/record/213452/files/rrbvector.pd...

[2] https://docs.rs/im/15.0.0/im/struct.Vector.html

[3] https://pypi.org/project/blist/


If you haven't already seen it, you'd probably really enjoy Chandler Carruth's "Efficiency with Algorithms, Performance with Data Structures" talk from CppCon 2014: https://youtu.be/fHNmRkzxHWs


Along the lines of adaptive structures that change depending on how they're used, I thought this article about how JavaScript arrays work in V8 was interesting: https://ryanpeden.com/how-do-javascript-arrays-work-under-th...

V8's Array does different things depending on whether the array is sparse or packed and what data types it contains.


Yeah--JS seems to allow enough weird stuff with arrays that fast paths for when things aren't weird seem inevitable. I admit when I clicked I still found V8 had more implementations than I expected, haha.

What they do with strings is kind of neat, because if you do a lot of concats in a row (potentially an O(n^2) series of operations done the naïve way), they defer actually concatenating all the data until you try to get a character out or such (the string is just a "ConsString" pointing to the source strings): https://gist.github.com/mraleph/3397008

It's also impressive how much dynamic instrumentation they can do. Beyond finding hot code and guessing types, they're even e.g. using it to guess which garbage will be long-lived (for "pretenuring" to avoid needing to copy it).

There's a limit to how much clever stuff you can do implicitly since you don't want to impose big costs for code that's doing everything right; something like QList isn't that adventurous but it's also clearly wrong when you want to compactly store a bunch of small structs in memory. And like the linked post on V8's thing notes, when you patch over a problem in one specific case the 'fix' can be fragile. Still, interesting area.


Surely you should ve able to just use a different comparator instead (> instead of <).


Not in python! But I guess using negatives is the same as using different key (which is what python uses instead of comparators).

The real problem is if you need min/max at the same time. Then you not only need a min heap and max heap, you also need to track what was already popped from either heap! This is because in python you can't delete an element if it's not at the root. So tracking "tombstones" will let you pop from one heap, then know to ignore it the next time it comes up in the other heap.

This is of course super complicated to maintain and I get it wrong all the time. Luckily only shows up in algo interview problems.


Sounds like python is not batteries included compared to other languages in this regard!


Do any languages give you min/max at the same time with their standard library data structures?

(setting the key function with "key=lambda x: -x" is really easy and works just like changing a comparator function in other languages so I dunno why you guys are talking about it)


Yes actually. Both c++ and java have red black trees. In c++ it's std::set, in java it's TreeSet.

BST supports log(n) min/max/inserts/removes (among other stuff).


Looks like 50x to me. https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Taking 27 milliseconds just to start the interpreter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: