While I might not agree with the conclusion of the OP, the performance of a prog...

carlmr · on Nov 26, 2020

This is very true. I've seen colleagues port Python to C++ and wonder why it's slower.

Of course Python is slow, but it will call into efficient C routines for a lot of things that aren't just available out of the box in C++. So what does the average programmer do? They implement their own, inefficient version of this in C++.

Now you've saved the overhead of copying your data from the Python to the C-world and back, but the main part of your computations is just not on the level of numpy.

So naturally C++ can be much faster than Python, because it doesn't have this overhead, but you have to do it right, which may be more effort.

mypalmike · on Nov 26, 2020

When I moved from C/C++ to Java on the 90s, I was amazed how casually things like hash tables would be used. A single function might create a couple hash tables, do some data juggling and sorting, and solve a problem in like 20 lines of readable, performant code. In my experience, C coders almost invariably employ less performant algorithms and data structures due to the incidental complexity involved in memory allocation, pointer indirection, etc.

trhway · on Nov 26, 2020

absolutely. Few years ago i got back deep into C++ after 18 years of mostly Java, and i'm amazed, it is like travel back in time - we (a large C++ platform) are looping over vectors instead of using hashtables/maps, copying strings all the time (we have 4 major types of strings plus some minor ones), and don't get me started on multithreading and locking/synchronization - such an easy and natural thing in Java - even these days the people in C++ world is still seem to be hesitant to use locking and as result of that hesitance push themselves into a myriad of multithreading issues when they risk and/or are forced to use threads. It a sight to behold when a customer waits for that one core/thread to finish the operation on a 400 core 16TB RAM machine - if it were Java all the 400 cores would be burning, the performance would still feel kind of so-so, and the customer would be building up the nerve to upgrade to 800 core 32 TB :)

>pointer indirection

30+ years ago, a student at University, i loved how pointers allowed to code algorithms efficiently and expressively compare to the non-pointer languages of the time. In the industry today the situation is completely opposite as correctly noted by the parent.

bluGill · on Nov 26, 2020

In many cases n is small enough that looping over a vector is faster than a hash lookup. The vector loop is very ry cache friendly, while the hash lookup typically involves a cache miss or two.

trhway · on Nov 26, 2020

it would be for a vector where values are stored in the vector entries. Instead it is usually pointers to strings or objects. So you pull from memory by pointer (i.e. cache unfriendly) every string, compare, etc... And small n is usually small until customer actually generates couple orders of magnitude more of those objects than it was in test/dev :)

dan-robertson · on Nov 26, 2020

I feel like often the opposite can be true as well. Obviously hashtables are better for large collections but often collections are very small (sizes typically follow a power law; most strings will be smaller than the pointer to their first character; most collections will only have a few elements) and arrays can win here due to less indirection and better locality.

Compare this to many functional languages (or lisps) where the most convenient data structure to hand is the singly linked list. It was already bad for performance when those languages were invented but in modern times they are relatively much worse than before.

Sometimes I think that the performance of C programs come more from the fact that it such a massive pain to do anything in C that you can usually only do the simplest thing and this tends to be fast and reliable. The problem is that if you can’t do a simple thing it will either take a lot of refactoring or you’ll do a complicated thing and your program will randomly segfault.

This is also my theory as to why languages like C have better libraries than lisp: it is such a monumental pain to do anything in C that people go to the small extra effort to package up their achievement into a library either for others or in case they need to solve the trivial problem again themselves. Improvements can them come later as needed but get shared. Compare this to, for example, lisp where the attitude is usually that libraries aren’t flexible enough and generally not that hard to implement yourself (so long as you aren’t so worried about data structures), and I think this is the reason there don’t tend to be so many libraries, especially for trivial things.

mypalmike · on Nov 27, 2020

Yeah I agree with this (apart from the lisp commentary, which I have no insight into).

dehrmann · on Nov 26, 2020

This is where I always hope a JVM will learn that, use escape analysis to keep things off the heap (they can do this), swap out data structures for more efficient ones for small amounts of data, or tune data structure initialization params (they can't do this).

mhh__ · on Nov 26, 2020

For that reason I generally motivate not using python to people on grounds of correctness rather than performance as per se. I can't be bothered to explain how compiler optimizations work so I usually just resort to "it's magic" when it comes to performance.

Going from a statically typed language to using python to do number crunching genuinely makes me want to vomit. I don't understand how people convince themselves that it's productive to write so many tests and (say) check types at runtime etc.

I actually love Python's syntax but the language design seems like it was thrown together over a weekend.

beagle3 · on Nov 26, 2020

Take a look at Nim if you haven't already -

It has a Pythonesque syntax, but strong static typing with extensive user control of semantics you seem to care about; e.g. floats and ints do not convert automatically unless you explicitly "import lenientops"[0] ; You can define 'operational transform' optimizations (such as: c <- c+a\b converts to multiply_accumulate(c,a,b) - which is a big performance difference for e.g matrices) that will be applied by the compiler so that your code shows what you mean (c=c+ab), and yet the compiler gets to compile the efficient version (using the relevant BLAS routine)

It's young, but has the best FFI for C,C++,Objective-C or JS you'll find anywhere on one hand, and already a good deal of native implementations, including e.g. a pure Nim BLAS that is comparable to within 5-10% with the best out there (including those with carefully hand optimized ASM kernels).

[0] https://nim-lang.org/docs/lenientops.html

elcritch · on Nov 26, 2020

I’d second that. I’ve been doing a bit of algorithm/NN stuff in Nim this year since it simplifies the “deployment” phase and it’s been surprisingly handy despite the limited libraries as there’s a C library for everything. It has a few rough edges but manageable.

It has the spirit of Python 2 but written by a compiler geek (in the good sense). If I were to write an HFT it’d be tempting to use. The new default GC is reference counting but without using atomic ref counts.

P.S. thanks for the lenientops tip

mhh__ · on Nov 26, 2020

I am involved with the foundation that runs the D programming language so I've found my home.

Other than syntax obviously.

FFI: Can nim use a C++ class and vtables? D does it all the time, nearly every language proclaims to have the best C++ interop but only D seems to be actually able to do it. Templates, classes, structs, and more all work.

We also have Mir, which I haven't benchmarked for a while but was faster than OpenBLAS and Eugene a few years ago and was recently shown to be faster than numpy (the c bits) on the forum.

beagle3 · on Nov 26, 2020

> FFI: Can nim use a C++ class and vtables? D does it all the time, nearly every language proclaims to have the best C++ interop but only D seems to be actually able to do it. Templates, classes, structs, and more all work.

Well, it depends on how you define "best". Beyond an ABI, FFI is obviously a function of the implementation rather than the language per-se.

The main Nim implementation can use C++ as a backend language, and when it does, it can use any C++ construct very easily by way of the .emit and .importcpp directives. For sure, classes, structs, exceptions all work, and IIRC templates do too (although you might need to instantiate a header yourself for each concrete type or something .... haven't done that myself). This implementation also means that it can use any C++17 or C++20 construct, including lambdas and friends. Does D's C++ interop support C++17? C++20? Can you guarantee it will support C++27? Nim's implementation already does, on every single platform you'll be able to use C++27 on (as long as C++27 can compile modern C++ code; there had been backward incompatible changes along the C++ history).

You can't just #include a C or C++ header and call it a day; You need to have a Nim compatible definition for any symbol (variable, macro, function, class, ...). There are tools that help you and make it almost as easy as #include, such as nimterop[0] and and nimline[1], and "c2nim" which is included with the Nim compiler is enough to generate the Nim definitions from the .h definitions (though it can't do crazy metaprogramming; if D can do that, then D essentially includes a C++ compiler. Which is a fine way to get perfect C++ compatibility - Nim does that)

But Nim can also do the same for JS when treating JS as a backend.

And it can basically do the same for Python, with nimpy[2] and nimporter, generating a single executable that works with your installed Python DLL (2.7, 3.5, 3.6, 3.7) - which is something not even Python itself can do. There was a similar Lua bridge, but I think that one is no longer maintained.

> We also have Mir, which I haven't benchmarked for a while but was faster than OpenBLAS and Eugene

There's quite a bit of scientific stack built natively with Nim. It is far from self-sufficient, but the ease with which you can use a C library makes up for it. I haven't used it, but Laser[3] is on par with or exceeds OpenBLAS speedwise, and generalizes to e.g. int32 and int64 matrix multiplication; Arraymancer[4] does not heve all of numpy's functionality but does have quite a few nice bits from scikit-learn, supports CUDA and OpenCL, and you can use numpy through nimpy if all else fails. Also notable is NimTorch[5]. laser and arraymancer are mostly developed by mratsim, who occasionally hangs out here on HN.

D is a fine language, I used it a little in the D1 days, and it was indeed a "better C++" but did not deliver enough value to be worth it for me, so I stopped. I know D2 is much better, but I've already found my better C++ (and better Python at the same time!) in Nim, so I haven't looked at it seriously.

[0] https://github.com/nimterop/nimterop

[1] https://github.com/sinkingsugar/nimline

[2] https://github.com/yglukhov/nimpy

[3] https://github.com/numforge/laser

[4] https://github.com/mratsim/Arraymancer

[5] https://github.com/sinkingsugar/nimtorch

carlmr · on Nov 26, 2020

>Going from a statically typed language to using python to do number crunching genuinely makes me want to vomit. I don't understand how people convince themselves that it's productive to write so many tests and (say) check types at runtime etc.

Totally agree, although I think strong typing is also important, not just static typing. C/C++ has a weak type system because of the automatic conversions it allows.

The problem with this discussion however is that the people that push for dynamic languages usually do it because it's "so fast" to do something in Python, and they won't test their software anyway.

bluGill · on Nov 26, 2020

Python is great for small programs that barely need any tests because you can tell it is correct by reading the code or running it until you get the right answer. when I have to maintain a large python program I agree, it is too easy to miss some error condition that doesn't work at all (anymore)

kortex · on Nov 26, 2020

It made more sense before the time of IDEs and code completion. I used to greatly prefer python to c++ or java when I actually had to type full names and look up interfaces like a pleb.

Now I can't stand untyped python. Have to actually look at documentation to see what methods a class has, who has time for that?

Writing golang with tabnine is otherworldly. It's so regular, not just the syntax, but the variable naming conventions, and course structure even, that it feels like loosely nudging the autocomplete in the direction of the goal.

sjaak · on Nov 26, 2020

An iron man is a triathlon, you are looking for steelman!

https://en.wiktionary.org/wiki/steelman

PaulDavisThe1st · on Nov 26, 2020

There is (was?) a Steelman triathlon too (actually, more than one of them, in different parts of the US).