Honey, I shrunk {fmt}: bringing binary size to 14k and ditching the C++ runtime

magnio · on Sept 1, 2024

> All the formatting in {fmt} is locale-independent by default (which breaks with the C++’s tradition of having wrong defaults)

Chuckles

tialaramex · on Sept 1, 2024

It's really more of a committee thing - so we wouldn't necessarily expect fmt, a third party library, to have wrong defaults.

Astoundingly, when this was standardised (as std::format for C++ 20) the committee didn't add back this mistake (which is present in numerous other parts of the standard). Which does give small hope for the proposers who plead with the committee to not make things unnecessarily worse in order to make C++ "consistent".

ape4 · on Sept 1, 2024

You can pass in a locale as a parameter. (Of course this doesn't fix the default)

formerly_proven · on Sept 1, 2024

I'm filing a Defect Report about std::format disrespecting locale as we speak.

arunc · on Sept 1, 2024

How/where do you do that?

johannes1234321 · on Sept 1, 2024

See https://isocpp.org/std/submit-issue

tialaramex · on Sept 1, 2024

Of course, just because a defect is reported doesn't mean it'll get fixed, or that the fix will be of any use.

The most famous (technically a C defect) is probably DR#260: https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm

vitaut · on Sept 1, 2024

You send an email to the Library Working Group chair.

h4ck_th3_pl4n3t · on Sept 1, 2024

It's kind of mindblowing to see how much code floating point formatting needs.

The linked dragonbox [1] project is also worth a read. Pretty optimized for the least used branches.

[1] https://github.com/jk-jeon/dragonbox

ziml77 · on Sept 1, 2024

I learned how much floating point formatting needs when I was doing work with Zig recently.

Usually the Zig compiler can generate binaries smaller than MSVC because it doesn't link in a bunch of useless junk from the C Runtime (on Windows, Zig has no dependency on the C runtime). But this time the binary seemed to be much larger than I've seen Zig generate before and it didn't make sense based on how little the tool was actually doing. Dropping it into Binary Ninja revealed that the majority of the code was there to support floating point formatting. So I changed the code to cast the floating point number to an integer before printing it out. That change resulted in a binary that was down at the size I had been expecting.

delta_p_delta_x · on Sept 2, 2024

> Usually the Zig compiler can generate binaries smaller than MSVC because it doesn't link in a bunch of useless junk from the C Runtime (on Windows, Zig has no dependency on the C runtime)

MSVC defaults to linking against the UCRT, just like how Clang and GCC on Linux default to linking against the system libc. This is to provide a reasonably useful C environment as a sane default.

If you don't want UCRT under MSVC, supply `/MT /NODEFAULTLIB /ENTRY:<function-name>` in the command-line invocation (or in the Visual Studio MSBuild options).

It is perfectly possible to build a Win32-only binary that is fully self-contained and only around 1 KiB.

ziml77 · on Sept 2, 2024

Yep I've done that before, it's how I know linking to the C runtime is what bloats the binary. For most real projects I wouldn't disable linking it, but it's fun to see the output get so small.

pjmlp · on Sept 2, 2024

Also UCRT is kind of recent, Windows 10 timeframe.

account42 · on Sept 2, 2024

It's also irrelevant since you could write a native Win32 binary without and crt dependency before it.

account42 · on Sept 2, 2024

> It is perfectly possible to build a Win32-only binary that is fully self-contained and only around 1 KiB.

Good luck actually distributing that binary to users without all the various kinds of scareware in the way yelling DANGER.

delta_p_delta_x · on Sept 2, 2024

That's a matter of code signing and SmartScreen, both of which are completely orthogonal to how the binary is built.

account42 · on Sept 3, 2024

You may or may not be able to pay the protection money to get around the warnings but it is not at all orthogonal to how the binary is build - the scareware industry (both Microsoft as well as third parties) absolutely despises executables that deviate from the default MSVC output.

jk-jeon · on Sept 1, 2024

https://github.com/jk-jeon/dragonbox/discussions/57#discussi...

We have been doing some experiment on optimizing for size, and currently it can be reduced to ~3k on 8-bit AVR. It only contains impl/table for single-precision binary32, and double-precision requires quite more, but at the same time much of the bloat is due to how limited AVR is. On platforms like x64 it should be much smaller.

You can certainly say 3k is still huge though.

mananaysiempre · on Sept 1, 2024

> It's kind of mindblowing to see how much code floating point formatting needs.

If you want it to be fast. The baseline implementation isn’t terrible[1,2] even if it is still ultimately an implementation of arbitrary-precision arithmetic.

[1] https://research.swtch.com/ftoa

[2] https://go.dev/src/strconv/ftoa.go

vitaut · on Sept 2, 2024

If I interpret the numbers correctly it is of the order of ~1000 times slower than modern algorithms such as Dragonbox.

mananaysiempre · on Sept 2, 2024

Something like that.

The Dragonbox author reports[1] about 25 ns/conversion, Cox reports 1e5 conversions/s, so that’s a factor of 400. We can probably knock off half an order of magnitude for CPU differences if we’re generous (midrange performance-oriented Kaby Lake laptop CPU from 2017 vs Cox’s unspecified laptop CPU ca. 2010), but that’s still a factor of 100. Still a performance chasm.

You can likely get some of the performance back by picking the low-hanging fruit, e.g. switching from dumb one-byte bigint limbs in [0,10) to somewhat less dumb 32-bit limbs in [0,1e9). But generally, yes, this looks like a teaching- and microcontroller-class algorithm more than anything I’d want to use on a modern machine.

[1] https://github.com/jk-jeon/dragonbox/blob/master/README.md#p...

vitaut · on Sept 1, 2024

{fmt} has an optional implementation of the old Dragon4 algorithm that is smaller in terms of code size but not as fast.

franga2000 · on Sept 2, 2024

I'm guessing the majority of use-cases limit the number of decimal points that are printed, I wonder if it would be more efficient to multiply by the number of decimals, convert to int, itoa() and insert the decimal point where it belongs...

jk-jeon · on Sept 2, 2024

Not sure what you mean by decimal points. Did you mean the number of decimal digits to be printed in total, or the number of digits after the decimal dot, or something else?

In any case, what Dragonbox and other modern floating-point formatting algorithms do is already roughly what you describe: they compute the integer consisting of digits to be printed, and then print those digits, except:

- Dragonbox and some of other algorithms have totally different requirements than `printf`. The user does not request the precision, rather the algorithm determines the number of digits to print. So `1.2` is printed as `1.2` and `1.199999999999` is printed as `1.199999999999`. You can read about the exact requirements in the Readme page of Dragonbox.

- The core of modern floating-point formatting algorithms is on how to compute the needed multiplication by a power of 10 without needing to do it by the plain bignum arithmetic (which is incredibly slow). Note that a `float` (assuming it's IEEE-754 binary32) instance can be as large as 2^100 or as small as 2^-100. It's nontrivial to deal with these numbers without incorporating bignum arithmetic, and even if you just give up avoiding it, bignum arithmetic itself is quite nontrivial in terms of the code size it requires.

pzmarzly · on Sept 1, 2024

> However, since it may be used elsewhere, a better solution is to replace the default allocator with one that uses malloc and free instead of new and delete.

C++ noob here, but is libc++'s default allocator (I mean, the default implementation of new and delete) actually doing something different than calling libc's malloc and free under the hood? If so, why?

1000100_1000101 · on Sept 1, 2024

Not the strongest on C++ myself, but the new[] will attempt to run constructors on each element after calling the new operator to get the RAM. The delete[] will attempt to run destructors for each element before calling operator delete[] to free the RAM.

In order for delete[] to work, C++ must track the allocation size somewhere. This could be co-located with the allocation (at ptr - sizeof(size_t) for example), or it could be in some other structure. Using another structure lowers the odds of it getting trampled if/when something writes to memory beyond an object, but comes with a lookup cost, and code to handle this new structure.

I'm sure proper C++ libraries are doing even more, but you already get the idea, new and delete are not the same as malloc and free.

OskarS · on Sept 1, 2024

> In order for delete[] to work, C++ must track the allocation size somewhere.

That is super-interesting, I had never considered this, but you're absolutely right. I am now incredibly curious how the standard library implementations do this. I've heard normal malloc() sometimes colocates data in similar ways, I wonder if C++ then "doubles up" on that metadata. Or maybe the standard library has it's own entirely custom allocator that doesn't use malloc() at all? I can't imagine that's true, because you'd want to be able to swap system allocators with e.g. LD_PRELOAD (especially for Valgrind and stuff). They could also just be tracking it "to the side" in some hash table or something, but that seems bad for performance.

tom_ · on Sept 1, 2024

new[] and delete[] both know the type of the object. Therefore both know whether a destructor needs to be called.

When a destructor doesn't - e.g., new int[] - operator new[] is called upon to allocate N*sizeof(T) bytes. The code stores off no metadata. The result of operator new[] is the array address.

When a destructor does - e.g., new std::string[] - operator new[] is called upon to allocate sizeof(size_t)+N*sizeof(T) bytes. The code stores off the item count in the size_t, adds sizeof(size_t) to the value returned by operator new[], uses that as the address for the array, and calls T() on each item. And delete[] performs the opposite: fishes out the size_t, calls ~T() on each item, subtracts sizeof(size_t) from the array address, and passes that to operator delete[] to free the buffer.

(There are also some additional things to cater for: null checks, alignment, and so on. Just details.)

Note that operator new[] is not given any information about whether a destructor needs to run, or whether there is any metadata being stored off. It just gets called with a byte count. Exercise caution when using placement operator new[], because a preallocated buffer of N*sizeof(T) may not be large enough.

jeffbee · on Sept 1, 2024

jemalloc and tcmalloc use size classes, so if you allocate 23 bytes the allocator reserves 32 bytes of space on your behalf. Both of them can find the size class of a pointer with simple manipulation of the pointer itself, not with some global hash table. E.g. in tcmalloc the pointer belongs to a "page" and every pointer on that page has the same size.

Someone · on Sept 1, 2024

That doesn’t help for C++ if you allocated an array of objects with destructors. It has to know that you allocated 23 objects, so that it can call 23 destructors, not 32 ones, 9 of which on uninitialized memory.

jeffbee · on Sept 1, 2024

I believe the question was more around how the program knows how much memory to deallocate. The compiler generates the destructor calls the same way the compiler generates everything else in the program.

progmetaldev · on Sept 1, 2024

Isn't it also possible for other logic to run in a destructor, such as freeing pointers to external resources? Doesn't this cause (at the very least) the possibility for more advanced logic to be run beyond freeing the object's own memory?

c0balt · on Sept 2, 2024

Yes, it usually is. See, e.g., smart pointers.

bangaladore · on Sept 1, 2024

realloc is the same, as the old memory needs to be copied to the new memory.

pjmlp · on Sept 1, 2024

ISO C++ doesn't require new and delete default implementations to call down into malloc()/free().

Many implementations do it, only because it is already there and thus it is easy just to reach for them.

murderfs · on Sept 1, 2024

No, modulo the aligned allocation overloads, but applications are allowed to override the default standard library operator new with their own, even on platforms that don't have an equivalent to ELF symbol interposition.

masklinn · on Sept 1, 2024

That doesn't really explain where the dependency on the C++ runtime come from tho, as far as I know the dependency chain is std::allocator -> operator new -> malloc, but from the post the replacement only strips out the `operator new`.

Notably I thought the issue would be the throwing of `std::bad_alloc`, but the new version still implements std::allocator, and throws bad_alloc.

And so I assume the issue is that the global `operator new` is concrete (it just takes the size of the allocation), thus you need to link to the C++ runtime just to get that function? In which case you might be able to get the same gains by redefining the global `operator new` and `operator delete`, without touching the allocator.

Alternatively, you might be able to statically link the C++ runtime and have DCE take care of the rest.

gobblegobble2 · on Sept 1, 2024

> Notably I thought the issue would be the throwing of `std::bad_alloc`, but the new version still implements std::allocator, and throws bad_alloc.

The new version uses `FMT_THROW` macro instead of a bare throw. The article says "One obvious problem is exceptions and those can be disabled via FMT_THROW, e.g. by defining it to abort". If you check the `g++` invocation, that's exactly what the author does.

masklinn · on Sept 2, 2024

The author also compiles with `-fno-exceptions` which should already have the same behaviour.

kllrnohj · on Sept 1, 2024

Yes they could have just defined their own global operator new/delete to have a micro-runtime. Same as you'd do if you were doing a kernel in C++. Super easy, barely an inconvenience

vitaut · on Sept 2, 2024

Changing global new/delete is a non-starter in a reusable library. Allocator is a much more localized change and roughly the same amount of work.

janos95 · on Sept 1, 2024

The main point of replacing it with malloc is that new will throw std::bal_alloc so using it requires linking against the c++ runtime.

pjmlp · on Sept 2, 2024

Only if not using nothrow placement new syntax.

londons_explore · on Sept 1, 2024

I kinda hoped a formatting library designed to be small and able to print strings, and ints ought to be ~50 bytes...

strings are ~4 instructions (test for null terminator, output character, branch back two).

Ints are ~20 instructions. Check if negative and if so output '-' and invert. Put 1000000000 into R1. divide input by R1, saving remainder. add ASCII '0' to result. Output character. Divide R1 by 10. put remainder into input. Loop unless R1=0.

Floats aren't used by many programs so shouldn't be compiled unless needed. Same with hex and pointers and leading zeros etc.

I can assure you that when writing code for microcontrollers with 2 kilobytes of code space, we don't include a 14 kilobyte string formatting library...

vient · on Sept 1, 2024

It is a featureful formatting library, not simply a library for slow printing of ints and strings without any modifiers. You can't create a library which is full of features, fast, and small simultaneously.

jstimpfle · on Sept 1, 2024

You'd hope the unused stuff gets stripped out but I don't know much about this topic so not going to argue.

vlovich123 · on Sept 1, 2024

Ffunction-sections and fdata-sections would need at a minimum to be used to strip dead code. But even with LTO it’s highly unlikely this could be trimmed unless all format strings are parsed at compile time because the compiler wouldn’t know that the code wouldn’t be asked to format a floating point number at some point. There could be other subtle things that hide it from the compiler as dead code.

The surest bet would be a compile time feature flag to disable floating point formatting support which it does have.

Still, that’s 8kib of string formatting library code without floating point and a bunch of other optimizations which is really heavy in a microcontroller context

CoastalCoder · on Sept 1, 2024

I think this is one scenario where C++ type-templated string formatters could shine.

Especially if you extended them to indicate assumptions about the values at compile time. E.g., possible ranges for integers, whether or not a floating point value can have certain special values, etc.

vlovich123 · on Sept 1, 2024

You’d be surprised. I’m pretty sure std::format is templated. That doesn’t mean that it’s still easy to convince the compiler to delete that code.

josephg · on Sept 1, 2024

> it’s highly unlikely this could be trimmed unless all format strings are parsed at compile time

They probably should be passed at compile time, like how zig does it. It seems so weird to me that in C & C++ something as simple as format strings are handled dynamically.

Clang even parses format strings anyway, to look for mismatched arguments. It just - I suppose - doesn’t do anything with that.

vlovich123 · on Sept 2, 2024

That’s passed at compile time via template arguments and/or constexpr/consteval. Even still, there can be all sorts of reasons a compiler isn’t able to omit something as deeply integrated as floating point formatting from a generic floating point library. Rust handles this more elegantly with cargo features so that you could explicitly guarantee you’ve disabled floating point altogether in a generically reusable and intentional way (and whatever other features might take up space).

It’s also important to note that the floating point code only contributed ~44kib out of 75kib but they stopped once the library got down to ~23kib and then removed the c++ runtime completely to shave off another ~10kib.

However, it’s also equally important to remember that these shavings are interesting and completely useless:

1. In a typical codebase this would contribute 0% of overall size and not be important at all

2. A codebase where this would be important and you care about it (ie embedded) is not served well by this library eating up at least 10kib even after significant optimization as that 10kib that is intractible is still too large for this space when you’re working with a max ~128-256kib binary size (or even less sometimes).

jcelerier · on Sept 2, 2024

fmt support full compile-time processing of strings with FMT_COMPILE though: https://fmt.dev/latest/api/#format-string-compilation

jstimpfle · on Sept 2, 2024

The usual case is that libc is linked dynamically so it's not a problem spending a few KB for the library.

And run time format strings are a concise encoding for calling this functionality. I would assume that compile time alternatives take more space.

vitaut · on Sept 1, 2024

It is indeed possible to remove unused code with techniques like format string compilation but that's a topic for another post.

aseipp · on Sept 1, 2024

The design of any library for a microcontroller and an "equivalent" for general end user application is going to be different in pretty much every major design point. I'm not sure how this is any more relevant to fmt than it is just general complaining out in the open.

The code for an algorithm like Dragonbox or Dragon4 alone is already blowing your size budget, so the "optional" stuff doesn't really matter. And that's 1 of like 20 features people want.

fsckboy · on Sept 1, 2024

>I can assure you that when writing code for microcontrollers with 2 kilobytes of code space, we don't include a 14 kilobyte string formatting library...

then the thing to do is publish the libraries you do use, right, then document what formatting features they support? then other people might discover more and clever ways to pack more features in than you thought of

otherwise, I don't get your point.

jeroenhd · on Sept 1, 2024

I don't think the requirements for your specific programming niche should influence the language like that. Your requirements are valid, but they should be served by a bottom of the barrel microcontroller compiler rather than the language spec.

MobiusHorizons · on Sept 1, 2024

It’s relevant because the author mentions microcontrollers as the reason for focusing on binary size.

Karliss · on Sept 1, 2024

There are many orders of magnitude difference between smallest and higher end microcontrollers. You can have an 8bit micro with <1k of ram and <8k of flash memory, and you can have something with >8MB of flash memory or more running a RTOS possibly even capable of running Linux with moderate effort. In the later case 14k of formatting library is probably fine.

jeroenhd · on Sept 1, 2024

All of the optimisation work in this article is done for Linux aarch64 ELF binaries.

Besides that, microcontrollers have megabytes of storage these days. To be restricted by a dozen or two kilobytes of code storage, you need to be _very_ storage restricted.

I have run into code size issues myself when trying to run Rust on an ESP32 with 2MiB of storage (thought I bought 16MB, but MB stood for megabits, oops). Through tweaking the default Rust options, I managed to save half a megabyte or more to make the code work again. The article also links to a project where the fmt library is still much bigger (over 300KiB rather than the current 57KiB).

There are microcontrollers where you need to watch out for your dependencies and compiler options, and then there are _tiny_ microcontrollers where every bit matters. For those specific constraints, it doesn't make a lot of sense to assume you can touch every language feature and load every standard library to just work. Much older language features (such as template classes) will also add hundreds of kilobytes of code to your program already, you have to work around that stuff if you're in an extremely constrained environment.

The important thing with language features that includes targets like these is that you can disable the entire feature and enable your own. Sharing design goals between x64 supercomputers and RISC-V chips with literal dozens of bytes of RAM makes for an unreasonably restricted language for anything but the minimal spec. Floats are just expensive on minimum cost chips.

MobiusHorizons · on Sept 2, 2024

Pretty much anything in the Arm M0+ will be in the 4kb / 8kb / 16kb flash range and anything in the M3 class will typically be in the 64kb / 128kb flash range. These are small microcontrollers, yes, but they the typical microcontroller I would reach for when doing something that is not algorithmically intensive. Microcontrollers such as this are more than capable of responding to user input, running an LCD screen, or interfacing with peripherals. With 128kb maybe I could burn 14kb (~11%) on a string formatting library, but that seems pretty excessive to me.

IshKebab · on Sept 1, 2024

It isn't designed to be small; it's designed to be a fully featured string formatting library with size as an important secondary goal.

If you want something that has to be microscopic at the cost of not supporting basic features there are definitely better options.

> I can assure you that when writing code for microcontrollers with 2 kilobytes of code space, we don't include a 14 kilobyte string formatting library...

No shit. If you only have 2kB (unlikely these days) don't use this. Fortunately the vast majority of modern microcontrollers have way more than that. E.g. esp32 starts at 1MB. Perfectly reasonable to use a 14kB formatting library there.

londons_explore · on Sept 1, 2024

When you're designing something that sells for a dollar to retailers, eg. a birthday card that sings, your boss won't let you spend more than about 5 cents on the microcontroller, and probably wants you to spend 1-2 cents if you can.

edflsafoiewq · on Sept 1, 2024

Perhaps a singing birthday card doesn't need to format strings.

nikbackm · on Sept 1, 2024

How else would you get nice looking logs for debugging it?

a1o · on Sept 1, 2024

Using log4c

swagonomixxx · on Sept 1, 2024

I kind of get where you're coming from but at what point do we admit that such use cases are the fringe and not the main?

tialaramex · on Sept 1, 2024

> When you're designing something that sells for a dollar to retailers

Then you shouldn't prioritize compatibility with 1980s Unix code, which is what C++ is for.

IshKebab · on Sept 1, 2024

Sure, but such extreme use cases are rare and don't need to be constantly brought up.

cozzyd · on Sept 1, 2024

Even on larger microcontrollers you often have to write a bootloader...

consteval · on Sept 3, 2024

Ok but most of these use cases don't link to the standard libs anyway, even if you're writing a C program.

IshKebab · on Sept 1, 2024

Very occasionally I guess. They're almost always bare metal.

cozzyd · on Sept 1, 2024

You still want a bootloader to support firmware updates, typically in the first 8 kB of flash or something like that.

IshKebab · on Sept 1, 2024

Good point. I guess don't use `fmt` for that...

Narishma · on Sept 1, 2024

> esp32 starts at 1MB

Which models? The most I've ever seen on an ESP32 is 512KB of SRAM.

ta988 · on Sept 1, 2024

I think they are talking about the flash. The code is by default run from flash (a mechanism called XIP execute in place). But you can annotate functions (with a macro called IRAM_ATTR) that you want to have in ram if you need performance (you have to also be careful about the data types you use inside as they are not guaranteed to be put in RAM).

ska · on Sept 1, 2024

Sure, but there are also microcontrollers with a lot more space. This probably won’t ever usefully target the small ones, but that doesn’t mean it isn’t useful in the space at all.

progmetaldev · on Sept 1, 2024

As someone coming into this programming space, is it common to be running a debugger that can read the float, as opposed to someone brand new to the space printing out a value for debugging. Not specifically comments like yours, but in general the assumption that others should know their tools as strongly as those profess in limited environment spaces, makes it seem that this is a space only for those with a high level of knowledge. That is absolutely not a knock on your knowledge or ability if that is just how things are, I'm sure there are specific areas of computing that require a vast knowledge of how things work without having a library to debug your issues into the way a simple print statement works. I have worked in limited environments, so I am curious specifically about a microcontroller environment like londons_explore specified.

I learned to program in Atari 8-bit systems, and know there are limitations on what you can output. londons_explore had a completely valid comment. I was just looking for a perspective for developers getting involved in microcontroller environments, and how they could best debug their code. We all know that a debugger is better than a "print" statement, but not always the fastest, especially with logic. If my answer is just "debugging", and there isn't another, that satisfies me. I am always looking for unique ways developers solve problems in ANY environment, because I would enjoy being able to work in any environment that's best for the solution being provided. I guess I have been blessed with environments where the client is willing to pay for something higher-level than what is actually required.

I enjoy all aspects of development across devices, so I hope nobody took my comment as a challenge, it absolutely was not.

sixfiveotwo · on Sept 1, 2024

> I can assure you that when writing code for microcontrollers with 2 kilobytes of code space, we don't include a 14 kilobyte string formatting library...

I'm pretty sure you wouldn't use C++ in that situation anyway, so I don't really see your point.

usrnm · on Sept 1, 2024

If you get rid of the runtime, which most compilers allow you to do, C++ is just as suitable for this task as C. Not as good as hand-rolled assembly, but usable

sixfiveotwo · on Sept 1, 2024

Okay, vtables in 2kb code space?

CyberDildonics · on Sept 1, 2024

Where are the vtables coming from if you use no inheritance and no memory allocation?

sixfiveotwo · on Sept 2, 2024

Why program in C++ if you don't use any of its features? You're just writing C in disguise.

pjmlp · on Sept 2, 2024

A C in disguise with better strong typing, compile time programming that beats hands down the preprocessor, while having the preprocessor available if one insists in using it, being able to design abstractions with bounds checking, namespaces instead of 1960's prefixes,....

sixfiveotwo · on Sept 2, 2024

Template specialization means more generated code, which, again, must fit in 2kb.

On very old archs (16bits, 8bits), there was no OO, because the code could not be complex enough to warrant it. It could not be complex because there wasn't room enough.

That being said, there are templated libraries (like fmt...) which may result in zero overhead in code size, so if the thread OP is using C++, then surely he could also use that library...

ender341341 · on Sept 2, 2024

> Template specialization means more generated code, which, again, must fit in 2kb.

modern compilers are way better about de-duplicating "different types but same instruction" template specializations so it's less of an issue than you may expect , especially if you're coming with template specialization generation from the mid/late 2000's.

pjmlp · on Sept 2, 2024

No different from copy-paste hard to debug preprocessor code.

sixfiveotwo · on Sept 2, 2024

This is not my claim.

pjmlp · on Sept 2, 2024

It actually is, as it is voiced in a way as if preprocessor was better than using templates.

sixfiveotwo · on Sept 2, 2024

You'll have to quote me, I don't see where I am implying that.

But anyway, the point that I am refuting is the use of C++ to write programs for such an extremely constrained runtime environnement, and at the same time refuse to use this library (which is template based afaik).

pjmlp · on Sept 2, 2024

"Template specialization means more generated code, which, again, must fit in 2kb."

Have fun,

"Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17"

https://youtu.be/zBkNBP00wJE?si=im0ga3rW08mR8g8f

sixfiveotwo · on Sept 3, 2024

That quote does not imply anything about C macro being better in any way, but I am nevertheless delighted that you could prove me wrong somehow. Thank you for this!

CyberDildonics · on Sept 2, 2024

I don't write anything new and use inheritance anyway.

Even in an embedded context you have classes and destructors, operator overloading and templates. You can still make data structures that exist in flat constrained memory.

sixfiveotwo · on Sept 2, 2024

Sure, but 2kb code space? What sort of class hierarchy can you possibly use in so little space?

consteval · on Sept 3, 2024

C++ has namespaces, C does not. That one singular feature makes C++ worthy of being used over C.

CyberDildonics · on Sept 2, 2024

What class hierarchy?

Even demo scene people use C++ and windows binaries can start at 1KB. Classes, operator overloading and destructors are all still useful. There is no reason there has to be more overhead than C.

sixfiveotwo · on Sept 3, 2024

Right, they also tend to use special tools to crunch their binaries down to acceptable size, afaik.

That said, that's a very good point. Maybe they'd even use that fmt library?

CyberDildonics · on Sept 3, 2024

What point are you even trying to make now? Every time you've been wrong about something and corrected you don't acknowledge it and just move on to something barely related.

https://en.wikipedia.org/wiki/Gish_gallop

You said this:

I'm pretty sure you wouldn't use C++ in that situation anyway, so I don't really see your point.

People have pointed out to you why you both can and would use C++, especially in place of C.

sixfiveotwo · on Sept 6, 2024

Thank you for bringing this to my attention. I will try to remember that

Let me coin a new term to help me in that task: the "swarm gallop", which would be that technique applied by a group of people, instead of just one person (somewhat like a DDOS).

Thank you.

ta988 · on Sept 1, 2024

You can use c++ yes and a lot of people do. You just keep the exceptions, stdlib and runtime in general at the door.

formerly_proven · on Sept 1, 2024

avrlibc's small variant of printf (which still has a ton of features) is like 600 bytes.

maccard · on Sept 1, 2024

What do you use instead?

Iostream is… far bigger than this, for example.

londons_explore · on Sept 1, 2024

most platforms come with their own libraries for this, which are usually a mix of hand coded assembly and C. You #include the whole library/sdk, but the linker strips out all bits you don't use.

Even then, if you read the disassembled code, you can usually find within a few minutes looking some stupid/unused/inefficient code - so you could totally do a better job if you wrote the assembly by hand, but it would take much more time (especially since most of these architectures tend to have very irregular instruction sets)

maccard · on Sept 1, 2024

If you’re just going to use the platform built in, then the size of a third party library doesn’t matter to you.

criddell · on Sept 1, 2024

If you only have 2 kB of code space, you would likely be doing custom routines in assembly that do exactly what you need and nothing more.

maccard · on Sept 1, 2024

Right - so no matter how small libfmt gets Op isn’t going to use it

Sharlin · on Sept 1, 2024

I presume the sort of custom routines that GP described?

jstimpfle · on Sept 1, 2024

Curious what space space you work in? What kind of devices, what are they used for?

londons_explore · on Sept 1, 2024

Not me but a friend. Things like making electronics for singing birthday cards and toys that make noise.

But there are plenty of other similar things - like making the code that determines the flashing pattern of a bicycle light or flashlight. Or the code that does the countdown timer on a microwave. Or the code that makes the 'ding' sound on a non-smart doorbell. Or the code that makes a hotel safe open when the right combination is entered. Or the code that measures the battery voltage on a USB battery bank and puts 1-4 indicator LED's on so you know how full it is.

You don't tend to hear about it because the design of most of this stuff doesn't happen in the USA anymore - the software devs are now in China for all except high-end stuff.

furyofantares · on Sept 1, 2024

Do any of those need a string formatting library?

toast0 · on Sept 1, 2024

Hotel safe might, if it logs somewhere (serial port?).

The others may have a serial port setup during development, too. If you have a truly small formatter, you can just disable it for final builds (or leave it on, asssuming output is non blocking, if someone finds the serial pins, great for them), rather than having larger rom for development and smaller for production.

londons_explore · on Sept 1, 2024

mostly used for debugging with "printf debugging" - either on the developers desk, or in the field ("we've got a dead one. Can you hook up this pin to a USB-serial converter and tell me what it's saying?")

jcelerier · on Sept 2, 2024

> strings are ~4 instructions (test for null terminator, output character, branch back two).

that's C strings. You also need to handle (size, data) strings like std::string_view

For the record, here's what fmt allows (from the docs):

    fmt::print(fmt::emphasis::bold | fg(fmt::color::red)
             , "Elapsed time: {0:.2f} seconds", 1.23);
    fmt::print("Elapsed time: {0:.2f} seconds"
             , fmt::styled(1.23, fmt::fg(fmt::color::green)  fmt::bg(fmt::color::blue)));
 
    fmt::print("{}", fmt::join(std::vector<int>{1, 2, 3}, ", "));

    fmt::print("strftime-like format: {:%H:%M:%S}\n", 3h + 15min + 30s);

you really think you can replicate that in 50 bytes?

secondcoming · on Sept 1, 2024

Would someone writing code for a 2Kb microcontroller even be using full-fledged C++, or just C With Classes?

londons_explore · on Sept 1, 2024

It's still full fledged C++, you just don't use many of the features, and the compiler leaves out all of the associated code.

Pretty easy to accidentally use some iostream and accidentally pull in loads of code you didn't want though.

astrobe_ · on Sept 1, 2024

To me the only reason to use C++ over C in that case is a the slightly stronger type-checking and maybe some extra syntactic sugar.

dinkumthinkum · on Sept 1, 2024

What? I mean, you realize that fmtlib is much more complicated than that, right? What you are describing is something very basic, primitive by comparison. I’m also puzzled why you think floats are not used by many programs, that’s kind of mind-boggling. I get that you wouldn’t load it on a microcontroller but you wouldn’t do that with the standard library either.

ptspts · on Sept 1, 2024

Shameless plug: printf(Hello, World!\n"); is possible with an executable size of 1008 bytes, including libc with output buffering: https://github.com/pts/minilibc686

Please note that a direct comparison would be apples-to-oranges though.

jart · on Sept 1, 2024

That's because the compiler turns it into fputs

a1o · on Sept 1, 2024

> Considering that a C program with an empty main function is 6kB on this system, {fmt} now adds less than 10kB to the binary.

Interesting, I've never done this test!

JonChesterfield · on Sept 1, 2024

It varies widely with whether the C library is dynamically or statically linked and with how the application (and C library) were built. And on which C library it is. Also a little on whether you're using elf or some other container.

neonsunset · on Sept 1, 2024

It's always fmt. Incredibly funny that this exact problem now happens in .NET. If you touch enough numeric (esp. fp and decimal) formatting/parsing bits, linker ends up rooting a lot of floating point and BigInt related code, bloating binary size.

pjmlp · on Sept 1, 2024

Still looking forward for the Delphi like experience with Native AOT, thankfully getting better.

msephton · on Sept 1, 2024

Very enjoyable. I love these sort of thinking outside the box optimisations.

rty32 · on Sept 1, 2024

Maybe I am slow, it took me a while to realize the "14k" in the title refers to "14kB"

hrydgard · on Sept 1, 2024

What else would it possibly mean?

k is very common shorthand for kB, at least historically.

Rygian · on Sept 1, 2024

14000 lines of assembler?

rty32 · on Sept 3, 2024

Something along that line (oops, pun not intended). I was thinking 14000 of something, not sure exactly what that is, until I realized it was kind of obvious.

Indeed a decade ago I would not have any doubts. Using "k" to refer to "kB" was much more common.