Zero or Sign Extend

jchw · on Oct 24, 2024

Of course doing the undefined thing works on almost any platform except DS9k, but that last formulation is quite elegant. It's a bit like byteswapping in that it's fairly simple to do but it's even simpler to not do by just never relying on the machine endianness.

Sesse__ · on Oct 24, 2024

Also shifts, especially variable-length shifts, are frequently slower than xor and add/sub (e.g., on x86, shl only works with cl and shlx has high latency), so that's another score for the xor variant.

CalChris · on Oct 24, 2024

Maybe for variable variable-length shifts but for constant variable-length shifts, SHL reg, imm8 is single cycle on recent x86_64 microarchitectures.

Sesse__ · on Oct 24, 2024

But xor and sub can go in way more ports, giving you higher throughput.

Findecanor · on Oct 24, 2024

When compiling for e.g. ARM, however, the two shifts tends to be compiled into a single bitfield-extract instruction.

RISC-V doesn't have such an instruction but there are cores that do macro-op fusion of just that sequence.

dzaima · on Oct 24, 2024

"way more" is 2 vs 4 ports (→5 for ≥alderlake); 1/cycle via shifts is probably good enough for most use-cases (though perhaps the more focused port pressure could be an issue with larger context).

And with hard-coded immediates xor+sub also ends up at twice the code size as shl+shr, so there's some trade-off. (but yeah if code size isn't a concern, xor+sub wins out)

Sesse__ · on Oct 26, 2024

Twice or more is pretty significant, wouldn't you say? :-) Code size is a valid concern, though.

eqvinox · on Oct 24, 2024

> ... this explicitly relies on shifting something into the sign bit, which depending on the exact flavor of language standard you’re using is either not allowed or at best fairly recently ..

An unsigned has no sign bit, so the left shift just needs to be unsigned to make it "technically correct".

(Remember to not use smaller than int types though, due to integer promotion issues)

fluoridation · on Oct 24, 2024

Yup. When you're twiddling bits you're better off using unsigned types in general anyway, and leaving converting to a signed type at the very end.

notepad0x90 · on Oct 25, 2024

There are explicit instructions on x64 and aarch64 for sign and zero extensions. There are also common patterns like 'xor r64, r64' to clear out a reg (and the zero-register arm variant). Why are there no higher level language abstractions for these types of patterns? Or maybe there are and I'm just unaware.

I would really like to see single operand operators, similar to 'i++'. '!!i' to do 'i~=i', '<<i' to do 'i=i<<1'.

The 'rep' instructions are also nice: https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz

Imagine doing something like this in C or Rust: 'int i=10;rep i printf("called %u times", i);' where rep would store the value of i in rcx, sets it to zero and stores in rax and jmp's to whatever function or codeblock you specified (could be inline code, or lambda expression), rcx (i) times, passing 'i''s current value optionally to the target code block. It would essentially be a shorthand form of 'for(int i=0;i<10;i++){printf("called %u times",i);}' except it's easier to use for simpler constructs like 'rep 8 <<i;' (just an example, you can just do 'i = i << 8;') if you combine it with my earlier proposed left shift operator.

toast0 · on Oct 25, 2024

> There are explicit instructions on x64 and aarch64 for sign and zero extensions. There are also common patterns like 'xor r64, r64' to clear out a reg (and the zero-register arm variant). Why are there no higher level language abstractions for these types of patterns? Or maybe there are and I'm just unaware.

The high level abstraction for xor r64, r64 is foo = 0. High level abstraction for sign/zero extension is casting to a larger type.

notepad0x90 · on Oct 25, 2024

I guess I was just being lazy and wanting something like '~foo'. If you won't want to change the type and are fine with losing bits, it's slightly more verbose, but I admit to being lazy again with that as well :(.

adrian_b · on Oct 25, 2024

If you define in C/C++ a structure with bit fields, and you declare the bit fields as being either signed or unsigned, assigning the bit fields to a bigger integer, which is appropriately signed or unsigned, will perform an optimal sign extension or zero extension, as necessary, without having to write any expression.

When you need sign extension or zero extension for conversion between standard integer types, you have to write only the type casting operator.

Having to use any of the inefficient tricks presented in the parent article is necessary only when you do not declare the correct types for your variables.

notepad0x90 · on Oct 25, 2024

I was think more about convenience than necessity.

Neywiny · on Oct 24, 2024

This is the perfect spot to use a bitfield. You can tell it signed or unsigned, and the compiler will deal with it all and optimize. No bit ops to get wrong or maintain. Very readable and scalable.

monocasa · on Oct 24, 2024

Endianness of bit fields changes with arch. Ie. Is the first bit field member the most or least significant bit range of the associated word.

cryptonector · on Oct 24, 2024

Yes, first you have to swab, if you have to swab.

monocasa · on Oct 24, 2024

Swab doesn't help in this case. It's the ordering of the bit fields within a word.

So like is

   struct test_struct {
     uint8_t a: 1;
     uint8_t b: 1;
     uint8_t x: 6;
   };

xxxxxxba or abxxxxxx?

ie. to get to member 'a' do you mask 0x80 or 0x01?

adrian_b · on Oct 26, 2024

You should never do yourself masking or shifting.

The compiler will always choose the appropriate machine instructions for the target ISA, which may have dedicated bit field extraction and bit field insertion instructions that are more efficient than the equivalent sequences of masking, shifting and merging instructions.

Moreover, the compiler will handle any endianness correctly.

In networking applications, where structures created on a computer may need to be used on another computer, the communication protocol will always serialize any data to a known format and the protocol implementation will provide conversion procedures to and from the native data formats.

So the only place where one may be concerned about endianness is when writing a conversion function between a native data format and some format specified for communication or storage. In this case it is known precisely which are the endiannesses for the native CPU and for the standard storage or communication data format.

For the standard format, its specification, e.g. an Internet RFC, will specify exactly the layout of the bits. For the native data format, you do not need to know the order of the bit fields. You just assign data to the structure members and or you assign the structure members to other variables and the compiler will take care of the layout.

monocasa · on Oct 26, 2024

Y'all are missing the point.

The difference between xxxxxxba and abxxxxxx is implementation defined, and changes with different architectures even with the same compiler.

That means in the "an Internet RFC [...] exactly [specified] the bits" case, you actually need multiple implementations for different architectures to be portable.

So waht you end up having is

   struct test_struct {
   #if ARCH_BIT_ENDIANNESS == BIG
     uint8_t a: 1;
     uint8_t b: 1;
     uint8_t x: 6;
   #else
     uint8_t x: 6;
     uint8_t b: 1;
     uint8_t a: 1;
   #endif
   };

for every struct for portable code that has to use bitfields with an externally defined bit pattern.

And it's not that goofy of archs that do it the opposite way of x86. PowerPC is a good example.

a_e_k · on Oct 24, 2024

That was the first alternative that I thought of. Experimenting with this:

    int sign_extend(int val_11b) {
        struct { int v : 11; } t = { val_11b };
        return t.v;
    }

in Compiler Explorer produces pretty much the same x86-64 assembly as the first function in the post (the shift left, then shift right version) under GCC, Clang, and MSVC when optimizations are turned on.

https://godbolt.org/z/d3Kf9fsE6

(But I do love that xor variant; that's really clever and clean.)

gpderetta · on Oct 24, 2024

At least GCC was very conservative in dealing with bitfields and, last time I bothered to check, generated suboptimal code.

epcoa · on Oct 24, 2024

Not in C or C++, at least, the bit and byte order is not defined.

edflsafoiewq · on Oct 24, 2024

But the width and signedness of a bitfield are defined at compile-time, while in this example they need to come from a format read at runtime.

cryptonector · on Oct 24, 2024

So? The author knows a priori the size of the int on the wire.

almostgotcaught · on Oct 24, 2024

> the compiler

I love when people say this as if there's exactly one compiler with a fixed implementation for whatever opt pass.

AlotOfReading · on Oct 24, 2024

That's not how this phrase is used. It usually encompasses any reasonably advanced compiler like clang, GCC, and sometimes MSVC.

Joker_vD · on Oct 24, 2024

But not including any of the slightly broken C compilers that the embedded hardware manufacturers provide (also, ICC neither)?

AlotOfReading · on Oct 24, 2024

I'm just providing examples, not excluding everything unmentioned.

adgjlsfhk1 · on Oct 24, 2024

as of a few years ago, ICC is just LLVM with some tweaked settings

IshKebab · on Oct 24, 2024

Eh the author's suggestions only seem better because C++ is insane.

The last one is definitely nice though!

vlovich123 · on Oct 24, 2024

Can you post examples in other languages where this would be easier?

samatman · on Oct 24, 2024

Zig:

    fn signExtend(raw: u11) i32 {
        return @as(i11, @bitCast(raw));
    }

    test "signExtend" {
        try expectEqual(1023, signExtend(1023));
        try expectEqual(-1, signExtend(2047));
    }

IshKebab · on Oct 24, 2024

Sure, in Rust:

  fn sign_extend_u11(x: u32) -> u32 {
    (((x as i32) << (32-11)) >> (32-11)) as u32
  }

Doesn't have any of the C++ issues he mentions. And it will be faster than the alternative since it's just two instructions. (Ok this is never going to matter in practice but still...)

TinkersW · on Oct 25, 2024

Shifts run on less ports than xor/sub, so should be avoided if possible when performance is important, especially when using SIMD where the shifts are often quite subpar.

dan-robertson · on Oct 25, 2024

Isn’t the alternative also two instructions:

  x ^ 0x400 - 0x400

vlovich123 · on Oct 25, 2024

But the C++20 code doesn't have this issue. That's my point. C++20 is 4 years old by this point.

nayuki · on Oct 30, 2024

The first piece of code is perfect in Java:

    int t = (val_11b << (32 - 11)) >> (32 - 11).