More

codewiz · on Nov 1, 2024

The very last line: "Putting this in words was as hard as writing the code."

codewiz · on Sept 17, 2024

I use KDE on Fedora every day, and it's great.

I didn't install any spins, just the Plasma desktop packages and the few KDE apps I use. GNOME still works, but I rarely log into it.

RattlesnakeJake · on Sept 17, 2024

Same here. I'd forgotten that it wasn't an Edition until I saw this.

codewiz · on Sept 11, 2024

As long as the environmental consequences fall entirely within the state borders, states should be allowed to decide independently.

However, when it comes to polluting rivers, sea and air, consequences of pollution are of often planet-wide. Thus, a global approach is required.

That said, the sooner Starship achieves full reusability, the sooner we'll stop burning rocket stages into the atmosphere and letting the incombustible parts fall into the ocean.

codewiz · on Aug 24, 2024

"This was made possible by carefully rethinking the aberration correction theory of optics."

Can someone explain how they carefully rethought the theory to reduce the number of mirrors from more than 6 to just 2 (or 4)?

DarkSucker · on Aug 24, 2024

The new design design uses on axis mirrors to image the photomask onto the wafer, and on axis mirror systems are far easier to design and fabricate compared to zig zag (off axis) systems. I've never designed an EUV system, and I guess that Shintake's team had to solve some materials or optical coating technology issues that allowed them to consider the simpler on axis design. Having worked on zig zag and on axis designs in the IR and VIS range, I can say that Shintake's design will be much (orders of magnitude?) easier to align and assemble.

codewiz · on June 11, 2024

Near the conclusion of this excellent blogpost:

"We live in a semi-barbaric age where science is probing the finest details of matter, space and time—but many of the discoveries, paid for by taxes levied on the hard-working poor, are snatched, hidden, and sold by profiteers."

GranularRecipe · on June 11, 2024

I agree with his sentiment, but his wording is rather offensive to barbarians. Profiteering is enabled by the fine civilisational invention called "intellectual property".

codewiz · on June 10, 2024

I agree that there are still a few issues, except VRR works just fine on my Wayland session with an amdgpu card. HDR, on the other hand...

codewiz · on June 10, 2024

I love how Andrej Karpathy explains things. His code implementing the feed-forward block of the transformer looks like this:

   def forward(self, x):
     x = x + self.attn(self.ln_1(x))
     x = x + self.mlp(self.ln_2(x))
     return x

This is how it's described (starting at 19:00 into the video):

"This is the pre-normalization version, where you see that x first goes through the layer normalization [ln_1] and then the attention (attn), and then goes back out to go to the layer normalization number two and the multilayer perceptron [MLP], sometimes also referred to as feed-forward network, FFN, and then that goes into the residual stream again."

"And the one more thing that's kind of interesting to note is: recall that attention is a communication operation, it is where all the tokens - and there's 1024 tokens lined up in a sequence - this is where the tokens communicate, where they exchange information... so, attention is an aggregation function, it's a pooling function, it's a weighted sum function, it is a reduce operation, whereas this MLP [multilayer perceptron] happens every single token individually - there's no information being collected or exchanged between the tokens. So the attention is the reduce, and the MLP is the map."

"And the transformer ends up just being repeated application of map-reduce, if you wanna think about it that way."

codewiz · on June 10, 2024

I love the way Andrej Karpathy explains things. The code for the feed-forward block of a transformer looks like this:

   def forward(self, x):
     x = x + self.attn(self.ln_1(x))
     x = x + self.mlp(self.ln_2(x))
     return x

This how Andrej describes it (starting at 19:00 into the video):

"This is the pre-normalization version, where you see that x first goes through the layer normalization [ln_1] and then the attention (attn), and then goes back out to go to the layer normalization number two and the multilayer perceptron [MLP], sometimes also referred to as feed-forward network, FFN, and then that goes into the residual stream again."

"And the one more thing that's kind of interesting to note is: recall that attention is a communication operation, it is where all the tokens - and there's 1024 tokens lined up in a sequence - this is where the tokens communicate, where they exchange information... so, attention is an aggregation function, it's a pooling function, it's a weighted sum function, it is a reduce operation, whereas this MLP [multilayer perceptron] happens every single token individually - there's no information being collected or exchanged between the tokens. So the attention is the reduce, and the MLP is the map."

"And the transformer ends up just being repeated application of map-reduce, if you wanna think about it that way."

codewiz · on June 10, 2024

Hidden fees remove any consumer-side pressure on credit cards to lower their costs.

It also creates perverse incentives for cards to pass part of the merchant fees back to the consumer as rewards or even cash. Here in the US, 2-3% cash back is typical, driving consumers to prefer credit over other payment methods.

Meanwhile, merchants are forced to bake the fees into the retail price, causing the paradox that those who pay upfront end up spending more for the same goods.

codewiz · on June 10, 2024

Thanks, I have updated the title.