That sounds like a nit / premature optimization. Electricity is cheap. If this i...

cornholio · on Feb 16, 2024

The hardware requirements of a massively parallel algorithm can't possibly be "a nit" in any universe inhabited by rational beings.

djxfade · on Feb 16, 2024

Totally disagree. Most end users are on laptops and mobile devices these days, not desktop towers. Thus power efficiency is important for battery life. Performance per watt would be an interesting comparison.

true_religion · on Feb 16, 2024

What end users are working with arbitrary files that they don’t know the identification of?

This entire use case seems to be one suited for servers handling user media.

wongarsu · on Feb 16, 2024

File managers that render preview images. Even detecting which software to open the file with when you click it.

Of course on Windows the convention is to use the file extension, but on other platforms the convention is to look at the file contents

michaelmior · on Feb 16, 2024

> on other platforms the convention is to look at the file contents

MacOS (that is, Finder) also looks at the extension. That has also been the case with any file manager I've used on Linux distros that I can recall.

jdiff · on Feb 16, 2024

You might be surprised. Rename your Photo.JPG as Photo.PNG and you'll still get a perfectly fine thumbnail. The extension is a hint, but it isn't definitive, especially when you start downloading from the web.

r0ze-at-hn · on Feb 16, 2024

Browsers often need to guess a file type

michaelt · on Feb 16, 2024

Theoretically? Anyone running a virus scanner.

Of course, it's arguably unlikely a virus scanner would opt for an ML-based approach, as they specifically need to be robust against adversarial inputs.

michaelmior · on Feb 16, 2024

> it's arguably unlikely a virus scanner would opt for an ML-based approach

Several major players such as Norton, McAfee, and Symantec all at least claim to use AI/ML in their antivirus products.

scq · on Feb 16, 2024

You'd be surprised what an AV scanner would do.

https://twitter.com/taviso/status/732365178872856577

vertis · on Feb 16, 2024

I mean if you care about that you shouldn't be running anything that isn't highly optimized. Don't open webpages that might be CPU or GPU intensive. Don't run Electron apps, or really anything that isn't built in a compiled language.

Certainly you should do an audit of all the Android and iOS apps as well, to make sure they've been made in a efficient manner.

Block ads as well, they waste power.

This file identification is SUCH a small aspect of everything that is burning power in your laptop or phone as to be laughable.

_puk · on Feb 16, 2024

Whilst energy usage is indeed a small aspect this early on when using bespoke models, we do have to consider that this is a model for simply identifying a file type.

What happens when we introduce more bespoke models for manipulating the data in that file?

This feels like it could slowly boil to the point of programs using magnitudes higher power, at which point it'll be hard to claw it back.

vertis · on Feb 16, 2024

That's a slippery slope argument, which is a common logical fallacy[0]. This model being inefficient compared to the best possible implementation does not mean that future additions will also be inefficient.

It's the equivalent to saying many people programming in Ruby is causing all future programs to be less efficient. Which is not true. In fact, many people programming in Ruby has caused Ruby to become more efficient because it gets optimised as it gets used more (or Python for that matter).

It's not as energy efficient as C, but it hasn't caused it to get worse and worse, and spiral out of control.

Likewise smart contracts are incredibly inefficient mechanisms of computation. The result is mostly that people don't use them for any meaningful amounts of computation, that all gets done "Off Chain".

Generative AI is definitely less efficient, but it's likely to improve over time, and indeed things like quantization has allowed models that would normally to require much more substantial hardware resources (and therefore, more energy intensive) to be run on smaller systems.

[0]: https://en.wikipedia.org/wiki/Slippery_slope

diffeomorphism · on Feb 16, 2024

That is a fallacy fallacy. Just because some slopes are not slippery that does not mean none of them are.

samatman · on Feb 17, 2024

The slippery slope fallacy is: "this is a slope. you will slip down it." and is always fallacious. Always. The valid form of such an argument is: "this is a slope, and it is a slippery one, therefore, you will slip down it."

diffeomorphism · on Feb 17, 2024

No, it isn't.

samatman · on Feb 17, 2024

Yeah. Yeah, it is.

thfuran · on Feb 16, 2024

>This feels like it could slowly boil to the point of programs using magnitudes higher power, at which point it'll be hard to claw it back.

We're already there. Modern software is, by and large, profoundly inefficient.

underdeserver · on Feb 16, 2024

In general you're right, but I can't think of a single local use for identifying file types by a human on a laptop - at least, one with scale where this matters. It's all going to be SaaS services where people upload stuff.

prmph · on Feb 16, 2024

We are building a data analysis tool with great UX, where users select data files, which are then parsed and uploaded to S3 directly, on their client machines. The server only takes over after this step.

Since the data files can be large, this approach bypasses having to trnasfer the file twice, first to the server, and then to S3 after parsing.

DontSignAnytng · on Feb 16, 2024

This dont sound like very common scenario.