The official challenge is in Java, but there are unofficial solutions in the "Show and tell" categories, using other languages, databases, and stuff (even a Fortran solution that takes 3 seconds without going too far in optimization):
I like this discussion because while the original is about "what can you do in 2024 stock java", I'm more interested in "what can you do if you use naive, obvious solutions". AWK does this in 6 minutes but, as pointed in this comment:
> It's either 6 minutes with 20 lines of AWK written in five minutes. Or 6 seconds in 300 lines of java written in five days. Classic "XKCD: Is It Worth the Time?".
You can both have five minutes of work and some extra JVM speed juice over awk. The following Kotlin Script runs in ~2m46s on my M1 Mac. This is faster than the baseline and a bit more complex than it otherwise could be to avoid using too much RAM.
I think it's also more readable than the awk for people who don't already know the language whilst being of a similar length, but that might be just me. More usefully you can get type checking, autocompletion, use any libraries you want and so on.
class Result(var lo: Double = Double.MAX_VALUE, var hi: Double = Double.MIN_VALUE, var total: Double = 0.0, var samples: Int = 0) {
val mean get() = total / samples
}
val results = HashMap<String, Result>()
Path("measurements.txt").useLines { it
.map { it.substringBefore(";") to it.substringAfter(";").toDouble() }
.forEach { (name, measurement) ->
with(results.getOrPut(name) { Result() }) {
lo = min(lo, measurement)
hi = max(hi, measurement)
total += measurement
samples++
}
}
}
results.toSortedMap().mapValues { (_, m) -> "%.1f/%.1f/%.1f".format(m.lo, m.mean, m.hi) }.also(::println)
You can do this sort of thing with jbang, although for some reason Gunnar's baseline is slower. Maybe I'm doing something wrong.
> although for some reason Gunnar's baseline is slower. Maybe I'm doing something wrong.
I suspect it's because the baseline code is adding each line to a TreeMap before processing, effectively sorting 1B rows and then processing them. Your code reduces to the 413 unique stations and then sorts those.
I feel like Gunnar purposefully wrote the baseline with lots of room for improvement to encourage participation.
Yes, I think you're right. The way he defines two classes for holding the results is another hint that it was built to be easily optimized :) And, well, you gotta hand it to him. That really went viral!
That's really interesting and I like seeing what can be done in more "evolved" languages than awk. Awk has the advantage of being everywhere and not needing installation phases, but that's a price you only pay once.
Your Kotlin code feels more imperative to me even though I'm used to that kind of things; I like that awk still feels more like describing stuff, but that's only me.
I didn't do it that way because with one billion rows, it runs out of memory. The imperative version that mutates in place doesn't.
You can probably write a functional version that also doesn't run out of memory using .groupingBy{}, because it all applies to lazy sequences. I just didn't bother working out how. The Java streams framework can do the same things and has the advantage that it can sometimes make it easy to parallelize.
> It's either 6 minutes with 20 lines of AWK written in five minutes. Or 6 seconds in 300 lines of java written in five days. Classic "XKCD: Is It Worth the Time?".
I can bet that some 2 min 50 seconds in baseline took by byte -> String conversion. Java is pretty bad at this, especially if default Charset is UTF8. Even using ASCII at Files.lines() call may improve performance dramatically.
All leaders work directly with bytes. All other tricks is just for sub-second improvements.
Did solve similar problem with log files.
Yeah strings are very expensive, saw huge gains in my submission after removing them. Parallelization gets the runtime down to 30s , After removing strings you get to around 15s on the eval machine, the rest of the tricks are still very relevant to get to sub 10s.
>"It's either 6 minutes with 20 lines of AWK written in five minutes. Or 6 seconds in 300 lines of java written in five days. Classic "XKCD: Is It Worth the Time?"
It's exactly this. Is it going to be run many times or just once or a few times? Is it going to be run in production where it impacts customers or runs up the AWS bill? If it's a throwaway script, the Linux terminal and some Awk is hard to beat. There are always tradeoffs.
https://github.com/gunnarmorling/1brc/discussions/categories...
I like this discussion because while the original is about "what can you do in 2024 stock java", I'm more interested in "what can you do if you use naive, obvious solutions". AWK does this in 6 minutes but, as pointed in this comment:
https://github.com/gunnarmorling/1brc/discussions/171#discus...
> It's either 6 minutes with 20 lines of AWK written in five minutes. Or 6 seconds in 300 lines of java written in five days. Classic "XKCD: Is It Worth the Time?".