It allows you to create many more language threads (go routines) than kernel threads, and unlike other languages properly hides this abstraction from you with a rope stack (so you don't need to create coroutines or use async/await syntax).
I'm not up to speed with the latest and greatest in Haskell or Elixir, so they very well may have something similar. Rust has great runtimes, like tokio, that do similar things, but without rope stacks and with painful async/await syntax.
A M:N Scheduler was neither revolutionary nor rare even when Go was launched. Even a mainstream like C# already had one (albeit based on continuations, until C# 5.0 came out). At the same time, some mainstream programming languages either had similar third-party M:N stackful schedulers when go came out (gevent in Python) or got them a little while after (Quasar in Java).
Go's scheduler was only somewhat unique in the combination of features it pursued:
1. M:N
2. Stackful (i.e. unoptimized memory usage for each task/goroutine)
3. But using very small stacks[1] so it's easier to create a very large number of goroutines.
4. Integrated with the GC
5. Colorless functions
6. Built-in
7. No access to native threads
8. Not configurable or customizable.
9. Run as Native code, without using a virtual machine.
Colored functions (Async/await or Kotlin's suspend) are a matter of taste. They're heavily criticized for the burden they add, but advocates prefer the extra type-safety they provide. If you want to be able to statistically analyze or data races (or prevent them completely, as Rust does), I don't think you can avoid them.
Speaking of Rust, Rust did start with an M:N, work-stealing scheduler based on stackful coroutines. This scheduler was eventually removed[2] from the standard library, since it was deemed a bad match for a systems language.
Go was originally marketed as a systems language, but it was really a language that was optimized for writing concurrent servers by large teams of programmers with varying experience[3]. Specifically it was designed for server software at Google[4], and to replace the main place C++ was used for in Google: I/O-bound server software. That's why Go made very radical choices:
- Go Native (we still need good performance, but not C++-level)
- Maximize concurrency
- Make concurrency easy
- Use GC (we need the language to be much easier than C++)
- Minimize GC pause times (we need reasonable performance at server workloads)
This meant that the Go M:N Scheduler was usually the best performing stackful scheduler for server loads for a while. Interpreted languages like Python, Node and Lua were slower and were either single-threaded or had a GIL. Erlang and Java used a VM, and weren't optimized for low latency GC. C and C++ had coroutines libraries, but since these languages were not garbage-collected, it was harder to optimize the coroutine stack size.
I think it created a wrong impression that the Go scheduler was revolutionary or best-in-class. It never was. The groundbreaking thing that Go did is to optimize the entire language for highly concurrent I/O-bound workloads. That, along with a sane (i.e. non-callback-based) asynchronous model and great PR from being a Google language helped Go popularize asynchronous programming.
But I wouldn't say it is unmatched by any mainstream language nowadays. Java has low-latency GC nowadays and it's working on it's own Go-like coroutine implementation (Project Loom). But all mainstream JVM languages (Kotlin, Scala and Clojure) already have their own M:N schedulers.
Rust, in the meantime, is strictly more performant than Go: It's using state-machine based stackless coroutines, which emulate the way that manual asynchronous implementations (like Nginx) are done in C. You can't get more efficient than that. Not to mention that Rust doesn't have a GC and features more aggressive compiler optimizations.
[1] IIRC, initial stack sizes has changed between 2k-8k during the language lifetime, and the stack resizing mechanism has changed as well.