as noted someone else, it is lock contention that doesn't scale, not mutable shared state. lock-free data structures, patterns like RCU ... in many cases these will scale entirely appropriately to the case at hand. A lot of situations that require high-scale mutable shared state have an inherent asymmetry to the data usage (e.g. one consumer, many writers; many consumers; one writer) that nearly always allow a better pattern than "wrap it in a mutex".
Mutable shared state is literally the nature of contention. It's true that locking is the mediocre default, but "avoid locks" is not a silver bullet. Alternatives have their own tradeoffs. If you "carefully design" a solution, it's probably because you're not just using an alternative but actually taking care to optimize, and because you have a specific use case (which you described).