Scalability is adapting the characteristics of your instances to demand increase or decrease. You can adapt the number of instances (horizontal) or adapt the CPU, RAM, I/O characteristics of every single instance (vertical).
A basic rule of thumb I try to follow is: as long as you have at least two instances, don't immediately think about hundreds or thousands of instances for spikes that will never exist. Scaling from 1 to 2 (to remove SPOF) is much harder than scaling from 2 to 3, or 20. You have already removed some implicit "localhost" assumptions.
Of course, the two instances rule is for stateless components. For stateful systems you want at least three. But even then, you'd be surprised by how much stuff you can run on 2-3 decently sized virtual machines and a load balancer.
A basic rule of thumb I try to follow is: as long as you have at least two instances, don't immediately think about hundreds or thousands of instances for spikes that will never exist. Scaling from 1 to 2 (to remove SPOF) is much harder than scaling from 2 to 3, or 20. You have already removed some implicit "localhost" assumptions.