I get into this a bit later on, but I think the exact same model applies to pointers: You're much better off in most cases thinking of pointers as pointing at the zero-width points between elements, than at elements themselves.
I'm also having trouble meshing this way of thinking about indexes with the idea of a fixed bit-width, discretely addressable RAM, which would suggest that there is nothing "between" two storage elements.
I find it very useful, however, for imagining what the returned insertion point index of a binary search would mean, when the item you are looking for can not be found.
Isn't it more obvious that the indices are the addresses of the locations if the boxes are arranged vertically? Besides the boxes should contain their values not their address, an address is attached to a box, and to one box only. Better off indeed thinking of pointer existing in completely different boxes and their values as pointing at the zero-width sides of other boxes :)