I haven't heard of this paper, very interesting read! Thank you for bringing it up here. Resonates very well with the (little) experience I have from playing around with CNN-based surrogate models years ago.
The library looks great, thank you for building this!
Am I understanding it correctly that you use it to detect hidden relationships in your data to build better ML models? If so, have you tried it on non-NLP use cases?
Yes. If you look at some of my other replies here, I am using this approach successfully in another project to do robust near duplicate image detection. So it’s also using high dimensional vector embeddings, but they are embeddings of images using ResNet-style vision models rather than text based LLMs.