This sounds a lot like how we used to do research, by reading books and writing any interesting quotes on index cards, along with where they came from. I wonder if prompting for that would result in better chunks? It might make it easier to review if you wanted to do it manually.
The fundamental problem of both keyword and embedding based retrieval is that they only access surface level features. If your document contains 5+5 and you search "where is the result 10" you won't find the answer. That is why all texts need to be "digested" with LLM before indexing, to draw out implicit information and make it explicit. It's also what Anthropic proposes we do to improve RAG.