Hacker Newsnew | past | comments | ask | show | jobs | submit | jsimian's commentslogin

Yeah, I just tried this out (props to the devs, super easy to set up) and my main gripe is the chunking algorithms aren't great - could be alot more useful with a context option that gives surrounding results to search results. The sliding window chunking method always cuts off the start of sentences.


I've found it works better to chunk by some logical sections in the document, e.g header h2 h3 h4 etc or 1.1 1.1.1 ... plus to be able to ignore some stuff (header and footer) plus other customizations.

At least for use cases where there are clusters of many similarly formatted documents, it would be cool to have a way of easily customizing chunking.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: