I spent a good amount of time last year working on a system to analyse patent do...

Adityav369 · 2025-07-21T22:25:49 1753136749

You can ask the model to describe the image, but that is inherently lossy. What if it is a chart and the model gets most x, y pairs, but the user asks about a missing "x" or "y" value. Presenting the image at inference is effective since you're guaranteeing that the LLM is able to answer exactly the user's question. The only blocker here becomes how good retrieval is, and that's a smaller problem to solve. This approach allows us to only solve for passing in relevant context, the rest is taken care of by the LLM, otherwise the problem space expands to correct OCR, parsing, and getting all possible descriptions to images from the model.

monkeyelite · 2025-07-21T21:53:34 1753134814

This is a great example of how to use LLMs thanks.

But it also illustrates to me that the opportunities with LLMs right now are primarily about reclassifying or reprocessing existing sources of value like patent documents. In the 90-00s many successful SW businesses were building databases to replace traditional filing.

Creating fundamentally new collections of value which require upfront investment seems to still be challenging for our economy.

cheschire · 2025-07-21T21:44:06 1753134246

how often has the model hallucinated the image though?