Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’ll join some other commenters, to add my favorite difficult pdf problem that I haven’t found a ready to use (even paid) solution for: extract key value pairs from a filled form such as this medical claims form:

https://imgur.com/a/EJDi7L7

There are two levels of difficulty: the starting file could be an image (pdf or png or jpg), which is the most difficult scenario. The slightly easier one is where it’s a text-based pdf so no OCR is needed.

I threw this as an image file at google form parser but it did poorly, I.e missed quite a few fields.



Dev here for the above stirling pdf app, Please raise features like this as a feature request github issue ticket and we can try address it in future!


I would do exactly what you have done here if I were the dev of the said app. But with the luxury of being an outsider, a user has expressed an inconvenience and it seems to make sense, then if I were to be the dev of the app here, wouldn't I go and create the ticket in whatever system with a link to this post instead of asking the user of the app to follow the red tape? I know there are places where this is not incentivised so this is a question for your org and not for you.


I see what you're saying and for simple features I agree However Without the OP creating the ticket there can be no feedback look on the feature. If i wanted it tested for their usecase, there input and confirmation on if its what they wanted and improvements for the workflow etc.. If I base the whole feature on this comment it could end up only doing half a job. Id rather have that communication loop open!


I tend to agree. As an open source dev myself, I avoid asking folks to create issues, as it puts a burden on the user. I’ve see some highly respected open source leads so this, and I’m not faulting them, as I think they’re coming from a good place; it may be a difference of opinion on what’s best practice.


Not OP. My take is that if the requester can’t be bothered to create a GH issue, it’s likely that this isn’t really a problem for them. An annoyance possibly but has not risen to “pain” levels.


This is open source software sir, it needs multiple steps to ensure users actually need these features and are willing to use them.



Their scummy website doesn’t list their prices in any way I can see. Hard pass.


Have you tried Azure AI Document Intelligence?

In theory it's exactly this...


I second this, that or have you tried GPT-4 Vision or Donut?


Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.

One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.


I'm fairly confident GPT-4V will do this just fine, tbh.

You just need to extract each of the elements into a structured JSON or something, right?

I'll try with your example later today.


Exactly, the form has filled values in named cells, so we need a JSON of cellName -> filledValue mappings.

Let me know how GPT-4V does!


I second trying GPT-4 Vision, though they have dumbed it down a bit since launch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: