I’ll join some other commenters, to add my favorite difficult pdf problem that I haven’t found a ready to use (even paid) solution for: extract key value pairs from a filled form such as this medical claims form:
There are two levels of difficulty: the starting file could be an image (pdf or png or jpg), which is the most difficult scenario. The slightly easier one is where it’s a text-based pdf so no OCR is needed.
I threw this as an image file at google form parser but it did poorly, I.e missed quite a few fields.
I would do exactly what you have done here if I were the dev of the said app. But with the luxury of being an outsider, a user has expressed an inconvenience and it seems to make sense, then if I were to be the dev of the app here, wouldn't I go and create the ticket in whatever system with a link to this post instead of asking the user of the app to follow the red tape? I know there are places where this is not incentivised so this is a question for your org and not for you.
I see what you're saying and for simple features I agree
However Without the OP creating the ticket there can be no feedback look on the feature.
If i wanted it tested for their usecase, there input and confirmation on if its what they wanted and improvements for the workflow etc..
If I base the whole feature on this comment it could end up only doing half a job. Id rather have that communication loop open!
I tend to agree. As an open source dev myself, I avoid asking folks to create issues, as it puts a burden on the user. I’ve see some highly respected open source leads so this, and I’m
not faulting them, as I think they’re coming from a good place; it may be a difference of opinion on what’s best practice.
Not OP. My take is that if the requester can’t be bothered to create a GH issue, it’s likely that this isn’t really a problem for them. An annoyance possibly but has not risen to “pain” levels.
Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.
One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.
https://imgur.com/a/EJDi7L7
There are two levels of difficulty: the starting file could be an image (pdf or png or jpg), which is the most difficult scenario. The slightly easier one is where it’s a text-based pdf so no OCR is needed.
I threw this as an image file at google form parser but it did poorly, I.e missed quite a few fields.