More

pierre · 2026-03-20T02:25:28 1773973528

A new document (including PDF) parser that outperform traditional tool such as PyPDF or MuTools.

Link to open source repo: https://github.com/run-llama/liteparse

pierre · 2026-03-19T07:39:44 1773905984

Contributor here, happy to answer any questions!

pierre · 2025-10-24T07:15:48 1761290148

Demo: https://olmocr.allenai.org/ Paper: https://arxiv.org/abs/2510.19817

pierre · 2025-07-25T13:44:24 1753451064

Main issue is that token are not equivalent across provider / models. With huge disparity inside provider beyond the tokenizer model:

- An image will take 10x token on gpt-4o-mini vs gpt-4.

- On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

- ...

Having the price per token is nice, but what is really needed is to know how much a given query / answer will cost you, as not all token are equals.

alexellman · 2025-07-25T14:10:28 1753452628

yeah I am going to add an experiment that runs everyday and the cost of that will be a column on the table. It will be something like summarize this article in 200 words and every model gets the same prompt + article

bigiain · 2025-07-26T02:11:55 1753495915

For me, and I suspect a lot of other HN readers, a comparison/benchmark on a coding task would be more useful. Something small enough that you can affordably run it every day across a reasonable range of coding focused models, but non trivial enough to be representative of day to day AI assisted coding.

One other idea - for people spending $20 or $200/month for AI coding tools, a monitoring service that tracks and alerts on detected pricing changes could be something worth paying for. I'd definitely subscribe at $5/month for something like that, and I'd consider paying more, possibly even talking work into paying $20 or $30 per month.

BonoboIO · 2025-07-25T13:48:01 1753451281

On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

Can you elaborate this? I don’t quite understand the difference.

rsanek · 2025-07-25T22:52:55 1753483975

I hadn't heard of this before either and can't find anything to support it on the pricing page.

https://ai.google.dev/gemini-api/docs/tokens

pierre · on March 3, 2025

LlamaIndex is building a platform for AI agents that can find information, synthesize insights, generate reports, and take actions over the most complex enterprise data.

We are seeking an exceptional engineer to join our growing LlamaParse team. Will work at the intersection of document processing, machine learning, and software engineering to push the boundaries of what's possible in document understanding. As a key member of a focused team, will have significant impact on our product's direction and technical architecture.

We are also hiring for a range of other roles, see our career page:

- Backend Software Engineer

- Forward Deploy Engineer

- Founding AI Engineer

- Open Source Engineer Python

- Founding Lead Product Manager

- Platform Engineer

- Senior Developer Relation Engineer

- Senior / Staff Backend Engineer

- Product Marketing Manager

nathan_douglas · on March 3, 2025

Hi Pierre, I see that the Platform Engineer position (which probably matches me most) says it's Hybrid. I'm very interested, but I live in Ohio. I understand sometimes things get clicked on accident, and just wanted to know if there might be an issue with this listing or if it's truly hybrid and the one you posted is remote, etc. Don't want to gum up the works :)

shadoweos · on March 6, 2025

You mention Product Manager but the role isn't mentioned on the career page.

pierre · on Feb 28, 2025

If you want to try agentic parsing we added support for sonnet-3.7 agentic parse and gemini 2.0 in llamaParse. cloud.llamaindex.ai/parse (select advanced options / parse with agent then a model)

However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.

pierre · on Jan 3, 2025

This is a nice UI for end users, however it seems to be a seems wrapper on top of mutool, which is distributed as AGPL. If you want to process PDF locally, legally and safely you should use their CLI instead.

cess11 · on Jan 3, 2025

How did you figure that out? Couldn't it be Poppler as well?

pierre · on Jan 3, 2025

I read the output header, and see the Artifex (mutools / gs team) headers

cess11 · on Jan 3, 2025

Alrighty, that's a smoking gun.

pierre · on Sept 22, 2024

Parsing docs using LVM is the way forward (also see OCR2 paper released last week, people are having ablot of success parsing with fine tunned Qwen2).

The hard part is to prevent the model ignoring some part of the page and halucinations (see some of the gpt4o sample here like the xanax notice:https://www.llamaindex.ai/blog/introducing-llamaparse-premiu...)

However this model will get better and we may soon have a good pdf to md model.

fzysingularity · on Sept 22, 2024

We’ve been doing exactly this by doubling-down on VLMs (https://vlm.run)

- VLMs are way better at handling layout and context where OCR systems fail miserably

- VLMs read documents like humans do, which makes dealing with special layouts like bullets, tables, charts, footnotes much more tractable with a singular approach rather than have to special case a whole bunch of OCR + post-processing

- VLMs are definitely more expensive, but can be specialized and distilled for accurate and cost effective inference

In general, I think vision + LLMs can be trained to explicitly to “extract” information and avoid reasoning/hallucinating about the text. The reasoning can be another module altogether.

yigitkonur35 · on Sept 22, 2024

I did a ton of Googling before writing this code, but I couldn't find you guys anywhere. If I had, I'd have definitely used your stuff. You might want to think about running some small-scale Google Ads campaigns. They could be especially effective if you target people searching for both LLM and OCR together. Great product, congratz!

fzysingularity · on Sept 22, 2024

Hey, thanks! DM me if you want to test it out (sudeep@vlm.run).

Agreed on SEO - we’re redoing our landing page and searchability. We recently rebranded, hence the lack of direct search hits for LLM / OCR.

authorfly · on Sept 22, 2024

What about combining old school OCR with GPT visual OCR?

If your old school OCR output has output that is not present in the visual one, but is coherent (e.g. english sentences), you could get it back and slot it into the missing place from the visual output.

yigitkonur35 · on Sept 22, 2024

You're absolutely right. I use PDFTron (through CloudCovert) for full document OCR, but for pages with fewer than 100 characters, I switch to this API. It's a great combo – I get the solid OCR performance of SolidDocument for most content, but I can also handle tricky stuff like stats, old-fashioned text, or handwriting that regular OCR struggles with. That's why I added page numbers upfront.

fkilaiwi · on Sept 22, 2024

what paper are you referring to?

perrywky · on Sept 23, 2024

I guess this: https://arxiv.org/html/2409.01704v1

pierre · on July 26, 2024

yes, you can pass an array of path to the extract function.

pierre · on July 3, 2024

Good article, but what is the alternative? What can you build today as a software engineer that can have impact? Nothing seems to come close to AI / AI infra, even of its hard / risky / a moving landscape.

mlsu · on July 3, 2024

I would almost invert that statement. Sorry if this comes off ranty, but what exactly are people doing in the "AI space" currently that isn't "undifferentiated spam/chatbot" being sold to non-techies who heard about AI on NPR? What are real people using "AI" for that is so insanely valuable today? How much "company Y: same product with a chat window, sparks emoji" do we all need before this thing levels out and we all take a breather on the hype?

jdross · on July 3, 2024

personally?

- writing and refactoring code. probably 50 times a day now - improving documentation across the company - summarizing meetings automatically with follow ups - drafting most legal work before a lawyer edits (saved 70% on legal bills) - entity extraction and data cleanup for my users

mlsu · on July 3, 2024

Put a number on it. How much value of this will they capture from you personally (we'll assume, very very charitably by the sound of it, that you represent an "average" user of AI products) when this market matures? Exactly how much will your employer pay for a meeting summarizer? $10/mo a seat, $20/mo a seat, $50/mo a seat? Could the product sustain a 5x, 10x, 50x price hike that is going to have to happen to recoup the investment being made today?

fhd2 · on July 3, 2024

Agreed. Even if right now this seems like stuff companies want to throw money at for novelty/FOMO related reasons, I think eventually reality ought to catch up.

Probably an unpopular opinion, but I think the most efficient companies of the future will tackle the ironies of automation effectively: Carefully designing semi automation that keeps humans in the loop in a way that maximises their value - as opposed to just being bored rubber stamping the automation without really paying attention.

per1Peteia · on July 3, 2024

bingo

marcosdumay · on July 3, 2024

I'd say that if your team needs a meeting summarizer, your team has a meeting problem.

It's a clutch that will help you cope with the problem. But the real value is on fixing the actual issue.

fragmede · on July 3, 2024

I'd say if you're not using a meeting summarizer, you're wasting someone's time by having them write up notes. if you're not writing up notes, you're wasting someone else's time recapping the meeting for them. meeting notes are a 1 (meeting):many relationship for conveying information as to what was discussed. how else do you go back and see what the one person on the storage team talked to your the person on your team who left last week about so you can go into the next meeting with them prepared?

marcosdumay · on July 3, 2024

If your meeting produces "notes", and those are relevant for people that were not in it, you are doing it wrong.

If your meeting is aimed at producing "general understanding", it's already a dangerous one, and the understanding should go to the correct documentation (what is best done during the meeting). Otherwise, it should produce "focused understanding" between a few people and with immediate application.

If all you take from it is notes, well, I'm really sure that your team won't go digging through meetings notes every time they need to learn about some new context. Meeting notes are useful for CYA only, and if people feel safe they'll be filled directly at /dev/null.

swalsh · on July 3, 2024

Going to be vague, but I'm using it to scale out human processes in ways I couldn't using humans (because they cost too much) or regular code (because it's unstructured). Early results are promising, we've found a bunch of stuff which has been buried... and is potentially worth millions. Not a chat wrapper, just breathing new light into our regular old business.

Art9681 · on July 3, 2024

What do you consider "AI"? Because machine learning models have been deployed in enterprise systems for years. Video processing, security, data labeling, sentiment analysis. The sexiest one I can think of in recent memory is nVidia DLSS.

mlsu · on July 3, 2024

Broadly, what marketing is saying is “AI.” There is huge value being created with deep learning today on internal systems. Recommenders, machine translation, computational photography… it is huge, improves people's lives, drives revenue.

None of that is marketed as "AI." It's just a thing the computer does. The single most valuable application of deep learning so far (content recommenders) is a cultural phenomenon, but it’s not referred to as “AI” but rather “the algorithm.”

hnlmorg · on July 3, 2024

Everything we build has some kind of impact.

At risk of getting philosophical, I’d ask yourself what your goals actually are if you feel only AI can have the impact you desire.

fhd2 · on July 3, 2024

Not sure why this is down voted, that is the key question. Impact means different things to people. Could be:

1. Building a sustainable business and making decent money

2. Building a market leader and making ludicrous amounts of money

3. Advancing the state of the art in technology

4. Helping people with their little daily struggles

5. Solving pressing problems humanity is facing

Or many other things I suppose. Now if you believe that AI is eventually going to make anything humans can build now redundant, that'd be a reason to believe nothing else matters in the end I suppose. But even if we get there, there's a lot of road leading to that destination. Any step provides value. Software built today can provide value even if nobody is going to need it ten years from now. And it's not like you could even predict that.

mirekrusin · on July 3, 2024

The motive is to get acquired in most cases. It’s obvious and starts to make sense when you see startup that has no feasible monetisation strategy on the horizon, yet they exist and get funding. They’re betting on building infra to be hopefully used in large corp and this is their demo/PoC.

lionkor · on July 3, 2024

A lot of things, if you're okay with not chasing the next hype bubble

jononor · on July 3, 2024

Anything SaaS that solves a painpoints for established industries. Those that have billions of turnaround for decades already, are not good at building tech themselves, and buy solutions/services to run their business. Bonus for low barriers to entry. Agriculture, logistics, real estate, energy, etc.

swalsh · on July 3, 2024

I have a theory that the days of established businesses that don't know tech is dwindling. A lot of companies which has adopted tech has started building a small foundation of talent internally. I think you're seeing this trend accelerate with the large tech companies laying people off. I have heard about top grade data science talent landing at some small sized health plan.

My companies fastest growing competitor is "internally sourced departments" of the services we provide.

nextworddev · on July 3, 2024

You confirm my observation as well. Even motel chains have developers building internal tools these days

jononor · on July 3, 2024

Yes computer savvyness is on the rise, and have been for decades, and this will continue. But there are many levels: Ability to be a competent user, and competent buyer, ability to build it themselves. Then there are big difference in buy vs build culture. And preferences for type of solutions that are default buy vs default build. And finally, smaller and medium sized organizations are less likely to have internal teams. All this should be analyzed for the specific market, product and customer segment one targets.

oivey · on July 3, 2024

Slightly different take than some of the siblings: you can still just build this stuff. If your goal is impact, maybe the best place to do it will be at a cloud vendor or other big corp. If your goal is actually just a big VC exit, then maybe not.

If your product is something that can be ripped off in 3 months, then it probably wasn’t going to have a long term impact anyway.

ianpurton · on July 3, 2024

It's fine to be in AI.

My takeaway from the article is instead of being a Gen AI startup be a Gen AI startup for a specific use case.

_el1s7 · on July 3, 2024

If you don't know what to build, you don't build.

rsynnott · on July 3, 2024

Define ‘impact’. Does ‘impact’ here mean ‘tickles the fancy of a 2024-era VC’? If so, you may be right. If used in its common meaning, absolutely not; most of this stuff is ~useless.

ehnto · on July 3, 2024

All the same stuff, to be honest. If AI is set to replace human work, well we have had a cheap human labour market for decades and yet we still need software. An LLM can't replace a business itself, which is made up of niche processes, direction and purpose, which we sometimes codify into a SaaS. We'll still need to do all that even if AI replaces some of the human parts of the business.

notamy · on July 3, 2024

> What can you build today as a software engineer that can have impact?

Quite a bit, if you don’t follow the standard tech hype. Find an industry that isn’t tech-first and you’ll notice that there’s a lot of room for improvement.

astronautas · on July 3, 2024

Data infra?