yeah I am going to add an experiment that runs everyday and the cost of that will be a column on the table. It will be something like summarize this article in 200 words and every model gets the same prompt + article
For me, and I suspect a lot of other HN readers, a comparison/benchmark on a coding task would be more useful. Something small enough that you can affordably run it every day across a reasonable range of coding focused models, but non trivial enough to be representative of day to day AI assisted coding.
One other idea - for people spending $20 or $200/month for AI coding tools, a monitoring service that tracks and alerts on detected pricing changes could be something worth paying for. I'd definitely subscribe at $5/month for something like that, and I'd consider paying more, possibly even talking work into paying $20 or $30 per month.
LlamaIndex is building a platform for AI agents that can find information, synthesize insights, generate reports, and take actions over the most complex enterprise data.
We are seeking an exceptional engineer to join our growing LlamaParse team. Will work at the intersection of document processing, machine learning, and software engineering to push the boundaries of what's possible in document understanding. As a key member of a focused team, will have significant impact on our product's direction and technical architecture.
We are also hiring for a range of other roles, see our career page:
Hi Pierre, I see that the Platform Engineer position (which probably matches me most) says it's Hybrid. I'm very interested, but I live in Ohio. I understand sometimes things get clicked on accident, and just wanted to know if there might be an issue with this listing or if it's truly hybrid and the one you posted is remote, etc. Don't want to gum up the works :)
If you want to try agentic parsing we added support for sonnet-3.7 agentic parse and gemini 2.0 in llamaParse. cloud.llamaindex.ai/parse (select advanced options / parse with agent then a model)
However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.
This is a nice UI for end users, however it seems to be a seems wrapper on top of mutool, which is distributed as AGPL. If you want to process PDF locally, legally and safely you should use their CLI instead.
We’ve been doing exactly this by doubling-down on VLMs (https://vlm.run)
- VLMs are way better at handling layout and context where OCR systems fail miserably
- VLMs read documents like humans do, which makes dealing with special layouts like bullets, tables, charts, footnotes much more tractable with a singular approach rather than have to special case a whole bunch of OCR + post-processing
- VLMs are definitely more expensive, but can be specialized and distilled for accurate and cost effective inference
In general, I think vision + LLMs can be trained to explicitly to “extract” information and avoid reasoning/hallucinating about the text. The reasoning can be another module altogether.
I did a ton of Googling before writing this code, but I couldn't find you guys anywhere. If I had, I'd have definitely used your stuff. You might want to think about running some small-scale Google Ads campaigns. They could be especially effective if you target people searching for both LLM and OCR together. Great product, congratz!
What about combining old school OCR with GPT visual OCR?
If your old school OCR output has output that is not present in the visual one, but is coherent (e.g. english sentences), you could get it back and slot it into the missing place from the visual output.
You're absolutely right. I use PDFTron (through CloudCovert) for full document OCR, but for pages with fewer than 100 characters, I switch to this API. It's a great combo – I get the solid OCR performance of SolidDocument for most content, but I can also handle tricky stuff like stats, old-fashioned text, or handwriting that regular OCR struggles with. That's why I added page numbers upfront.
Good article, but what is the alternative? What can you build today as a software engineer that can have impact? Nothing seems to come close to AI / AI infra, even of its hard / risky / a moving landscape.
I would almost invert that statement. Sorry if this comes off ranty, but what exactly are people doing in the "AI space" currently that isn't "undifferentiated spam/chatbot" being sold to non-techies who heard about AI on NPR? What are real people using "AI" for that is so insanely valuable today? How much "company Y: same product with a chat window, sparks emoji" do we all need before this thing levels out and we all take a breather on the hype?
- writing and refactoring code. probably 50 times a day now
- improving documentation across the company
- summarizing meetings automatically with follow ups
- drafting most legal work before a lawyer edits (saved 70% on legal bills)
- entity extraction and data cleanup for my users
Put a number on it. How much value of this will they capture from you personally (we'll assume, very very charitably by the sound of it, that you represent an "average" user of AI products) when this market matures? Exactly how much will your employer pay for a meeting summarizer? $10/mo a seat, $20/mo a seat, $50/mo a seat? Could the product sustain a 5x, 10x, 50x price hike that is going to have to happen to recoup the investment being made today?
Agreed. Even if right now this seems like stuff companies want to throw money at for novelty/FOMO related reasons, I think eventually reality ought to catch up.
Probably an unpopular opinion, but I think the most efficient companies of the future will tackle the ironies of automation effectively: Carefully designing semi automation that keeps humans in the loop in a way that maximises their value - as opposed to just being bored rubber stamping the automation without really paying attention.
I'd say if you're not using a meeting summarizer, you're wasting someone's time by having them write up notes. if you're not writing up notes, you're wasting someone else's time recapping the meeting for them. meeting notes are a 1 (meeting):many relationship for conveying information as to what was discussed. how else do you go back and see what the one person on the storage team talked to your the person on your team who left last week about so you can go into the next meeting with them prepared?
If your meeting produces "notes", and those are relevant for people that were not in it, you are doing it wrong.
If your meeting is aimed at producing "general understanding", it's already a dangerous one, and the understanding should go to the correct documentation (what is best done during the meeting). Otherwise, it should produce "focused understanding" between a few people and with immediate application.
If all you take from it is notes, well, I'm really sure that your team won't go digging through meetings notes every time they need to learn about some new context. Meeting notes are useful for CYA only, and if people feel safe they'll be filled directly at /dev/null.
Going to be vague, but I'm using it to scale out human processes in ways I couldn't using humans (because they cost too much) or regular code (because it's unstructured). Early results are promising, we've found a bunch of stuff which has been buried... and is potentially worth millions. Not a chat wrapper, just breathing new light into our regular old business.
What do you consider "AI"? Because machine learning models have been deployed in enterprise systems for years. Video processing, security, data labeling, sentiment analysis. The sexiest one I can think of in recent memory is nVidia DLSS.
Broadly, what marketing is saying is “AI.” There is huge value being created with deep learning today on internal systems. Recommenders, machine translation, computational photography… it is huge, improves people's lives, drives revenue.
None of that is marketed as "AI." It's just a thing the computer does. The single most valuable application of deep learning so far (content recommenders) is a cultural phenomenon, but it’s not referred to as “AI” but rather “the algorithm.”
Not sure why this is down voted, that is the key question. Impact means different things to people. Could be:
1. Building a sustainable business and making decent money
2. Building a market leader and making ludicrous amounts of money
3. Advancing the state of the art in technology
4. Helping people with their little daily struggles
5. Solving pressing problems humanity is facing
Or many other things I suppose. Now if you believe that AI is eventually going to make anything humans can build now redundant, that'd be a reason to believe nothing else matters in the end I suppose. But even if we get there, there's a lot of road leading to that destination. Any step provides value. Software built today can provide value even if nobody is going to need it ten years from now. And it's not like you could even predict that.
The motive is to get acquired in most cases. It’s obvious and starts to make sense when you see startup that has no feasible monetisation strategy on the horizon, yet they exist and get funding. They’re betting on building infra to be hopefully used in large corp and this is their demo/PoC.
Anything SaaS that solves a painpoints for established industries. Those that have billions of turnaround for decades already, are not good at building tech themselves, and buy solutions/services to run their business. Bonus for low barriers to entry. Agriculture, logistics, real estate, energy, etc.
I have a theory that the days of established businesses that don't know tech is dwindling. A lot of companies which has adopted tech has started building a small foundation of talent internally. I think you're seeing this trend accelerate with the large tech companies laying people off. I have heard about top grade data science talent landing at some small sized health plan.
My companies fastest growing competitor is "internally sourced departments" of the services we provide.
Yes computer savvyness is on the rise, and have been for decades, and this will continue. But there are many levels: Ability to be a competent user, and competent buyer, ability to build it themselves. Then there are big difference in buy vs build culture. And preferences for type of solutions that are default buy vs default build. And finally, smaller and medium sized organizations are less likely to have internal teams. All this should be analyzed for the specific market, product and customer segment one targets.
Slightly different take than some of the siblings: you can still just build this stuff. If your goal is impact, maybe the best place to do it will be at a cloud vendor or other big corp. If your goal is actually just a big VC exit, then maybe not.
If your product is something that can be ripped off in 3 months, then it probably wasn’t going to have a long term impact anyway.
Define ‘impact’. Does ‘impact’ here mean ‘tickles the fancy of a 2024-era VC’? If so, you may be right. If used in its common meaning, absolutely not; most of this stuff is ~useless.
All the same stuff, to be honest. If AI is set to replace human work, well we have had a cheap human labour market for decades and yet we still need software. An LLM can't replace a business itself, which is made up of niche processes, direction and purpose, which we sometimes codify into a SaaS. We'll still need to do all that even if AI replaces some of the human parts of the business.
> What can you build today as a software engineer that can have impact?
Quite a bit, if you don’t follow the standard tech hype. Find an industry that isn’t tech-first and you’ll notice that there’s a lot of room for improvement.
Link to open source repo: https://github.com/run-llama/liteparse
reply