Nice writeup. I've been through similar problems that you have with my contact lens price comparison website https://lenspricer.com/ that I run in ~30 countries. I have found, like you, that websites changing their HTML is a pain.
One of my biggest hurdles initially was matching products across 100+ websites. Even though you think a product has a unique name, everyone puts their own twist on it. Most can be handled with regexes, but I had to manually map many of these (I used AI for some of it, but had to manually verify all of it).
I've found that building the scrapers and infrastructure is somewhat the easy part. The hard part is maintaining all of the scrapers and figuring out if when a product disappears from a site, is that because my scraper has an error, is it my scraper being blocked, did the site make a change, was the site randomly down for maintenance when I scraped it etc.
A fun project, but challenging at times, and annoying problems to fix.
Doing the work we need. Every year I get fucked by my insurance company when buying a basic thing - contacts. Pricing is all over the place and coverage is usually 30% done by mail in reimbursement.
Thanks!
I'm curious, can you wear contact lenses while working? I notice my eyes get tired when I look at a monitor for too long. Have you found any solutions for that?
I use contact lenses basically every day, and I have had no problems working in front of screens. There's a huge difference between the different brands. Mine is one of the more expensive ones (Acuvue Oasys 1-Day), so that might be part of it, but each eye is compatible with different lenses.
If I were you I would go to an optometrist and talk about this. They can also often give you free trials for different contacts and you can find one that works for you.
FWIW, that is the same brand that I use and was specifically recommended for dry-eyes by my optometrist. I still wear glasses most of the time because my eyes also get strained from looking at a monitor with contacts in.
I'd recommend a trial of the lenses to see how they work for you before committing to a bigger purchase.
Hah, I'm already there in my 40s! I'm seriously considering getting a strap for my glasses - right now I just hook them into my shirt, but they'll occasionally fall out when I bend over for something, and it's only a matter of time before they break or go into a sewer.
My eye doctor recommended wearing “screen glasses”. They are a small prescription (maybe 0.25 or 0.5) with blue blocking. It’s small but it does help; I work on normal glasses at night (so my eyes can rest) and contacts + screen glasses during the day and they are really close.
For Germany, below the prices it says "some links may be sponsored", but it does not mark which ones. Is that even legal? Also there seem to be very few shops, are maybe all the links sponsored? Also idealo.de finds lower prices.
When I decided to put the text like that, I had looked at maybe 10-20 of the biggest price comparison websites across different countries because I of course want to make sure I respect all regulations that there are. I found that many of them don't even write anywhere that the links may be sponsored, and you have to go to the "about" page or similar to find this. I think that I actually go further than most of them when it comes to making it known that some links may be sponsored.
Now that you mention idealo, there seems to be no mention at all on a product page that they are paid by the stores, you have to click the "rank" link in the footer to be brought to a page https://www.idealo.de/aktion/ranking where they write this.
> One of my biggest hurdles initially was matching products across 100+ websites. Even though you think a product has a unique name, everyone puts their own twist on it. Most can be handled with regexes, but I had to manually map many of these (I used AI for some of it, but had to manually verify all of it)
In the U.S. at least, big retailers will have product suppliers build slightly different SKUs for them to make price comparisons tricky. Costco is somewhat notorious for this where almost everything electronics (and many other products) sold in their stores is a custom SKU -- often with slightly product configuration.
Costco does this for sure, but Costco also creates their own products. For instance there are some variations of a package set that can only be bought at Costco, so you aren't getting the exact same box and items as anywhere else.
Yeah it is to some degree. I tried to use it as much as possible, but there's always those annoying edge cases that makes me not trust the results and I have to check everything, and it ended up being faster just building some simple UI where I can easily classify the name myself.
Part of the problem is simply due to bad data from the websites. Just as an example - there's a 2-week contact lens called "Acuvue Oasys". And there's a completely different 1-day contact lens called "Acuvue Oasys 1-Day". Some sites have been bad at writing this properly, so both variants may be called "Acuvue Oasys" (or close to it), and the way to distinguish them is to look at the image to see which actual lens they mean, look at the price etc.
It's true that this could probably also be handled by AI, but in the end, classifying the lenses takes like 1-2% of the time it takes to make a scraper for a website so I found it was not worth trying to build a very good LLM classifier for this.
> It's true that this could probably also be handled by AI, but in the end, classifying the lenses takes like 1-2% of the time it takes to make a scraper for a website so I found it was not worth trying to build a very good LLM classifier for this.
This is true for technology in general (in addition to specifically for LLMs).
In my experience, the 80/20 rule comes into play in that MOST of the edge cases can be handled by a couple lines of code or a regex. There is then this asymptotic curve where each additional line(s) of code handle a rarer and rarer edge case.
And, of course, I always seem to end up on project where even a small, rare edge case has some huge negative impact if it gets hit so you have to keep adding defensive code and/or build a catch all bucket that alerts you to the issue without crashing the entire system etc.
Tangentially related: for those who are considering LASIK in order to save money on contacts: I created https://lenspricer.com/, which compares contact lens prices across almost all online stores.
Buying them online at the cheapest price can lower your cost significantly (like up to 80%+ - the price you pay at nation-wide stores is often very overpriced compared to online prices).
An article+calculator on how long it would take for you to break even from LASIK based on your selected contacts and rough LASIK price would be a great story for your site.
I built a website to help people save money on contact lenses: lenspricer.com
Most people buy contact lenses from their optician. The opticians often mark them up by 50%+. You can buy the exact same contact lenses you are wearing online, at a much lower cost.
Lenspricer helps you identfiy which online store is currently the cheapest for your exact contact lenses, compared across different package sizes, divided into a monthly cost for a super easy comparison.
In EU, many of the big optician chains sell their contact lenses as private labels, but in reality the contact lenses are well-known brands. On Lenspricer, I've created a big database showing which contact lenses correspond to which, making it easy to compare prices, even for private labels. These private labels are often sold at an even higher mark-up, and I've seen many cases where the savings are 80%+ by buying the original brand online instead.
I started Lenspricer because I discovered that the contact lenses I had been paying $68 per month for, for many years, was being sold online for $32 per month. The exact same contact lenses. When talking to my network, I realized many are paying this and didn't know buying them online was an option. So I set out to make buying them online much easier.
Lenspricer exists for 30 countries, and collects prices from 5-10+ stores in each country. You can select your country in the top right corner.
The monetization plan is to get affiliate deals with the contact lens stores. In the US I don't currently have with the majority of stores, but in the EU I'm getting more and more deals.
Let me know what you think! Any and all feedback is super helpful. Thanks :-).
FYI, the organization by "monthly" price, which assumes like contacts for two eyes worn daily, was initially quite confusing for me. I wear a contact in only one eye and I don't wear it in that eye every day, so the monthly price is totally irrelevant to me - I purchase contacts on a per-unit basis. You do list that, but it's smaller and deprioritized.
Thanks for the feedback! I've actually been a bit on the fence about how to show the prices. I show the monthly price because I thought it would be the easiest way to compare across package sizes, and for people to compare against their current subscription price.
I actually had a very first version where I showed the price "per lens". Would that have made it less confusing for you at first glance?
I think price per lens would have been fine, yes. That's ultimately what I care about. But I also think that cost-per-quantity-lens-pack, which you list but deprioritized, is more understandable. Sorting by cost per lens but presenting the cost per pack (30, 90, etc.) is likely the sweet spot.
One of my biggest hurdles initially was matching products across 100+ websites. Even though you think a product has a unique name, everyone puts their own twist on it. Most can be handled with regexes, but I had to manually map many of these (I used AI for some of it, but had to manually verify all of it).
I've found that building the scrapers and infrastructure is somewhat the easy part. The hard part is maintaining all of the scrapers and figuring out if when a product disappears from a site, is that because my scraper has an error, is it my scraper being blocked, did the site make a change, was the site randomly down for maintenance when I scraped it etc.
A fun project, but challenging at times, and annoying problems to fix.