More

dogline · 2026-02-17T16:31:16 1771345876

You're right, and it's a good idea. The summary started out small, as a header to the actual daily pages, but then I realized I could have AI do a lot more work here, including silly things like collect weather references and assemble them together. My prompt kept getting bigger to find trends in the data. But, it takes away from the view-ability of the site, which is not good.

LLM's ability to take 7400+ handwritten entries and try to make a narrative out them is amazing. With all of the AI experiments on HN lately, we're figuring out the power of LLMs, but it most cases, it still needs a human refining touch, and we need to remember that. Or else it just looks like AI slop.

anonymous908213 · 2026-02-17T16:53:13 1771347193

I certainly don't think it's a bad thing to try to refine the information into a more digestible form. I think, for example, the dedicated "People", "Places", "Events", and "Map" sections are well-organized and interesting[1]. I would simply prefer if the presentation of this information did not detract from the ability to read the diary itself, as it does on the month pages. I am rather fond of reading historical diaries as part of a general curiosity about the past, and reading the experiences as they were written is as interesting to me if not more so than the aggregate information, personally.

[1] Although, of course, there is the question of reliability. For example, the "Boy Scouts" page says Boy Scouts have 2 mentions, but has references to 3 diary entries! Also, on further examination, Sep 1931 has broken dates (meaning my previous theory about it breaking only after Jul 1941 was wrong), and some pages appear to be out of order.

dogline · 2026-02-17T01:26:04 1771291564

Hadn't thought about it, but will take a look. Also, the two Forestry type links look very interesting. I figure there must be interest in this sort of thing - this is one resource, and the Stirling City Historical Society (Lassen NF) has a bunch of other documents I'd love to digitize soon.

dogline · 2026-02-17T01:23:15 1771291395

I did a quick search, mules are mentioned 75 different times. Like this one at random from Sept 1942: https://forestrydiary.com/page/019bd90a-f176-713f-9999-b14b6...

"Fix up my packs. Load the 2 mules with 225# each. Take the 2 loads to trail camp at Lake Everett, Unload. Have lunch with the Trail cook. Haze mules & ride to 7 1/2 PM."

Horses are mentioned 2586 times. That'd be a whole study on how they're used in the back country. (Edit: horse number is inflated since part of the diary form at one point asks for "Horse Mileage". Will have to refine search).

dogline · 2026-02-16T23:56:39 1771286199

Also, just to clarify, I scanned all 7488 pages in personally (Fujitsu ScanSnap ix500). With Claude's help, I found some undocumented SANE features to auto crop and fix the scans, then had a Python script in Linux auto scan them and put them into a Postgres database as I went. Other scripts would add transcription, summaries, and auto index everything.

"mistral-ocr-latest" did really good handwriting transcription, considering how tight and small some of the handwriting is. Then back to Claude API calls to summarize by month and collect people and places from all of the entires.

Claude then created static html pages from what started as a Flask app. Published on Dreamhost.

dogline · 2026-02-17T00:51:42 1771289502

Oh boy. #3 on front page, 19k page hits in the first hour. 8243 static html pages, 15728 webp images (10k-50k each).

I've never had one of my sites with this much traffic. With everything as static files, website is still holding. Thank you all.

NoiseBert69 · 2026-02-17T16:56:57 1771347417

Before your server catches a fire and burns down the originals: please also send them to archive.org

pstuart · 2026-02-17T05:05:24 1771304724

A fresh training dataset ;-)

dogline · 2026-02-17T06:26:17 1771309577

Yeah. If there are groups that want the high resolution images, talk to me.

HanClinto · 2026-02-17T15:04:34 1771340674

Could consider putting it up as a dataset on Kaggle, perhaps? I would think they'd provide hosting for such things?

Archive.org would be another option as a repository for the high-res scans in an accessible / discoverable location.

zzleeper · 2026-02-17T06:41:43 1771310503

That's amazing!

I'm working on a kinda similar project (documenting bank runs from historical newspapers) and also opted for Claude to build a static website. Crazy that the two sites have a very similar look and feel: https://www.finhist.com/bank-runs/index.html . The only big difference is that mine lacks a map, which I should hopefully fix soon (I already have lat and lon and am linking to google maps).

PS: Do you know if mistral works better at OCRing handwritten text than gemini 3? Was planning on going the gemini3 for another project

dogline · 2026-02-17T16:19:31 1771345171

That's cool! I've noticed when asking for Claude for a website, it does have a certain look, like our two sites, if you don't give it any more guidance. I'm not sure if that's a good thing or not.

Digitizing history in different ways, with different resources that are unique or only known to small groups, might be a new development area, and that's exciting. As I've shown, and how other people have shared, using AI tools to digitize things which haven't previously been done before is now possible. Are there ways to make this easier for everybody? New techniques to discuss? I don't know, and I'd love to talk about it.

Concerning OCR: I used Mistral because of a posting here describing advancements with handwriting recognition a month or so ago. I didn't actually compare them. And I've got my setup that I can rerun everything again later if there are advancements in the area. Again, another area to keep track of and discuss.

zzleeper · 2026-02-17T20:53:25 1771361605

Thanks for the insights! I'll try Mistral as well.. Gemini worked well for me so far but which model is SOTA is changing quite frequently these days

beej71 · 2026-02-17T05:13:08 1771305188

This is great! I love it when people take bits of history that works be forgotten and put them out in the world (to be further vacuumed up by Internet Archive). Thank you for doing it.

dogline · 2026-02-17T07:11:00 1771312260

Beej! Thank you very much! Your networking guides have long been a great contribution to everybody, and collectively improves what we know.

These diary pages come largely from Stirling City, just north of Chico, and later from the Hat Creek district, on Hwy 89 north of Mt. Lassen. Nearby, many historical records were lost in the Paradise Camp Fire, and digitizing some of the records in some of the local museums is something this is a test run for.

  —Lance (CSU Chico ‘93 Computer Engineering)

beej71 · 2026-02-18T16:44:42 1771433082

Very cool! I'm struggling to recall if I ran into you at CSUC, though--close timing. My memory isn't what it used to be. :)

dogline · 2026-02-16T18:19:35 1771265975

I like this. It's rather creative.

dogline · 2026-01-07T00:40:44 1767746444

I have custom scripts I use at home to keep track of various personal data, assisted by an LLM. The idea of using Telegram as a way to have a global, quick, and personal interface from my phone or tablet, is perfect and easy to set up.

Claude is making it easier to have bespoke data and dashboards for anything. We're going to make a lot of them, for all reasons. I've also made apps with Django interfaces, but quick, global interfaces are going to become in demand.

darkwater · 2026-01-07T11:22:38 1767784958

I concur, but I also think that Home Assistant could be used as a rock bed to build many of those dashboards easily. They just need to revert the "go all in on UI first configuration" and keep YAML declarations as first-class citizen to let LLMs easily compose dashboards based on user's desires.

dogline · 2025-12-26T03:00:34 1766718034

Isn’t that how a lot of us learned — buy typing the code out of back of a magazine? Then spending hours trying to debug a typo somewhere.

I didn’t realize how close LLMs are to the old magazines. Let it give you a seed, then use that springboard to learn everything else.

dogline · 2025-11-29T03:42:06 1764387726

Until just now, I've been trying to figure out why people think that JSON is necessary in the database? Yes, lots of data is hierarchical, and you just normalize it into tables and move on. The fact that some people don't work this way, and would like to put this data as it stands into a JSON tree hadn't occurred to me.

What problem does normalization solve? You don't have to parse and run through a tree every time you're looking for data. You would, however, need to rebuild the tree through self joins or other references in other cases, I suppose. It depends how far you break down your data. I understand that we all see data structures a bit differently, however.

dogline · 2025-11-22T00:36:33 1763771793

We'll have to get the old (webrings)[https://en.wikipedia.org/wiki/Webring] back in fashion.

VP2262 · 2025-11-22T00:56:57 1763773017

There's some here https://indieweb.org/webring

smetj · 2025-11-22T15:25:00 1763825100

Exactly! Linkportals and webrings ...

dogline · 2025-08-11T05:21:29 1754889689

You gave an IOCCC snippet as an example of a C99 coding trick you know? I mean, the code looks visually cool, but it's funny to explain a code concept using code shaped like an anime character. (At least that's what I think it is.)

omoikane · 2025-08-11T06:06:59 1754892419

I don't know how to link to just a specific part of the line, but the interesting bit is at the end of line 16 and the start of line 17:

    x*=02//* */2
          -1;

With C89, this is evaluated as "x *= 02 / 2 - 1", or "x *= 0".

With C99, this is evaluated as "x *= 02 / -1", or "x *= -2".