Hacker Newsnew | past | comments | ask | show | jobs | submit | pmaze's commentslogin

The connections are meaningful to me in so far as they get me thinking about the topics, another lens to look at these books through. It's a fine balance between being trivial and being so out there that it seems arbitrary.

A trail that hits that balance well IMO is https://trails.pieterma.es/trail/pacemaker-principle/. I find the system theory topics the most interesting. In this one, I like how it pulled in a section from Kitchen Confidential in between oil trade bottlenecks and software team constraints to illustrate the general principle.


Can you walk me though some of the insights you gained? I've read several of those books, including Kitchen Confidential and Confessions of an Economic Hit Man, and I don't see the connection that the LLM (or you) is trying to draw. What is the deeper insight into these works that I am missing?

I'm not familiar with he term "Pacemaker Principle" and Google search was unhelpful. What does it mean in this context? What else does this general principle apply to?

I'm perfectly willing to believe that I am missing something here. But reading thought many of the supportive comments, it seems more likely that this is an LLM Rorschach test where we are given random connections and asked to do the mental work of inventing meaning in them.

I love reading. These are great books. I would be excited if this tool actually helps point out connections that have been overlooked. However, it does not seem to do so.


> Can you walk me though some of the insights you gained?

This made me realize that so many influential figures have either absent fathers, or fathers that berated them or didn't give them their full trust/love. I think there's something to the idea that this commonality is more than coincidence. (that's the only topic of the site I've read through yet, and I ignored the highlighted word connections)


> we are given random connections and asked to do the mental work of inventing meaning in them

How is that different from having an insight yourself and later doing the work to see if it holds on closer inspection?


Don't ask me to elaborate on this, because it's kinda nebulous in my mind. I think there's a difference between being given an insight and interrogating that on your own initiative, and being given the same insight.


I don't doubt there is a difference in the mechanism of arriving at a given connection. What I think it's not possible to distinguish is the connection that someone made intuitively after reading many sources and the one that the AI makes, because both will have to undergo scrutiny before being accepted as relevant. We can argue there could be a difference in quality, depth and search space, maybe, but I don't think there is an ontological difference.


The one that you thought of in the shower has a much greater chance of being right, and also of being relevant to you.


Has it? Why?


Because humans aren't morons tasked with coming up with 100 connections.


Doesn't explain why a connection made in the shower has in essence more merit than a connection an LLM was instructed to come up with.


Not sure how to make it clearer. Look at the quality of this post, and compare it to your shower thoughts. I imagine you're not as stupid as the machine was.


I ended up judging where to draw the line. Its initial suggestions were genuinely useful and focused on making the basic tool use more efficient. e.g. complaining about a missing CLI parameter that I'd neglected to add for a specific command, requesting to let it navigate the topic tree in ways I hadn't considered, or new definitions for related topics. After a couple iterations the low hanging fruit was exhausted, and its suggestions started spiralling out beyond what I thought would pay off (like training custom embeddings). As long as I kept asking it for new ideas, it would come up with something, but with rapidly diminishing returns.


The names & descriptions definitely have that distinct LLM flavour to them, regardless of which model I used. I decided to keep them, but as short as possible. In general, I find the recombination of human-written text to be the main interest.

There's two stages to the linking: first juxtaposing the excerpts, then finding and linking key phrases within them. I find the excerpts themselves often have interesting connections between them, but the key phrases can be a bit out there. The "fictions" to "internal motives" one does gel for me, given the theme of deceiving ourselves about our own motivations.


https://hnbooks.pieterma.es

I scraped HN's 1000 most mentioned books and visualised them. This month I used a new embedding model (Nomic), switch out UMAP for PaCMAP, and added automatic cluster labelling.

The clustering and dimensionality reduction aren't quite as stable as I'd like, but most seeds give decent results now.


Love it! Thanks for building this, was looking for book recos.


This is awesome! Thank you for this project


I did, there was a first round of UMAP to 50 dimensions. Running HDBSCAN on the full embeddings gave bad results, lots of singleton clusters.


Interesting, I got the opposite result. The full embeddings gave two or three clusters. How did you work with the hyper parameters of HDBSCAN?


The crash was indeed not intended - my mistake! Should be fixed now.

You've got the cluster semantics spot on, to be honest. Broad genres are grouped together, with a tendency for sub-genres to be grouped locally within those.

There is no interpretation of the overall shapes or the global structure, those are more a result of a particular UMAP run than inherent in the data.

Would love to provide different views on it and go more in depth next, thanks for the suggestion.


IMO, evolution over time is a great place to start.


Hey, thanks for reporting - this is fixed now. I messed up the static build and some browsers freaked out. By law of showing things publicly, I of course only tested in a browser that didn't. Hope you can give it another chance!


My apologies for that! First time deploying Svelte Kit to Cloudflare Pages, and I messed up the static build. Should be fixed now, hope you can give it another shot.


Thanks!

The cluster memberships that come out of the first round are distributions over the different clusters, e.g. a given book is weighted 0.8 for cluster A and 0.2 for cluster B. The Hellinger distance is well-suited to quantify the difference between two distributions like that. Cosine similarity and Euclidean distance worked as well, but Hellinger gave subjectively nicer results.

Very interesting question, I'm not sure! While developing, I noticed that the systems thinking books were spread over different genres, which I found quite pleasing. However, I'm not sure if other books were even more diffuse. I'll have to dig back in and find out :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: