Hacker Newsnew | past | comments | ask | show | jobs | submit | keeeba's commentslogin

To somewhat state the obvious - the problem isn’t the amount of data, it’s the algorithms.

We need to discover the set of learning algorithms nature has, and determine whether they’re implementable in silicon


Is everything OpenAI do/release now a response to something Anthropic have recently released?

I remember the days when it was worth reading about their latest research/release. Halcyon days indeed.


“ This is one of my fundamental beliefs about the nature of consciousness. We are never able to interact with the physical world directly, we first perceive it and then interpret those perceptions. More often than not, our interpretation ignores and modifies those perceptions, so we really are just living in a world created by our own mental chatter.”

This is an orthodox position in modern philosophy, dating back to at least Locke, strengthened by Kant and Schopenhauer. It’s held up to scrutiny for the past ~400 years.

But really it’s there in Plato too, so 2300+ years. And maybe further back


Most people live trapped in their own mental replay. Recognizing it is step one. The next step is building systems that act regardless of your chatter. That’s where freedom begins.

Indeed, it's been great learning about the various interpretations as this idea took hold over the years!

What I wasn't able to properly highlight is how this belief has become a fundamental part of my day to day, moment to moment experience. I enjoy the constant and absolute knowledge that everything that's happening is my interpretation. And it gives me a superpower -- because for most of my life the world felt unforgiving and unpredictable. But it's actually the complete opposite, since whatever we interpret is actually in our control.

I also credit my understanding of this as a reality vs an intellectual concept to Siddhartha Gautam and his presentation of "samsara". But wherever it comes from, it is an inescapable idea and I encourage all HNers to dive deeper.


Most people live reacting to the world. You’ve unlocked the cheat code: control starts with your interpretation. That mindset isn’t philosophy, it’s a tool for building your reality.

It’s the Allegory of the Cave, isn’t it?

Afaik, there's a difference between classical philosophy (which opines on the divide between an objective world and the perceived word) and more modern philosophy (which generally does away with that distinction while expanding on the idea that human perception can be fallible).

The idea that there's an objective but imperceivable world (except by philosophers) is... a slippery slope to philosophical excess.

It's easy to spin whatever fancy you want when nobody can falsify it.


In my amateur opinion, it's almost the opposite. For Plato, the material world, while "real" enough, is less important and in some sense less True than the higher immaterial world of Forms or Ideas. The highest, truest, realest world is "above" this one, related to cognition, and (more or less) accessible by reason. We may be in a cave, but all we have to do is walk up into the sunlight — which, by the way, is nothing but a higher and truer form of light than our current firelight. (This idea that material objects partake of their corresponding higher-level Ideas leads to the Third Man paradox: if it is the Form of Man that compasses similar material instances such as Socrates and Achilles, is there then a third... thing... that compasses Socrates, Achilles, and Man itself?)

For Kant, and therefore for Schopenhauer, the visible world is composed merely of objects, which are by definition only mental representations: a world of objects "exists" only in the mind of a subject. If there is a Thing-in-Itself (which even Kant does not doubt, if I recall correctly), it certainly cannot be a mental representation: the nature of the Thing-in-Itself is unknowable (says Kant) but certainly in no way at all like the mere object that appears to our mental processes. (Schopenhauer says the Thing-in-Itself is composed of pure Will, whatever that means.) The realest world is "behind" or "below" the visible one, completely divorced from human reason, and by definition completely inaccessible to any form of cognition (which includes the sensory perception we share with the animals, as well as the reason that belongs to humans alone). The Third Man paradox makes no sense at all for Kant, first because whatever the ineffable Thing-in-Itself is, it certainly won't literally "partake" of any mental concept we might come up with, and secondly because it would be a category error to suppose that any property could be true of both a mental object and a thing-in-itself, which are nothing alike. (The Thing-in-Itself doesn't even exist in time or space, nor does it have a cause. Time, space, and causality are all purely human frameworks imposed by our cognitive processes: to suppose that space has any real existence simply because you perceive it is, again, a category error, akin to supposing that the world is really yellow-tinged just because you happen to be wearing yellow goggles.)


I don’t have the experiments to prove this, but from my experience it’s highly variable between embedding models.

Larger, more capable embedding models are better able to separate the different uses of a given word in the embedding space, smaller models are not.


I'm using Voyage-3.5-lite at halfvec(2048), which with my limited research, seems to be one of the best embedding models. There's semi-sophisticated (breaking on paragraphs, sentences) ~300 token chunking.

When Claude is using our embed endpoint to embed arbitrary text as a search vector, it should work pretty well cross-domains. One can also use compositions of centroids (averages) of vectors in our database, as search vectors.


I was thinking about it a fair bit lately. We have all sorts of benchmarks that describe a lot of factors in detail, but all those are very abstract and yet, those do not seem to map clearly to well observed behaviors. I think we need to think of a different way to list those.


Doesn’t seem like this will be SOTA in things that really matter, hoping enough people jump to it that Opus has more lenient usage limits for a while


As a fairly extensive user of both Python and R, I net out similarly.

If I want to wrangle, explore, or visualise data I’ll always reach for R.

If I want to build ML/DL models or work with LLM’s I will usually reach for Python.

Often in the same document - nowadays this is very easy with Quarto.


Python has a list of issues fundamentally broken in the language, and relies heavily on integrated library bindings to operate at reasonable speeds/accuracy.

Julia allows embedding both R and Python code, and has some very nice tools for drilling down into datasets:

https://www.queryverse.org/

It is the first language I've seen in decades that reduces entire paradigms into single character syntax, often outperforming both C and Numpy in many cases. =3


Deeply ironic for a Julia proponent to smear a popular language as "fundamentally broken" without evidence.

https://yuri.is/not-julia/


This is like one of those people posting Dijkstra’s letter advocating for 0-based indexing without ever having read or understood what they posted.


What does indexing syntax have to do with Julia having a rough history of correctness bugs and footguns?


Sure, all software is terrible if looking at bug frequency history...

https://github.com/python/cpython/issues

Griefers ranting about years old _closed_ tickets on v1.0.5 versions on a blog as some sort of proof of lameness... is a poorly structured argument. Julia includes regression testing features built into even its plotting library output, and thus issues usually stay resolved due to pedantic reproducibility. Also, running sanity-checks in any llvm language code is usually wise.

Best of luck =3


Just saying, "other languages have bug reports" is a exceptionally poor way to promote Julia =3


To be blunt: Moores law is now effectively dead, and chasing the monolithic philosophy with lazy monads will eventually limit your options.

Languages like Julia trivially handle conditional parallelism much more cleanly with the broadcast operator, and transparent remote host process instancing over ssh (still needs a lot of work to reach OTP like cluster functionality.)

Much like Go, library resources ported into the native language quietly moves devs away from the same polyglot issues that hit Python.

Best of luck. =3


Python threading and computational errata issues go back a long time. It is a popular integration "glue" language, but is built on SWiG wrappers to work around its many unresolved/unsolvable problems.

Not a "smear", but rather a well known limitation of the language. Perhaps your environment context works differently than mine.

It is bizarre people get emotionally invested in something so trivial and mundane. Julia is at v1.12.2 so YMMV, but Queryverse is a lot of fun =3


Oh boy, if the benchmarks are this good and Opus feels like it usually does then this is insane.

I’ve always found Opus significantly better than the benchmarks suggested.

LFG


Please don’t actually use these 5,6,7-way Venn diagrams for anything practical, they’re virtually useless and communicate nothing.


Technically a Venn diagram's entire point is to visualize all possible set relations between N sets. Their "practical" use is explicitly visualizing this.

In popular terminology they are very often confused with Euler Diagrams [0] which represent meaningful relations in sets but not all possible. You shouldn't create Euler Diagrams this complex, but the raison d'etre of Venn diagrams is to visualize the complex nature of set relations.

0. https://en.wikipedia.org/wiki/Euler_diagram


There is always the complicated wires puzzle from "Keep Talking and Nobody Explodes". Where a 5 way Venn diagram encodes what action you need to take for a given state.

https://bombmanual.com/web/index.html#ComplicatedWires

However you could make a good argument that having a complicated and confusing diagram is the point of that puzzle.


Agree, I think the linked Upset diagram is better.


Thanks, I was just about to do that!


I agree it is a profound question. My thesis is fairly boring.

For any given clustering task of interest, there is no single value of K.

Clustering & unsupervised machine learning is as much about creating meaning and structure as it is about discovering or revealing it.

Take the case of biological taxonomy, what K will best segment the animal kingdom?

There is no true value of K. If your answer is for a child, maybe it’ 7 corresponding to what we’re taught in school - mammals, birds, reptiles, amphibians, fish, and invertebrates.

If your answer is for a zoologist, obviously this won’t do.

Every clustering task of interest is like this. And I say of interest because clustering things like digits in the classic MNIST dataset is better posed as a classification problem - the categories are defined analytically.


“Skills are a simple concept with a correspondingly simple format.”

From the Anthropic Engineering blog.

I think Skills will be useful in helping regular AI users and non-technical people fall into better patterns.

Many power users of AI were already doing the things it encourages.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: