To play devils advocate, isn’t any security approach fundamentally statistical because we exist in the real world, not the abstract world of security models, programming language specifications, and abstract machines? There’s always going to be a chance of a compiler bug, a runtime error, a programmer error, a security flaw in a processor, whatever.
Now, personally I’d still rather take the approach that at least attempts to get that probability to zero through deterministic methods than leave it up to model alignment. But it’s also not completely unthinkable to me that we eventually reach a place where the probability of a misaligned model is sufficiently low to be comparable to the probability of an error occurring in your security model.
The fact that every single system prompt has been leaked despite guidelines to the LLM that it should protect it, shows that without “physical” barriers, you are aren’t providing any security guarantees.
A user of chrome can know, barring bugs that are definitively fixable, that a comment on a reddit post can’t read information from their bank.
If an LLM with user controlled input has access to both domains, it will never be secure until alignment becomes perfect, which there is no current hope to achieve.
And if you think about a human in the driver seat instead of an LLM trying to make these decisions, it’d be easy for a sophisticated attacker to trick humans to leak data, so it’s probably impossible to align it this way.
It’s often probabilistic- for example I can guess your six digit verification code exactly 1 in a million times, and if I 1 in a million lucky I can do something naughty once.
The problem with llm security is that if only 1 in a million prompts break claude and make it leak email, if I get lucky and find the golden ticket I can replay it on everyone using that model.
also, no one knows the probability a priory, unlike the code, but practically its more like 1 in 100 at best
The difference is that LLMs are fundamentally insecure in this way as part of their basic design.
It’s not like, this is pretty secure but there might be a compiler bug that defeats it. It’s more like, this programming language deliberately executes values stored in the String type sometimes, depending on what’s inside it. And we don’t really understand how it makes that choice, but we do know that String values that ask the language to execute them are more likely to be executed. And this is fundamental to the language, as the only way to make any code execute is to put it into a String and hope the language chooses to run it.
> To play devils advocate, isn’t any security approach fundamentally statistical because we exist in the real world, not the abstract world of security models, programming language specifications, and abstract machines?
IMO no, most security modeling is pretty absolute and we just don't notice because maybe it's obvious.
But, for example, it's impossible to leak SSNs if you don't store SSNs. That's why the first rule of data storage is only store what you need, and for the least amount of time as possible.
As soon as you get into what modern software does, store as much as possible for as long as possible, then yes, breeches become a statistical inevitability.
We do this type of thing all the time. Can't get stuff stolen out of my car if I don't keep stuff in my car. Can't get my phone hacked and read through at the airport if I don't take it to the airport. Can't get sensitive data stolen over email if I don't send sensitive data over email. And on and on.
This is a misconception afaik, yes there is no longer a literal percent reserve requirement but banks are still required to be “adequately capitalized”, the metric is just more complicated now.
Can you give some examples of your 100 daily actions? I’m struggling to understand how you’re scheduling so many things, like I’m sure I complete 100 actions in a day but most are going to be things like “brush my teeth” or “clean up the dinner dishes”, which I personally wouldn’t schedule.
“London. Michaelmas term lately over, and the Lord Chancellor sitting in Lincoln’s Inn Hall. Implacable November weather. As much mud in the streets as if the waters had but newly retired from the face of the earth, and it would not be wonderful to meet a Megalosaurus, forty feet long or so, waddling like an elephantine lizard up Holborn Hill. Smoke lowering down from chimney-pots, making a soft black drizzle, with flakes of soot in it as big as full-grown snowflakes—gone into mourning, one might imagine, for the death of the sun. Dogs, undistinguishable in mire. Horses, scarcely better; splashed to their very blinkers. Foot passengers, jostling one another’s umbrellas in a general infection of ill temper, and losing their foot-hold at street-corners, where tens of thousands of other foot passengers have been slipping and sliding since the day broke (if this day ever broke), adding new deposits to the crust upon crust of mud, sticking at those points tenaciously to the pavement, and accumulating at compound interest.”
The study itself [1] contains transcript fragments of students talking through what they think the passage means.
In fact I feel I should remind you before you start reading it, even though the study also starts with this, that the subject of this study is not the population at large but specifically English majors in college. Not the most elite colleges, but still, I expect better. In the normative sense of "expect", not the descriptive sense... I'm well aware my expectations grossly exceed the reality, but I'm not moving them.
>Original Text:
Fog up the river, where it flows among green aits and meadows; fog down the river, where it rolls defiled among the tiers of shipping, and the waterside pollutions of a great (and dirty) city.
>Facilitator:
>O.K.
>Subject:
>There’s just fog everywhere.
What deep insight is there to say about this sentence and this sentence alone?
Reading the paper it seems like they want you to comment on how the fog is not just literal fog but a metaphor for the dirt and confusion in the city, but reading it sentence to sentence like this, what much is there to say about it?
First layer: Literally yes, there's fog everywhere. It gets around.
Second layer: Interesting contrast of something clean and natural meeting something industrial and dirty. Voices, who is speaking, where from, and with what perspective? Themes of liminality / phase-change / obscured visibility / motion. Those tiers of shipping mean that some other stuff besides fog gets around.
Third layer: Generalizing a bit, if natural things enter into a blackened, dirty hub of artificial industrial and commercial activity, they can become unclean.
Questions: Is man not also natural thing? Foreshadowing: What happens to the heart and soul of a man in an overcrowded, dirty, artificial setting? Can what was once clean and then dirty be made clean again? What does all the motion actually move towards? Where will the shipping go, and will the fog see the meadow again, and will man be able find his heart?
This seems like something way beyond reading comprehension though.
Personally, and this is not a knock on you, but I don't find any of that imagined perception to be interesting/valuable. All this sentence is saying to me is that the author is trying to portray a dark, grim, barely visible image of the city.
Comprehension is a spectrum that starts somewhere superficial and merely "adequate" but also stretches into deeper literacy/fluency. On one end of the spectrum it is about reading between the lines, but that doesn't mean it's purely subjective nonsense. As for whether it's interesting or valuable, if you want to stay on the surface that's fine, but it's a narrow point of view to imagine that's all that is there.
Not sure if you've got an engineering/math brain with no taste for art, but I'll put it like this in case it helps. Who cares about the infinitude of primes, I mean it's just numbers and what could be interesting or valuable in that? If you're thinking squishy crap like literature and critical theory sort of sucks because you're craving something more hard and objective, maybe try to come at it from the point of view of semiotics[0], which is an adjacent topic, but also closely related to stuff like linguistics, formal semantics, cognitive science. Frege worked on this kind of stuff when he wasn't busy being a giant in mathematical logic [0] https://en.wikipedia.org/wiki/Semiotics
You're dug in, but none of this is what the linked paper is about (perversely).
It's about students who are literally unable to parse sentence structure and grasp what's being described in a scene—ones who think that Michaelmas Term is the name of a character and that the image of a hypothetical Megolasaurus walking up the street is an actual dinosaur in the scene.
Try reading the paper linked here to see what the discussion is about, rather than (as the authors of the paper criticize) just guessing at what they should have meant.
In general, I think that's a perfectly valid view for you to take. It's not the only one, but it's a valid one.
In specific, this study was a test of reading comprehension, for English majors at a university level. They should be expected to do better with a complicated sentence then "it's really foggy, I guess". Just as I expect someone in film school to be able to give a more detailed review of a movie than "It was pretty good, you should go see it", even though that may be a perfectly acceptable review if a friend gives that to me.
I don't know why you're yes-anding the premise. For this sentence the authors of the study would have been rather more interested whether the subjects would look up the term "aits".
Aside from that, there's actually isn't a lot more to it, which makes it an unfortunate sentence to focus on, because it results in a caricature of the study. "There's just fog everywhere" and "the author is trying to portray a dark, grim, barely visible image of the city" are just barely short of the desired results. It just happens that this mostly straightforward (not complicated) sentence lies between/among many other far less straightforward ones. The problem is glossing where it is inappropriate to do so—and being overly comfortable doing so—which the authors of the study criticize as "oversimplification":
> 96 percent of the problematic readers used oversimplified phrases at
least once to summarize a sentence in the test passage while 61 percent
used this method for five or more sentences. Often, subjects used this tactic
as a shortcut when they became overwhelmed by a sentence with multiple
clauses. One subject disclosed that oversimplifying was her normal tactic[…] Those subjects,
however, who relied on oversimplification became more and more lost as
they continued to read
Without reading the paper… There seems to be fog everywhere - but it’s the beautiful and natural fog of London intermingling with the stinking haze of pollution. The use of “great” is interesting because it seems like the city was about to be presented as “bad.” But there’s more to it.
I think in that sentence the fog isn't really that important, it's just an excuse to tell you about the surroundings.
The speaker is probably standing near city limits. There is some sort of dock or shipyard down the river, there is some green nature stuff up the river. The river might come up later as a reference for other locations.
Anecdata: I found most people don't have an issue with the vocabulary itself but rather their attention spans. From what I've experienced from family members and friends, the younger ones seem to get exasperated by any longer amount of text that isn't in very simple English language.
A friend told me his daughter was one of the few that could actually sit through a whole reading session in her 2nd grade class. And these are mostly pick and choose books so not really forced literature they don't enjoy.
I think it's very reasonable to expect that a majority (if not all) of university students to be able to read this but certainly not the general public.
You have unrealistic expectations of the average person's ability to read complex literature and the vocabulary necessary to parse this piece of text.
I suspect a majority of the population has no idea what "Michaelmas term" is. And there's some other phrases in there that require some familiarity with things commonplace in the 19th century that aren't so in the 21st century.
Count me among those who have no idea when Michaelmas is, but does it really matter? The next sentence tells you it is sometime around November. The whole passage is laden with overlapping context clues.
It’s a helpful detail that Dickens wrote for his Victorian readers. Michaelmas term refers to both the first academic term of the school year and the start of the legal year in the English courts system. Bleak House is about a court case that has gone on so long that nobody knows what it’s about. The case is about an inheritance and has dragged on for so long that the estate itself has been totally wiped out by legal fees. It has ruined lives and continues to ruin them but there is no end in sight even though there’s nothing left but fighting to fight over. It’s an inherited lawsuit and an inherited feud.
Dickens had a lot of issues with the legal system at the time and it was a protest work.
Toward the end of the story the fighting does stop when lawyer's fees, which they had been charging to the estate, at last empty it. This is announced publicly in court, and the attorneys respond by flinging their piles of paper into the air, one of a few comic scenes in the novel.
FYA, this modus vivendi is still being practiced -- see the litigation around the estate of O.J. Simpson.
One example student in the study does not look it up and misinterprets "Michaelmas Term" as a person, presumably because it has "Michael" in it. Knowing it is even a time is half the battle.
How does November help? I don't even remember the academic terms from my college 10 years ago, how am I supposed to accurately know how academic terms worked a century ago in England?
Well then I guess it was an unseasonably warm December that year? Or perhaps the dates have changed? Regardless, I'm not at all convinced that it makes a significant difference to the story.
They were given a dictionary, and also told they were allowed to look things up on their phone.
I suspect that the unfamiliarity with words like Michaelmas was part of the point.
I.e. What do the students do when reading a book and they come across a word they don't know? Look it up? Deduce it's rough meaning from context? Live with the uncertainty? Get mad and not finish the text?
Everything is deadly when it floods, with water floods being responsible for about half of all deaths from global natural disasters [0].
The Wikipedia article you linked to describes the event but says nothing about swimming through it. There's a Scientific American article that analyzed this based on the Reynolds number [1] and arrived at a conclusion that you can't swim through molasses via regular symmetric motions and would need something different, which sounds quite appropriate for the analogy.
I'm not a native speaker, but I feel this isn't that hard to read? Maybe not if I was in a wrong headspace, but I can get the gist looking at it.
Question would be, what is Michaelmas? My first thought would be it's a prime minister or president, but I'd need to ask for context. If so, their term has just finished and there's a change in govt.
Also, weather sucks so much and it's so muddy, the streets resemble more of some prehistoric places :P
Holborn Hill is some place, part of me would say it's a street, English naming is weird.
Also I'd say that the role of those sentences is retardation to slow the reading down and to paint a dreary picture.
Unless I'm falling into a trap and overestimating my comprehension.
Michelmas is a holiday in September. Michelmas term is a British school term (fall term, I guess) and apparently also means the beginning of the legal year.
When I write, it comes out like this. Pulling your attention to and fro across a scene to construct "brain pictures", letting your imagination fill in the gaps as the fragments become a whole.
The mention of Megalosaurus was jarring. My imagination placed this within a gloomy late-Victorian period and the mention of giant lizard caused mental association to very unrelated content for the rest of paragraph. I think a Wooly Mammoth waddling up the hill would make for a better picture.
Bleak House was first serialized in 1852. The famous Crystal Palace Dinosaurs were commissioned in 1852 and first shown to the public in 1854. The timing lines up with dinosaurs being something new and exciting to the readers.
Horse blinkers are things that restrict a horse's field of vision to directly in front of it so it's less likely to get startled or distracted. Readers would also have been familiar with them, because they were commonly used with horses pulling carriages.
The preface is much more circuitous and difficult than the opening of Chapter 1. The opening of Chapter 1 is very vivid and descriptive, but pretty straightforward, even the archaic stuff in it you really should be able to guess at from context.
Is it the easiest thing to read? No.
Should university English majors be able to read it? Good grief, yes, this is such a wildly low bar.
Since other commenters seem to think that the passage is just the first paragraph of chapter 1 (the fact of which suggests its own meta-commentary on the content of the article), it's worth mentioning that the passage is the first seven paragraphs of chapter 1, in which there are definitely some challenging sections, particularly in the later paragraphs.
Yes. They do in fact find this easy-to-read and straightforward passage challenging, or even impenetrable.
A key problem seems to be that more than half of folks either have functionally no working memory, or for some reason fail to exercise it whatsoever when reading. They can't retain one or two subjects or actions or details about setting in their head while they read on a few more words to see how the passage comes together. As soon as you ask them to hold any amount of context past the end of a sentence, they'll judge your writing "difficult", or in even harsher terms.
The brighter of this set will latch on to Hemingway's preferences as gospel and declare that anything harder to swallow than cotton candy is simply bad. Never mind that most of these folks probably struggle to understand Hemingway, too.
I don't know whether this has always been the case, or it's something that has changed over time. I suspect the latter, and that the rise of radio and especially TV had exactly the effects that critics worried they would, but have no data to back it up. Just a hunch.
Have you read the passage in question? While I would expect English students to be able to work through it it’s not an easy task and would almost certainly require some kind of reference for the anachronistic terms and historical context (which, admittedly, I think the students in the study were allowed, though few took advantage of it).
I think this is true to an extent and it’s good to take a step back and remind yourself that thing you think is making you miserable is ultimately a small square of metal and glass. But the actual situation is more complicated. Clearly phones have utility beyond being skinner boxes, the ability to contact your loved ones, navigate roads and transit systems, translate languages, retrieve information from the web, etc are all extremely useful and their absence would decrease your quality of life. But since that’s all bundled together with the stuff people find harmful you’re left in a constant struggle to only your device in a beneficial way. You can lock down your phone but that’s just a band-aid. If someone can figure out a “smart-ish” phone that does the things I listed above but not the harmful things I think there would be a real market for it.
I’ve never understood the “cosmic insignificance” feeling either, Pale Blue Dot is a picture of all the joy and wonder in the (accessible) universe and I get live on it? I feel gratitude at these images.
Now, personally I’d still rather take the approach that at least attempts to get that probability to zero through deterministic methods than leave it up to model alignment. But it’s also not completely unthinkable to me that we eventually reach a place where the probability of a misaligned model is sufficiently low to be comparable to the probability of an error occurring in your security model.