Previous Trump's admin was more friendly than Obama's/Biden's towards Poland. For example they removed need to apply for a visa to visit; the process to get those was time consuming and frankly humiliating. Another thing is that Trump's insistence on increase military spending for EU countries is also in line with Polish foreign policy so overall there is a chance I guess.
I mean I don't expect Poland being anywhere high on priority list but we already got slapped by Biden's admin so it can only get better.
So, if I understand correctly they're using Hodgkin-Huxley LIF neurons but trained end-to-end in a graph neural network. Through training to reproduce the neural data, the network learns the underlying connectivity of the neural system?
This seems very cool, but I'm surprised this kind of thing attracts VC money! I'm also skeptical how well this would scale due to the inherently underdetermined nature of neural recordings, but I've only skimmed the PDF so may be missing their goals and approach.
Because the old guard wanted it to remain a cliquey non-profit filled to the brim with EA, AI Alignment, and OpenPhilanthropy types, but the current OpenAI is now an enterprise company.
This is just Sam Altman cleaning house after the attempted corporate coup a year ago.
So let's say I claim that the sun goes in a circle in the sky in the morning. The null hypothesis is that it doesn't do that. Perform experiment. Null hypothesis wins. Write up paper! This is a negative result.
The point is that for every result where the alternative hypothesis wins, there are a massive, if not infinite, number of results where the null hypothesis will win. Are these publishable?
The idea is that some null hypotheses being true is actually interesting because it challenges an assumed belief. From the first paragraph of the article, the immediate feedback from the postdoc's supervisor was 'you did it wrong [because everyone knows that fish do like warmer water]'.
> It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.
> The idea is that some null hypotheses being true is actually interesting because it challenges an assumed belief.
??? As I had said originally, that's one of the primary situations where a negative result should be published.
But the huge, huge, huge majority of negative results are trivial and uninteresting. Thus the fundamental issue with negative results is that you have to provide rather more compelling justification for why such results should be published.
Yeah, I agree with your first point, but maybe misunderstood your reply? If there's nothing "surprising" about the result, it's not interesting, so not publishable. The article's first example, however, did seem to be surprising to the researcher's community, so it should have been published.
Sure. What I said, or had meant to say, was in reply to people complaining that there was some kind of cartel against negative results. Rather, what we're seeing is just the natural, if unfortunate, response to the basic problem with negative results as a whole. You can't just treat them as the same as positive results: because of their numerosity they require unusual justification for their publication.
I don't think that's entirely fair. There does seem to be something distinct about mental disorders - namely, they are defined by the symptoms.
With physical ailments, symptoms are surface manifestations of an underlying physical cause. For example, fever and fatigue in flu are the result of the influenza virus. Crucially, you can have the physical cause without symptoms (such as in the presymptomatic period), so the two are dissociable. Even where the physical cause is unknown, there's still this symptom-cause distinction.
In the case of mental disorders, they're essentially defined by the symptoms. To have depression is to be depressed. To have anxiety is to be anxious. What would it even mean to have depression without being depressed?
While on some level these are physical in a sense (insofar as they result from brain activity), I don't necessarily think we should think about their cause in the same way as a physical ailment.
More to the point, there's a much more obvious sociocultral element to mental disorders. There's no 'objective' line between being sad and being depressed so while we clearly can and should treat depression, it seems to be very different from other diseases.
None of this is to take away from the real suffering people undergo with these disorders. I just don't think that treating them through a strict biological-pathological lens is as useful as people think it is.
This is what we see historically, in the 19th and early 20th centuries, in "body medicine". We're just at an earlier stage of understanding and treating psychiatric disorders. I assume in a couple hundred years we'll look back at the current state of the art as hopelessly outmoded.
> In the case of mental disorders, they're essentially defined by the symptoms. To have depression is to be depressed. To have anxiety is to be anxious. What would it even mean to have depression without being depressed?
What would it mean to have typhoid without an actual fever? The disease is something else, some causative factor that is not the actual symptom itself, but is no less real for that.
> I just don't think that treating them through a strict biological-pathological lens is as useful as people think it is.
Agreed, but that's with the current state of the art. I would be surprised indeed if by 2080 we didn't have a different lens to view these things through.
I think the mind and body are fundamentally different. Like with diabetes you might have a genetic deficiency that interferes with producing insulin so you have diabetes. You inject insulin and now you don't, more or less.
The mind is much more of a complex system than that.
Well, there are also slower/faster acting insulins so that system is also more complex than that, but yeah you are generally right.
I like to think of it with an analogy like you have a very intricate 3D maze and have to get a tiny ball into a hole, but you can only ever move the maze in one direction, and it resets between tries. Something like SSRI just pushes everything one way in a big swipe, and sure it might just solve a particular puzzle, but there are endless others where such low-precision method has no way of succeeding. Of course that’s why psychotherapy is a must have for any sort of medication.
> There does seem to be something distinct about mental disorders - namely, they are defined by the symptoms.
There a pretty long list of idiopathic physical ailments, and there are some mental disorders that have specific mechanisms. I get the generalization, but the reality is more jumbled.
But it may very well be slower than just recompute it. At least for ordinary MHA and even GQA.
So, either a model arch woodoo significantly reducing kv cache size (while keeping roughly the same compute cost), or some really careful implementation moving kv cache of upcoming requests to devices in background [0].
[0] My back of envelop calc shows that even then it still does not make sense for, say, Llama 3 70B on H100s. Time to stare at TPU spec harder trying to make sense of it I guess.
It depends on how large the input prompt (previous context) is. Also, if you can keep cache on GPU with a LRU mechanism, for certain workloads it's very efficient.
You can also design an API optimized for batch workloads (say the same core prompt with different data for instruct-style reasoning) - that can result in large savings in those scenarios.
If you can pipeline upcoming requests and tie state to a specific request, doesn't that allow you to change how you design physical memory? (at least for inference)
Stupid question, but why wouldn't {extremely large slow-write, fast-read memory} + {smaller, very fast-write memory} be a feasible hardware architecture?
If you know many, many cycles ahead what you'll need to have loaded at a specific time.
Or hell, maybe it's time to go back to memory bank switching.
The throughput of the PCIe link between the CPU and GPU, is far less than the aggregate throughput of the internal interconnects between neighbouring tensor cores.
Matrix operations might flow a lot of data around — but that data flow is akin to a bunch of individual people travelling along the individual residential streets they live on. There's a lot of movement there, but also a lot of capacity for movement, because there's no bottleneck of everyone needing to go to the same place or come from the same place.
Persisting the data out of the GPU and then loading it back in, is more like all those people commuting to work and then going back home. Big fan-in onto the PCIe "highway" over to the CPU and into RAM; then big fan-out back. Traffic jams for miles.
In the time it takes to restore a 1GB state snapshot from RAM into VRAM, you can probably chew through the equivalent of 1TB or more of intermediate matrix states.
I don’t know of any public details on how they implement Context Caching, but that is presumably exactly what they are doing. Just caching the text would be a minimal savings.
Normally, the LLM is composed of multiple transformer blocks, where each block consists of the (mutli-head) attention and fully-connected feedforward components. These are then stacked on top of each other to give the final output of the network.
If you're interested in reading something similar, your description reminds me of The Black Cloud by the astrophysicist Fred Hoyle. I have to say I found the writing quite clumsy, but owing to him being an astrophysicist himself there was quite a lot of attention to detail in making the plot scientifically realistic (within certain bounds).
I have to say I can't think of a worse outcome than Wikipedia becoming advertising- or subscription-funded.
At best, it will be less useable and more liable to influence once its source of funding is at the behest of advertisers. And with a subscription model, presumably it would then be pay-to-play which is antithetical to the idea of Wikipedia in the first place.
I also don't agree that Wikipedia has to a significant degree departed from "impartiality, openness and academic freedom", or at least I'd need some sources/examples.