Yes, it feels like we have squeezed most of the performance out of current algorithms and architectures. OpenAI and deepmind have thrown tremendous compute against the problem with little overall progress (overall, alpha go is special). There was a big improvement in performance by bringing in function approximators in the form of deep networks. Which as you said can scale upwards nicely with more data and compute. In my opinion as an academic in the deep RL, it feels like we are missing some fundamental pieces to get another leap forward. I am uncertain what exactly the solution is but any improvement in areas like sample efficiency, stability, or task transfer could be quite significant. Personally I’m quite excited about the vein of learning to learn.
There is a sense in which it was: out of all the games that have ever been designed, or that it would be logically possible to design, humans selected Go as one of the relatively few to receive sustained attention, in part because it is particularly well suited to the deep neural network that is the visual cortex. So it is not a coincidence that it is also well suited to artificial deep neural networks.
In a nutshell it’s too wasteful in energy spent and it doesn’t even try to mimic natural cognition. As physicists say about theories hopelessly detached from reality - “it’s not even wrong”.
The achievements of RL are so dramatically oversold that it can probably be called the new snake oil.
I'm going to need you to unpack that a bit. Isn't interacting with an environment and observing the result exactly what natural cognition does? What area of machine learning do you feel is closer to how natural cognition works?
Adding to the other comment, it's quite clear that animals, and especially humans, act and learn based on many orders of magnitude less experiences than pure RL needs, especially when discussing higher order behaviors. We obviously have some systems that use inductive and deductive reasoning, heuristics, simplistic physical intuitions, agent modeling and other such mechanisms, that do not resemble ML at all.
I would say that it is likely, intuitively, that these systems were trained through things that look much like RL in the millions of years of evolution. But that process is obviously not getting repeated in each individual organism, who is born largely pre-trained.
And for any doubt, the poverty of the stimulus argument should put it to rest, especially when looking at simpler organisms than vertebrates, which can go from egg to functional sensing, moving, eating, predator avoiding in a matter of minutes or hours.
> What area of machine learning do you feel is closer to how natural cognition works?
None. The prevalent ideas in ML are a) "training" a model via supervised learning b) optimizing model parameters via function minimization/backpropagation/delta rule.
There is no evidence for trial & error iterative optimization in natural cognition. If you'd try to map it to cognition research the closest thing would be behaviorist theories by B.F. Skinner from 1930s. These theories of 'reward and punishment' as a primary mechanism of learning have been long discredited in cognitive psychology. It's a black-box, backwards looking view disregarding the complexity of the problem (the most thorough and influential critique of this approach was by Chomsky back in the 50s)
The ANN model that goes back to Mcculloch & Pitts paper is based on neurophysiological evidence available in 1943. The ML community largely ignores fundamental neuroscience findings discovered since (for a good overview see https://www.amazon.com/Brain-Computations-Edmund-T-Rolls/dp/... )
I don't know if it has to do with arrogance or ignorance (or both) but the way "AI" is currently developed is by inventing arbitrary model contraptions with complete disregard for constraints and inner workings of living intelligent systems, basically throwing things at the wall until something sticks, instead of learning from nature, like say physics. Saying "but we don't know much about the brain" is just being lazy.
If you're really interested in intelligence I'd suggest starting with representation of time and space in the hippocampus via place cells, grid cells
and time cells, which form sort of a coordinate system for navigation, in both real and abstract/conceptual spaces. This likely will have the same importance for actual AI as Cartesian coordinate system in other hard sciences. See https://www.biorxiv.org/content/10.1101/2021.02.25.432776v1
and https://www.sciencedirect.com/science/article/abs/pii/S00068...
And generally look into memory research in cogsci and neuro, learning & memory are highly intertwined in natural cognition and you can't really talk about learning before understanding lower level memory organization, formation and representational "data structures". Here are a few good memory labs to seed your firehose
The place/grid/etc cells fall generally under the topic of cognitive mapping. And people have certainly tried to use it in A.I. over the decades, including recently when the neuroscience won the Nobel prize. But in the niches where it's an obvious thing to try, if you can't even beat ancient ideas like Kalman and particle filters, people give up and move on. Jobs where you make models that don't do better at anything except to show interesting behavior are computational neuroscience jobs, not machine learning, and are probably just as rare as any other theoretical science research position.
There is a niche of people trying to combine cognitive mapping with RL, or indeed arguing that old RL methods are actually implemented in the brain. But it looks like they don't much benefit to show in applications for it. They seem to have no shortage of labor or collaborators at their disposal to attempt and test models. It certainly must be immensely simpler than rat experiments.
Having said that, yes I do believe that progress can come considering how nature accomplish the solution and what major components we are still missing. But common-sense-driven tacking them on there has certainly been tried.
For what it’s worth, I agree with this take. But I think RL isn’t completely orthogonal to the ideas here.
The missing component is memory. Once models have memory at runtime — once we get rid of the training/inference separation - they’ll be much more useful.
Mosh is great, but it does not support port forwarding. For some people, including myself, this is a deal breaker. It has been an open issue/feature request since 2012[1] and even has a ~600 USD bounty on it[2].
I would also love this feature, but I understand the argument that it's major feature-creep for this project - as I understand, with mosh as implemented it would be difficult to integrate. IIRC they would have to roll some kind of TCP over UDP? I'm not sure, it's been a while since I looked at it. I'm willing to accept a good tool missing features over a poorly maintained tool that at some point had all the features I care about.
And realistically $600 is a pittance compared to the long-term cost of maintaining a feature that has the potential to dramatically increase the size of the code base.
EternalTerminal[1] was mentioned in a comment below. It supports port forwarding, and other goodies like native scrollback, at the cost of latency on laggy connections (because it doesn't do full terminal emulation). If that tradeoff sounds good to you, try it!
Working on research for my thesis, which will be wrapped up by years end. The current project focuses on improving sample efficiency in deep reinforcement learning. I am researching how best to merge the options framework with the adaptability of meta-reinforcement learning.
In my spare time I write and research algorithmic trading strategies. I’ve been sticking to the traditional techniques, with a small toe into statistics for modeling.
With whatever time is left, I’ve been learning rust and have enjoyed it quite a bit so far.
From my viewpoint, as both a researcher and someone who has built frameworks around environments/games:
- Each step within the game has to be extremely fast. I.e the game should be able to be run as fast as the machine allows while keeping physics etc. consistent.
- Runnable via library import such that there is no drawing to the screen.
- Should be easy to reset the environment to an initial state.
- RNG state should be seedable.
- I highly recommend supporting an identical interface found in OpenAI's gym. Check their docs out. Even better would be to have your game importable as an environment in gym.
- Configurable screen resolution would be great (eg. output 120x100)
- The environment is "hackable" eg. the maps or levels can be modified or loaded say via some ascii map.
- Should support multiple copies of the game running at once.
- A nice to have would be if the current environment state could be exported and loaded later.
- Expose some information/signals such that a reward signal can be created. Or better yet you define one as the game creator.
> - Should be easy to reset the environment to an initial state.
Adding on to that, the ability to rewind the game state is a pretty big deal.
The biggest deal for AI researchers though is that you implement a replay function and format, and publish lots of tooling around them (to read and parse them, etc; at least in Python).
Also, if it's an online game, save the replays serverside and publish them somewhere. Kaggle will be happy to take it I'm sure.
> - I highly recommend supporting an identical interface found in OpenAI's gym. Check their docs out. Even better would be to have your game importable as an environment in gym.
> - Configurable screen resolution would be great (eg. output 120x100)
I think both of these things assume you are going to be doing RL from pixels. I think to support a wider variety of RL/control research, you should be able to get the game state in a structured form and not just a flat vector the way gym does it.
But even then, that's still just one branch of AI research. I've seen people optimize how games behave to optimize engagement with the game, and in that setting just controlling the player is not enough. The work I saw looked at controlling level progression to increase engagement, but you could imagine controlling other bits of the game, particularly relevant if your game is not symmetric and the metric you care about is not just making the best AI.
Maybe not AI, but people also do research on how to replace components of games with ML components and the results can be pretty cool, e.g. https://www.youtube.com/watch?v=Ul0Gilv5wvY
Which is just to say that there is not one size fits all approach here.
If the game depends on random events (eg an attack does random damage between 3-8) it would be useful to make sure it's always the same randomness, if you want it at least.
In addition to the other explanation, check out today's NYT article on how one guy cracked the lottery because of pseudo-random behaviour in the lottery code.
Most random sources are PRNG rather than 'true' random sources, and sometimes it's useful (for debugging, for analysis or just for interest) to be able to use a predictable pattern of otherwise random numbers.
One way is to allow some way of 'seeding' the PRNG such that the order of the numbers it produces is the same each time, as we return the random function back to a known state.
Or, by example, if I make 5 calls to the PRNG with seed value '0' and see the following: [5, 2, 9, 18, 4, ...] and that causes the agent I'm testing to do something utterly weird, so I want to re-run my agent to observe the effect in detail to debug it, and for that to happen, I need the same [5, 2, 9, 18, 4, ...] sequence, otherwise I'll be forced to run repeatedly until I observe the same glitch, so by re-seeding the PRNG to '0', it will then predictably return that sequence, rather than a new, random sequence.
It's because most of the randomness used by software is actually pseudorandom. What that means is that you actually use a defined sequence. The sequence has behaviour that's close enough to what you'd get if you were picking random samples from a distribution for the desired application.
The key difference is that it's reproducible and that if you have insight into the parameters of the sequence (e.g. the seed and the current position in the sequence), you can predict the results. That's why people often get upset when people use these pseudorandom number generators for security purposes.
The seed is a value that is used to generate the sequence. If you use the same seed, you get the same sequence.
Typically when you init a random generator, it'll let you pass a number in if you want to. That will set the sequence of "random" output from the generator; different seeds will be random with respect to each other. If you re-use the same seed you'll get the same sequence of "random" numbers as before. This is useful to test or re-try sequences involving "random" in a reproducible way.
No they all use the same general principal of backpropagation to do the training. Different flavours of optimizers exist with different tweaks and additions to speed training up.
So it's not common to use a layer-by-layer training approach for deep nets? I thought that was one of the main things that made a huge difference and enabled the "deep" revolution. Anyways, isn't vanishing gradients still a problem? If so, how do people use these frameworks for deep nets? Otherwise, how is the problem resolved? I thought vanishing gradients was an issue for anything with more than 2 or 3 layers.
Thought I'd share a library I've been using with my personal work and projects. Its an interface around PyGame which makes it painless to start doing RL based work. Deciding to share it now as I'd like to get feedback from others on how to adjust the library and make it useful. Plus I was afraid of developing "just-one-more-thing" syndrome causing further delays.
If you want to start using it right away General Deep Q RL[1] currently supports PLE out of the box.