More

nrmn · on July 17, 2021

Yes, it feels like we have squeezed most of the performance out of current algorithms and architectures. OpenAI and deepmind have thrown tremendous compute against the problem with little overall progress (overall, alpha go is special). There was a big improvement in performance by bringing in function approximators in the form of deep networks. Which as you said can scale upwards nicely with more data and compute. In my opinion as an academic in the deep RL, it feels like we are missing some fundamental pieces to get another leap forward. I am uncertain what exactly the solution is but any improvement in areas like sample efficiency, stability, or task transfer could be quite significant. Personally I’m quite excited about the vein of learning to learn.

an_opabinia · on July 17, 2021

> alpha go is special

The VC community is in denial about how much Go resembled a problem purpose built to be solved by deep neural networks.

TchoBeer · on July 18, 2021

Are you suggesting that Go literally was purpose built for this?

rwallace · on July 18, 2021

There is a sense in which it was: out of all the games that have ever been designed, or that it would be logically possible to design, humans selected Go as one of the relatively few to receive sustained attention, in part because it is particularly well suited to the deep neural network that is the visual cortex. So it is not a coincidence that it is also well suited to artificial deep neural networks.

an_opabinia · on July 18, 2021

It’s one of the few interesting games out there whose rules can be neatly represented as algebra on binary matrices and still make sense.

nrmn · on July 17, 2021

Why do you believe this to be the case?

andyxor · on July 17, 2021

In a nutshell it’s too wasteful in energy spent and it doesn’t even try to mimic natural cognition. As physicists say about theories hopelessly detached from reality - “it’s not even wrong”.

The achievements of RL are so dramatically oversold that it can probably be called the new snake oil.

vladTheInhaler · on July 17, 2021

I'm going to need you to unpack that a bit. Isn't interacting with an environment and observing the result exactly what natural cognition does? What area of machine learning do you feel is closer to how natural cognition works?

tsimionescu · on July 18, 2021

Adding to the other comment, it's quite clear that animals, and especially humans, act and learn based on many orders of magnitude less experiences than pure RL needs, especially when discussing higher order behaviors. We obviously have some systems that use inductive and deductive reasoning, heuristics, simplistic physical intuitions, agent modeling and other such mechanisms, that do not resemble ML at all.

I would say that it is likely, intuitively, that these systems were trained through things that look much like RL in the millions of years of evolution. But that process is obviously not getting repeated in each individual organism, who is born largely pre-trained.

And for any doubt, the poverty of the stimulus argument should put it to rest, especially when looking at simpler organisms than vertebrates, which can go from egg to functional sensing, moving, eating, predator avoiding in a matter of minutes or hours.

andyxor · on July 17, 2021

> What area of machine learning do you feel is closer to how natural cognition works?

None. The prevalent ideas in ML are a) "training" a model via supervised learning b) optimizing model parameters via function minimization/backpropagation/delta rule.

There is no evidence for trial & error iterative optimization in natural cognition. If you'd try to map it to cognition research the closest thing would be behaviorist theories by B.F. Skinner from 1930s. These theories of 'reward and punishment' as a primary mechanism of learning have been long discredited in cognitive psychology. It's a black-box, backwards looking view disregarding the complexity of the problem (the most thorough and influential critique of this approach was by Chomsky back in the 50s)

The ANN model that goes back to Mcculloch & Pitts paper is based on neurophysiological evidence available in 1943. The ML community largely ignores fundamental neuroscience findings discovered since (for a good overview see https://www.amazon.com/Brain-Computations-Edmund-T-Rolls/dp/... )

I don't know if it has to do with arrogance or ignorance (or both) but the way "AI" is currently developed is by inventing arbitrary model contraptions with complete disregard for constraints and inner workings of living intelligent systems, basically throwing things at the wall until something sticks, instead of learning from nature, like say physics. Saying "but we don't know much about the brain" is just being lazy.

The best description of biological constraints from computer science perspective is in Leslie Valiant work on "neuroidal model" and his book "circuits of the mind" (He is also the author of PAC learning theory influential in ML theorist circles) https://web.stanford.edu/class/cs379c/archive/2012/suggested... , https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

If you're really interested in intelligence I'd suggest starting with representation of time and space in the hippocampus via place cells, grid cells and time cells, which form sort of a coordinate system for navigation, in both real and abstract/conceptual spaces. This likely will have the same importance for actual AI as Cartesian coordinate system in other hard sciences. See https://www.biorxiv.org/content/10.1101/2021.02.25.432776v1 and https://www.sciencedirect.com/science/article/abs/pii/S00068...

Also see research on temporal synchronization via "phase precession", as a hint on how lower level computational primitives work in the brain https://www.sciencedirect.com/science/article/abs/pii/S00928...

And generally look into memory research in cogsci and neuro, learning & memory are highly intertwined in natural cognition and you can't really talk about learning before understanding lower level memory organization, formation and representational "data structures". Here are a few good memory labs to seed your firehose

https://twitter.com/MemoryLab

https://twitter.com/WiringTheBrain

https://twitter.com/TexasMemory

https://twitter.com/ptoncompmemlab

https://twitter.com/doellerlab

https://twitter.com/behrenstimb

https://twitter.com/neurojosh

https://twitter.com/MillerLabMIT

unishark · on July 18, 2021

The place/grid/etc cells fall generally under the topic of cognitive mapping. And people have certainly tried to use it in A.I. over the decades, including recently when the neuroscience won the Nobel prize. But in the niches where it's an obvious thing to try, if you can't even beat ancient ideas like Kalman and particle filters, people give up and move on. Jobs where you make models that don't do better at anything except to show interesting behavior are computational neuroscience jobs, not machine learning, and are probably just as rare as any other theoretical science research position.

There is a niche of people trying to combine cognitive mapping with RL, or indeed arguing that old RL methods are actually implemented in the brain. But it looks like they don't much benefit to show in applications for it. They seem to have no shortage of labor or collaborators at their disposal to attempt and test models. It certainly must be immensely simpler than rat experiments.

Having said that, yes I do believe that progress can come considering how nature accomplish the solution and what major components we are still missing. But common-sense-driven tacking them on there has certainly been tried.

sillysaurusx · on July 18, 2021

For what it’s worth, I agree with this take. But I think RL isn’t completely orthogonal to the ideas here.

The missing component is memory. Once models have memory at runtime — once we get rid of the training/inference separation - they’ll be much more useful.

bobberkarl · on July 18, 2021

just to say this is the kind of answer that makes HN an oasis on the internet.

nrmn · on April 8, 2020

Mosh is great, but it does not support port forwarding. For some people, including myself, this is a deal breaker. It has been an open issue/feature request since 2012[1] and even has a ~600 USD bounty on it[2].

[1] - https://github.com/mobile-shell/mosh/issues/337

[2] - https://www.bountysource.com/issues/4471419-ssh-port-forward...

TTPrograms · on April 8, 2020

I would also love this feature, but I understand the argument that it's major feature-creep for this project - as I understand, with mosh as implemented it would be difficult to integrate. IIRC they would have to roll some kind of TCP over UDP? I'm not sure, it's been a while since I looked at it. I'm willing to accept a good tool missing features over a poorly maintained tool that at some point had all the features I care about.

And realistically $600 is a pittance compared to the long-term cost of maintaining a feature that has the potential to dramatically increase the size of the code base.

shafte · on April 8, 2020

EternalTerminal[1] was mentioned in a comment below. It supports port forwarding, and other goodies like native scrollback, at the cost of latency on laggy connections (because it doesn't do full terminal emulation). If that tradeoff sounds good to you, try it!

[1] https://eternalterminal.dev/

nrmn · on July 7, 2019

Can jackfruit be eaten raw too?

2muchcoffeeman · on July 7, 2019

Yes. The seeds can also be saved and cooked. And if you have the young fruit, you could cook it in say a curry like a meat substitute.

nrmn · on July 6, 2019

fyi to the downvotes: o7 is a farewell salute.

https://www.reddit.com/r/EliteDangerous/comments/3sdixv/what...

di4na · on July 6, 2019

Coming from Eve Online, it is even older :D. I have no problem with the downvote. What matters is to do it.

codingdave · on July 6, 2019

Just because it has a meaning does not mean it is adding constructively to the conversation.

Lutzb · on July 7, 2019

You are right, but maybe once, we can just see it as what it means in this context. o7

nrmn · on July 5, 2019

Working on research for my thesis, which will be wrapped up by years end. The current project focuses on improving sample efficiency in deep reinforcement learning. I am researching how best to merge the options framework with the adaptability of meta-reinforcement learning.

In my spare time I write and research algorithmic trading strategies. I’ve been sticking to the traditional techniques, with a small toe into statistics for modeling.

With whatever time is left, I’ve been learning rust and have enjoyed it quite a bit so far.

nrmn · on May 4, 2018

From my viewpoint, as both a researcher and someone who has built frameworks around environments/games:

- Each step within the game has to be extremely fast. I.e the game should be able to be run as fast as the machine allows while keeping physics etc. consistent.

- Runnable via library import such that there is no drawing to the screen.

- Should be easy to reset the environment to an initial state.

- RNG state should be seedable.

- I highly recommend supporting an identical interface found in OpenAI's gym. Check their docs out. Even better would be to have your game importable as an environment in gym.

- Configurable screen resolution would be great (eg. output 120x100)

- The environment is "hackable" eg. the maps or levels can be modified or loaded say via some ascii map.

- Should support multiple copies of the game running at once.

- A nice to have would be if the current environment state could be exported and loaded later.

- Expose some information/signals such that a reward signal can be created. Or better yet you define one as the game creator.

scrollaway · on May 4, 2018

Excellent list.

> - Should be easy to reset the environment to an initial state.

Adding on to that, the ability to rewind the game state is a pretty big deal.

The biggest deal for AI researchers though is that you implement a replay function and format, and publish lots of tooling around them (to read and parse them, etc; at least in Python).

Also, if it's an online game, save the replays serverside and publish them somewhere. Kaggle will be happy to take it I'm sure.

Eridrus · on May 5, 2018

> - I highly recommend supporting an identical interface found in OpenAI's gym. Check their docs out. Even better would be to have your game importable as an environment in gym.

> - Configurable screen resolution would be great (eg. output 120x100)

I think both of these things assume you are going to be doing RL from pixels. I think to support a wider variety of RL/control research, you should be able to get the game state in a structured form and not just a flat vector the way gym does it.

But even then, that's still just one branch of AI research. I've seen people optimize how games behave to optimize engagement with the game, and in that setting just controlling the player is not enough. The work I saw looked at controlling level progression to increase engagement, but you could imagine controlling other bits of the game, particularly relevant if your game is not symmetric and the metric you care about is not just making the best AI.

Maybe not AI, but people also do research on how to replace components of games with ML components and the results can be pretty cool, e.g. https://www.youtube.com/watch?v=Ul0Gilv5wvY

Which is just to say that there is not one size fits all approach here.

godelmachine · on May 4, 2018

May I ask what do you mean by "RNG state should be seedable"?

yathern · on May 4, 2018

If the game depends on random events (eg an attack does random damage between 3-8) it would be useful to make sure it's always the same randomness, if you want it at least.

godelmachine · on May 4, 2018

Same randomness? I can't get gist of the term.

sriram_malhar · on May 4, 2018

In addition to the other explanation, check out today's NYT article on how one guy cracked the lottery because of pseudo-random behaviour in the lottery code.

https://www.nytimes.com/interactive/2018/05/03/magazine/mone...

Moggie100 · on May 4, 2018

Most random sources are PRNG rather than 'true' random sources, and sometimes it's useful (for debugging, for analysis or just for interest) to be able to use a predictable pattern of otherwise random numbers.

One way is to allow some way of 'seeding' the PRNG such that the order of the numbers it produces is the same each time, as we return the random function back to a known state.

Or, by example, if I make 5 calls to the PRNG with seed value '0' and see the following: [5, 2, 9, 18, 4, ...] and that causes the agent I'm testing to do something utterly weird, so I want to re-run my agent to observe the effect in detail to debug it, and for that to happen, I need the same [5, 2, 9, 18, 4, ...] sequence, otherwise I'll be forced to run repeatedly until I observe the same glitch, so by re-seeding the PRNG to '0', it will then predictably return that sequence, rather than a new, random sequence.

kdelok · on May 4, 2018

It's because most of the randomness used by software is actually pseudorandom. What that means is that you actually use a defined sequence. The sequence has behaviour that's close enough to what you'd get if you were picking random samples from a distribution for the desired application.

The key difference is that it's reproducible and that if you have insight into the parameters of the sequence (e.g. the seed and the current position in the sequence), you can predict the results. That's why people often get upset when people use these pseudorandom number generators for security purposes.

The seed is a value that is used to generate the sequence. If you use the same seed, you get the same sequence.

derivagral · on May 4, 2018

Typically when you init a random generator, it'll let you pass a number in if you want to. That will set the sequence of "random" output from the generator; different seeds will be random with respect to each other. If you re-use the same seed you'll get the same sequence of "random" numbers as before. This is useful to test or re-try sequences involving "random" in a reproducible way.

nrmn · on April 27, 2016

PLE as well! https://github.com/ntasfi/PyGame-Learning-Environment

(disclaimer I'm the author)

nrmn · on April 21, 2016

No they all use the same general principal of backpropagation to do the training. Different flavours of optimizers exist with different tweaks and additions to speed training up.

Relevant file in project: https://github.com/tflearn/tflearn/blob/0.1.0/tflearn/optimi...

radarsat1 · on April 21, 2016

So it's not common to use a layer-by-layer training approach for deep nets? I thought that was one of the main things that made a huge difference and enabled the "deep" revolution. Anyways, isn't vanishing gradients still a problem? If so, how do people use these frameworks for deep nets? Otherwise, how is the problem resolved? I thought vanishing gradients was an issue for anything with more than 2 or 3 layers.

nrmn · on April 12, 2016

Author here.

Thought I'd share a library I've been using with my personal work and projects. Its an interface around PyGame which makes it painless to start doing RL based work. Deciding to share it now as I'd like to get feedback from others on how to adjust the library and make it useful. Plus I was afraid of developing "just-one-more-thing" syndrome causing further delays.

If you want to start using it right away General Deep Q RL[1] currently supports PLE out of the box.

[1]https://github.com/VinF/General_Deep_Q_RL