Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Deep Reinforcement Learning to Play StarCraft (arxiv.org)
116 points by wbthomason on Sept 14, 2016 | hide | past | favorite | 50 comments


Generalized reasoning in strategy games AI is an especially difficult problem for machine learning. IRL, top StarCraft players routinely model their opponents mental states and psychology to create an edge.

So perhaps it's worth pointing out, that this paper specifically addresses a sub-problem of Starcraft play, micromanagement ('micro') [1]

The game engine runs at 24 frames per second. (As an aside, 'frames' in this context likely does not map to physical FPS of the display).

>We ran all the following experiments with a skip_frames of 9 (meaning that we take about 2.6 actions per unit per second).

The research team found that attempting to move at a superhuman pace (eg one action every frame), resulted in a subpar performance and hyper-parameterization indicated 2.6 to be an ideal action per second.

In context, this translates to an APM of 156. Or, roughly half that of professional Korean e-athletes. [2]

[1] https://en.wikipedia.org/wiki/Micromanagement_(gameplay)

[2] https://en.wikipedia.org/wiki/Actions_per_minute


"The researchers found that attempting to move at a superhuman pace (eg one action every frame), resulted in a subpar performance."

Moving at extremely fine-grained timesteps can make learning much more difficult, because now a reward arrives millions of timesteps delayed rather than hundreds or thousands. It's like trying to teach a NN to compose piano music by starting down at the 1ms raw audio level. This is part of why audio synthesis was so difficult up until recently with DeepMind's WaveNet. In theory, being able to move every frame should enable extremely superhuman performance, but in practice, you can't learn your way there. So often people will chunk data to make it easier to learn the higher-level concepts: operate on words, rather than characters, for example.


Why not go the other way and decrease the actions per minute so you learn the overall point of the game , And with each game the actions per minute increases.


Or maybe extend the traditional categories of macro and micro with another one, call it 'nano'... the micro agent indicates where each unit ought to be in 9 frames, and the nano agent figures out how to take them there. Since the timescale is so short, the agent could brute-force enumerate possible moves to some extent and figure out which is optimal, like chess AI. Or use a separate network.

I guess that's inelegant when a deep network already has its own concept of fine-grained versus coarse-grained layers, and should be able to do this on its own with the right training method.


That sounds like an interesting research angle. The thing about AI research is there are so many open ends there are essentially unlimited research options. If you can pose it as a problem and identify a reasonable programming approach then you have an avenue for AI research. Deep Learning isn't the end of AI research. It is the beginning.


APM numbers are actions-per-minute, not the actions-per-unit-per-minute measurement in which the AI's performance is reported. It's not clear how the two map to each other -- the AI may well be exceeding human performance when multiple unit groups are considered.


That's a good point and not really correct to conflate the two.

It would then seem to suggest, that forces with more than say, 5 units (780 average sustained APM) and above, would likely be getting into super human territory.


I am not a gamer... but I have to ask. 780 APM is possible with human players? 13 actions per second? I can't come close to clicking my mouse that fast. I don't think I can blink that fast.


There are hotkeys. there's a big map you can scroll around, but you can bind a specific view of a map to f1-f4. you can get a lot of actions just rolling through those, looking for specific events. Is the attack coming from over here? is the base still collecting resources? stuff like that.

You can also hotkey individuals or groups, 1-8. Again just rolling through those will get you a fair amount of apm.

That said, 780 is peak for world class players. You probably can't do that, ever, because by the time you put in the years of effort, your reflexes will have decayed.

On the other hand, brood war was a subtle game. APM is not the be all end all of winning. Getting proficient with a few aspects of the game can make you pretty good pretty quickly. Heck, just looking around the map like i mentioned above will help. Some people are amazing controlling a few units like a surgeon. Those units are practically unstoppable. But they miss the big picture. Scanning the map lets you distract, delay and ultimately minimize the value of those unstoppable units.


I found this video demonstrating ~300 APM, which is quite insane already. I think >500 APM is probably burst speeds, not sustained.

https://www.youtube.com/watch?v=zmYhX8fjmo8


Yes, it's possible. Bear in mind that not just clicks count though. July managed a peak APM of 818 in Stracraft: Brood War in an official game. The average APM of a match is usually between 300 and 500, also depending on the race that is played.


A lot of that is 'spamming', where they keep their hands moving and doing things even if they're not particularly meaningful, just so they can keep the pace of the eye/brain/hands feedback loop going.


Maybe early on it is spamming, but later on in the game most actions because useful actions. Having said that, some APM measurements only tracked effective APM (EAPM), which was clicks that actually achieved some result rather than merely selecting a unit.


I think that may be high for an average... and to give an idea of how much APM is spam, there's that Stork vs. Idra replay from the World Cyber Games where Stork demolishes Idra despite using 70 APM to Idra's 300+. Of course, Protoss takes the least APM but Stork may have been the least spammy player.


After loosing on the Go territory seams like fb is trying to challenge Alphabet on StarCraft. DeepMind already declared they will go for StartCraft as a next challenge does it mean that they accept the challenge? I'm actually happy to see what could be the result! The only weak point for the StarCraft community is that it would be on SC1 and not SC2.


SC1 is not a weak point, it's generally considered to be better balanced, and is the "gold standard" for RTS's. SC1 also has far more training dataset than SC2.


In addition to the training data, there is also the BWAPI project, that lets the bots play the game against other bots or humans.

http://bwapi.github.io/

There isn't something similar available for SC2 due to a mix of technical and nontechnical issues:

https://github.com/bwapi/bwapi/wiki/FAQ#will-there-be-an-api...


Yep! But as someone that built something on top of the BW API, I'd wager anyone going after SC1 AI to probably write their own thing. It's still an amazing API for general heuristics and modeling, but there's a few issues with it that stand in the way of making it a scaleable foundation.


What is wrong with it?


Gabriel Synnaeve was already working on this long before joining FB.

http://emotion.inrialpes.fr/people/synnaeve/


Actually SC1 is the defacto choice when talking about StarCraft esports and the best choice since the best SC players have been honing their skills on SC1 for a long time (see South Korea SC1 esports scene).


There are actual professional players still active for SC2 though. I don't think you can make that claim for SC1 because even though the competitive scene has been growing again nobody is getting a paycheck for playing competitive games.


You've got to be kidding. There's one going on right now with a prize pool of $31k! http://wiki.teamliquid.net/starcraft/Main_Page


Are you referring to the just completed Afreeca Starleague with $21k prize pool? Only the top two got more than a typical month's paycheck (winner did get over $10k) and it lasted a couple of months. There also haven't been any other events even close to that size this year for BW. Pro players don't typically live off tournament winnings.


That's the one! The winner pulled in $13,500 USD. I must have misinterpreted "paycheck" in your comment "nobody is getting a paycheck for playing competitive games."

[edit] Ah, $21k, not $31k. Thanks. [edit] And yes, it was recently completed and isn't actually ongoing. Not sure what I was smoking when I made so many false statements.

[edit] Maybe I was thinking the tournament starting October 28th, http://wiki.teamliquid.net/starcraft/VANT36.5_National_Starl... for $33k total prize pool.


Yeah you could've well meant the VANT event, I forgot about Afreeca just announcing another large one. Who knows, with how things are going we might see professional play again in Brood War and of course some of the current players were at the top when it existed.


While there are currently no professional players, as amateurs, the top guys definitely earn more than what you would get as a professional SC2 player. In July 2016, the 10th highest earned $7,699.18 USD, the highest $36,811.18 USD. And those numbers don't include ad revenue, tournament winnings or sponsorship deals.


Earning through streaming can be detrimental to the level of play though. At least one of those top10 doesn't even bother playing competitive matches and I'm pretty sure he's not the only one that's not really a competitor even if he's making a living as an entertainer playing Starcraft.


It's a misconception that StarCraft is a strategy game. If you look at how it's actually played by human pros, it looks closer to a fighting game; very reflex-driven & heavy on micro-interactions. You would expect an un-gated AI with effectively infinite actions per second to do very well.


It seems like most people in this thread are claiming that high APM and the ability to have perfect control over everything happening on the map will give the AI an advantage that will force a win.

Here is a video of one of the best current StarCraft bots losing to an D-rank (low skill) human player. The bot's APM is ~5500 while the human's is ~200. https://www.youtube.com/watch?v=ztNYOnx_YQo

The fact is, no AI has ever beaten even an amateur player in a tournament. Even with great micro, if your play is too predictable then the human will learn it and exploit it.

I, for one, am very excited to see the development of new StarCraft AIs. And especially SC2 AIs so that it can challenge the current world champions.


Well, it literally is a real time strategy game. It's just that once the meta stabilizes, macro advantages become fewer and fewer and micro skills become the deciders. Perhaps at this point it might be better to refer to it as a real time tactics game. The AI would presumably not fail at micro so it would dominate at this point in the game's lifecycle, but it might be able to be taken advantage of at a macro level.


Are you not just generalizing from only watching pros vs pros where the skill gap between them is probably very small thus making it seem like the only difference is in mechanical ability?


It works on several different levels. You're talking about what competitive players call 'micro', which could be considered the implementation of the strategy.

I haven't played for a while, but usually there are a few basic 'builds', which are essentially memorized openings-- and there are 3 types of openings -- Macro builds where you focus on building an economy, while sacrificing military resources for a long-game, 'all-in' builds, which sacrifice your economy to build an early military advantage and win within the first few minutes, and various mid-range builds that try to do a little bit of both.

An all in is largely just down to micro and execution and it either wins or it doesn't, but the other two types of builds have a large strategic element-- for example, you need to scout to check if your opponent is all-in-ing, you can do harassment to distract your opponent from implementing his strategy by interrupting his economy, and then there's planning for the end game, building defenses, and the whole question of what you do if your original plan fails for one reason or another. There's a lot of thinking involved on multiple levels simultaneously, both spacial and temporal.


After the most recent top Starcraft tournament (gsl code S), they asked the loser why he lost. He said, "I worked hard to prepare for this tournament, but my opponent prepared better."

What kind of things did he prepare? It wasn't reflexes, it was strategy. What kind of strategies was he preparing? He watched his opponent's past games, and came up with some build orders of course, but in this case, the primary strategy he came up with was an army composition hoping to counter what his opponent had been doing recently. When the opponent had the proper counter to that strategy, he won the rest of the games easily.


I think you are confusing the focus that is apparent in the micro of the game or micromanagement which requires high APM to the macro such as getting enough bases to keep on creating units to fight and troop composition.


It's a misconception that fighting games aren't strategy games. The term that conflates both is tactics, if I got my vernacular right. Ofc, strength and reactivity is a tactical advantage.


This is pretty cool, although I think MOBAs (Dota, LoL) would be an even better test of AI skills than StarCraft. They also have imperfect information, but place more importance on strategy and less on micro than StarCraft; require some game theory and bluffing in the draft, and would need multiple agents to cooperate (assuming you set it up so that you had 5 AIs play the game, with well-defined communication channels, rather than one controlling the five players, which I think is the right way to go).

Seems like there's more potential for useful AGI techniques in that direction.


Increased AI performance in RTS is always exciting, but part of me is disappointed by the fact that the AI doesn't "see" or interact with the game that humans do. That is, humans don't play the game by querying the state/status of each unit and then issuing commands via some API. It would be fun (though complicate things significantly) to produce an AI that at least has some notion of a mouse/keyboard so that you could see it in action from a first person perspective.


Screw Go. Beating the Terran Emperor in a game of Starcraft is when the computers will finally take over the world.

It would be interesting though. How would a program that has PERFECT micro fare against a professional Starcraft player. Would a program reliably figure out how to kill 10 banelings with a few marines and a medivac by using the fact you can micro them to be able to do it without taking losses? Even if it could, would it know WHEN its even worth it to do so?


There are already incredibly good micro StarCraft AI, able to operate at thousands of APM - this isn't the limiting aspect of current StarCraft AI.

The "Hard" part of StarCraft is that it is a huge Rock Paper Scissors game with only the information you fight for. You have to be able to piece together a picture of your opponents actions and forces from small cues.


Wow, I hadn't seen this before.

Here is "Automaton 2000" controlling 20 marines vs 40 banelings, without losing a single unit.

https://youtu.be/DXUOWXidcY0?t=52

Pretty cool.


And as cool as that is, this is even more terrifying, as a hundred zergslings dodge seige tank cannons and destroy them.

https://youtu.be/IKVFZ28ybQs

It's enough to make you scared for the future of humanity.


What... the... hell..!? That's basically what they show on movies where the action stars have superhuman movement. Except, it's zergling's perfectly coordinating the demolition of siege tanks. Awesome demo of AI micro.


That's incredible. The speed advantage here: https://youtu.be/IKVFZ28ybQs Brings to mind the power of high frequency trading


They must constrain the AI to effectively only "click" on things at a rate and as far apart from the last thing clicked on such that a human could do it with a mouse


In the past, Q-learning has led to models that perform at a superhuman level, so I wouldn't be surprised to see something similar.


I think Starcraft is a very interesting challenge for AI because it involves planning in an environment that is only partially observable: you must scout in order to see what your opponent is up to, and even then, you don't see everything. If DeepMind works on this, I really hope that they constrain the AI (APM-wise) so that its only chance of winning is by good planning and strategy, not super-fast micro.


According to this paper Facebook is only working on micro - attempting to win a few simple (one to two unit types) battles that humans can win 100% of the time against AI.

As you said, the glory of StarCraft is it's strategic level information game. Will be interesting to see what comes out of attempting to learn that.


Prior work and why I love StarCraft as a testbed for AI described here:

http://webdocs.cs.ualberta.ca/~cdavid/starcraftaicomp/report...

The two papers in RTS techniques sections are a must read for an idea of what problems it poses along with results of prior attempts. The ability of human pro's to detect AI patterns and defeat them with bluffs is pretty consistent. StarCraft, like Poker, involves lots of psychological analyses and ploys.

Even if Google or Facebook make one, I still think of humans as superior until it can learn how to beat them with mere dozens to hundreds of games rather than what was fed into AlphaGo. That wasn't human equivalent or superior so much as approximating the results of nearly all human activity in the space then focusing it against one human. You could call it superhuman but it required tons of activity by brilliant humans. Brilliant humans require little with the champions a lot less than the automated techniques. Lots of self-discovery with limited data. I want to see the AI's pull that off plus keep it going when encountering humans with innovative, never-before-seen strategies. That's when I'll give them credit as useful on barely-defined problems with curveballs like humans.


If anyone is interested in deep learning around Blizzard games, there is an active AI community around Hearthstone in the `#hearthsim` and `#hearthsim-ai` channels on Freenode. cf https://hearthsim.info. Starcraft AI discussions welcome!

We're also discussing support for such projects using game replays from HSReplay.net :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: