DeepMind uses the Daily Mail as a huge training corpus for text comprehension

paulsutter · on June 23, 2015

The original Deepmind paper [1] is based on a really smart idea. Algorithmic development relies on measuring the performance of any proposed algorithm. For reading comprehension, performance is evaluated using Q&A about the corpus. Its difficult to find a large corpus with a comprehensive set of questions about the content.

Deepmind is cleverly converting the Daily Mail article summaries into questions by removing a proper noun. For example:

Question: Producer X will not press charges against Jeremy Clarkson, his lawyer says.

Answer: Oisin Tymon

They are using the Daily Mail corpus to develop their algorithm, and that's smart. They aren't relying on it as an important source of information. Maybe all you guys with the dismissive comments have a better idea?

[1] http://arxiv.org/pdf/1506.03340.pdf

EDIT: Thanks Otik, reworded the opening sentance

Otik · on June 23, 2015

For a moment I thought you were calling the Mail "a great paper", but actually the simple language used by it is probably quite good for this purpose.

Of course, it does mean that if you ask Deep Mind about the cause of anything negative it will probably tell you "immigrants".

collyw · on June 23, 2015

or "benefits scroungers"

PlzSnow · on June 23, 2015

edgy stuff

mcintyre1994 · on June 23, 2015

I agree with you - it's a neat idea, though I haven't read the paper yet.

I wonder if it has explicit knowledge that they'll always remove a proper noun. Eventually the daily mail will write some article about a place causing or curing cancer, and the poor thing will break down trying to find X in X causes cancer from that corpus :)

On another note, I wonder if they could train it on Buzzfeed-esque content without changing the headlines. "You won't believe who Jeremy Clarkson punched!" -> the answer

tormeh · on June 23, 2015

The daily mail should not be a source of information at all, though. Any greater weighting than zero is too much. If it's only used as a test for the algorithm then I guess it can be OK, assuming the Mail's broken logic doesn't get in the way. It's the step above dream logic, really. Deepmind might have problems abstracting too much over the text because of this. Trying to combine information from several different articles would be basically impossible.

Anyway, how about using Wikipedia instead? I can only guess they're using the Mail because they think it's funny.

erroneousfunk · on June 23, 2015

"I can only guess they're using the Mail because they think it's funny."

Ironically, given your self-professed superior tastes in reading material, Deepmind seems to have a better ability to read, comprehend, and answer questions about articles than you do.

peteretep · on June 23, 2015

So they're using a corpus as a determining factor for quality assessments, but you don't think that makes it an important source of information?

wodenokoto · on June 23, 2015

Thanks for the summary. I really didn't understand what the article was trying to say they where doing.

johntaitorg · on June 23, 2015

Another use for the Daily Mail: https://www.youtube.com/watch?v=xPlEIryW8zA

Animats · on June 23, 2015

The Daily Mail? Not the Times?

andyjohnson0 · on June 23, 2015

Apart from any other problems it may have as a knowledge source, the (London) Times is paywalled.

peteretep · on June 23, 2015

If you thought institutionalised racism, sexism and chauvinism were bad now, the singularity is going to suck.

SixSigma · on June 23, 2015

This is how Judge Death starts.

The crime is life, the sentence death.

KaiserPro · on June 23, 2015

I know why they did it, because its the lowest common denominator for celeb gossip.

However its shit for science, overtly paedophilic, and the only western news sit that seems to have a special section devoted to curating and promoting ISIS propaganda.

Why is it bad? because if you are looking for facts, the daily mail is a bad source.

If you are looking for natural language, its a good source, however its full of nuanced racism, sexism, classism & basically everything else thats wrong with britian.

It'll be good at describing house prices though.

mattnewport · on June 23, 2015

Overtly paedophilic? It's been a while since I lived in the UK and I never read the Mail when I did but this particular accusation is new to me. I've heard most of the others levelled at it before but what's the story behind that one? If anything I'd have associated them with Brass Eye style "paedophiles under the bed" paranoia (though in light of revelations of the last few years maybe that wasn't so unjustified).

EliRivers · on June 23, 2015

The Daily Mail uses a set of code phrases to indicate to its readers when underage girls can be viewed as something to have sex with.

Here are a few links.

http://www.themediablog.co.uk/the-media-blog/2013/01/daily-m...

http://www.vice.com/en_uk/read/all-grown-up-sexing-up-the-in...

http://www.newstatesman.com/martin-robbins/2012/06/sex-child...

afandian · on June 23, 2015

EDIT: this was the Daily Star not the Daily Mail.

This is the Daily Mail though: https://www.youtube.com/watch?v=NKzgDBumCr0

That was one of the great ironies of the Brass Eye special. One one page of the Daily Star a criticism of the episode. On the other page a sexualised picture of a child.

http://www.anorak.co.uk/303258/news/leveson-inquiry-charlott...

There's tonnes of stuff like this. They're quite open about it.

longwave · on June 23, 2015

That particular incident was in the Daily Star rather than the Mail.

afandian · on June 23, 2015

Thanks, my mistake.

raverbashing · on June 23, 2015

You're missing a /s at the end, hopefully

afandian · on June 23, 2015

No, that's a fairly accurate description of the paper. I won't repeat the list, but if you take a look at it the hatred of women, minorities etc really shines through. And a bizarre fanatical interest in house prices and which social group is affecting them.

raverbashing · on June 23, 2015

I'm not disagreeing with you with this description of the Daily Mail.

But it is irrelevant to the work DeepMind is doing

afandian · on June 23, 2015

I was just responding to your suggestion that the GP should might have been sarcastic.

Anyway, this is the frontiers of large-scale natural language processing. Who knows what latent features it might pick up? I don't know about the techniques in use, but it's conceivable for a present or future system to encode the biases represented in its corpus.

KaiserPro · on June 23, 2015

not really, its using this corpus to infer meaning to phrases. (at the very least)

So the bias it exhibits in language towards women, immigrants and the poor, will be baked into the way it understands language.

talideon · on June 23, 2015

I'm afraid not. The Daily Mail is an awful rag. One of its nicknames is "The Daily Hate".

niklasni1 · on June 23, 2015

Daily Heil.

talideon · on June 23, 2015

Yup, and those are its nice nicknames.

latenightcoding · on June 23, 2015

This is great are they modelling a neural network to detect bull shit ?

nbevans · on June 23, 2015

Somewhat worrying that they are feeding DeepMind a diet of DailyMail articles!

Maybe this is why Skynet turned rogue. Reading daily trash about celebrities and body image dysmorphia inducing trash, is enough to make anyone go mad.

jacknews · on June 23, 2015

Haha, absolutely classic! What will it end up comprehending? David Beckham's love life? How brown skinned foreigners are taking all the jobs? Etc.

carlob · on June 23, 2015

https://www.youtube.com/watch?v=5eBT6OSr1TI

Xophmeister · on June 23, 2015

Hey DeepMind, what causes cancer?

> EVERYTHING!

dghf · on June 23, 2015

> EXCEPT FOR THE STUFF THAT CURES IT!

"The Daily Mail Oncological Ontology Project: a blog following the daily mail’s ongoing mission to divide all the inanimate objects in the world into those that cause or cure cancer."

https://thedailymailoncologicalontologyproject.wordpress.com...

http://dailymailoncology.tumblr.com/

mcintyre1994 · on June 23, 2015

More worryingly: Hey DeepMind, go cure cancer for us. Would it work to kill all men, women, childless people, parents and babies first or would it focus on destroying the sun?

jen729w · on June 23, 2015

Indeed, all I could think is that the resultant AI will end up being an intolerant racist bigot!

kw71 · on June 23, 2015

REVEALED: You won't believe the SHOCKING markov chains used by HACKERS to train artificial intelligence READING ROBOTS!

FranOntanaya · on June 23, 2015

I am sorry to hear you won't believe the SHOCKING markov chains used by HACKERS to train artificial intelligence READING ROBOTS. Tell me more about it.

Fede_V · on June 23, 2015

Great, so we will have a racist and reactionary AI that thinks celebrity gossip and immigrant bashing are the most important things in the world.

For the non-AIs reading this thread, I highly reccomend https://addons.mozilla.org/en-US/firefox/addon/kitten-block/ a plugin that replaces every daily mail link with a random picture from kittens & tea, in case you mistakenly click on a daily mail link.

Snark aside, the paper is really cool though :)

vixen99 · on June 23, 2015

You read trash but I read light amusing gossip. The great thing about dissing the Mail is that you don't have to present any argument; you just toss in a few words and phrases like 'racist', 'trash' and 'immigrant-bashing' (defined as 'the slightest hint of criticism at all regarding immigration levels; add an argued case study and you really are in the gutter), add a dash of stock kitten-smear and the whackjob is done. You're on the high ground and the pathetic masses are on the low.

Fede_V · on June 23, 2015

If you'd like, I can dig up several instances of the daily mail being terrible (specifically, their reporting about 'soandso causes cancer', their use of single outrageous instances to infer non-existing trends, the horrendous way they treat their employees, their casual sexism, etc etc..) but that seems out of place in an article which is mostly concerned about machine learning and NLP.

I fully stand by the statement that the Daily Mail is a reactionary piece of trash that appeals to the lowest and basest sentiments of its readers though.

batou · on June 23, 2015

It's far more worrying than that. I worked for a pollster and it was obvious the DM was able to set an agenda and influence people based on the statistics collected.

It's a heavily weaponised tabloid capable of swinging a political victory here and there.

The parent poster is merely concerned about the surface area, not the consequences. If it is just words then its laughable but it represents influence and that is beyond dangerous.

dghf · on June 23, 2015

> This year, the Mail reported that disabled people are exempt from the bedroom tax; that asylum-seekers had “targeted” Scotland; that disabled babies were being euthanised under the Liverpool Care Pathway; that a Kenyan asylum-seeker had committed murders in his home country; that 878,000 recipients of Employment Support Allowance had stopped claiming “rather than face a fresh medical”; that a Portsmouth primary school had denied pupils water on the hottest day of the year because it was Ramadan; that wolves would soon return to Britain; that nearly half the electricity produced by windfarms was discarded. All these reports were false.

http://www.newstatesman.com/media/2013/12/man-who-hates-libe...

Otik · on June 23, 2015

I would love to see work as described in the article leading to the ability to analyse and quantify the output of journalistic outlets in terms of political bias, accuracy and depth of reporting.

If we had a tool like this, maybe we'd be able to move away from some of the emotive language that we use when talking about outlets like the Mail.

EliRivers · on June 23, 2015

Do you read it primarily online?

The DM people are quite smart. Their target demographics for the print version and the online version are very, very different. The online pitch is trashy gossip for women in the 20-40 age group. The print pitch is different.