The original Deepmind paper [1] is based on a really smart idea. Algorithmic development relies on measuring the performance of any proposed algorithm. For reading comprehension, performance is evaluated using Q&A about the corpus. Its difficult to find a large corpus with a comprehensive set of questions about the content.
Deepmind is cleverly converting the Daily Mail article summaries into questions by removing a proper noun. For example:
Question: Producer X will not press charges against Jeremy Clarkson, his lawyer says.
Answer: Oisin Tymon
They are using the Daily Mail corpus to develop their algorithm, and that's smart. They aren't relying on it as an important source of information. Maybe all you guys with the dismissive comments have a better idea?
I agree with you - it's a neat idea, though I haven't read the paper yet.
I wonder if it has explicit knowledge that they'll always remove a proper noun. Eventually the daily mail will write some article about a place causing or curing cancer, and the poor thing will break down trying to find X in X causes cancer from that corpus :)
On another note, I wonder if they could train it on Buzzfeed-esque content without changing the headlines. "You won't believe who Jeremy Clarkson punched!" -> the answer
The daily mail should not be a source of information at all, though. Any greater weighting than zero is too much. If it's only used as a test for the algorithm then I guess it can be OK, assuming the Mail's broken logic doesn't get in the way. It's the step above dream logic, really. Deepmind might have problems abstracting too much over the text because of this. Trying to combine information from several different articles would be basically impossible.
Anyway, how about using Wikipedia instead? I can only guess they're using the Mail because they think it's funny.
"I can only guess they're using the Mail because they think it's funny."
Ironically, given your self-professed superior tastes in reading material, Deepmind seems to have a better ability to read, comprehend, and answer questions about articles than you do.
I know why they did it, because its the lowest common denominator for celeb gossip.
However its shit for science, overtly paedophilic, and the only western news sit that seems to have a special section devoted to curating and promoting ISIS propaganda.
Why is it bad? because if you are looking for facts, the daily mail is a bad source.
If you are looking for natural language, its a good source, however its full of nuanced racism, sexism, classism & basically everything else thats wrong with britian.
Overtly paedophilic? It's been a while since I lived in the UK and I never read the Mail when I did but this particular accusation is new to me. I've heard most of the others levelled at it before but what's the story behind that one? If anything I'd have associated them with Brass Eye style "paedophiles under the bed" paranoia (though in light of revelations of the last few years maybe that wasn't so unjustified).
That was one of the great ironies of the Brass Eye special. One one page of the Daily Star a criticism of the episode. On the other page a sexualised picture of a child.
No, that's a fairly accurate description of the paper. I won't repeat the list, but if you take a look at it the hatred of women, minorities etc really shines through. And a bizarre fanatical interest in house prices and which social group is affecting them.
I was just responding to your suggestion that the GP should might have been sarcastic.
Anyway, this is the frontiers of large-scale natural language processing. Who knows what latent features it might pick up? I don't know about the techniques in use, but it's conceivable for a present or future system to encode the biases represented in its corpus.
Somewhat worrying that they are feeding DeepMind a diet of DailyMail articles!
Maybe this is why Skynet turned rogue. Reading daily trash about celebrities and body image dysmorphia inducing trash, is enough to make anyone go mad.
"The Daily Mail Oncological Ontology Project: a blog following the daily mail’s ongoing mission to divide all the inanimate objects in the world into those that cause or cure cancer."
More worryingly: Hey DeepMind, go cure cancer for us. Would it work to kill all men, women, childless people, parents and babies first or would it focus on destroying the sun?
I am sorry to hear you won't believe the SHOCKING markov chains used by HACKERS to train artificial intelligence READING ROBOTS. Tell me more about it.
Great, so we will have a racist and reactionary AI that thinks celebrity gossip and immigrant bashing are the most important things in the world.
For the non-AIs reading this thread, I highly reccomend https://addons.mozilla.org/en-US/firefox/addon/kitten-block/ a plugin that replaces every daily mail link with a random picture from kittens & tea, in case you mistakenly click on a daily mail link.
You read trash but I read light amusing gossip. The great thing about dissing the Mail is that you don't have to present any argument; you just toss in a few words and phrases like 'racist', 'trash' and 'immigrant-bashing' (defined as 'the slightest hint of criticism at all regarding immigration levels; add an argued case study and you really are in the gutter), add a dash of stock kitten-smear and the whackjob is done. You're on the high ground and the pathetic masses are on the low.
If you'd like, I can dig up several instances of the daily mail being terrible (specifically, their reporting about 'soandso causes cancer', their use of single outrageous instances to infer non-existing trends, the horrendous way they treat their employees, their casual sexism, etc etc..) but that seems out of place in an article which is mostly concerned about machine learning and NLP.
I fully stand by the statement that the Daily Mail is a reactionary piece of trash that appeals to the lowest and basest sentiments of its readers though.
It's far more worrying than that. I worked for a pollster and it was obvious the DM was able to set an agenda and influence people based on the statistics collected.
It's a heavily weaponised tabloid capable of swinging a political victory here and there.
The parent poster is merely concerned about the surface area, not the consequences. If it is just words then its laughable but it represents influence and that is beyond dangerous.
> This year, the Mail reported that disabled people are exempt from the bedroom tax; that asylum-seekers had “targeted” Scotland; that disabled babies were being euthanised under the Liverpool Care Pathway; that a Kenyan asylum-seeker had committed murders in his home country; that 878,000 recipients of Employment Support Allowance had stopped claiming “rather than face a fresh medical”; that a Portsmouth primary school had denied pupils water on the hottest day of the year because it was Ramadan; that wolves would soon return to Britain; that nearly half the electricity produced by windfarms was discarded. All these reports were false.
I would love to see work as described in the article leading to the ability to analyse and quantify the output of journalistic outlets in terms of political bias, accuracy and depth of reporting.
If we had a tool like this, maybe we'd be able to move away from some of the emotive language that we use when talking about outlets like the Mail.
The DM people are quite smart. Their target demographics for the print version and the online version are very, very different. The online pitch is trashy gossip for women in the 20-40 age group. The print pitch is different.
Deepmind is cleverly converting the Daily Mail article summaries into questions by removing a proper noun. For example:
Question: Producer X will not press charges against Jeremy Clarkson, his lawyer says.
Answer: Oisin Tymon
They are using the Daily Mail corpus to develop their algorithm, and that's smart. They aren't relying on it as an important source of information. Maybe all you guys with the dismissive comments have a better idea?
[1] http://arxiv.org/pdf/1506.03340.pdf
EDIT: Thanks Otik, reworded the opening sentance