Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Spammers started to hit GitHub? (github.com/quartzjer)
44 points by bencevans on Nov 17, 2012 | hide | past | favorite | 48 comments


The amount of time and energy and cpu cycles wasted on spammers is truly a crime against humanity.

I still do not understand how it makes them money, I think it's just an endless chain of people falsely thinking others are successful with it so they try to do it too and the cycle continues.


>I still do not understand how it makes them money

When I saw this post the page had 404d. However, the page was still active in the Google cache[1]

It is not just Github being targeted, a lot of other large websites with user generated content are also being spammed with this content[2][3]

The spammers are linking the content to blogspot blogs so that they can: 1. hide the referrer from their affiliate program (to prevent getting banned for spamming) 2. to utilize the temp increase in search engine rankings 3. save money on domains for one-off usage

Since the "live stream" is just a one-off game that will only be popular/trending for 1-3 days in which period the spammers will use a lot of macro scripts/bots to spam these websites. Believe it or not, even a "nofollow" link can give you an advantage in search rankings. They spam them to death in the "popular period" and then bank money from the affiliate program[4]

[1]http://webcache.googleusercontent.com/search?q=cache%3Ahttps... [2]http://shine.yahoo.com/author-blog-posts/watch-38-enjoy-flor... [3]http://webcache.googleusercontent.com/search?q=cache%3Awww.f... [4]http://www.officialtvstream.com.es/passport/signup.php?price...


The spammers are linking the content to blogspot blogs so that they can: 1. hide the referrer from their affiliate program (to prevent getting banned for spamming) 2. to utilize the temp increase in search engine rankings 3. save money on domains for one-off usage

Actually, in most cases, these reasons have little to do with it. For the most part, the pages are created on public sites because the links are spammed on social networks, and those networks quickly block postings with the same links over and over again. So, if you can create hundreds of different URLs with the same landing page, you can have hundreds of times the spam posts go through and not be blocked. Most of these guys don't bother with private domains because social networks will quickly issue a blanket block to all posts containing links to untrusted domains that have been reported to them for spam, but they will never issue a blanket block for a site like Github or Facebook. So the links last longer.

As to the specifics:

1) 1. hide the referrer from their affiliate program (to prevent getting banned for spamming). There are much easier and reliable ways to not only hide, but entirely change, the referer. See http://www.contentgeneration.org/cpa-redirector-2/

2)to utilize the temp increase in search engine rankings These sites aren't meant for that. Whether they link from the landing page straight to the affiliate link, or they go to a lander that is used to change the referer, they don't care about the search rankings of the links.


They spam them to death in the "popular period" and then bank money from the affiliate program

If you had to speculate: How much money?

(I know, it all depends, etc. But I was just hoping to get a rough idea of the order of magnitude here.)


Full disclosure: I frequent private "blackhat" forums and although I do not participate in these activities, I blog about them anonymously. Furthermore, I know a few people (online) who do stuff similar to this with CPA content lockers, so I'll try and give you an educated guess. In general the average conversion rate in the TV show/live content is around 0.5-3% depending on the quality of traffic.

Now, as you can see from Google Trends[1] some of these get very popular and trend once pretty much every year. In the second comparison, I've compared the search terms with a term I know is definitely popular to get relative popularity of the terms [2]. Comparing these two terms I can get an idea of the amount of traffic these website would get in the time period. I would estimate that the "big spammers" would easily get around 250k-500k uniques from multiple sources (spamming, mass advertising, social media, botnets etc).

Assuming they get paid $2 every signup, and have a conversion rate of 1.5% with 300k uniques, it would bank them around $9k, of course I'm only talking about the big guys here who have done this for a long time.

In general, I would say that the following is correct:

Upper quartile average: $1-3k per day for a few days

Lower quartile average: $50-60 per day for a few days

Quick edit: It should also be noted that for these live games over 95% of the traffic will be from the USA hence the large profits.

---------

[1]http://www.google.com/trends/explore#q=Iowa%20vs%20Michigan%...

[2]http://www.google.com/trends/explore#q=Iowa%20vs%20Michigan%...


Since when was putting the selfish desires of a for-profit company ahead of your own well-being virtuous?

This behavior is inherent to human nature, and presumably to everyone here who calls themselves a hacker. That Google doesn't "like" what I do has no moral or legal weight. The same goes for sites that accept user-generated content.

Obviously not everybody who's a spammer is good at it or a true hacker, but I would venture a guess that most hackers are blackhat if they do SEO, simply because they see this as a system that can be exploited with controlled heuristic testing as apposed to some superfluous tools and a whack community a la seoMoz.org

And it's not a zero-sum game. Google is forced to improve their natural language processing capabilities, their ability to execute more computationally expensive processing on massive amounts of content, so on and so forth, to try and keep up and provide a legitimate product.

So back to the spamming; Any site that allows you to put a link on it is a target for seo, of course. Maximizing c-class IP diversity among links is important. But more importantly, these pages are usually Tier-1 to Tier-2 in link schemes as it's effective to point the low level forum spam at user-content generated sites with strong domain authority and funnel the juice to your primary site. They're basically just buffers.

How long does it last? Varies greatly on how you're promoting it, the niche its in, etc. It doesn't always have to be "pump and dump" - I have multiple blogs over a year old that are alive and producing $1,200-$2,000 month each with almost no effort. When you can create just one of those blogs with about 10 hours worth of work (in total), there's obviously money to be made. And if you can write all of your own software to automate it, you can pretty much just "print money."


I make a hundred bucks a month from amazon affiliate clicks, and I'm lazy.


Some of my friends make over $200,000 per month with spamming.


Maybe not if you're living in an apartment in San Francisco. However, if you're in an Eastern Bloc country or India, a few hundred USD per month can radically change your life. So even abysmal returns to me would be success to someone else. When you're poor and looking at the prosperity of the West, you don't care if you break the Internet.


They do make money out of it, not that much but they do. Most of the time, it's so small that we don't have a clue on why they do it. For sure there must be some sort of gold diggers trail spawning vocations, but they wouldn't do it if it wasn't profitable in some way.

The key idea is where do you live: if I'm living in a poor country where average salaries can vary from $50 to $400 per month, imagine how much I can make by send viagra spam by just sending emails to doing hard physical job ? For us it's not worth even doing it as a hobby. For some people, it's damn profitable, just take the case of Nigerian scammer, convincing someone to send him $500. He's doing that every day, if he can scam one "wealthy" person to do it once per month, it's damn profitable. Yeah, this isn't really a shiny way to make profit, but you may understand the motivation behind it.


> I still do not understand how it makes them money,

It makes money from a pure economies of scale situation. When your marginal cost per "ad" is essentially zero, you can ship out a sufficiently large number of such ads that even the smallest percentage of recognition by the targets results in thousands or millions of return inquires/clicks/etc. I.e., if it cost $10 for 1 billion views, and only 0.0001% respond, that is 1000 responses. Effectively only costing one cent per response. If you make ten cents per response, you have made $100 on your $10 investment.

Couple that with the fact that a computer did the "work" and it is an easy money machine.


We're aware of the problem. Just like any service like ours, we see a fair amount of spam repos, issues/comments, and, of course, Gists. We already expend a good bit of energy on handling it as it is, but we're always working on new ways to handle it. :)


It'd be nice if gists had a link to report them as spam. Also, links in gists should be nofollow.


This is nothing new. If you watch the [new gist feed][1], for instance, you'll see plenty of it roll by.

[1]: https://gist.github.com/gists


What's especially interesting is that gists are allowed by github's robot.txt and none of the links are nofollow. That means every gist spammers create with links are helping their ranking.


All large sites get hammered by automatic/semi-automatic spam, and occasionally some leaks through. Is this surprising?


All sites get hammered by automatic spam. If it gets indexed by google, or if anybody links to it, the spam bots will find it eventually.

It is not practical to operate any kind of website that allows users to post things without some form of spam protection. For small sites, email verification or text classification will do the job by itself. Traditional captchas are fairly ineffective, but written questions like "what color rhymes with true?" seem to work pretty well for smaller sites.

Bigger sites dealing with a larger volume of traffic almost always require regular human intervention, curated IP block lists, stealth banning and the like.


Reported the blogspot link as a spam blog. Not sure about Google's response time on these things.


I don't understand spammers that target tech sites - especially the ones that add nofollows, techies know spam and don't click the shit. Waste of time, effort and money for the spamlords to be honest


As others have said, nofollows do give you help in search results, although not as much as otherwise. They're looking to rise in search results more than for legitimate clicks.


404 error.


Every time I mention this on HN, at least 15 people upvote it - which means they didn't know about it. So I feel obliged to repeat it again and again (as I usually use this method a few times every week and it's tremendously useful for me):

If you want to get Google's cached version of a webpage, just type

    cache:[url]
    e.g.: cache:https://github.com/quartzjer/TeleHash/issues/5
in the search bar and press return.


*in Chrome's search bar.

I know, it's a petty technicality. Safari says it can't open the specified address, and Firefox doesn't understand the URL. I'm even more certain IE will explode if I was to try, but I can't at the moment. From what I remember, IE can't even parse a raw IP address without explicitly putting http:// in front of it.


You're right. But you can do a `cache:http://...` in google.com's search bar, which is how I do it. And unless I'm mistaken, Chrome's built-in search bar is called OmniBar, so my wording was (incidentally) correct! ;-)



This must have been a problem for some time, considering how popular GitHub is. There is no way around spam when running a popular service where the users can create their own content.


What about old-good CAPTCHA? There are many ways, but all are annoying for normal users.


I used to quite like the idea of CAPTCHA, but in the past year (or couple of years), reCAPTCHA (easily the most popular CAPTCHA generator) has become sooooo bad that I involuntarily shudder even thinking about CAPTCHAs...

Thanks God for http://bugmenot.com!


Even the best captcha methods can be circumvented by farming them out to poorer countries for only pennies.


There's an even cheaper way - host a porn gallery, and require a captcha to access it. Present the captcha you're trying to solve to the horny user.


Hell, you could farm it out to minimum wage workers here in the US. I've seen it done, although I think it would only really work on a large scale if you have a constant stream of captchas.


You can purchase 1000 captcha solves for $1.39 last time I checked.

CAPTCHA only security is fantastic for spammers.


Yup; $1.39 per 1000 on Deathbycaptcha.com It's cheaper if you buy ~1,000,000+ in bulk.

Or you can buy OCR software that plugs in to your bots; e.g. captchasniper.com

Captcha'd targets are usually higher-quality and more valuable as there is some economic cost of posting to it.


Captchas just advertise that a service has a spam problem.


using CAPTCHAs assumes the majority of people are spammers rather than real users. it's an alienating user experience and one that i hope goes away.


I noticed some spammers a couple of weeks ago, they occasionally create repositories too.

I'm not sure if there is a way to report them, there is nothing no the GH contact form.


No.. they have "contact" page and they are very quick and responsive.

https://github.com/contact


I'm quite confident a story on top of HN is a good way to reach them :D


I suppose it is... it's not like it's some security hole that everyone is going to exploit. I'm sure they're already working on spam filtering and reporting.


Proper etiquette would be to let the github people know for a few days before you alert the whole world.

Have you notified them before posting this?


This isn't a security flaw, it's just mundane crap that every web app goes through. They'll delete this today and tweak their filtering, and tomorrow the spammer will try again (or in twenty minutes, who am I kidding?).

I'm not sure why you say prior notification is proper etiquette in this circumstance. It's equivalent to pointing out a grammar mistake in a blog post. Not particularly interesting, but not malicious.


Thanks for all the downvotes, guys!

I'm not talking about "responsible disclosure" and I'm not sure why people assumed I do.

If I see that a person has their fly open, I'd go and discretely alert them of that. Similarly, if someone has a note saying "I am stupid" taped to their back (which I think is a good analogy to what the spam on github is), I would do the same.

I wouldn't go shouting "hey, github has an 'I am stupid' sign taped on their back" in town square, which is what posting it on HN amounts to.


Why? It's not like they're disclosing something sensitive, like a security vulnerability.


I don't see any reason too I'm afraid. If it was a security vunerbillity I would have, as I have done before ... https://help.github.com/articles/responsible-disclosure-of-s... (Ben Evans - @bencevans)


Looks like git needs the equivalent of a downvote: Something like % git nuke https://github.com/quartzjer


Even if your command made sense, quartzjer is not responsible; some spammer just added a spam issue to a legit project.


git != github


git = communication with github, but more importantly, many of the github clones are going to have the same problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: