Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The database is being reverse engineered and published anyways, as per the article.


I think Archive is just rehydrating shortened links in webpages that have been archived. I doubt They’re discovering previously unknown urls.


No they really are trying to enumerate all 230 billion possible shortlinks; that’s why they need so many people to help crawl everything.


Got a source? I don’t see details one way or another


From the article:

> there are about 230 billion* links that need visiting

> * Thanks to arkiver on the Archive Team IRC for correcting this number.

Also when running the Warrior project you could see it iterating through the range. I don't have any logs handy since the project is finished but they looked a bit like

  https://goo.gl/gEdpoS: 404 Not Found
  https://goo.gl/gEdpoT: 404 Not Found
  https://goo.gl/gEdpoU: 302 Found -> https://...
  https://goo.gl/gEdpoV: 404 Not Found




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: