Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google+ cannot be used with customer or brand accounts anymore (plus.google.com)
92 points by anticensor on April 2, 2019 | hide | past | favorite | 60 comments


ArchiveTeam did manage to scrape ~98.6% of the user profiles before it went down. It was around 16 hours from completion when Google pulled the plug.

It was done with the distributed scraper 'Warrior' using a massive amount of small cloud instances to spread the load to 1000s of IP adresses. The dataset is ~1.45 PB.

http://tracker.archiveteam.org/googleplus/


I really wonder what is so valuable about saving everything on the Internet. For all of human history until now, every little human interaction was fleeting and only meant something to the people involved. Things that were significant were preserved when the people involved decided they were important. Now we are trying to keep every one of those insignificant things, for what purpose? Training an AI is all I can think of.


> Now we are trying to keep every one of those insignificant things, for what purpose?

What seems unimportant and insignificant to us now might become very interesting in the future. People in the future might consider the arrival of the internet as the beginning of a new age. Furthermore, some things may seem obvious to a person born in our times, but might not be obvious to people born in the future.

So many works, including even movies from the 20th century are lost now. E.g. the original version of Metropolis used during its premiere screening. Diogenes wasn't just a guy living in the town square, he also created some written works. They are lost now.


I had room fulls of boxes of things that I thought might one day be important, or significant to someone. Then I realized, when I die, all those things are just going to be thrown away or donated to a thrift store, and it's highly unlikely that anyone who buys any of that will find the same value in it that I thought it had. If I have created anything truly timeless and worthwhile, then it had better be something I created for a client and got paid for making, otherwise it's just going to probably end up in a garbage heap. So I made a decision to get rid of all of it. Time is too short to be held down by the (truly) endless possibilities of "what if" this or that thing ends up being useful to someone in the future.


Yeah, keeping physical things around does only make sense if it is actually valuable to someone. Archives only accept objects actually worthwhile of keeping. But it's different for digital content, as it's so much easier to store. At least for now.

Also, Archaeologists love the garbage dumps and cesspits of old towns, literally the places where people put their least valuable things, because they weren't raided by earlier visitors and they can derive so much info about it. And it's nicely stratified so it gives some rough chronology.


Another way to look at this is that the more you store, the more difficult it is for people to actually find the valuable parts of what you've stored. One of the fascinating aspects of the internet is that our internet lives diverge so thoroughly from our offline lives - so the data we're leaving for the future is arguably horribly unrepresentative.


> difficult it is for people to actually find the valuable parts

That's why you need to catalog stuff.

And if you have stored something, you can always get rid of it if you deem it to be unimportant, but if you haven't stored it, most times you can't get it back. Erring on the side of storing unimportant things is an important strategy to cope with that.


The thing with internet content is that it's indexable and it's possible for every person interested to check it out. If you have boxes stored in some shed - it's much less so.


I'm less worried about the curiosity of people decades or centuries in the future than I am in the privacy interests of former Google+ users right now. Many of them chose that social network specifically because it was supposed to offer them better privacy protections. Which isn't a litmus test for whether or not they'd be OK with having their profile scraped and archived, but it is at least suggestive.

It would be polite if ArchiveTeam were to now contact everyone whose profile they have scraped, and ask for permission to retain that data. And then delete all the profiles for which they didn't get affirmative consent.


The content IA are archiving was public and publicly accessible.


Well, for one, ArchiveTeam is unaffiliated with Internet Archive. Judging by the website, it's more closely associated with 4chan or Encyclopedia Dramatica or something.

For two, being public and publicly accessible doesn't mean it isn't gauche to scrape it. It's kind of like how nobody sticks a sign that says "take one only" over the bowl of mints at a restaurant; it's assumed you just know that it's not cool to stick a whole handful of them into your pocket.


AT are not affiliated with IA, but work closely with them, and the data AT collects is transferred to IA. Therefore, my characterisation of the data as "archived by the Internet Archive" is substantively accurate.

IA aren't scraping the data themselves, but they're the customer.

As for your second point: there's merit to that argument, and I've discussed same previously -- I'm very much of mixed minds on this. A few considerations weigh strongly, however.

0. The data were already public, as noted.

1. The system shutdown was not a known factor when most of the data were created. The expectation at that time was that the data would continue to exist.

2. The shutdown itself has occurred in a context in which individuals, and far more importantly groups quite literally could not archive the relevant data themselves. Google's own Data Takeout, whilst fairly remarkable (in a positive sense) within the industry makes many things difficult or impossible. Ordinary users cannot archive Community content, and even Owner and Moderator roles within communities could only archive posts from public communities -- neither comments nor private communities were archiveable. (Third-party tools could provide these capabilties). Moreover, technically, cost, bandwidth, or storage-constrained users or communities largely had no viable options for saving their own legacies.

3. The contents sitting on Google+, indexed and searchable by content on both the site and via the public Web, were far more visible than they will be at the Internet Archive, which does not support full-text search of its archives (at least not yet), and which is not as effectively indexed publicly as Google+ was.

4. The Internet Archive does provide for content removal under the DMCA, as well as other mechanisms. For a G+ user, given how content URLs are constructed (they all include the user's G+ UUID as a common element), requesting removal of an entire tree is trivial.

On balance, this favours the Archive.


I can't feel sorry for someone who goes to Google for better privacy protections. I mean, really?


It's ok to lose stuff. If we save everything, there will be too much stuff, only a fraction will ever be used. I'm not convinced of the value. Perhaps people should do their filtering, saving.


It's always surprising to see how much relatively recent stuff is lost on https://www.lostmediawiki.com/


> What seems unimportant and insignificant to us now might become very interesting in the future.

Often this argument is used to try to justify things as important as mass-surveillance and as small as logging in a software project. “Just collect and retain everything, who knows what we’ll actually need!” With GDPR and increasing focus on massive & intrusive data collection, I think this mindset is going to have to change. Before deciding to preserve or collect information (particularly information of a personal nature, like social media accounts), you should be prepared to justify the activity. “It might be useful one day, maybe” shouldn’t be good enough.


Content that is created is valuable to someone, somewhere.

This is probably my favorite Google Plus post:

https://webcache.googleusercontent.com/search?q=cache:9FEWJo...


Great post. Thanks!


More often than not, what today sounds insignificant can become significant tomorrow - if not for everyone, then at least for some.

Personally i have dug a lot on old websites stored at the web archive trying to find information and files (especially patches, older software and amateur games that few knew about) seemingly lost. For me the Archive (not just the web archive but all the archive.org projects) is as important as Wikipedia (and donate to both).


Archaeologically, I think people tend to be interested in a lot of things about society that don't get well-preserved. Social media offers potentially a time capsule to the lives of ordinary citizens today in the distant future.

That being said, I intend to ask the Internet Archive to remove my personal G+ profile.


For a lot of people and groups, the Internet Archive is their only hope at preserving at least some of their Google+ content. I've been working with many of them over the past six months.

Google+ had millions of active users of a wide range of technical skills, but by far the majority at the lower end of the scale. Even the ones with technical chops were often limited by budget, network bandwidth, costs, or reliability, or other factors, in what they could do.

Google+'s promise was to host text, image, and video content with Google's deserved reputation for high reliablity. The company itself is not, as many other failed social and online media services were, going out of business. It's simply decided to exit this particular activity.

The first realisation I had of the problem Google+ Communities faced was when someone commented on the subreddit I'd created about the G+ shutdown that they were wondering how they were going to move 400,000 users and content to a new home.

I didn't even know how many communities G+ had. The online information I'd found said about 5 million, but on checking it was actually 7.9 million as of late November, 2018, and over 8.1 million by January 2019. Through Loysoft (Friends+Me), I got a summary dump of all 8.1 million communities and some overview characteristics (members, posts and posting dates), and could finally start getting a handle on how many significant communities there were. That resulted in an extract of about 100,000 communities of 100+ members and posting activity within the previous 30 days, which was used for outreach and migrations. I made that freely available to anyone looking to assist in migrations.

But a lot of communities were missed, Google itself shut down the G+ Communities serving Community owners and moderators, with no notice, making outreach all but impossible.

Is there a lot of crap data out there? Yes, there is.

But there are also some gems, and the task of sorting between the two in advance of archiving it is more effort than simply archiving everything and making it available. And the Internet Archive has set itself the mission of total archival, where possible. So there's that.

The questions are ones we're discussing though at the PlexodusReddit: https://old.reddit.com/r/plexodus


The other side of this is for that the past centuries things that were on _paper_ (letters, photographs, newsletters, zines, personal notes/journals, whatever) could be just left sitting around, and _some_ portion of them would still be there and legible decades+ later for historical purposes.

When it's digital... if you don't keep feeding it, it's gone forever.

There has been some coverage of historians and archivists concerned about this, here's just the first random thing i found googling:

https://abcnews.go.com/Technology/digital-era-end-history/st...

But yes, professional archivists increasingly recognize you can't save _everything_ digital, we couldn't afford it. Figuring out what to save is a challenge. But if we save nothing, we're not gonna have much history to look at.


Three additional reasons:

We don't necessarily know what's "important" in advance.

Archiving this content can be a nice public service for people who put content on the internet, don't do a good enough job preserving it themselves, and then want to access it later. I've heard "check the wayback machine" as a solution on HN for people whose blog got taken down unexpectedly from some provider and they didn't have a back up solution.

Perhaps the same as the first reason, but some of this content can turn out to be useful as evidence. It's not uncommon for public figures to take down unflattering messages they posted on social media platforms and then deny having ever made them.


Before, people knew those were fleeting, and so they knew they had to record them if they wished to keep them.

Current platforms give an idea of permanence, so people stopped worrying about preserving what they care about, and when one closes both are lost.


If you look at the way most people use social media, they absolutely have no concept that they provide permanence. It’s exactly the opposite.

People saving images solely to Flickr or videos on YouTube is a different usage pattern than the ephemeral likes and comments they make on the social platforms.


Historically, information was more likely to be saved by accident. If something was published in print, that means that copies of it were distributed nationally or globally, found in hundreds of homes or libraries. If the publisher went out of business, people would still have physical publications lying around. And if something came to be of historical interest 50 years later, it could be preserved forever. If a web publication or community goes out of business, where's that data going to be 50 years later?


I wonder too. I was berated on here for mentioning that I deleted my reddit comments when I quit reddit, as if I was somehow stealing (my own content) from the users. Just weird IMO.


Well, as a Reddit user i dislike these "delete old comments" scripts since i very often read old discussions about topics i find interesting and when i hit the occasional "deleted by script foo, do it yourself too because reddit evil" i get irritated since they make following that old discussion hard.

It is your account of course, but i still dislike the practice since it goes counter to the entire purpose of a discussion site.


Several sites ingest and store Reddit comments forever. Scripts that overwrite old comments are ceremony for the user, nothing more.


> are ceremony for the user, nothing more.

Only a small subset of users would be interested in those sites, and they likely don't come up in searches nearly as much as the original reddit post, so there's much more value to the deletion scripts than you are giving credit for.


Ironic considering you comment here and HN does not allow editing or deleting comments after a two hour window to protect discussion integrity.


What do find ironic about it? It's two entirely different types of discourse, so it's an apples-to-oranges comparison.


A difference in the content of the two platforms doesn't mean that the comments on the two sites are so dissimilar that they can't be compared.

In particular, they remove the forum requirement of quoting the comment you're replying to, by allowing you to respond to someone's comment directly below it in the middle of an existing thread. It would be exactly as inconvenient to readers of old Reddit and HN threads if that context was deleted automatically 6 months after the fact, HN just doesn't let you do it.


> doesn't mean that the comments on the two sites are so dissimilar that they can't be compared.

Actually, it does. I discuss things of significance here, where the information may be useful to someone in the future, but I only commented on reddit for entertainment purposes (jokes, witty remarks, etc.) Also, creepy people don't comb through past HN comments looking to call you out in current conversations like they frequently do on reddit, and almost always out of context and spun to fit their agenda. While entertaining, it started getting annoying after a while, so in my opinion, that's why they can't have nice things, so to speak.


When I as younger I always used to love reading the silly things people archived.

It gave me a little glimpse of what college was going to be like and helped me deal with being so isolated.


A lot of what we previously know about societies comes from things like midden piles, literal trash heaps of discarded crap nobody wanted.


>Things that were significant were preserved when the people involved decided they were important

Sounds like you answered your own question.


Also linking events and destroying plausible deniability.


It's a kind of "dragnet conservation".


Crazy person trying to save everything on the Internet here (at least, a few sections of it.)

It was actually when the Trump administration basically gutted and removed the whole of the EPA's website that caused me to take notice. I realized that while my written notes from a decade ago were still around and usable, every digital note I'd made more than a year or two ago was gone.

Our knowledge culture has changed from memorizing facts to knowing how to get to facts. Information also no longer flows down from a "chosen few" who have the means to publish, but instead from everyone at just about all times.

Archiving is also no longer a horrifically complicated or expensive hobby. There's no reprinting books on archive paper or storing them in helium. Instead you can simply buy a stack of hard drives and start bulk storing data.

Since getting involved, I've come to realize that data goes offline frequently, with no warning - extensively due to bad copyright claims. I can't really fix the copyright process (at least, not very quickly), but I can help save the data and help make it available to others in meaningful formats.


I've been surprised that there hasn't been much pushback/questioning (from what I've seen) on this archive effort. Of course people voluntarily published the information here. But they also had at least a little control over how and what was presented to the world. Now they don't.

Is archive team going to respond if someone finds something in the archive they want to take down? Is there a listing somewhere of exactly what is archived (does it include pictures from google plus?)? I couldn't find it, and I don't want to go rooting around in the actual archive itself (if that is even possible).


I find it curious and saddening that this is becoming an expectation. Someone voluntarily publishes something for the world to see - they are certainly within their rights to issue a retraction, but they have long since relinquished control over others’ actions with the data they published. This is pretty fundamental to how the internet works. The fact that there is now an expectation of continued control over said data just shows how far companies like Facebook have gone toward fundamentally changing the nature of the internet.


The internet was created in a different age. The kinds of abuses people are worrying about right now were barely even possible when the Internet was created. Now they're quite feasible using relatively inexpensive and well-known technology, and people who are eager to share the knowledge of how to do it have created all sorts of MOOCs and bootcamps and even accredited degree programs on the subject.


Under many data privacy laws, data subjects have a legal "right to correct" any inaccurate personal data concerning him or her.

GDPR: https://gdpr-info.eu/art-16-gdpr/


Does the ArchiveTeam coordinate with services like Google+ for archival? Are the techniques used (such as spreading the load across 1000s of separate IPs) used for anything other than getting around Google's integrity services e.g. ratelimiting?

I'm really curious how such a large operation is legally pulled off, especially when services like Google+ intentionally try to make scraping difficult (and presumably for large offenders, will attempt a C&D).


Sometimes the operators of a service will contact the ArchiveTeam and work with them to speed up the archival process. The file hosting service pomf.se did this when they were shutting down.

https://www.archiveteam.org/index.php?title=Pomf.se#Archivin...


They use the ArchiveTeam Warrrior VM, so they have access to many public IPs: https://www.archiveteam.org/index.php?title=ArchiveTeam_Warr...


This is rather creepy if I'm honest - why are you scraping user's profile data? Is there a way I can request that you delete any of my data that may be in there?


Only what was public was scrapped. You can make a request to the Internet Archive to remove your content when they have ingested it and made it available.


I don't know if ArchiveTeam's bots can be logged in as Google Accounts, but I can still (for the time being at least!) browse around on a GSuite account, and an old grandfathered-in "Google Apps for Business" account


Archive Team are absolute heros. The Google+ Mass Migration community learned of them and their "googleminus" project in January, and worked to help give information on the crawl, amount of data, and particulars of G+.

arkiver and Fusl in particular have been absolutely amazing in what they've accomplished.

They also managed to pull in 94.5% of all Google+ Communities, which should provide the ability to view posts by Community (they're otherwise scattered among user posts). We're still assessing how much of that was login-page redirects in the last hour or two of the crawl, but it's amazing work.

I'd managed to send of about 80k larger, recently-active (100+ members, <30 day activity) over the past few weeks, with a final grab about 18 hours ago, using the Internet Archive's "save" URL.

If you ever need to use that it's:

    https://web.archive.org/save/<URL>
Where you replace "<URL>" with whatever it is you're trying to save, including the protocol string, say, this HN post:

    https://web.archive.org/save/https://news.ycombinator.com/item?id=19556665
That can be scripted, and my submissions used a bog-simple Bash script and xargs to plow through 100k submissions (20k appear to have been dead) in about 90 minutes, on very modest hardware.

Also: the Internet Archive (and Archive Team) run off volunteers and donations. You can help, and please do.

https://archive.org/donate/

https://www.archiveteam.org/index.php?title=Donate

(Not affiliated, but very grateful to them.)


And the shutdown seems to be fuzzy, because I can still post to Google+, +1 things, and presumably do other usual G+ things despite it telling me it's shutdown. Now I'm only occasionally reloading to watch my follower count drop as things get deleted.

I still feel the early variant of + with the original implementation of Circles (back in 2012) was one of the most useful takes on social sharing I've come across. And it worked especially well for photographers. Once they moved everybody to Collections I stopped using it with any regularity, and stuck mostly to groups/Communities. Hitching YouTube to it was basically the last straw. When I made a farewell post last month it had been about four years since my previous completely-Public post.

The platform had flaws, and I'm a bit sad to see it go, but I won't particularly miss it.

Cheers,


Though I suppose I've gotten accustomed to using double quotes to specify mandatory search terms, perhaps we can get back the `+foo` search syntax that was so unceremoniously taken from us and reappropriated for Google+?


My Google+ notifications were more active than they have been in months as everyone was checking in to see how much longer was left. Google didn't specify the time of the shutdown, just "sometime on April 2nd", so none of us were sure when the cutoff would be. Middle of the day, as it turns out.


Yes, the last day was probably the most active period of Google+.


The title should say: Google plus cannot be used.


G Suite and Google internal accounts still work.


I'm going to miss it. A lot.


"Looks like you've reached the end"


When Google first announced its plans to shut down G+, originally slated for August 2019, a few of us started looking at the question of helping people and communities (in both the technical "G+ Community", and the social "community of people" senses) keep intact.

For all the ribbing G+ gets, the problem is a big one. And it's not one that's specific to Google+. As the regular parade of shut-down announcements of services and firms on HN attests, online mediated services can be cancelled, often quite abruptly. And there's often very little notice.

The world of social media sites is likely to go through more shakeups, for various reasons (and there are a number of sites presently looking pretty shaky), while the options for alternative provisioning of similar services (and the question of whether what we now call "social media" really is a net positive or something people want, need, or even should use) either personally or at a more local scale (though through what institutions isn't entirely clear) is a possibility. Projects such as IndieWeb, the POSSE initiative ("post on (your own) server, syndicate elsewhere"), federated protocols, IPFS, DAT, Beaker browser, and more (I've been discovering a lot in the past six months) may break us out of the current proprietary silo model.

Or not. The technical landscape is confusing, technical skills are limited, and the risks of DiY hosting can be large. It's a difficult trade-off. Though it's one I'd like to explore.

There are huge changes that have and will be happening on the regulatory front, from privacy to copyright to liability to propaganda and disinformation, and far more. Some of these laws and regulation seem written with self-service in mind, many do not. That's a whole 'nother field.

(I've got a To-Do item to get ahold of the EFF on these questions, as well as other groups.)

And then there's the whole fact that the tech world is in the midst of a (very well deserved IMO) backlash for its cavalier attitudes abuses and outright harm inflicted on both individuals and society as a whole. The promise of the 1990s has not been delivered.

Back to the group: we looked at the problem of migrating, realised there were many different users and groups, with different interests, and a wide range of technical abilities, from top-tier Linux kernel hackers (Alan Cox) to none at all. Some are best served by commercial solutions, for now, but many can look at federated or self-service options. We put together FAQs and Wikis and discussion forums and gathered a lot of data (we seem to have the best information outside Google on the actual size and scope of G+ users, data, and communities), and more. All inside six months.

It's been a group effort, and a lot of people contributed. I need to dig through my G+ archives to find the thank yous I'd posted earlier today, but it's substantial, and that was only a partial list.

What I hope is that others can use and be helped by what we've done.

The wiki is https://social.antefriguserat.de and there's a subreddit at https://old.reddit.com/r/plexodus Both will continue to be active over coming months, we're only part-way through the process, and still need to establish ourselves in our new spaces.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: