A large chunk of the Fediverse was scraped; your posts are being “released”
Matteo Zignani, Christian Quadri, Alessia Galdeman, Sabrina Gaito andGian Paolo Rossi from the University of #Milan scraped all english speaking instances (363) listed on instances.social, wrote a paper about it and are distributing the dataset.
“In this context, we release a dataset gathered from Mastodon, […]“
“These data have been collected by implementing an ad-hoc tool for downloading the public timelines of the servers, namely instances, that form the Mastodon platform, along with the meta-data associated to them.“
“The spider exploits the instance list obtained from the previous step and makes a pool of requests to the instance endpoint 4 which returns the latest toots of the local timeline. Since the time-lines implement a pagination mechanism, the spider extracts the URL for the next request and repeat this procedure till it reaches the end of the timeline.”
@tastytea This is a contentious issue.
EFF to Court: Accessing Publicly Available Information on the Internet Is Not a Crime
Ninth Circuit Doubles Down: Violating a Website’s Terms of Service Is Not a Crime
Dear Canada: Accessing Publicly Available Information on the Internet Is Not a Crime
‘Scraping’ Is Just Automated Access, and Everyone Does It
@tastytea Fact remains that what's publicly accessible will all but certainly be accessed. And much of Mastodon postings *is* public.
I'm not addressing the moral, ethical, or research dimensions. And I honestly don't know where GDPR falls on this. But access to public (and private) content can be highly socially beneficial, see #PanamaPapers and #Whistleblowing generally.
It's also been a huge part of the work *against* resurgent fascism and media manipulation, #KateStarbird and others.
@xenophora An honestly good question.
"Intent" would be a large part of that. Posting publicly (or unlisted) *provides access to anyone to read*. Posting "followers only" (a pretty bad design option, honestly, given that *anyone* can follow an account, unless locked), or DM, would be a *strong signal* of intent that "this is not public".
A huge problem is that people really don't get this or think like this, at least in many cases. Which argues against the rule, not the people.
@dredmorbius @xenophora @tastytea IMO the default should be self destructing posts unless explicitly marked as permanent. i feel this would closer match how people seem to expect social posting to work. as close as tou can get, anyway, since people’s mental models of how this work are highly self contradictory.
@zensaiyuki And *really smart people* have been *absolutely wrong* about the potential negative use of infotech. Herbert Simon very pointedly, claiming the Nazis operated without "mechanized data processing":
But they didn't: https://ibmandtheholocaust.com/
Everyone is welcome as long as you follow our code of conduct! Thank you. Mastodon.cloud is maintained by Sujitech, LLC.