@tastytea This is a contentious issue.
EFF to Court: Accessing Publicly Available Information on the Internet Is Not a Crime
Ninth Circuit Doubles Down: Violating a Website’s Terms of Service Is Not a Crime
Dear Canada: Accessing Publicly Available Information on the Internet Is Not a Crime
‘Scraping’ Is Just Automated Access, and Everyone Does It
@tastytea Fact remains that what's publicly accessible will all but certainly be accessed. And much of Mastodon postings *is* public.
I'm not addressing the moral, ethical, or research dimensions. And I honestly don't know where GDPR falls on this. But access to public (and private) content can be highly socially beneficial, see #PanamaPapers and #Whistleblowing generally.
It's also been a huge part of the work *against* resurgent fascism and media manipulation, #KateStarbird and others.
@xenophora An honestly good question.
"Intent" would be a large part of that. Posting publicly (or unlisted) *provides access to anyone to read*. Posting "followers only" (a pretty bad design option, honestly, given that *anyone* can follow an account, unless locked), or DM, would be a *strong signal* of intent that "this is not public".
A huge problem is that people really don't get this or think like this, at least in many cases. Which argues against the rule, not the people.
@xenophora FWIW, I don't think that the argument "you've got no expectation of privacy for what happens in public" is in the least bit valid. Evidenced in large part by *published documentation* by intelligence agencies to *seek* privacy *by going to public spaces* (parks, restaurants, museums, etc.).
I also see *disaggregated public information* as hugely different from *aggregated public information*, due to the vastly lower search costs.
Which means specific details matter.
@xenophora Simple hashing of user identifiers is *not* sufficient for privacy guarantees, in virtually all instances, but especially where the search space is relatively few. Across a few hundred Mastodon instances, the active public profiles number in the 1,000s - 10,000s, I suspect, which is a *hugely* tractable search space, no matter how fine your hash.
Rehasing over time, say, every day or week, to allow tracking threads, but not personal activity over time ... maybe.
@xenophora Another option might be to disaggregate content into small-n tuples, at the word level, so phrases of 2-4 words (analysis rapidly explodes above, or even at, this level), and come up with a *per conversation* or *per thread* identifier, rather than persistant individual identifiers over time.
This preserves much ability for meaningful analysis *and* a lot of privacy. I'm not sure that it's sufficient.
@xenophora Keep in mind too that resistance to scraping tends to come from two specific constituencies:
1. The powerful or anti-social, who seek the shield of being able to comment or post content with impunity and protection from copying. I include the publishing industry, but also most establishment power.
2. The weak and oppressed, with genuine, or at least credible, fears of being attacked either now or in future based on comments.
Very different groups, similar concerns.
@xenophora Your instance expires posts?
I believe it's images that expire after a year or so, though the posts they're attached to stay on.
@dredmorbius @xenophora @tastytea IMO the default should be self destructing posts unless explicitly marked as permanent. i feel this would closer match how people seem to expect social posting to work. as close as tou can get, anyway, since people’s mental models of how this work are highly self contradictory.
@zensaiyuki Yes, this.
I think we're going to have a Come to #Savior moment on this eventually, based on ... a whole mess of stuff. But basically the non-tenability of saving Everything, All the Time.
Digital Information Archival reminds me a hell of a lot of Borjes' Map. At some point, 1:1 correspondence (or worse, n:1) of record-to-reality loses utility.
My #BOTI concept might be an approach: nuke most things, save the best + some sample ("best" is time-contextual).
@zensaiyuki I've toyed with the idea of phasing in and out identities over time.
Might be a "lives for a year, sleeps for a year, is destroyed", or a longer or shorter interval.
Retention periods and policies have Been A Thing in business for decades, mostly paralleling the adoption of computer tech, and the notion of (legal) document discovery:
@dredmorbius @xenophora @tastytea as valuable as old posts are, and how annoying it is to see an awesome old post or thread disappear, that should be weighed against the potential harm of old postings being discovered from years ago and pulled out of context to ruin people’s lives. it’s a nice weapon against those in power… sometimes. but it’s much more potent against vulnerable people.
@zensaiyuki And *really smart people* have been *absolutely wrong* about the potential negative use of infotech. Herbert Simon very pointedly, claiming the Nazis operated without "mechanized data processing":
But they didn't: https://ibmandtheholocaust.com/
@zensaiyuki @dredmorbius @xenophora @tastytea This isn't a bad idea. I'd have them decay to from public to private between convo participants. In a way social media has made it so that someone can listen to a conversation you had in a bar 3 years ago and either jump in out of nowhere or pull things out to discredit you in some way. Like at the bar, I don't necessarily mind if you jump in there at bar.... Granted, I understand that what I'm saying online is permanent. But as a paradigm, decay 👍
@ejnunya If you're interested in the notions of the effects and capabilities of different media, I highly recommend Marshall Poe's "History of Communications":
Forgetting is really a thing endemic to human memory, it can only work on computers by a mild convention and requires a lot of trust.
The closest thing to forgetting for computers is drowning data in oceans of other data in hopes that indexing bots have limited cataloguing capacity.
@zensaiyuki @dredmorbius @xenophora @tastytea that's a bit absolutist :-) Why not just accept the fact that all your public posts could potentially be linked back to you? For most people it's not a bad thing. What we do need in society is to get rid of stigma associated with it. If someone said a horrible phrade 10 years ago or posted a racy nude, it shouldn't be a reason to ostracize a person and disqualify them from life.
@isagalaev @dredmorbius @xenophora @tastytea i will concede that it isn’t a perfect solution. but “won’t work” requires a very narrow definition of “work”. I think though, it *would* work, in the sense that it would be an improvement over how things work now, and reduce opportunities for abuse. that it doesn’t completely eliminate abuse doesn’t strike me as a strong argument against it.
@zensaiyuki I'd shift the notion of privileged group to _whatever_ group enjoys protection, lattitude, impunity, and immunity. Often but not always cis white males.
More generally: the privileged oligarchy, whatever it may be, as well as various populist supporters, though those may find themselves discarded on little notice.
concerns ethnic prejudice
@zensaiyuki Germans, Austrians, Papists, Irish, Italians, Jews, Greeks, Spanish, Poles, Bosnians, Kroats, etc., have all been included among discrinated populations.
Today there are shades of that among the "privileged" and non-privileged Asian populations -- generally Japan & Korea now being considered "model", Chinese approaching or attaining that. All were previously hugely discriminated against in the West. ...
concerns ethnic prejudice
@isagalaev "Stigma" is a canard. There is a very real, and unquantifiable, risk.
One element of the #MeToo movement has been a tremendous shift in the actionability of records (and memories) thought old and dead.
Eric Schmidt's famous "Google's policy is to get right up to the creepy line" quip fails to recognise that *the creepy line is not fixed*. What's considered acceptable or unacceptable changes dramatically with time. We're in the midst of a shift.
@isagalaev What makes #MeToo unusual is that instead of disempowered and minority populations or cultures being targeted, it's *specifically* power and its abusers. Not unheard of in history, but also not the usual dynamic.
Skin colour, politics, sexual orientation, past relationships, thoughts, writings, etc., have all proved fatal for some.
Antoine Lavoisier comes to mind.
Witch hunts, purges, pogroms, genocides, reeducation, and other realignments.
@dredmorbius @zensaiyuki @xenophora @tastytea "What's considered acceptable or unacceptable changes dramatically with time." — this is tremendously important. Whenever something "creepy" floats from times long past it's worth to consider that it might have been downright normal back then. I think this shift you mention should include acceptance of this fact as well. (I'm still not sure I'm communicating my mind well.)
Triggers: pretty much all of them.
@isagalaev This cuts both ways.
Things NOW considered creepy (or worse) were once (sometimes still) entirely normal, including but not limited to: human sacrifice, pederasty, child marriage, miscegination, drugs, arranged marriage rape, piracy, genital mutilation, bigamy, consaguinity, infanticide, genocide, slavery, debt peonage, serfdom, dueling, bloodletting, trial-by-combat, forced confessions / torture, etc., etc., etc.
@isagalaev It's possible to mark records for deletion-on-restore, and offline-backup is different from online/nearline access. Though it's fungible with that.
A backups-retention policy where media *are* aged out over time is another option, though that Requires Procedures and Adherance To Them.
Transmission, retention, and destruction of data all rely strongly on trust.
Generalistic and moderated instance.