Inverting the Web 

Follow

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:
news.ycombinator.com/item?id=2

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.

@dredmorbius This is not a fully fleshed out idea yet, but the "L" was the important bit. People generally don't care about the location of the content. They care about the content of the content, and other stuff about the content like the author, etc.

Just think about how people generally navigate the web these days. They don't type a URL into their addressbar or click a bookmark. They type a search query into their address bar, which will generally bring up Google results.

@dredmorbius So the question I want to answer is how do we enable that kind of navigation, or something similarly easy to understand, without giving a whole bunch of power to a single entity? How do we leverage people's existing trust networks, or existing reputable (generally topic-specific) databases to provide results with at least as good of quality as Google's?

@freakazoid @dredmorbius
The idea we had with xanadu is that, because links are part of an overlay instead of embedded, people would send each other packs of links between different documents, and you might subscribe to a themed feed of links the way you subscribe to an RSS feed or follow an account. It was supposed to be p2p but could be federated -- but doesn't work if overlay links don't work (i.e., if content is mutable or addresses are)

@enkiv2 Do you know / have you worked with Andrew Pam / Xanadu Australia?

I discovered his work w/ Xanadu in the closing months of G+.

@dredmorbius @enkiv2
Yeah. When I was working on xanadu he was maintaining the repos & shell access. I never worked closely with him, but I'm friendly enough with him & his wife.

@freakazoid @dredmorbius on the darknet, people are generally advised to use a TOFU model. Use the first link you find from a reputable source such as dark.fail, bookmark it, and use a different one only if it is cryptographically signed by whatever entity controls the resource you are using.

@zardoz Right. TOFU's also long been used in PGP/GPG, and is arguably more widespread than the Web of Trust.

A widely practices mis-assertion of a key is likely to result in a public disavowal ... eventually.

For someone with a particularly high threat function / risk calculus, that's not attractive. And for most casuals, it's yet another idea that can lead to bad practices / poor decisions which might later be regretted.

@freakazoid

@zardoz TOFU's prevalence shows though that even with strong crypto and good tools, validation mechanisms are largely informal.

@freakazoid

@freakazoid Sorry, what "L"?

I'm not seeing reference to this and am confused.

@dredmorbius Sorry, I mean the "L" in "URL". It's a uniform resource *locator*.

Google is trying to build the thing I'm talking about, only it will be designed to give them even more power than they already have by hiding URLs entirely, making it so that there's no chance at all to navigate the web successfully without them.

@freakazoid OK, yes.

And, old hat to you, but the idea was to "locate on the Internet, by server and path": espace.cern.ch/webservices-hel

... in a system literally designed by nuclear particle physicists.

Alternatively, L = I, "identifier".

Location == Identity.

Part of that remains valid. Part of it ... may not.

I've been kicking around the idea of a (local) document-oriented "filesystem" in which specifiers are effectively metadata descriptors or content-based keys. old.reddit.com/r/dredmorbius/c

@freakazoid Some more exploration of that (and more specific to the docs-oriented FS) here, in comments:

joindiaspora.com/posts/1702010

@dredmorbius Yeah, I've thought about similar approaches. Directories aren't required to be listable. Unordered bags of KV pairs don't map super well to hierarchical paths, but it's not like that matters very much. For most people a filesystem interface wouldn't matter anyway; they want a browser-ish application.

@freakazoid It ... depends.

There are times you want a _very specific_ resource.

It's not just _content_ that matters, but ownership, provenance, who can / did change / modify it, etc., etc.

There are times when "what colour is the sky?" can be answered by any of thousands of references.

The fact that _approximate, content-described results_ are _sometimes_ or even _often_ appropriate doesn't mean _always_.

@dredmorbius Indeed, but I'm not talking about getting rid of URLs, and for such things search engines end up just acting as a URL directory, since you will look until you see the URL you want.

@freakazoid A directory-path-based specification is saying "find this precise linked-list chain of directory specifications, with the implied properties of ownership, access permissions, modification history, provenance, etc., etc."

People looking for docs may allow slack. Software looking for libraries, somewhat less so.

And even humans looking for specific documentary authority may want a specific result.

@freakazoid The key for me is that _search is identity_, or at least _an identifier_, if _a search query_ returns _precisely one match_.

(Other options being "null" or "list".)

@dredmorbius I'm not sure I understand. It's possible for searches to return singleton results by accident. It seems like what you want is to distinguish between searchable metadata fields that uniquely identify resources and those that don't.

@freakazoid Right, that IS a problem, and a BIG one.

Possibly THE problem.

Q: Can documents be reasonably self-describing or self-identifying?

@dredmorbius I assume you mean *securely* self-describing?

Most distributed storage systems that try to defend against malicious nodes use exactly two types of keys, each self-certifying: content hash for immutable values and public key hash for mutable ones.

Beyond that you're into the realm of the subjective. My thinking here was to have signed triples ala RDF and use some kind of reputation system, i.e. web of trust, to decide which to trust.

@dredmorbius You could also provide a way to express *negative* triples as a way to try to correct errors or deliberate spam injected by others.

@freakazoid I'm not familiar with signed triples / negative triples. Sec....

@freakazoid I'd settle for "functionally" or "sufficiently".

I realise that any document under active attack (to change/misrepresent) would require more stringent methods.

But something that "usually works" would be a huge step forward.

@freakazoid Re: navigation.

1. Google are trying hard to kill off the URL.

2. There may be user-pattern based reasons to do just that.

3. URLs and DNS map ... poorly ... to meatspace notions of locality and identity. In large part due to the actions of websites, search engines, browser devs, SEO, and domain registrars.

4. A namespace with at _least_ a half-million entities and little sensible structure ... is far beyond human scale.

5. It's mostly reputation.

@dredmorbius I agree that killing off the URL is a worthy goal, which makes it a perfect weapon for Google to deal its final killing blow to the open Web.

As for scale, IIRC you can serve 90+% of web search requests with coverage of only about 5% of the space. Something like 99% Google results are served entirely from RAM. They don't even expect to serve useful results from their largest index; it exists primarily to give the impression of completeness.

@enkiv2 I know SEARX is: en.wikipedia.org/wiki/Searx

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick @enkiv2 @dredmorbius Not true; there are several decentralized routing systems out there. UIP, 6/4, Yggdrasil, Cjdns, I2P, and Tor hidden services to name just a few. Once you're no longer using names that are human-memorizable you can move to addresses that are public key hashes and thus self-certifying.

A system designed for content retrieval doesn't really need a way to refer to location at all. IPFS, for example, only needs content-based keys and signature-based keys.

@freakazoid @enkiv2 @dredmorbius I said _really_. None of those are human-readable (unlike IP). Non-human-readable systems miss the point of the WWW, web of trust stuff is awful and doesn't scale. Human readability in decentralized addressing is a solved problem (more or less) for addressing systems, but there's nothing good implementing the solution yet, so little point.

@kick I'm with you in advocating for human-readable systems. IPv4 is only very barely human-readable, almost entirely by techies. IPv6 simply isn't, nor are most other options.

Arguably DNS is reaching a non-human-readable status through TLD propogation.

Borrowing from some ideas I've been kicking around of search-as-identity (with ... possible additional elements to avoid spoof attacks), and the fact that HTTP's URL is *NOT* bound to DNS, there may be ways around this.

@enkiv2 @freakazoid

@kick I'll disagree with you that WoT doesn't scale, again, at least in part.

We rely on a mostly-localised WoT all the time in meatspace. Infotech networks' spatial-insensitivity makes this ... hard to replicate, but I'm not prepared to say it's _entirely_ impossible.

Addressing based on underlying identifiers, tied to more than just content (I'm pretty sure that _isn't_ ultimately sufficient), we might end up with _something_ useful.

@enkiv2 @freakazoid

@kick Nodes of authority / trust, perhaps -- not centralised, but not fully distributed either. More hub-and-spoke than full-mesh, but a quite _extensive_ H&S system.

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid WoT doesn't scale for average users. Technical users it does. WoT doesn't work over the phone, for example, or on e-mail, because people are easily convinced that malicious actors are within their WoT in targeted attacks. This is going to get worse esp. with recent FastSpeech & Tacotron publications/code releases.

@kick @enkiv2 @dredmorbius @freakazoid This body remembers when the definition of "geek" was someone who used a computer to exchange text chat messages to people. At least, that's what it meant at UCSC. Going back further, was it Augustine who was mightily impressed that Anselm could read without moving his lips?

@kick To be clear, I'm trying to distinguish WoT-as-concept as opposed to WoT-as-implementation.

In the sense of people relying on a trust-based network in ordinary social and commerce interactions in real life, not in a PGP or other PKI sense, that's effectively simply _how we operate_.

Technically-mediated interactions introduce complications -- limited information, selective disclosure, distance, access-at-a-distance.

But the principles of meatsapce trust can apply.

@enkiv2 @freakazoid

@kick That is: direct vs. indirect knowledge. Referrals. TOFU. Repeated encounters. Tokenised or transactional-proof validations.

Those are the _principles_.

The specific _mechanics_ of trust on a technical network are harder, but ... probably tractable. The hurdle for now seems to be arriving at data and hardware standards. We've gone through several iterations which Scale Very Poorly or Are Hard To Use.

We can do better at both.

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid Do you have a proposed mechanical solution to get around the social problems that arrive with WoT? e.g.:

https://news.ycombinator.com/item?id=21528887

@kick A roundabout response, though I think it gets somewhere close to an answer.

"Trust" itself is not _perfect knowledge_, but _an extension of belief beyond the limits of direct experience._ The etymology's interesting: etymonline.com/word/trust

Trust is probabalistic.

Outside of direct experience, you're always trusting in _something_. And ultimately there's no direct experience -- even our sight, optic nerve, visual perception, sensation, memory, etc., are fallable.

@enkiv2 @freakazoid

@kick Building off the notion that "reality is what, when you stop believing in it, refuses to go away", we validate trust in received assertions of reality through multiple measures.

Some by the same channel, some by independent ones.

Getting slighly more concrete:

Simulator sickness is a problem commercial and military pilots experience with flight simulators. The problem is the simulator lies, and visual and vestibular inputs disagree. Sims are good, not perfect.

@enkiv2 @freakazoid

@kick I don't know if you've ever dealt with a habitual liar, or someone whose mental processes are so disrupted that they can't recall, or recall incorrectly, or misrepresent past events (or present ones). It's tremendously disorienting.

Our own memories are glitchy enough that you start doubtiing yourself. Having a record (journal, diary, receipts, independent witnesses) helps hugely.

Getting to theories of truth, consistency and correspondence seem to work best.

@enkiv2 @freakazoid

@kick Is a given narrative or representation *internally* consistent, or at least mostly so? And does it correspond to observable external realities (or again, mostly so)?

Mechanisms of trust generally try to achieve consistency or correspondence, sometimes both. In information systems, we tend to use one-way hashes, because those support the computational needs, but the hashes themselves are used to create a consistency or correspondence.

@enkiv2 @freakazoid

@kick So, in the "we have your dad hostage" situation, the scammer's failure was one of correspondence: dad was already dead.

But how you'd check this, *if you had the presence of mind to do so*, would be to attempt independent verification through other channels.

Call his number directly, or your mother's (assuming both are still alive and together), or current partner's. Ask to speak to him. Call the police, etc.

Falsehoods are common to any comms regime.

@enkiv2 @freakazoid

@kick If the channel (or medium) is a narrow one, and _not_ given to interrogation or ready validation, then you've got a harder problem.

You may need to call on experts. And we _have_ those for extand documentation classes -- people who validate books, or paintings, or recordings, or photos, or videos. They look for signs of both authenticity and deception.

See Captain Disillusion. Or art provenance.

Not perfect. But pretty good.

@enkiv2 @freakazoid

Show more
@dredmorbius @enkiv2 @freakazoid The "call directly" is a good technical solution, but I know someone personally who didn't think to do that when a _company_ called them, so I'm not sure how well that'd work assuming a _person_ (they were in perfect state of mind, just unaware that companies generally don't call you first and ask for PII).

Educating users is the most difficult social problem, especially educating them on things that they generally don't recognize as _aspects_ of the problem (like you pointed to when you mentioned the elderly calling things they don't understand "nonsense," for example).

As an example of technical users failing the basic "trust but verify," you can find a bunch of examples on HN of people saying things akin to "I use ProtonMail because they encrypt all of my e-mails!" which is easily disprovable (in the sense that they're intending, not in-transit encryption, which basically every modern provider has) just by sending a message to a non-ProtonMail box that doesn't have a key on keyservers and finding it completely readable.
Show more

@dredmorbius @enkiv2 @freakazoid Cheater!

But yeah, a decent answer.

I do kind of worry about how fallible most WoT implementations are^1, but there definitely might be a way to do it, I’ll cede.

^1 Given that I as a random finance dork managed to reimplement the recent FastSpeech papers in ten days and get results decent enough to fool my SO when using it over a phone call (modern carriers started compressing call audio poorly when they internally moved to VOIP and the quality is pretty poor as a result), my confidence in what has previously been seen in a relatively decent way to verify (audio) has lessened slightly.

@kick I have been warning close friends and family members (some elderly and prone to dismiss technological threats and concerns as "nonsense" or "nothing I would want to use" or "beyond my understanding" or "but why would someone do that", v. frustrating) about DeepFakes and FastSpeech technologies.

I know that at least one has had faked-voice scam phone calls, though they realised this eventually. I'm predicting based in part on this, BTW.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid in infosec "trust" means "reliance" and isn't probabilistic. It's just a choice to give an entity the power to attack you. What's probabilistic and fallible is the possible benefits of that choice.

@kragen As with most words, there's a range of meanings. I'll admit to having pulled "extension of belief beyond the limits of experience" out of my hat, so it's not entirely standard. And that's "trust as a state of knowledge".

There's also the notion of "to put one's trust in (someone|something)", which can mean a binary rather than probablistic committment. We also have provisional or total trust.

Trust me, it's complicated.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid
Of course, one look at the state of computer security shows that for most cases (even very important ones) the social countermeasures are weaker than the technical ones. It's a lot easier to social engineer or rubber hose than to crack even a pretty weak password.

@enkiv2 Which is another way of saying that social engineering and rubber-hoses are low-cost search / goal-attainment paths.

@kick @freakazoid

@dredmorbius @enkiv2 @freakazoid IPv4 is completely human-readable if treated like phone numbers (though alternatively another way would be to map the available range of numbers to words, and autotranslate on the human end; humans can remember three words pretty easily). Kind of pushing it for English-speaking populations, though (English active-memory limit 7 things), I'll admit, but should be fine for the larger branches of the world that speak languages that can store more in active memory (e.g. cantonese at 10).
Sign in to participate in the conversation
mastodon.cloud

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!