Inverting the Web 

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.


@enkiv2 I know SEARX is:

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick @enkiv2 @dredmorbius Not true; there are several decentralized routing systems out there. UIP, 6/4, Yggdrasil, Cjdns, I2P, and Tor hidden services to name just a few. Once you're no longer using names that are human-memorizable you can move to addresses that are public key hashes and thus self-certifying.

A system designed for content retrieval doesn't really need a way to refer to location at all. IPFS, for example, only needs content-based keys and signature-based keys.

@freakazoid @enkiv2 @dredmorbius I said _really_. None of those are human-readable (unlike IP). Non-human-readable systems miss the point of the WWW, web of trust stuff is awful and doesn't scale. Human readability in decentralized addressing is a solved problem (more or less) for addressing systems, but there's nothing good implementing the solution yet, so little point.

@kick I'm with you in advocating for human-readable systems. IPv4 is only very barely human-readable, almost entirely by techies. IPv6 simply isn't, nor are most other options.

Arguably DNS is reaching a non-human-readable status through TLD propogation.

Borrowing from some ideas I've been kicking around of search-as-identity (with ... possible additional elements to avoid spoof attacks), and the fact that HTTP's URL is *NOT* bound to DNS, there may be ways around this.

@enkiv2 @freakazoid

@kick I'll disagree with you that WoT doesn't scale, again, at least in part.

We rely on a mostly-localised WoT all the time in meatspace. Infotech networks' spatial-insensitivity makes this ... hard to replicate, but I'm not prepared to say it's _entirely_ impossible.

Addressing based on underlying identifiers, tied to more than just content (I'm pretty sure that _isn't_ ultimately sufficient), we might end up with _something_ useful.

@enkiv2 @freakazoid

@kick Nodes of authority / trust, perhaps -- not centralised, but not fully distributed either. More hub-and-spoke than full-mesh, but a quite _extensive_ H&S system.

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid WoT doesn't scale for average users. Technical users it does. WoT doesn't work over the phone, for example, or on e-mail, because people are easily convinced that malicious actors are within their WoT in targeted attacks. This is going to get worse esp. with recent FastSpeech & Tacotron publications/code releases.

@kick @enkiv2 @dredmorbius @freakazoid This body remembers when the definition of "geek" was someone who used a computer to exchange text chat messages to people. At least, that's what it meant at UCSC. Going back further, was it Augustine who was mightily impressed that Anselm could read without moving his lips?

@kick To be clear, I'm trying to distinguish WoT-as-concept as opposed to WoT-as-implementation.

In the sense of people relying on a trust-based network in ordinary social and commerce interactions in real life, not in a PGP or other PKI sense, that's effectively simply _how we operate_.

Technically-mediated interactions introduce complications -- limited information, selective disclosure, distance, access-at-a-distance.

But the principles of meatsapce trust can apply.

@enkiv2 @freakazoid

@kick That is: direct vs. indirect knowledge. Referrals. TOFU. Repeated encounters. Tokenised or transactional-proof validations.

Those are the _principles_.

The specific _mechanics_ of trust on a technical network are harder, but ... probably tractable. The hurdle for now seems to be arriving at data and hardware standards. We've gone through several iterations which Scale Very Poorly or Are Hard To Use.

We can do better at both.

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid Do you have a proposed mechanical solution to get around the social problems that arrive with WoT? e.g.:

@kick A roundabout response, though I think it gets somewhere close to an answer.

"Trust" itself is not _perfect knowledge_, but _an extension of belief beyond the limits of direct experience._ The etymology's interesting:

Trust is probabalistic.

Outside of direct experience, you're always trusting in _something_. And ultimately there's no direct experience -- even our sight, optic nerve, visual perception, sensation, memory, etc., are fallable.

@enkiv2 @freakazoid

@kick Building off the notion that "reality is what, when you stop believing in it, refuses to go away", we validate trust in received assertions of reality through multiple measures.

Some by the same channel, some by independent ones.

Getting slighly more concrete:

Simulator sickness is a problem commercial and military pilots experience with flight simulators. The problem is the simulator lies, and visual and vestibular inputs disagree. Sims are good, not perfect.

@enkiv2 @freakazoid

@kick I don't know if you've ever dealt with a habitual liar, or someone whose mental processes are so disrupted that they can't recall, or recall incorrectly, or misrepresent past events (or present ones). It's tremendously disorienting.

Our own memories are glitchy enough that you start doubtiing yourself. Having a record (journal, diary, receipts, independent witnesses) helps hugely.

Getting to theories of truth, consistency and correspondence seem to work best.

@enkiv2 @freakazoid

@kick Is a given narrative or representation *internally* consistent, or at least mostly so? And does it correspond to observable external realities (or again, mostly so)?

Mechanisms of trust generally try to achieve consistency or correspondence, sometimes both. In information systems, we tend to use one-way hashes, because those support the computational needs, but the hashes themselves are used to create a consistency or correspondence.

@enkiv2 @freakazoid

@kick So, in the "we have your dad hostage" situation, the scammer's failure was one of correspondence: dad was already dead.

But how you'd check this, *if you had the presence of mind to do so*, would be to attempt independent verification through other channels.

Call his number directly, or your mother's (assuming both are still alive and together), or current partner's. Ask to speak to him. Call the police, etc.

Falsehoods are common to any comms regime.

@enkiv2 @freakazoid

@kick If the channel (or medium) is a narrow one, and _not_ given to interrogation or ready validation, then you've got a harder problem.

You may need to call on experts. And we _have_ those for extand documentation classes -- people who validate books, or paintings, or recordings, or photos, or videos. They look for signs of both authenticity and deception.

See Captain Disillusion. Or art provenance.

Not perfect. But pretty good.

@enkiv2 @freakazoid

@kick So back to "how would you prove..."

If you're operating in an edge case outside the ideals of the planned system, especially where the attacker prevents (or claims unavailable) reliable means of verification -- and controlling the flow of information is one of the oldest hacks in the book, see Sun Tzu "On the Use of Spies" -- then you're somewhat limited.

But you can try bypassing the suspect channel, or side-channel leaks through that, or testing for consistency.

@enkiv2 @freakazoid

@kick All of which would help you establish the truth of a claimed world-state.

Having to be constantly vigilant for such cases is _extremely_ tiring, based on my own experience.

We prefer operating in high-trust environments. Which itself is a likely adaptation -- if certain systems / experiences prove consistently low-trust, those with the option to do so will abandon them.

(Not all have that option.)

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid The "call directly" is a good technical solution, but I know someone personally who didn't think to do that when a _company_ called them, so I'm not sure how well that'd work assuming a _person_ (they were in perfect state of mind, just unaware that companies generally don't call you first and ask for PII).

Educating users is the most difficult social problem, especially educating them on things that they generally don't recognize as _aspects_ of the problem (like you pointed to when you mentioned the elderly calling things they don't understand "nonsense," for example).

As an example of technical users failing the basic "trust but verify," you can find a bunch of examples on HN of people saying things akin to "I use ProtonMail because they encrypt all of my e-mails!" which is easily disprovable (in the sense that they're intending, not in-transit encryption, which basically every modern provider has) just by sending a message to a non-ProtonMail box that doesn't have a key on keyservers and finding it completely readable.

@kick People are stupid, yes.

I knew someone, years ago, who spent a week mad at her boyfriend because she'd mis-dialed his number, got a woman on the other end, and jumped to the conclusion that he was cheating on her.

That's ... a difficult problem to engineer around.

But we might be able to avoid some larger-scale consequences. The Podesta Test comes to mind.

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid Cheater!

But yeah, a decent answer.

I do kind of worry about how fallible most WoT implementations are^1, but there definitely might be a way to do it, I’ll cede.

^1 Given that I as a random finance dork managed to reimplement the recent FastSpeech papers in ten days and get results decent enough to fool my SO when using it over a phone call (modern carriers started compressing call audio poorly when they internally moved to VOIP and the quality is pretty poor as a result), my confidence in what has previously been seen in a relatively decent way to verify (audio) has lessened slightly.

@kick I have been warning close friends and family members (some elderly and prone to dismiss technological threats and concerns as "nonsense" or "nothing I would want to use" or "beyond my understanding" or "but why would someone do that", v. frustrating) about DeepFakes and FastSpeech technologies.

I know that at least one has had faked-voice scam phone calls, though they realised this eventually. I'm predicting based in part on this, BTW.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid in infosec "trust" means "reliance" and isn't probabilistic. It's just a choice to give an entity the power to attack you. What's probabilistic and fallible is the possible benefits of that choice.

@kragen As with most words, there's a range of meanings. I'll admit to having pulled "extension of belief beyond the limits of experience" out of my hat, so it's not entirely standard. And that's "trust as a state of knowledge".

There's also the notion of "to put one's trust in (someone|something)", which can mean a binary rather than probablistic committment. We also have provisional or total trust.

Trust me, it's complicated.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid
Of course, one look at the state of computer security shows that for most cases (even very important ones) the social countermeasures are weaker than the technical ones. It's a lot easier to social engineer or rubber hose than to crack even a pretty weak password.

@enkiv2 Which is another way of saying that social engineering and rubber-hoses are low-cost search / goal-attainment paths.

@kick @freakazoid

@dredmorbius @enkiv2 @freakazoid IPv4 is completely human-readable if treated like phone numbers (though alternatively another way would be to map the available range of numbers to words, and autotranslate on the human end; humans can remember three words pretty easily). Kind of pushing it for English-speaking populations, though (English active-memory limit 7 things), I'll admit, but should be fine for the larger branches of the world that speak languages that can store more in active memory (e.g. cantonese at 10).

@dredmorbius @kick @enkiv2 @freakazoid
Search-as-identity is one solution, but I prefer petnames -- a decentralized identity system for decentralized networks. If somebody wants to find something globally it's fine to rely upon something strict but unmemorable, but finding stuff that's already resident on your box or that your direct connections are sharing ought to be a personal or community affair.

@enkiv2 SAI and petnames are two points in a space (not sure if 1D or n-dimensional).

Search utilises characteristics which may be internally-specified (content, transforms) or extenal (metadata, assigned identifiers).

Petnames are locally-assigned non-global identifiers. They may be _shared_ among some group, but they're localised, folksonomic, nonauthoritative.

(Though local names can become global with time/use/convention.)

@kick @freakazoid

@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.

If you request by IP, you'll get only the default / primary host on that IP address.

That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.

@freakazoid Question: is there any inherent reason for a URL to be based on DNS hostnames (or IP addresses)?

Or could an alternate resolution protocol be specified?

If not, what changes would be required?

(I need to read the HTTP spec.)

@kick @enkiv2

@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.

@freakazoid Hrm....


There are alternate URLs, e.g., irc://host/channel

I'm wondering if a standard for an:

http://<address-proto><delim>address> might be specifiable.

Onion achieves this through the onion TLD. But using a reserved character ('@' comes to mind) might allow for an addressing protocol _within_ the HTTP URL itself, to be used....

@kick @enkiv2

@dredmorbius @kick @enkiv2 @ is already reserved for the optional username[:password] portion before the hostname.

@freakazoid @dredmorbius @enkiv2 Is ! still reserved (! may be a DNS thing actually, thinking about it further)?

@kick As of RFC 2369, "!" was unreserved. That RFC is now obsolete. Not sure if status is changed.

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid Entirely unrelated because I just remembered this based on @kragen's activity in this thread:

Vaguely shocked that I'm interacting with both of you because I'm pretty sure you two are the people I've (at least kept in memory for long enough) read the words of online consistently for longest. (Since I was like, eight, maybe, on Kragen's part. Not entirely sure about you but less than I've checked for by a decent margin at least.)

@kick Clue seeks clue.

You're asking good questions and making good suggestions, even where wrong / confused (and I do plenty of both, that's not a criticism).

You're helping me (and I suspect Sean) think through areas I've long been bothered about concerning the Web / Internet. Which I appreciate.

(Kragen may have this all figured out, he's far certainly ahead of me on virtually all of this, and has been for decades.)

@enkiv2 @kragen @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid while I appreciate the vote of confidence, and I did spend a long time figuring out how to build a scalable distributed index, I am as at much of a loss as anyone when it comes to figuring out the social aspect of the problem (SEO spam, ranking, funding).

@dredmorbius @kick @enkiv2 @freakazoid building a non-distributed index has gotten a lot easier though. when I published the Nutch paper it was still not practical for a regular person to crawl most of the public textual web, from a cost perspective. (not sure if it's practical now, though, due to cloudflare)

@kragen @dredmorbius @enkiv2 @freakazoid I think it would be? Given the people working at Cloudflare, it seems like they'd whitelist whatever you're crawling with if you asked the right person assuming it didn't become something everyone and their cat was requesting to do.

@kragen I see a lot of this coming down to:

- What is the incremental value of additional information sources? At some point, net of validation costs, this falls below zero.

- Google's PageRank relied on inter-document and -domain relations. Author-based trust hasn't carried as much weight. I believe it needs to.

- Randomisation around ranking should help avoid systemib bias lock-ins.

- Penalties for fraud, with increasing severity and duration for repeats.

@kick @enkiv2 @freakazoid

@kragen - Some way of vetting new arrivals / entities, such that legitimate newcomers aren't entirely locked out of the system. Effectively letters of recommendation or reference.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid I've thought that it might be reasonable to bootstrap a friendnet by assigning newcomers (randomly or by payment) to "foster families" or "undergraduate faculties" to allow them to gain enough whuffie to become emancipated. ideally, gradually, rather than through an emancipation cliff analogous to legal majority or a B.S.

@kragen Challenge on any such scheme is scaling quickly enough, relative to other systems.

Though if the founding cohort is sufficiently interesting, you'll have the reverse problem: too many people wanting in.

An inspiration I've long had for this is Lawrence Lessig's "signed by" convention at the ... Yale Wall, I think, described in "Code and Other Laws of Cyberspace".

That applied to anonymous messages, but for new users might also work.

@kick @enkiv2 @freakazoid

@kragen It's effectively a socialisation problem -- how do you introduce new members to a society?

But doing that *without* creating an inculcated old-boys/girls/nbs network, or any of the usual ethnic or socioeconomic cliques. Something that most systems have generally failed at.

Random assignments should help but aren't of themselves sufficient.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid human societies have hierarchies of prestige; we can't hope to eliminate those through incentive design. We can hope to prevent things like despotism, witch-burning, the Inquisition, the Holocaust, and the burning of the Library of Alexandria. But there's going to be an old-enbies network, unavoidably.

@dredmorbius @kragen @kick @enkiv2 @freakazoid
Stafford Beer had some ideas about ways to rotate people through groups in such a way that ideas echo through a network. Based on graph theory & permutation. I've forgotten the name. Worth looking into as a way to grow/integrate folks into a large group by making connection in a smaller one & getting mirroring/feedback.

@dredmorbius @kragen @enkiv2 @freakazoid How much privacy are you willing to sacrifice with this?

Taking a single possibility (I listed a few) from a thing I wrote to a couple of posts up-thread but didn’t send because I want to hear someone’s opinion on a sub-problem of one of the guesses listed:

Seed with trusted users (i.e. people submitting sites to crawl), rank preferentially by age (time-limited; would eventually wear off), then rank on access-by-unique-users. Given that centralized link aggregators wouldn’t disappear, someone throws HN in, for example, the links on HN get added into the pool, whichever get clicked on most rise up, eventually get their own ranking, etc.

This works especially well if using what I sent the e-mail to inquire a little more about: cluster sorting rather than just barebacking text (this is what Yippy does, for example, and what Blekko used to do), because it promotes niche results better than Google’s model with smaller datasets, and when users have more seamless access to better niches, more sites can get rep easier. Example: try vs. throwing your username into Google. The clustering allows for much more informative/interesting results, I think, especially if doing inquisitive searching.

Kragen mentioned randomly introducing newcomers (adding noise), but I think it might work better still if noise was added to the searches for at least the beginning of it. A single previously-unclicked link on the first five pages of search results?

@kick As little as possible.

I've not participated online under my real name (or even vague approximations of it) for a decade or more. That was seeming increasingly unattractive to me already then. And I'd been online for at least two decades by that point.

Of the various dimensions of trust, anti-sock-puppetry is one axis. It's not the only one. It matters a lot in some contexts. Less in others.

Doxxing may be occasionally warranted.

Umasking is a risk.

@enkiv2 @kragen @freakazoid

@dredmorbius @enkiv2 @kragen @freakazoid Privacy isn't just deanonymizing! You can also track pseudonyms.

@kick Right. My comments were aimed more at qualifying my interest in / preferences for privacy.

I'm finding contemporary society to be very nearly intolerable. And probably ultimately quite dangerous.

@enkiv2 @kragen @freakazoid

