Inverting the Web
We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.
Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.
@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).
Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.
My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...
@freakazoid ... and reputation.
But a mechanism in which:
1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known
... might disrupt things a tad.
Somewhat more:
https://news.ycombinator.com/item?id=22093403
NB: the reputation bits might build off social / netgraph models.
But yes, I've been thinking on this.
@dredmorbius @freakazoid
Isn't yandex a federated search engine? Maybe @drwho has input?
@enkiv2 I know SEARX is: https://en.wikipedia.org/wiki/Searx
Also YaCy as sean mentioned.
There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.
Being dropped by Firefox BTW.
That provides a query API only, not a distributed index, though.
@kick @enkiv2 @dredmorbius Not true; there are several decentralized routing systems out there. UIP, 6/4, Yggdrasil, Cjdns, I2P, and Tor hidden services to name just a few. Once you're no longer using names that are human-memorizable you can move to addresses that are public key hashes and thus self-certifying.
A system designed for content retrieval doesn't really need a way to refer to location at all. IPFS, for example, only needs content-based keys and signature-based keys.
@kick I'm with you in advocating for human-readable systems. IPv4 is only very barely human-readable, almost entirely by techies. IPv6 simply isn't, nor are most other options.
Arguably DNS is reaching a non-human-readable status through TLD propogation.
Borrowing from some ideas I've been kicking around of search-as-identity (with ... possible additional elements to avoid spoof attacks), and the fact that HTTP's URL is *NOT* bound to DNS, there may be ways around this.
@kick I'll disagree with you that WoT doesn't scale, again, at least in part.
We rely on a mostly-localised WoT all the time in meatspace. Infotech networks' spatial-insensitivity makes this ... hard to replicate, but I'm not prepared to say it's _entirely_ impossible.
Addressing based on underlying identifiers, tied to more than just content (I'm pretty sure that _isn't_ ultimately sufficient), we might end up with _something_ useful.
@kick To be clear, I'm trying to distinguish WoT-as-concept as opposed to WoT-as-implementation.
In the sense of people relying on a trust-based network in ordinary social and commerce interactions in real life, not in a PGP or other PKI sense, that's effectively simply _how we operate_.
Technically-mediated interactions introduce complications -- limited information, selective disclosure, distance, access-at-a-distance.
But the principles of meatsapce trust can apply.
@kick That is: direct vs. indirect knowledge. Referrals. TOFU. Repeated encounters. Tokenised or transactional-proof validations.
Those are the _principles_.
The specific _mechanics_ of trust on a technical network are harder, but ... probably tractable. The hurdle for now seems to be arriving at data and hardware standards. We've gone through several iterations which Scale Very Poorly or Are Hard To Use.
We can do better at both.
@kick A roundabout response, though I think it gets somewhere close to an answer.
"Trust" itself is not _perfect knowledge_, but _an extension of belief beyond the limits of direct experience._ The etymology's interesting: https://www.etymonline.com/word/trust
Trust is probabalistic.
Outside of direct experience, you're always trusting in _something_. And ultimately there's no direct experience -- even our sight, optic nerve, visual perception, sensation, memory, etc., are fallable.
@kick Building off the notion that "reality is what, when you stop believing in it, refuses to go away", we validate trust in received assertions of reality through multiple measures.
Some by the same channel, some by independent ones.
Getting slighly more concrete:
Simulator sickness is a problem commercial and military pilots experience with flight simulators. The problem is the simulator lies, and visual and vestibular inputs disagree. Sims are good, not perfect.
@kick I don't know if you've ever dealt with a habitual liar, or someone whose mental processes are so disrupted that they can't recall, or recall incorrectly, or misrepresent past events (or present ones). It's tremendously disorienting.
Our own memories are glitchy enough that you start doubtiing yourself. Having a record (journal, diary, receipts, independent witnesses) helps hugely.
Getting to theories of truth, consistency and correspondence seem to work best.
@kick Is a given narrative or representation *internally* consistent, or at least mostly so? And does it correspond to observable external realities (or again, mostly so)?
Mechanisms of trust generally try to achieve consistency or correspondence, sometimes both. In information systems, we tend to use one-way hashes, because those support the computational needs, but the hashes themselves are used to create a consistency or correspondence.
@kick If the channel (or medium) is a narrow one, and _not_ given to interrogation or ready validation, then you've got a harder problem.
You may need to call on experts. And we _have_ those for extand documentation classes -- people who validate books, or paintings, or recordings, or photos, or videos. They look for signs of both authenticity and deception.
See Captain Disillusion. Or art provenance.
Not perfect. But pretty good.
@kick All of which would help you establish the truth of a claimed world-state.
Having to be constantly vigilant for such cases is _extremely_ tiring, based on my own experience.
We prefer operating in high-trust environments. Which itself is a likely adaptation -- if certain systems / experiences prove consistently low-trust, those with the option to do so will abandon them.
(Not all have that option.)
@kick So back to "how would you prove..."
If you're operating in an edge case outside the ideals of the planned system, especially where the attacker prevents (or claims unavailable) reliable means of verification -- and controlling the flow of information is one of the oldest hacks in the book, see Sun Tzu "On the Use of Spies" -- then you're somewhat limited.
But you can try bypassing the suspect channel, or side-channel leaks through that, or testing for consistency.
@enkiv2 @freakazoid