Inverting the Web
We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.
Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.
@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).
Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.
My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...
@freakazoid ... and reputation.
But a mechanism in which:
1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known
... might disrupt things a tad.
NB: the reputation bits might build off social / netgraph models.
But yes, I've been thinking on this.
@dredmorbius This is not a fully fleshed out idea yet, but the "L" was the important bit. People generally don't care about the location of the content. They care about the content of the content, and other stuff about the content like the author, etc.
Just think about how people generally navigate the web these days. They don't type a URL into their addressbar or click a bookmark. They type a search query into their address bar, which will generally bring up Google results.
@freakazoid OK, yes.
And, old hat to you, but the idea was to "locate on the Internet, by server and path": https://espace.cern.ch/webservices-help/GeneralUserInformation/GeneralinformationaboutWWW/Pages/Howthewebworks.aspx
... in a system literally designed by nuclear particle physicists.
Alternatively, L = I, "identifier".
Location == Identity.
Part of that remains valid. Part of it ... may not.
I've been kicking around the idea of a (local) document-oriented "filesystem" in which specifiers are effectively metadata descriptors or content-based keys. https://old.reddit.com/r/dredmorbius/comments/6bgowu/what_if_the_web_was_filesystemaccessible/
@dredmorbius Yeah, I've thought about similar approaches. Directories aren't required to be listable. Unordered bags of KV pairs don't map super well to hierarchical paths, but it's not like that matters very much. For most people a filesystem interface wouldn't matter anyway; they want a browser-ish application.
@freakazoid It ... depends.
There are times you want a _very specific_ resource.
It's not just _content_ that matters, but ownership, provenance, who can / did change / modify it, etc., etc.
There are times when "what colour is the sky?" can be answered by any of thousands of references.
The fact that _approximate, content-described results_ are _sometimes_ or even _often_ appropriate doesn't mean _always_.
@dredmorbius Indeed, but I'm not talking about getting rid of URLs, and for such things search engines end up just acting as a URL directory, since you will look until you see the URL you want.
@freakazoid A directory-path-based specification is saying "find this precise linked-list chain of directory specifications, with the implied properties of ownership, access permissions, modification history, provenance, etc., etc."
People looking for docs may allow slack. Software looking for libraries, somewhat less so.
And even humans looking for specific documentary authority may want a specific result.
@freakazoid The key for me is that _search is identity_, or at least _an identifier_, if _a search query_ returns _precisely one match_.
(Other options being "null" or "list".)
@dredmorbius I'm not sure I understand. It's possible for searches to return singleton results by accident. It seems like what you want is to distinguish between searchable metadata fields that uniquely identify resources and those that don't.
@freakazoid Right, that IS a problem, and a BIG one.
Possibly THE problem.
Q: Can documents be reasonably self-describing or self-identifying?
@dredmorbius I assume you mean *securely* self-describing?
Most distributed storage systems that try to defend against malicious nodes use exactly two types of keys, each self-certifying: content hash for immutable values and public key hash for mutable ones.
Beyond that you're into the realm of the subjective. My thinking here was to have signed triples ala RDF and use some kind of reputation system, i.e. web of trust, to decide which to trust.
@dredmorbius You could also provide a way to express *negative* triples as a way to try to correct errors or deliberate spam injected by others.
@freakazoid I'd settle for "functionally" or "sufficiently".
I realise that any document under active attack (to change/misrepresent) would require more stringent methods.
But something that "usually works" would be a huge step forward.
Everyone is welcome as long as you follow our code of conduct! Thank you. Mastodon.cloud is maintained by Sujitech, LLC.