Inverting the Web
We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.
Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.
@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).
Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.
My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...
@freakazoid ... and reputation.
But a mechanism in which:
1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known
... might disrupt things a tad.
NB: the reputation bits might build off social / netgraph models.
But yes, I've been thinking on this.
@freakazoid Re: navigation.
1. Google are trying hard to kill off the URL.
2. There may be user-pattern based reasons to do just that.
3. URLs and DNS map ... poorly ... to meatspace notions of locality and identity. In large part due to the actions of websites, search engines, browser devs, SEO, and domain registrars.
4. A namespace with at _least_ a half-million entities and little sensible structure ... is far beyond human scale.
5. It's mostly reputation.
@dredmorbius I agree that killing off the URL is a worthy goal, which makes it a perfect weapon for Google to deal its final killing blow to the open Web.
As for scale, IIRC you can serve 90+% of web search requests with coverage of only about 5% of the space. Something like 99% Google results are served entirely from RAM. They don't even expect to serve useful results from their largest index; it exists primarily to give the impression of completeness.
Everyone is welcome as long as you follow our code of conduct! Thank you. Mastodon.cloud is maintained by Sujitech, LLC.