Inverting the Web 

We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.

Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.

@enkiv2 I know SEARX is:

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.

If you request by IP, you'll get only the default / primary host on that IP address.

That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.

@freakazoid Question: is there any inherent reason for a URL to be based on DNS hostnames (or IP addresses)?

Or could an alternate resolution protocol be specified?

If not, what changes would be required?

(I need to read the HTTP spec.)

@kick @enkiv2

@freakazoid Answering my own question: no, there's not:

"As far as HTTP is concerned, Uniform Resource Identifiers are simply formatted strings which identify--via name, location, or any other characteristic--a resource."

@kick @enkiv2

@dredmorbius @freakazoid @kick @enkiv2
Earlier RFCs had defined meanings for the parts of HTTP URLs, but vendors ignored the standards so now URL paths are just an arbitrary string which could mean anything.

@mathew I think this discussion hinges more on the host part, and what it might reference other than DNS as an HTTP (or HTTPS) protocol reference, so as to break from the DNS oligarchy.

An alternative is to define other protocol references, as with, say, doi://, which address specific content.

There's the PURL concept of Internet Archive.

And how to create a self-sustaining decentralised namespace is challenging.

@freakazoid @kick @enkiv2

@dredmorbius @freakazoid @kick @enkiv2 Back even further, the plan was that the web would eventually use URIs, which would be dereferenced to fragile URLs. But the host-independent transport layer never happened because one-way links that break were "good enough". URIs only really survived in the DTDs.

@mathew More on "why" would be interesting.

Insufficient motivation?
Sufficient of resistance?
Excess complexity?

@freakazoid @kick @enkiv2

@dredmorbius @mathew @freakazoid @enkiv2 Lack of competence! (at least partly.)

I think it's startling how much of technical history is due to people with better ideas being entirely incompetent.

@kick And it's not merely competence. Much of it is mastery across a range of skills, including marketing, organisational leadership, fundraising, fighting off (or neutralising) legal and business threats, etc.

"Capitalism as the engine of innovation" suffers massively from Texas Sharpshooter fallacy, and ignores many souls it destroyed or ignored. Aaron Swartz, Ian Murdoch, Ted Nelson, Doug Englebart, Paul Otlet, Rudolph Deisel, Nicola Tesla, Filo Farnsworth...

@enkiv2 @mathew @freakazoid

@dredmorbius @enkiv2 @mathew @freakazoid Nelson was who I was thinking of when I said "incompetence," actually.

Your statement makes me want to ask, though: how was capitalism responsible for the death of Murdock? That seemed to be strictly a police violence problem; he was making millions.

And Swartz's case, while indirectly caused by capitalism, seemed to be more caused by the state. (JSTOR pulled out quickly while MIT and the Fed insisted on pursuing.) One could argue I guess that his ideas were kind of neglected, but interestingly he seemed to have a lot of success with them as he got later in life.

@kick The cases of Murdoch and Swartz are slightly different, but in general: people with a demonstrated enormous talent *and* a goal of direct social benefit were attacked and/or abandoned by the instruments of their own society.

Carmen Ortiz, Steven Heymann, Michael Pickett, M.I.T., JSTOR, M.I.T. President L. Rafael Reif, and others in the prosecution chain of command are complicit in Swartz's murder. They drove him to it in all deliberation.

@enkiv2 @mathew @freakazoid

@kick And the proprietary academic publishing industry must be destroyed, in Swartz's name.

It will be.

@enkiv2 @mathew @freakazoid

@dredmorbius @enkiv2 @mathew @freakazoid Progress has definitely been made! There's only been a single paper that I've had trouble accessing in the last few months, despite having no legal access to papers. Can't wait until the system collapses further.

@kick There's a huge back-archive that's still hard to find.

Though the situation's getting vastly better.

Eventually the (surviving) publishers will turn to a public-goods model, tax-supported, because it's the only way they can exist. And I'm talking _all_ publishing substantially.

Academic: revert copyrights to authors, publish through Universities, as it was previously.

@enkiv2 @mathew @freakazoid

@kick Murdoch also suffered mental health issues. He'd done well, but as with many technological pioneers, saw hugely uneven success.

At a time when he was in crisis, and quite evidently and obviously so, the system entirely failed him.

As it does so very, very, very, very many.

Sucks out all they've got to give, then spits them out.

@enkiv2 @mathew @freakazoid

@dredmorbius @enkiv2 @mathew @freakazoid Fully agree with the second paragraph, my disagreement in the initial comment was that I'm under the belief that JSTOR did _less_ harm (they still did a lot of harm) than the other parties (they dropped their case pretty much immediately). But overall I agree, yeah.

@kick I'm (trying to) rereading the MIT report on the incident.

That's also rage-inducing.

@enkiv2 @mathew @freakazoid

Sign in to participate in the conversation

Everyone is welcome as long as you follow our code of conduct! Thank you. is maintained by Sujitech, LLC.