Inverting the Web
We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.
Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.
@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).
Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.
My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...
@freakazoid ... and reputation.
But a mechanism in which:
1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known
... might disrupt things a tad.
NB: the reputation bits might build off social / netgraph models.
But yes, I've been thinking on this.
Also YaCy as sean mentioned.
There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.
Being dropped by Firefox BTW.
That provides a query API only, not a distributed index, though.
@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.
If you request by IP, you'll get only the default / primary host on that IP address.
That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.
@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.
@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.
There are alternate URLs, e.g., irc://host/channel
I'm wondering if a standard for an:
http://<address-proto><delim>address> might be specifiable.
Onion achieves this through the onion TLD. But using a reserved character ('@' comes to mind) might allow for an addressing protocol _within_ the HTTP URL itself, to be used....
@kick Clue seeks clue.
You're asking good questions and making good suggestions, even where wrong / confused (and I do plenty of both, that's not a criticism).
You're helping me (and I suspect Sean) think through areas I've long been bothered about concerning the Web / Internet. Which I appreciate.
(Kragen may have this all figured out, he's far certainly ahead of me on virtually all of this, and has been for decades.)
@kragen I see a lot of this coming down to:
- What is the incremental value of additional information sources? At some point, net of validation costs, this falls below zero.
- Google's PageRank relied on inter-document and -domain relations. Author-based trust hasn't carried as much weight. I believe it needs to.
- Randomisation around ranking should help avoid systemib bias lock-ins.
- Penalties for fraud, with increasing severity and duration for repeats.
Taking a single possibility (I listed a few) from a thing I wrote to a couple of posts up-thread but didn’t send because I want to hear someone’s opinion on a sub-problem of one of the guesses listed:
Seed with trusted users (i.e. people submitting sites to crawl), rank preferentially by age (time-limited; would eventually wear off), then rank on access-by-unique-users. Given that centralized link aggregators wouldn’t disappear, someone throws HN in, for example, the links on HN get added into the pool, whichever get clicked on most rise up, eventually get their own ranking, etc.
This works especially well if using what I sent the e-mail to inquire a little more about: cluster sorting rather than just barebacking text (this is what Yippy does, for example, and what Blekko used to do), because it promotes niche results better than Google’s model with smaller datasets, and when users have more seamless access to better niches, more sites can get rep easier. Example: try https://yippy.com/search?query=dredmorbius vs. throwing your username into Google. The clustering allows for much more informative/interesting results, I think, especially if doing inquisitive searching.
Kragen mentioned randomly introducing newcomers (adding noise), but I think it might work better still if noise was added to the searches for at least the beginning of it. A single previously-unclicked link on the first five pages of search results?
@kick As little as possible.
I've not participated online under my real name (or even vague approximations of it) for a decade or more. That was seeming increasingly unattractive to me already then. And I'd been online for at least two decades by that point.
Of the various dimensions of trust, anti-sock-puppetry is one axis. It's not the only one. It matters a lot in some contexts. Less in others.
Doxxing may be occasionally warranted.
Umasking is a risk.
@dredmorbius @kick @enkiv2 @freakazoid yeah, although in many ways it's an improvement over Golden Horde society, Ivan the Terrible society, Third Crusade society, Diocletian society, Qin Er Shi society, Battle of the Bulge society, Khmer Rouge society, Holodomor society, People's Temple society, the society that launched the Amistad, etc. We didn't start the fire.
[5000 well thought out lines of a single mail response on how Linux wipes the floor with Solaris performance-wise >quoted] Have you ever kissed a girl? - Bryan
So the problem was at least prevalent by ‘96.
@kick That danger / risk is an interesting one.
Some people focus on strictly one element -- the State, or Corporations, or Terrorists, or Narcocriminals, or the Criminally Insane, or Griefers, or Stalkers / Exes.
It's kind of all of the above.
In some cases I'm not fully sure that it's simply having civic systems and rule of law which matter more.
But mostly it' the data, the ability to use and misuse it, or simply presuming data exist, that enables evil.
@kick That's simply how information works.
So with pervasive recorded fungible, manipulable, queryable, records, on tremendous numbers of people, you don't know what future motives, contexts, norms, values, power structures, etc., will be.
The problem with Google's policy of getting right up to the creepy line, is that that creepy line moves.
So does the Surveillance Data Risk Line.
And we don't know what parts will move which way for what people and data.
@dredmorbius @kick @enkiv2 @freakazoid Right. Today recreational marijuana is legal in California; 30 years ago it could end your career in many jobs, and even today it can get you executed in much of Asia. Who's to say what its legality or public perception will be in another 30 years? Similarly for abortion, divorce, adultery, job-hopping, capitalism, or opposition to global pervasive surveillance.
@kragen I think my periodic observations that numerous states within the US *still* don't have a legal minimum age for marriage annoys a fair portion of the Fediverse.
Moral values are profoundly fungible, over time. Sometimes in as little as a few years, but staggeringly so over decades and centuries.
I've reasons for believing we may be entering a period of higher flux in values serving as social identifiers, adopted as moral codes.
@kragen Quite possibly in different direction in different locales, and not necessarily in a consistent direction over time even within given jurisdictions.
Drug (or sex, marriage, possibly business or technical) laws may swing wildly.
Where there's an overload of information, clearly evident, durable signifiers take on signalling significance, especially for group identity and loyalty.
@kick Google's watching that line move in all kinds of ways.
Ways that impose $5 billion fines in Europe.
Ways that may be turning public sentiment against it in the US. Quite possibly harder and fiercer than happened to Microsoft in the 2000s.
Google's been openly mocked on Hacker News for at least five years, if not longer. Dang just commented that their A/B practices have him switching his habits. I'd changed mine in 2013, and don't regret it at all.
Generalistic and moderated instance.