Inverting the Web
We use search engines because the Web does not support accessing documents by anything other than URL. This puts a huge amount of control in the hands of the search engine company and those who control the DNS hierarchy.
Given that search engine companies can barely keep up with the constant barrage of attacks, commonly known as "SEO". intended to lower the quality of their results, a distributed inverted index seems like it would be impossible to build.
@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).
Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.
My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...
@freakazoid ... and reputation.
But a mechanism in which:
1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known
... might disrupt things a tad.
NB: the reputation bits might build off social / netgraph models.
But yes, I've been thinking on this.
Also YaCy as sean mentioned.
There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.
Being dropped by Firefox BTW.
That provides a query API only, not a distributed index, though.
@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.
If you request by IP, you'll get only the default / primary host on that IP address.
That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.
@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.
@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.
There are alternate URLs, e.g., irc://host/channel
I'm wondering if a standard for an:
http://<address-proto><delim>address> might be specifiable.
Onion achieves this through the onion TLD. But using a reserved character ('@' comes to mind) might allow for an addressing protocol _within_ the HTTP URL itself, to be used....
@kick Clue seeks clue.
You're asking good questions and making good suggestions, even where wrong / confused (and I do plenty of both, that's not a criticism).
You're helping me (and I suspect Sean) think through areas I've long been bothered about concerning the Web / Internet. Which I appreciate.
(Kragen may have this all figured out, he's far certainly ahead of me on virtually all of this, and has been for decades.)
@zardoz @dredmorbius @kick @enkiv2 @freakazoid I guess the other alternatives along those lines are the Git model (fork at will, and choose whose fork you link to) and the Debian model (maintainers exist, and vote on governance, but NMUs are available to limit the worst failures of the maintainer model, despite the avconv/ffmpeg problem etc.)
@kragen On the Git / fork model, there's a problem I've been trying to articulate for years and think I may finally have:
The threat of the low-cost / high-capability developer.
That is, even outside the proprietary world, it's possible to shape the direction of software (or protocol or data standards) development by being the most able / capable / low-cost developer.
That's been an issue in several notable projects, and seems more so now.
@kragen So whilst it's possible to fork, it can be hard to fork *and sustain a competitive level of development and support* especially against a particularly complicated alternative.
Say: browser rendering engines. Or init suite replacements. Or integrated desktops. Or office suites. Or tax or accounting software.
A vastly funded adversary *even if operating wholly within Free Software*, can code circles around other parties.
@kragen This goes back to the days of "worse is better" -- because "worse" is also (near-term) cheaper, and faster to develop, so it iterates and improves much faster than "better".
You may end up stuck in a local optimum as a result. But you'll at least get there quickly, while "better" is still trying to get their 0.01 out the door.
Otherwise: I tend to agree re: Wikipedia and Debian: social and organisational structures help tremendously.
It's the power of free, or at least low-cost.
Software development itself closely resembles network structures (and is a network of interactions between functions or processes). Water seeks the largest channel, electricity the lowest resistance, and buyers the lowest cost, software development favours capable development.
It's impossible to compete against a lower price:
@kragen You'd likely have to undermine their business model.
On the positive side, this is a dynamic which can be used to play megacorps (and possibly other interests) off one another.
That notion goes back to IBM's Earthquake Memo, ~1998.
I'm not sure if you were at the LinuxWorld Expo where copies of that were being shown around, probably 1999, NYC.
Tim O'Reilly wrote on that in Open Sources.
@dredmorbius @zardoz @kick @enkiv2 @freakazoid I think it goes back longer than that; IIRC Gumby commented on the fsb list in the mid-1990s that he wasn't worried about other companies contributing code to GCC and GDB because Cygnus could then turn around and sell the improved versions to Cygnus's customers. Of course those customers could get the software without paying, but they found Cygnus's offering valuable enough to pay for, and competitors' contributions just increased that value.
@dredmorbius @zardoz @kick @enkiv2 @freakazoid the big insight Tim had, which took the rest of us a while to appreciate, was how this gave new market power to companies that own piles of data, like Google or the ACM or Knight Capital. And now we have AWS and Azure and Samsung capturing a big part of the value from free software instead.
@kragen As I mentioned earlier: Virtually any monopoly I can think of can be described as a network.
The Usual Suspects are transport and communications. Markets are networks (nodes: buyers/sellers, links: transactions/contracts/relationships), politics (power brokers and relationships), information (knowledge as web, multiple contexts).
Most networks have more central nodes, those nodes become power centres as they amplify small applied effort.
@kragen The 1990s power nexuses were:
- Microsoft's per-CPU OEM licenses.
- Office market- and mind-share.
- ISV network and mindshare.
And at the server level, proprietary Unix.
Free software disrupted these, at least on the server, and eventually in the emerging mobile/handheld space. But new networks and centres emerged. Data, and ads, search, retail, and social networkss (Google, Amazon, Facebook).
Swapping monopolies isn't a win.
@kragen Defining "network" in this context may help:
A collection of nodes and links, between which _something_ flows; matterial, energy, information, forces, people, relationships, money.
Characteristics are size (nodes, links: 0, 1, 2, ... many), topology (unary, peer, chain, ring, star, tree, mesh, compound), throughput, permanance, directionality (directed, nondirected), protocols & formats, governance.
Common & distinctive properties emerge.
@kragen Weinsteinomics 101: Monopoly is fundamentally a control dynamic, not a marketshare proposition
...Harvey Weinstein and the Economics of Consent by Brit Marling is one of the more significant economics articles of the past decade, though I'm not sure Ms. Marling recognises this. In it, she clearly articulates the dynamics of power, and re-establishes the element of control so critical to understanding monopoly...
That's a point I find from a few writers.
Robert W. McChesney, now in media studies but trained in economics, specifically makes that point in his books (Communication Revolution particularly: https://www.worldcat.org/title/communication-revolution-critical-junctures-and-the-future-of-media/oclc/260208807).
Philip Mirowski's "More Heat Than Light".
W. Brian Arthur who notes that virtually all economics is policy rather than theory driven. There's little actual theory, much of it questionable.
@dredmorbius @kick @zardoz @enkiv2 @kragen @freakazoid This is a very interesting thread you had, but reading it rapidely, none of you has envisionned that changing radicaly of cyberspace architecture was the solution. From what I saw, all your reasonning are still imprisonned by the current norms and standards imposed by the Empire for the current cyberspace architecture.
According to my cryto-anarchist studies on cyber-powers genesis, the architecture of all known technological layers and of a cyberspace architecture caracterize what I call the cyber-power model and which in turn caracterize the economical model.
The current statut quo is definitely pushing for the neoliberal surveillance capitalism model we have today.
But different cyberspaces architectures can have cyber-power models that lead to
@kick @zardoz @enkiv2 @dredmorbius @kragen Many economists (especially Russ Roberts) agree that it is not a science. But like science it is a branch of philosophy, not religion. There are certainly plenty of folks who are quite dogmatic, but also many who are intensely curious and interested in finding better ways to describe and predict how people interact and make decisions.
@freakazoid The term is ... slightly ... exaggerated.
But you have elements of:
- A Received (or Revealed) Knowledge.
- An Annointed Priesthood.
- Sacred Texts.
- An exceedingly close relationship with Power.
- Ideological Purity Tests.
- A large Propaganda Arm.
- A strong resistence to actual empirical knowledge, most especially from the sciences.
- Routine rubbishing of dissident thought.
- Numerous True Believers.
The descriptions not far off.
@kragen Fair enough. "At least" to the Earthquake Doc.
Though that *specifically* laid out the policy of adopting an Open Source orientation for IBM specifically to compete more effectively against Microsoft and Sun.
Similarly: Netscape's assault against Microsoft, with browsers (and trying to break the desktop stranglehold), Sun's release of StarOffice, Google turning Microsoft's AJAX against MSFT via Gmail, etc., etc.
@dredmorbius @kragen @zardoz @kick @enkiv2 One reason companies are able to out-develop non-commercial organizations is that they're more able to make it people's full time job. So the problem to solve here is funding. A UBI would probably do it, but I think there are other ways, mostly involving collectivization. Coding communes: pool resources and minimize people's cost of living.
@freakazoid Absolutely. Commercialism's capacity to moblise resources is phenomenal.
Early work on Free Software as an organisational model (see Coleman's and O'Mahoney's works, among others) suggested FS/OS was an organisational model which could displace traditional propreitary SW dev. And in some cases it has.
Others not so much.
And it can be *adopted* by commercial enterprises (or govs, edus, orgs) as well, combining capital + FS/OS.
@kragen @zardoz @dredmorbius @kick @enkiv2 Unfortunately Wikipedia suffers from issues like that person who's been tirelessly editing the pages of media organizations and journalists in order to discredit them. At the end of the day there's no substitute for reputation and "editorial voice". I'd prefer known bias to unknown.
I still don't know how powerful this technique can be, though; once it's known maybe it's defused.
@kragen @zardoz @dredmorbius @kick @enkiv2 @freakazoid
SSB is something worth looking at re: combining social & technical concerns. The network is not fully connected (even less so than fedi) & you have a kind of automatic/passive filtering through this disconnection (especially through, like, transitive blocking). Spammers have to actively be followed by trusted peers in order to broadcast.
@dredmorbius @enkiv2 @kragen @zardoz @kick @freakazoid
Yeah, SSB = scuttlebutt. It's an incredibly interesting protocol and community with really vital discussion about norms and community management with a kind of vaguely left-libertarian flavor, hobbled by a couple specific technical problems that make onboarding & setup hard & make it tough to implement clients that aren't electron apps.
@kragen @enkiv2 @dredmorbius @zardoz @kick @freakazoid
SSB uses progressively signed JSON, where the text of the JSON gets hashed and the hash is added to the end. It also uses keys. Key order isn't defined in JSON so all implementations, for compatibility reasons, must use the order that happened to be produced by nodejs when the first SSB message was composed. This has been a barrier to non-v8-based clients (though a rust one exists now).
@dredmorbius @kick @enkiv2 @freakazoid building a non-distributed index has gotten a lot easier though. when I published the Nutch paper it was still not practical for a regular person to crawl most of the public textual web, from a cost perspective. (not sure if it's practical now, though, due to cloudflare)
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!