Inverting the Web 

@freakazoid What methods *other* than URL are you suggesting? Because it is imply a Universal Resource Locator (or Identifier, as URI).

Not all online content is social / personal. I'm not understanding your suggestion well enough to criticise it, but it seems to have some ... capacious holes.

My read is that search engines are a necessity born of no intrinsic indexing-and-forwarding capability which would render them unnecessary. THAT still has further issues (mostly around trust)...

@freakazoid ... and reputation.

But a mechanism in which:

1. Websites could self-index.
2. Indexes could be shared, aggregated, and forwarded.
4. Search could be distributed.
5. Auditing against false/misleading indexing was supported.
6. Original authorship / first-publication was known

... might disrupt things a tad.

Somewhat more:
news.ycombinator.com/item?id=2

NB: the reputation bits might build off social / netgraph models.

But yes, I've been thinking on this.

@enkiv2 I know SEARX is: en.wikipedia.org/wiki/Searx

Also YaCy as sean mentioned.

There's also something that is/was used for Firefox keyword search, I think OpenSearch, a standard used by multiple sites, pioneered by Amazon.

Being dropped by Firefox BTW.

That provides a query API only, not a distributed index, though.

@freakazoid @drwho

@dredmorbius @enkiv2 @freakazoid YaCy isn't federated, but Searx is, yeah. YaCy is p2p.
@dredmorbius @enkiv2 @freakazoid Also, the initial criticism of the URL system isn't entirely there: the DNS is annoying, but isn't needed for accessing content on the WWW. You can directly navigate to public IP addresses and it works just as well, which allows you to skip the DNS. (You can even get HTTPS certs for IP addresses.)

Still centralized, which is bad, but centralized in a way that you can't really get around in internetworked communications.

@kick HTTP isn't fully DNS-independent. For virtualhosts on the same IP, the webserver distinguishes between content based on the host portion of the HTTP request.

If you request by IP, you'll get only the default / primary host on that IP address.

That's not _necessarily_ operating through DNS, but HTTP remains hostname-aware.

@enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 IP is also worse in many ways than using DNS. If you have to change where you host the content, you can generally at least update your DNS to point at the new IP. But if you use IP and your ISP kicks you off or whatever, you're screwed; all your URLs are new invalid. Dat, IPFS, FreeNet, Tor hidden sites, etc, don't have this issue. I suppose it's still technically a URL in some of these cases, but that's not my point.

Follow

@freakazoid Question: is there any inherent reason for a URL to be based on DNS hostnames (or IP addresses)?

Or could an alternate resolution protocol be specified?

If not, what changes would be required?

(I need to read the HTTP spec.)

@kick @enkiv2

@dredmorbius @kick @enkiv2 HTTP URLs don't have any way to specify the lookup mechanism. RFC3986 says the part after the // and optional authentication info followed by @ is a "registered name" or an address. It doesn't say the name has to be resolved via DNS but does say it is up to the local system to decide how to resolve it. So if you just wanted self-certifying names or whatever you can use otherwise unused TLDs the way Tor does with .onion.

@freakazoid Hrm....

So:

There are alternate URLs, e.g., irc://host/channel
news://newsgroup/

I'm wondering if a standard for an:

http://<address-proto><delim>address> might be specifiable.

Onion achieves this through the onion TLD. But using a reserved character ('@' comes to mind) might allow for an addressing protocol _within_ the HTTP URL itself, to be used....

@kick @enkiv2

@dredmorbius @kick @enkiv2 @ is already reserved for the optional username[:password] portion before the hostname.

@freakazoid @dredmorbius @enkiv2 Is ! still reserved (! may be a DNS thing actually, thinking about it further)?

@kick As of RFC 2369, "!" was unreserved. That RFC is now obsolete. Not sure if status is changed.

tools.ietf.org/html/rfc2396

@enkiv2 @freakazoid

@dredmorbius @enkiv2 @freakazoid Entirely unrelated because I just remembered this based on @kragen's activity in this thread:

Vaguely shocked that I'm interacting with both of you because I'm pretty sure you two are the people I've (at least kept in memory for long enough) read the words of online consistently for longest. (Since I was like, eight, maybe, on Kragen's part. Not entirely sure about you but less than I've checked canonical.org/~kragen for by a decent margin at least.)

@kick Clue seeks clue.

You're asking good questions and making good suggestions, even where wrong / confused (and I do plenty of both, that's not a criticism).

You're helping me (and I suspect Sean) think through areas I've long been bothered about concerning the Web / Internet. Which I appreciate.

(Kragen may have this all figured out, he's far certainly ahead of me on virtually all of this, and has been for decades.)

@enkiv2 @kragen @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid while I appreciate the vote of confidence, and I did spend a long time figuring out how to build a scalable distributed index, I am as at much of a loss as anyone when it comes to figuring out the social aspect of the problem (SEO spam, ranking, funding).

@zardoz @dredmorbius @kick @enkiv2 @freakazoid the best attack on the SEO problem I've seen so far is Wikipedia: Wikipedia's messy social processes are very good at not getting captured by SEOs and the like. Not perfect, but enormously better than Google SERPs

@zardoz @dredmorbius @kick @enkiv2 @freakazoid I guess the other alternatives along those lines are the Git model (fork at will, and choose whose fork you link to) and the Debian model (maintainers exist, and vote on governance, but NMUs are available to limit the worst failures of the maintainer model, despite the avconv/ffmpeg problem etc.)

@kragen On the Git / fork model, there's a problem I've been trying to articulate for years and think I may finally have:

The threat of the low-cost / high-capability developer.

That is, even outside the proprietary world, it's possible to shape the direction of software (or protocol or data standards) development by being the most able / capable / low-cost developer.

That's been an issue in several notable projects, and seems more so now.

@zardoz @kick @enkiv2 @freakazoid

@kragen So whilst it's possible to fork, it can be hard to fork *and sustain a competitive level of development and support* especially against a particularly complicated alternative.

Say: browser rendering engines. Or init suite replacements. Or integrated desktops. Or office suites. Or tax or accounting software.

A vastly funded adversary *even if operating wholly within Free Software*, can code circles around other parties.

@zardoz @kick @enkiv2 @freakazoid

@kragen This goes back to the days of "worse is better" -- because "worse" is also (near-term) cheaper, and faster to develop, so it iterates and improves much faster than "better".

You may end up stuck in a local optimum as a result. But you'll at least get there quickly, while "better" is still trying to get their 0.01 out the door.

Otherwise: I tend to agree re: Wikipedia and Debian: social and organisational structures help tremendously.

@zardoz @kick @enkiv2 @freakazoid

@dredmorbius @zardoz @kick @enkiv2 @freakazoid it sounds like you're saying that free software tends to be meritocratic and some people don't like that? or is it more that it's much easier to add complexity to a problem (e.g., HTML5) than to remove it?

@kragen @dredmorbius @kick @enkiv2 @freakazoid nah I think he means that an agency with a lot of funding(like for instance google) could just become the arbiter of all information by pouring labor into it.

@zardoz @kragen Yes, close to this.

It's the power of free, or at least low-cost.

Software development itself closely resembles network structures (and is a network of interactions between functions or processes). Water seeks the largest channel, electricity the lowest resistance, and buyers the lowest cost, software development favours capable development.

It's impossible to compete against a lower price:

- Features
- Momentum
- Mindshare
- Security
- Etc

@kick @enkiv2 @freakazoid

@zardoz @dredmorbius @kick @enkiv2 @freakazoid yeah, they kind of already did. the question from my point of view is how to change the rules of the game to keep them from creating barriers to entry that allow them to dollar-auction their way into net-negative social value

@kragen You'd likely have to undermine their business model.

On the positive side, this is a dynamic which can be used to play megacorps (and possibly other interests) off one another.

That notion goes back to IBM's Earthquake Memo, ~1998.

I'm not sure if you were at the LinuxWorld Expo where copies of that were being shown around, probably 1999, NYC.

Tim O'Reilly wrote on that in Open Sources.

@zardoz @kick @enkiv2 @freakazoid

@dredmorbius @zardoz @kick @enkiv2 @freakazoid I think it goes back longer than that; IIRC Gumby commented on the fsb list in the mid-1990s that he wasn't worried about other companies contributing code to GCC and GDB because Cygnus could then turn around and sell the improved versions to Cygnus's customers. Of course those customers could get the software without paying, but they found Cygnus's offering valuable enough to pay for, and competitors' contributions just increased that value.

@dredmorbius @zardoz @kick @enkiv2 @freakazoid the big insight Tim had, which took the rest of us a while to appreciate, was how this gave new market power to companies that own piles of data, like Google or the ACM or Knight Capital. And now we have AWS and Azure and Samsung capturing a big part of the value from free software instead.

@kragen As I mentioned earlier: Virtually any monopoly I can think of can be described as a network.

The Usual Suspects are transport and communications. Markets are networks (nodes: buyers/sellers, links: transactions/contracts/relationships), politics (power brokers and relationships), information (knowledge as web, multiple contexts).

Most networks have more central nodes, those nodes become power centres as they amplify small applied effort.

@zardoz @kick @enkiv2 @freakazoid

Show more
Show more

@kragen Incidentally, the Harvey Weinstein and Jeffrey Epstein stories have made me aware just how much wealth, power, and corruption are also fundamentally network phenomena. Something I've touched on in a couple of Reddit posts IIRC.

@zardoz @kick @enkiv2 @freakazoid

Show more

@kragen Fair enough. "At least" to the Earthquake Doc.

Though that *specifically* laid out the policy of adopting an Open Source orientation for IBM specifically to compete more effectively against Microsoft and Sun.

Similarly: Netscape's assault against Microsoft, with browsers (and trying to break the desktop stranglehold), Sun's release of StarOffice, Google turning Microsoft's AJAX against MSFT via Gmail, etc., etc.

@zardoz @kick @enkiv2 @freakazoid

@dredmorbius @kragen @zardoz @kick @enkiv2 One reason companies are able to out-develop non-commercial organizations is that they're more able to make it people's full time job. So the problem to solve here is funding. A UBI would probably do it, but I think there are other ways, mostly involving collectivization. Coding communes: pool resources and minimize people's cost of living.

Show more

@kragen @zardoz @dredmorbius @kick @enkiv2 Unfortunately Wikipedia suffers from issues like that person who's been tirelessly editing the pages of media organizations and journalists in order to discredit them. At the end of the day there's no substitute for reputation and "editorial voice". I'd prefer known bias to unknown.

I still don't know how powerful this technique can be, though; once it's known maybe it's defused.

@kragen @zardoz @dredmorbius @kick @enkiv2 @freakazoid
SSB is something worth looking at re: combining social & technical concerns. The network is not fully connected (even less so than fedi) & you have a kind of automatic/passive filtering through this disconnection (especially through, like, transitive blocking). Spammers have to actively be followed by trusted peers in order to broadcast.

@dredmorbius @enkiv2 @kragen @zardoz @kick @freakazoid
Yeah, SSB = scuttlebutt. It's an incredibly interesting protocol and community with really vital discussion about norms and community management with a kind of vaguely left-libertarian flavor, hobbled by a couple specific technical problems that make onboarding & setup hard & make it tough to implement clients that aren't electron apps.

@enkiv2 @dredmorbius @zardoz @kick @freakazoid what are the technical problems with SSB? I've been trying to figure out where to find a straightforward explanation of the protocol at, like, the level of RFC 821.

@kragen @enkiv2 @dredmorbius @zardoz @kick @freakazoid
SSB uses progressively signed JSON, where the text of the JSON gets hashed and the hash is added to the end. It also uses keys. Key order isn't defined in JSON so all implementations, for compatibility reasons, must use the order that happened to be produced by nodejs when the first SSB message was composed. This has been a barrier to non-v8-based clients (though a rust one exists now).

@zardoz @kragen @dredmorbius @enkiv2 @freakazoid I typed a long reply to this (and the message above it) but decided to send someone an e-mail first to ask about something they're familiar with that's tangentially related to this; depending on what/if they reply I might respond with a few guesses.

@dredmorbius @kick @enkiv2 @freakazoid building a non-distributed index has gotten a lot easier though. when I published the Nutch paper it was still not practical for a regular person to crawl most of the public textual web, from a cost perspective. (not sure if it's practical now, though, due to cloudflare)

@kragen @dredmorbius @enkiv2 @freakazoid I think it would be? Given the people working at Cloudflare, it seems like they'd whitelist whatever you're crawling with if you asked the right person assuming it didn't become something everyone and their cat was requesting to do.

@kragen I see a lot of this coming down to:

- What is the incremental value of additional information sources? At some point, net of validation costs, this falls below zero.

- Google's PageRank relied on inter-document and -domain relations. Author-based trust hasn't carried as much weight. I believe it needs to.

- Randomisation around ranking should help avoid systemib bias lock-ins.

- Penalties for fraud, with increasing severity and duration for repeats.

@kick @enkiv2 @freakazoid

@kragen - Some way of vetting new arrivals / entities, such that legitimate newcomers aren't entirely locked out of the system. Effectively letters of recommendation or reference.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid I've thought that it might be reasonable to bootstrap a friendnet by assigning newcomers (randomly or by payment) to "foster families" or "undergraduate faculties" to allow them to gain enough whuffie to become emancipated. ideally, gradually, rather than through an emancipation cliff analogous to legal majority or a B.S.

@kragen Challenge on any such scheme is scaling quickly enough, relative to other systems.

Though if the founding cohort is sufficiently interesting, you'll have the reverse problem: too many people wanting in.

An inspiration I've long had for this is Lawrence Lessig's "signed by" convention at the ... Yale Wall, I think, described in "Code and Other Laws of Cyberspace".

That applied to anonymous messages, but for new users might also work.

@kick @enkiv2 @freakazoid

@kragen It's effectively a socialisation problem -- how do you introduce new members to a society?

But doing that *without* creating an inculcated old-boys/girls/nbs network, or any of the usual ethnic or socioeconomic cliques. Something that most systems have generally failed at.

Random assignments should help but aren't of themselves sufficient.

@kick @enkiv2 @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid human societies have hierarchies of prestige; we can't hope to eliminate those through incentive design. We can hope to prevent things like despotism, witch-burning, the Inquisition, the Holocaust, and the burning of the Library of Alexandria. But there's going to be an old-enbies network, unavoidably.

@kragen That's the Iron Rule of Oligarchy, so, yeah.

But we don't have to help them along any. And if we can figure out negative-feedback mechanisms to retard the process, so much the better.

@kick @enkiv2 @freakazoid

@dredmorbius @kragen @kick @enkiv2 @freakazoid
Stafford Beer had some ideas about ways to rotate people through groups in such a way that ideas echo through a network. Based on graph theory & permutation. I've forgotten the name. Worth looking into as a way to grow/integrate folks into a large group by making connection in a smaller one & getting mirroring/feedback.

@dredmorbius @kragen @enkiv2 @freakazoid How much privacy are you willing to sacrifice with this?

Taking a single possibility (I listed a few) from a thing I wrote to a couple of posts up-thread but didn’t send because I want to hear someone’s opinion on a sub-problem of one of the guesses listed:

Seed with trusted users (i.e. people submitting sites to crawl), rank preferentially by age (time-limited; would eventually wear off), then rank on access-by-unique-users. Given that centralized link aggregators wouldn’t disappear, someone throws HN in, for example, the links on HN get added into the pool, whichever get clicked on most rise up, eventually get their own ranking, etc.

This works especially well if using what I sent the e-mail to inquire a little more about: cluster sorting rather than just barebacking text (this is what Yippy does, for example, and what Blekko used to do), because it promotes niche results better than Google’s model with smaller datasets, and when users have more seamless access to better niches, more sites can get rep easier. Example: try https://yippy.com/search?query=dredmorbius vs. throwing your username into Google. The clustering allows for much more informative/interesting results, I think, especially if doing inquisitive searching.

Kragen mentioned randomly introducing newcomers (adding noise), but I think it might work better still if noise was added to the searches for at least the beginning of it. A single previously-unclicked link on the first five pages of search results?

@kick As little as possible.

I've not participated online under my real name (or even vague approximations of it) for a decade or more. That was seeming increasingly unattractive to me already then. And I'd been online for at least two decades by that point.

Of the various dimensions of trust, anti-sock-puppetry is one axis. It's not the only one. It matters a lot in some contexts. Less in others.

Doxxing may be occasionally warranted.

Umasking is a risk.

@enkiv2 @kragen @freakazoid

@dredmorbius @enkiv2 @kragen @freakazoid Privacy isn't just deanonymizing! You can also track pseudonyms.

@kick Right. My comments were aimed more at qualifying my interest in / preferences for privacy.

I'm finding contemporary society to be very nearly intolerable. And probably ultimately quite dangerous.

@enkiv2 @kragen @freakazoid

@dredmorbius @kick @enkiv2 @freakazoid yeah, although in many ways it's an improvement over Golden Horde society, Ivan the Terrible society, Third Crusade society, Diocletian society, Qin Er Shi society, Battle of the Bulge society, Khmer Rouge society, Holodomor society, People's Temple society, the society that launched the Amistad, etc. We didn't start the fire.

@kragen I'm referencing specifically the surveillance aspects, and the accellerating pace of that espeically over the past two decades or so. Though you can trace the trends back the the 1970s, generally.

Paul Baran was writing of the risks ~1966-1968, which is 52-54 years ago now.

IBM were actively demonstrating the risks 1939-1945.

Herbert Simon conveniently ignorant of this in 1978, when Zuboff discovered surveillance capitalism in her research.

@kick @enkiv2 @freakazoid

Sign in to participate in the conversation
mastodon.cloud

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!