Inverting the Web 

@freakazoid Shifting ground (and jumping back up this stack -- we've sorted the URL/URI bit):

What you suggest that's interesting to me is the notion of _self-description_ or _self-identity_ as an inherent document characteristic.

(Where a "document" is any fixed bag'o'bits: text, audio, image, video, data, code, binary, etc.)

Not metadata (name, path, URI).

*Maybe* a hash, though that's fragile.

What is _constant_ across formats?

@freakazoid So, for example:

I find a scanned-in book at the Internet Archive, I re-type the document myself (probably with typos) to create a Markdown source, and then generate PDF, ePub, and HTML formats.

What's the constant across these?

How could I, preferably programmatically, identify these as being the same, or at least, highly-related, documents?

MD5 / SHA-512 checksums will identify _files_, but not _relations between them_.

Can those relations be internalised intrinsically?

@freakazoid Or do you always have to maintain some external correspondence index which tells you that SOURCE.PDF was the basis for RETYPED.MD which then generated RETYPED.MD.ePub and RETYPED.MD.html, etc.

Something that will work across printed, re-typed, error/noise, whitespace variants. Maybe translations or worse.

Word vectors? A Makefile audit? Merkel trees, somehow?

@dredmorbius We have real world solutions for these problems in the form of notaries, court clerks, etc. I.e. (registered) witnesses. Trusted third parties, but they don't have to be a single party.

@dredmorbius In the RDF world I guess one doesn't sign the individual triple but the entire graph.

And it might make more sense to call these 4-tuples, because it's really "this person says that this object is related in this way to this other object".

@freakazoid Sorry, what's a triple in this context?

I've run across ... N-triples in an RDF / metadata context (via Worldcat -- it's one of their record schemas).

@dredmorbius Sorry, I thought you had used the term triple, but you actually used the term relation. I'm talking about triples in the RDF sense, which are relations.

