The Czech Central Social Institution in Prauge, has 18 blocks of file cabinets 25 high, 20 across, and 3 metres deep, for 9,000 cabinets totalling 27 km in linear storage.

What's that in bytes?

A typewritten page, double-spaced, at 10 cpi, has about 2 KB of characters. You can fit about 150 pages/in (60pages/cm). At a 2/3 fill factor, the entire room is about 210 GB of storage. At capacity, nearer 375 GB.

There are 1 TB microSD cards.

You can hide this building under your pinkie nail.

It's also worth mentioning that the *effective* storage capacity of most paper-based bureaucratic systems is vastly lower than the "characters on a typewritten page" metric, as the records tend to consist of *forms*, where much of the space is occupied by *redundant field descriptions and instructions*.

A given form might only capture 20-40 fields, typically, and at least some of these (name, date of birth, ID number, date), repeated across multiple forms.

Again: digital storage is EFFICIENT.

Show thread

@dredmorbius Unfortunately, just because paper is used inefficiently it doesn't mean that digital storage is efficient.

One would likely store PDFs containing these texts, probably scanned from the paper originals and therefore actually stored as pictures. That's not efficient at all considering that one could theoretically store only the compressed text and use the space much more efficiently.

@phel So ... I'm aware of this.

I've spent quite some time doing data conversions (it pays ... relatively well, and remains a stubbornly persistent need). I've also worked with numerous aspects and processes of converting printed / textual inforamation to digital formats. There is a range of conversions possible, of varying types, and they offer differing sets of capabilities.

Straight visual scans DO take up more space, but not exceptionally much. A well-compressed 600 dpi jpg ...

@phel ... stores about a book's worth of content (~250 pages or so) in ~50 MB. A fully rendered PDF (starting from text rather than graphical input) about 1/10 that. And even several TB of data would fit within a fraction of a single cabinet of the original data store.

The process of converting from paper to digital formats is an interesting one, and again, is one I've spent considerable time (professionally) working with.

@phel Much of the apparent irrationality of IBM mainframe hierarchical data formats, with a format prefix followed by a data record, makes sense when you realise that this is the *digital* consequence of converting *a paper-based file* of data. Each record is effectively one form of the original file.

Given that the original paper records were somewhat ad hoc, the resulting digital record is similar.

That's typical of business processes digitised in the 1960s and 1970s.

Today is different.

@phel And I'd argue in many ways worse: you end up with straight-to-RDBMS structures (or other digital equivalents: XML, JSON, tagged-data formats, key-pair data stores, NoSQL, etc.) created entirley by programming teams, often with little or no underlying business knowledge, and evolving across multiple iterations, expansions, and adaptations of projects.

The old stuff was kludgy but generally consistent with time. Newer, kludgy and inconsistent.

@phel But still:

- Digitised formats *can* be far more rational than printed.
- They occupy phenomenally less space.
- They've got vastly lower access update, and change times.
- They've got vastly higher transaction and transfer rates.
- They can be updated, rewritten, and/or deleted quickly and completely (at least within a given store).

Whether these are good or bad features depends on who's using them and to what ends. "The medium is the message" applies.

@dredmorbius I suspect that some of the contents might include photographs attached, so that w would push it back up

@penguin42 Possibly.

Though photographs are surprisingly _not_ useful in most bureaucratic paper-based archives.

Their nature is intereting: similar size to paper records, but vastly higher bit density, relative to area.

Though there are also compact desriptive methods -- railroad ticket punches encode descriptions of passengers to discourage ticket re-use.

@dredmorbius Impressive! But in contrast to MicroSD cards, you do not need any complex tools to access these files.

@phel Other than the cavernous, climate-controlled warehouse, trained staff, and gantry-based desks, no ;-)

(Though yes, I'd made a similar point myself a couple of days ago re: digital vs. print data: news.ycombinator.com/item?id=2)

@dredmorbius @phel Why not use the building to store tapes or some other suitable persistent store? Much of the infrastructure could be re-used for tape robots. :)

@Steinar Actually, not so much.

Tape libraries tend to resemble server cabinets / aisles, and require dedicated datacentres, power, cooling, AC, and (possibly most crucial here): dust filtration.

An 80-year old, Communist-era concrete block construction would almost certainly be a _horrible_ siting for such a library.

Not to mention that the scale is completely excessive. I don't know what current digital requirements are, but it's probably a few DC aisles *at most*. Not this cave.

@phel

@Steinar Here's a (2013 -- so seven years old) 27 PB tape library installation.

That's 100,000 times the storage of the paper-file system pictured above in this thread.

invidio.us/watch?v=1yUZ81dCqBg

@phel

Sign in to participate in the conversation
mastodon.cloud

[Notice Regarding the Transfer of the mstdn.jp / mastodon.cloud Services] We have received several inquiries showing interest in a transfer following the announcement of the end of the mstdn.jp and mastodon.cloud services. As a result of subsequently evaluating the situation and making preparations, we have decided that the corresponding services will be transferred to Sujitech, LLC. on June 30. Thank you.