Saturday, August 15, 2009

Digital Archeology

Digital Archeology is the study of archaic data found in memory devices of earlier digital computers.  It is related to the fields of cryptography and computer forensics.

Think about this problem: You are running a company and you want to change the computer software vendor that you keep your business accounting information on. There must be some old data which you will not convert over to the new system.  Now, further suppose that a number of years after you do this conversion you need to access some information from before the cut off point.  And further suppose that the company that supplied the software has gone out of business.  Now what do you do? 

Well, in case you don't think this is a realistic probability, consider the IRS and the requirement that not only must you maintain your accounting records in tangible paper form, but also, if you maintain electronic records, you must maintain the data in those records in machine readable form for a considerable period of time into the past.  The rationale for this is obvious.

Every day computer data devices are failing and sometimes that information on them is very valuable and sometimes that information is not backed up properly. Or alternatively, a back up has been kept, but for one reason or another it has been destroyed.  There are specialists who will extract information from a disc or other memory device and then reconstruct the original information in a meaningful way.
I think you could easily call this digital archeology.

Now, just imagine, 100 years from now. Will we be able to "read" the data that we are producing today?  Will we be able to see the images and video we are currently producing if it hasn't been "converted" to newer and newer formats along the way?
What about .pdf files? Will those be easily readable?  

Today, if we discover archival material that was produced 100s or even a thousand years ago we most likely have the skill to decipher it.  Will that be the case into the future?  
Alternatively, is there data out there today that was produced 50 years ago that is, for all practicle purposes, undecipherable today.  Look at the Sony "Beta" format for video. I'm sure there are still functioning units around, but for how long will they be around?

Should we, as a society, be taking steps now that will insure that digital data produced today and into the future will remain intelligible?

I'd like to hear from anyone who has something to say about this.
Please comment or contact me.


  1. I would imagine that the most common formats will be around for a very long time: jpg, pdf, doc, mp3, html, zip.... It's the more obscure formats, that will be lost, like your palm desktop contacts file, it will be unrecoverable without quite a bit of reverse engineering. You can still find a place to load a COBOL program.

  2. Yes, but what about earlier version of some of these popular formats. eg. early .doc formats are not supported by many current products.