Digital Preservation
Most University employees create
or receive digital documents such as email messages, PDF reports, Excel
spreadsheets, PowerPoint presentations, jpeg images, and Word files. These
digital documents are used to conduct the University's work; this makes them
University records. Many digital records do not need to be stored after the
purpose for which they were created has been accomplished. However, some
digital records need to be kept longer for program reporting, as precedents for
future use, or to satisfy policy or regulatory requirements. Some digital
records need to be kept permanently because they provide important information
about University activities. Permanent digital records should be transferred to
the University Archives.
Availability of trustworthy
information about Case Western Reserve University's development in five years
or one hundred years depends on the way digital records are managed today --
before they are transferred to the Archives.
This list of Frequently Asked Questions is intended for
those in the Case Western Reserve University community who want to prolong the
life of digital information at work and at home.
What is the problem?
What can I do to minimize the risks of information loss?
What is a preservation strategy?
What are storage media?
What do I need to keep in mind when choosing a storage medium?
Where can I store files so that they will last indefinitely?
If nothing lasts indefinitely, where do I put files?
Why do you recommend hard drives?
Why do you recommend CD-R?
What about CD-RW?
What about magnetic media?
What about DVDs or flash drives?
How long do CD-Rs last?
How do I choose a brand of CD-R?
What can I do to make my CDs last longer?
How should I store my CDs?
How should I handle my CDs?
How should I label CDs?
What about adhesive labels?
Ooops, I already put adhesive labels on CDs. How should I take them off?
How can I clean CDs?
I always thought the bottom of a CD was the side to be careful of. Why worry about the top?
Any other storage media tips?
As long as the storage media are not deteriorating, I'll be able to get files off them, right?
Ok. So as long as I still have old hardware around, I can use the files?
What are file formats?
How do I identify a file format?
What do I need to keep in mind when choosing a file format?
What file formats should I use for the long term?
Should I compress digital files that I want to keep long term?
Should I encrypt digital files that I want to keep long term?
What is safe computing?
I know backing up my computer is important, but I have no idea where to start. What should I do?
What can I do to protect information from hackers and viruses?
Any other safe computing tips?
What will happen if I don't do anything?
Back to Top
Overview
What is the problem?
Digital records are powerful
because of the ease with which they can be created and modified, distributed
and copied, stored and retrieved. But digital records are fragile because they
depend on many layers of technology to be rendered in ways that humans
can see or hear. Encoding formats, application and operating system software,
storage media change rapidly and at different cycles. You may have software
that can open the file, but the file is on a disc for which your computer
doesn't have a drive. You may have upgraded your operating system but the
application publisher has stopped supporting your platform and the old version
of the application won't run on the newer operating system. The farther in the
future you need to keep digital records, the greater the chance of these
incompatibilities making those records inaccessible.
If you box up your paper and put the box in the back of the
closet, unless you have a disaster like fire or flood, in twenty years when you
retire and your successor pulls the box out of the closet, those paper records
will still be readable. We call this benign neglect. Benign neglect isn't the
best approach to paper preservation, but it isn't a big threat.
Applying the benign neglect approach to digital preservation
is almost a guarantee that your digital information won't be readable in the
future. Until inexpensive and easy-to-implement technical solutions to digital
obsolescence are developed, digital preservation will require a continuous
program of monitoring and migration -- which means transferring digital records
to each new generation of technology.
What can I do to minimize the risks of information loss?
Develop a preservation strategy.
What is a preservation strategy?
A preservation strategy is a plan for keeping records, and
the information contained in those records, usable for as long as they are
needed. A good preservation strategy includes smart selection of storage media
and file formats, migration of files to new formats, and following safe
computing practices.
Back to Top
Storage Media
What are storage media?
Storage media are the physical objects, such as hard drives,
CDs, or floppy disks, that hold information. Storage media are where information is stored, and are not to be confused
with file formats, such as jpeg or PDF, which have to do with how information is stored.
What do I need to keep in mind when choosing a storage
medium?
> Convenience:
how much information can each piece hold? It's easier to store and take
care of fewer items.
> Efficiency:
how easy and fast is it to copy to and from the medium?
> How
widespread is it? The more people who are using it, the more likely it
will be to stay around for a while and the easier it will be to get
equipment to read it.
> How
much does it cost? What is the cost of the equipment needed to read it?
> How
long will the information on the medium be in good enough shape to be
read?
Where can I
store files so that they will last indefinitely?
There is currently no digital storage medium that can be
expected to last indefinitely. How long different storage media will last
depends on many factors such as how they are made, what they are made of, and
how they are stored. Manufacturers' claims for the life spans of their products
are not independently verified, and manufacturing processes for storage media
are not standardized. In addition, many of these media have not been around
long enough for people to really know what may happen to them after ten,
fifteen, or fifty years.
If nothing
lasts indefinitely, where do I put files?
In an office or a home situation, where professionally
managed file servers are not used, we recommend two things. Your safest bet,
provided safe computing practices are followed, is a hard drive. For removable
media, we recommend Recordable Compact Discs (CD-R).
Why do you
recommend hard drives?
Items on hard drives are less likely than removable media to
be forgotten. Hard drives are very reliable, as long as safe computing
practices are followed. On the downside, large volumes (hundreds of gigabytes)
of material magnify the risk of something going wrong, and, of course, it's
essential that you back up your data.
Why do you recommend CD-R?
For removable media, we recommend
Recordable Compact Discs, also known as CD-R (R stands for Recordable). These
are the discs that you purchase blank and then burn your own information to,
and once it's burned, it can't be changed. CD-Rs from reputable manufacturers,
if handled properly, will probably outlive the hardware and software necessary
to read them. This means that the CDs themselves will physically still be
intact when your CD drive has been replaced by whatever is going to replace CD
drives.
What about
CD-RW?
Another type of CD is rewriteable, or CD-RW (RW stands for
Re-Writeable). These can be written, erased, and rewritten, like the older
floppy disks. The technology that's used in these discs that makes them
re-writeable also makes them susceptible to damage from the environment,
particularly exposure to light. In addition, information on CD-RWs is less
secure because it can be changed or rewritten. Therefore, these are not
recommended for long-term storage.
What about
magnetic media?
Flexible magnetic disks, which include 3.5-inch diskettes
and zip disks, are considered to have a lifespan of five years or less and
should not be used for long-term storage.
What about
DVDs or flash drives?
DVDs and solid-state media, such as flash drives, haven't
been around long enough to develop a track record, and cannot be recommended
for long-term storage.
How long do
CD-Rs last?
This depends on many factors such as how they are made, what
they are made of, and how they are stored. Manufacturers' claims for the life
spans of their products are not independently verified, and manufacturing
processes for CDs are not standardized. CD-Rs from reputable manufacturers, if
handled properly, will probably outlive the hardware and software necessary to
read them. This means that the CDs themselves will physically still be intact
when your CD drive has been replaced by whatever is going to replace CD drives.
How do I
choose a brand of CD-R?
If you can, use a disc brand recommended by the manufacturer
of your recorder to decrease the likelihood of errors during burning. According
to research done in 2003, CD-Rs that use
a gold metal reflective layer and phthalocyanine (THAL-o-CY-a-neen) -based dyes
(so-called gold/gold discs) have the greatest life span. The gold/gold discs
are more expensive than others, but they seem to be more stable. Be careful.
Just because the label says "gold" or "silver" or because the
CD looks gold or silver, it doesn't guarantee that the product's metal layer is
actually gold or silver.
What can I do to make my CDs last longer?
Store and
handle them properly.
How should I
store my CDs?
CDs should be stored in a stable environment with
temperatures between 40° F and 68° F and relative humidity between
20% and 50%. In an office environment, store them away from water lines and out
of direct sunlight. In your home, the main things to remember are not to store
them in your hot attic or your damp basement, in direct sunlight, or over a
radiator. A dust and smoke-free environment is helpful, and you want to keep
your CDs away from food and liquids. It is best to store the discs in rigid
jewel cases because they give greater physical protection than paper sleeves.
The jewel cases should be stored vertically, like a book.
How should I
handle my CDs?
> Handle
them as little as possible.
> Put
them back in their jewel cases when not in use.
> Handle
them by the edges or the center hole.
> Don't
touch the top or the bottom.
> Don't
bend the discs: remove them from their jewel cases by pressing down on the
hub of the case while holding the outer edge of the disc and lifting.
How should I
label CDs?
Don't use sharp or pointed writing implements, because these
can scratch the thin lacquer and metal layers on top of CDs. Similarly, the
chemicals in some markers can migrate into the protective layer and damage it.
What you should do is mark the center part of the label side (the part you can
see through around the hole) using a soft-tip marker with water-soluble
permanent ink. And no, Sharpies are not water-soluble. Anything that has a
strong odor probably isn't water-soluble.
What about
adhesive labels?
You definitely don't want to use adhesive labels. The weight
of the label can upset the balance of the disc during use, and the adhesives
can damage the top layers of the disc.
Ooops, I already put adhesive labels on CDs. How should I
take them off?
Don't try to remove labels that are already on CDs! Pulling
off the label can damage the top layer of the disc, and if you can't get the
whole label off, you could end up with even more balance problems than you had
with the label on.
How can I
clean CDs?
You should only clean CDs when absolutely necessary. Clean
only the non-label side of CDs, and wipe from center to edge, not in a spiral,
with a lint-free cloth. If you absolutely have to use something stronger to
clean them, use a little isopropyl alcohol (rubbing alcohol).
I always
thought the bottom of a CD was the side to be careful of. Why worry about the
top?
Recordable CDs
are made up of three layers. They have a clear plastic base on the bottom. The
laser has to read through this layer, so scratches, dirt, and smudges on the
bottom of the disc can all interfere with retrieving data.
The other side,
often called the label side, has a metal layer covered by a thin coating of
lacquer. In between the clear plastic layer on the bottom and the reflective
metal layer on the top is the layer where the information is actually recorded.
The laser
shines up through the clear bottom layer and is reflected back, and that's how
the information is read. If the metal layer is scratched or there are holes in
it, the laser passes through instead of being reflected, and the information
can't be read.
Any other
storage media tips?
> You
want your CDs to be as fresh as possible, so don't stockpile them. Buy
them as you need them, open them just before use, and check the disc
surface before recording to make sure it looks ok. Then spot-check the
data after recording.
> Once a
year, look at the discs to check for visible signs of damage or
deterioration, and at the same time, you should also check a sample for
readability.
> Copy
the files onto newer, fresher storage media. Storage media deteriorate as
they age.
> Don't record
at the maximum speed. Slower recording may take a few minutes longer, but
it will reduce the chance of introducing errors into your data.
As long as the storage media are not deteriorating, I'll
be able to get files off them, right?
Not necessarily. You can't read floppy disks with a CD
drive. Monitor the marketplace. As newer storage media and drives replace
current technology, copy information to the newer media. Don't wait until
current technology has become obsolete. But don't rush to embrace each new
technology as soon as it appears. Wait until the new media have an established
presence.
Ok. So as long as I still have old hardware around, I can
use the files?
File formats are subject to rapid technology obsolescence
and evolution. If files are in an
old file format and you no longer have the software that created them, you may
not be able to open the files. If you can open them, they might not display as
intended. It is important to choose file formats with care to prevent problems
in the future.
Back to Top
File Formats
What are file formats?
File formats are how information
is stored, and should not be confused with storage media, such as hard drives,
CDs, or floppy disks, where
information is stored.
How do I identify a file format?
The extension at the end of the file name is a clue to the
format. Examples include .pdf, .jpg, .xls.
What do I need to keep in mind when choosing a file
format?
> Is it proprietary or
open? Proprietary file formats, such as Word and Excel, are developed by
software companies, such as Microsoft, to encode data produced - and read
- by their software. In contrast, open file formats, such as ASCII or RTF,
can be supported by multiple software applications on different platforms.
Chances for loss are increased if the information is locked into a
proprietary format. What if the software to read the format is no longer
available, does not have backward compatibility, or is simply not
supported? Open formats provide a better chance of being supported since
multiple software applications can read them.
> Is it
well documented? If it's proprietary and the company that owns it goes out
of business, will there be documentation around to help you get at your
information?
> How
long has the format been around? You might not want to pick a file format
that is brand new.
> Has it
been widely adopted? If you do use a proprietary file format, choose one
that is very popular, such as PDF. This will increase the likelihood that
it will be around for a while.
> Is it
usable on different hardware and software platforms?
> Does
it have a migration path and backward compatibility, such as Word's
ability to open 5.1 version files and save them as Word 2004 versions?
What file formats should I use for the long term?
If you need to remove digital files from active use, here
are some suggested formats:
> Store
text as ASCII (American Standard Code for Information Interchange), RTF
(Rich Text Format: Microsoft Word can save files as RTF) or PDF (Portable
Document Format).
> Store
databases and spreadsheets as comma-delimited or tab-delimited ASCII text
or XML (many common proprietary database and spreadsheet applications can
export data into these formats).
> Store
PowerPoint as GIF or PDF.
> Store
images as uncompressed TIFF.
Should I compress digital files that I want to keep long
term?
Compression adds complexity to long-term preservation. Some
compression techniques shed "redundant" information. As an example,
JPEG removes information to reduce file size. The image might look fine on your
current monitor, but as monitors
improve, the lower quality of the image will be more obvious
Should I encrypt digital files that I want to keep long
term?
Encryption is used to make sure that information is read
and/or used only by authorized users. Like compression, encryption adds
complexity to long-term preservation. Encryption algorithms change over time
and backwards compatibility should not be assumed.
Back to Top
Safe Computing
What is safe computing?
There are a number of regular practices you should adopt to minimize the likelihood of short-term
information loss.
I know backing up my computer is important, but I have no
idea where to start. What should I do?
> Write
data to different types of storage media. Not all storage media are manufactured the same, even the same brand (such
as CDs). If you can, use two different types of media, such as a hard
drive and a CD, or two different manufacturers or batches of CDs.
> Write
copies with different software. This will protect against corruption from
malfunctions, viruses, or bugs.
> Store
backed-up files off-site. What's the point of having back-ups if a
disaster strikes your home or office, and your back-ups are destroyed
along with your computer?
What can I do to protect information from hackers and
viruses?
> Install
firewalls. We are protected at the University; however, when you use a
high-speed connection such as cable modem or DSL, your computer is
connected to the Internet as long as it is on, not just when it is being
used.
> Use
"safe" passwords. Use combinations of numbers and letters, as well as capitalization. Avoid using
dictionary words, the name of the computer or your account. Change
passwords frequently.
> Keep
security patches current.
> Never
open any unrequested or unidentifiable files you receive as email attachments until you know what they contain,
even if the message appears to have been sent by someone you know and
trust. Turn off features in email software that automatically open
attachments.
> Computer viruses can destroy or corrupt data on computers they infect. Always use
virus detection and removal software. Update virus definitions and scan
your computer frequently. Better yet, set preferences to automatically
update.
Any other safe computing tips?
> How
many times have you accidentally altered a file? Or thrown it away and
then emptied the trash? Lock those important files to prevent accidental
alteration or destruction.
> Use
surge protectors to guard against power outages, electrical spikes,
lightning and other types of power problems.
What will happen if I don't do anything?
If basic steps are not taken, information will be lost and
the cost of recovering it could be very
high.
Back to Top
Additional
Resources
ITS help desk's website <http://help.case.edu/safe/maintain/>
Ohio State University's website on Safe Computing at <http://safecomputing.osu.edu/safecomputing>
Wilhelm Imaging Research <http://www.wilhelm-research.com/> - a
good source for specific information about printers and dyes.
PADI (Preserving Access to Digital Information) <http://www.nla.gov.au/padi/index.html>
- a gateway to international digital preservation resources. Following one
of the trails is a good way to get started.
Image Permanence Institute, Rochester Institute of Technology
Consumer Guides <http://www.imagepermanenceinstitute.org/sub_pages/consguides.htm>
- a good source for traditional and digital photographic preservation.
Fred R. Byers, Care and Handling of CDs and DVDs--A Guide for Librarians and
Archivists <http://www.itl.nist.gov/div895/carefordisc/>
- is technical and detailed but provides excellent information about CDs
and DVDs.
The
DVD Association and the National Institute of Standards and Technology are
working to establish a standard industry test to determine the archival quality
of recordable CDs and DVDs. The number of years used in the test standard will
be determined by responses to a 2-question survey available at <http://www.dvda.org/html/nist_survey.php>. Responses will be collected until May 31, 2005.
Back to Top