File Corruption

 I am thinking about archiving all my files using zip and/or gzip. So, I read 
the three parts of the FAQ.

 The way I understand data-corruption most of it occurs during the 
transfer/copying of data (binary<->ascii-conversion), not really, because 
of physical corruption of the media or some other causes.

 Still there are certain points that aren't totally clear to me:

 1.) What are the (unintended) causes of data corruption, and how to avoid them? 
Which ones are physical and which ones are "logical" ones, how the do the two 
types relate?
 There are things I don't quite understand here, why is data corruption while 
transferring data so often if modems themselves encrypt, decrypt and validate 
the data chunck by chunck while transferring it?

 2.) Are there ways/strategies to maximize the recovery of corrupted zipped data 
and/or minimize the corruption based on the zip file format?

 3.) Does keeping the data in a DB help?

 4.) How often do the different types of data corruption happen?

 5.) Is it safe to let people upload a file via the browsers <input type="file" ... />
 Or do you have to get the signatures prior to and then afterwards and compare 
them using an extension of this protocol? (I could imagine something like this 
should exist already) 

 6.) And in general about file corruptions, I personally know of cases in which 
even if people bought certain software from the company (say Microsoft), the copy 
they got (through the mail ...) was "diferent". Why doesn't microsoft and other 
big companies out there, publish the signatures (using different cypher algos) of 
all the files in their suites on their pages?

 7.) Where are "best practices" to be found? Any good links/white papers/books on 
the subject? 

 Thanks for your input

lbrtchx (180)
5/31/2003 3:54:01 PM
comp.compression 4696 articles. 0 followers. Post Follow

1 Replies

Similar Articles

[PageSpeed] 41

On Tue, 03 Jun 2003 01:10:25 +0200, Juergen Fenn <juergen.fenn@gmx.de>

>I have never had problems with file corruption using gzip. My free
>version of Power Archiver 6.11.0, however, does not deal properly with
>large files / directories (I do not know whether this bug has been
>fixed by now as I do not update PA any more since it has become

 I used to use Power Archiver too (because it was free and good), but
now use 7-zip (http://www.7-zip.org/) - it's free (though you can help
out the author with a donation if you choose). The interface is quite
different, but it's a lot faster (it opens in seconds what Power
Archiver would take a minute to open), and it doesn't barf on big
files/dirs either. Doesn't handle spanning yet (minor complaint), but
I don't remember if power archiver did that anyway...

>So if you want to prevent your archives from corrupting you should
>choose and test you archiving software properly. BTW, I do not
>compress date I _really_ rely on.

 Nor do I keep my backups in the same building as the computer..

 The thing with compressed vs uncompressed is that if the middle of an
uncompressed file gets corrupted, you might usually be able to use the
data around the corruption (text is the obvious example).
 But if the middle of a compressed file is corrupted you either a)
lose the part of the file that is corrupted and all the data
afterwards or b) lose the whole file.
 I don't know of general purpose compression algorithms that cope with
a small amount of corruption well, though I have seen a few jpg's with
some corruption that seem to "recover" afterwards.. (maybe coincidence
that the huffman codes sync up).

Errol Smith
errol <at> ros (dot) com [period] au
6/4/2003 3:26:57 AM