I am thinking about archiving all my files using zip and/or gzip. So, I read the three parts of the FAQ. The way I understand data-corruption most of it occurs during the transfer/copying of data (binary<->ascii-conversion), not really, because of physical corruption of the media or some other causes. Still there are certain points that aren't totally clear to me: 1.) What are the (unintended) causes of data corruption, and how to avoid them? Which ones are physical and which ones are "logical" ones, how the do the two types relate? There are things I don't quite understand here, why is data corruption while transferring data so often if modems themselves encrypt, decrypt and validate the data chunck by chunck while transferring it? 2.) Are there ways/strategies to maximize the recovery of corrupted zipped data and/or minimize the corruption based on the zip file format? 3.) Does keeping the data in a DB help? 4.) How often do the different types of data corruption happen? 5.) Is it safe to let people upload a file via the browsers <input type="file" ... /> feature? Or do you have to get the signatures prior to and then afterwards and compare them using an extension of this protocol? (I could imagine something like this should exist already) 6.) And in general about file corruptions, I personally know of cases in which even if people bought certain software from the company (say Microsoft), the copy they got (through the mail ...) was "diferent". Why doesn't microsoft and other big companies out there, publish the signatures (using different cypher algos) of all the files in their suites on their pages? 7.) Where are "best practices" to be found? Any good links/white papers/books on the subject? Thanks for your input
On Tue, 03 Jun 2003 01:10:25 +0200, Juergen Fenn <juergen.fenn@gmx.de> wrote: >I have never had problems with file corruption using gzip. My free >version of Power Archiver 6.11.0, however, does not deal properly with >large files / directories (I do not know whether this bug has been >fixed by now as I do not update PA any more since it has become >shareware). I used to use Power Archiver too (because it was free and good), but now use 7-zip (http://www.7-zip.org/) - it's free (though you can help out the author with a donation if you choose). The interface is quite different, but it's a lot faster (it opens in seconds what Power Archiver would take a minute to open), and it doesn't barf on big files/dirs either. Doesn't handle spanning yet (minor complaint), but I don't remember if power archiver did that anyway... >So if you want to prevent your archives from corrupting you should >choose and test you archiving software properly. BTW, I do not >compress date I _really_ rely on. Nor do I keep my backups in the same building as the computer.. The thing with compressed vs uncompressed is that if the middle of an uncompressed file gets corrupted, you might usually be able to use the data around the corruption (text is the obvious example). But if the middle of a compressed file is corrupted you either a) lose the part of the file that is corrupted and all the data afterwards or b) lose the whole file. I don't know of general purpose compression algorithms that cope with a small amount of corruption well, though I have seen a few jpg's with some corruption that seem to "recover" afterwards.. (maybe coincidence that the huffman codes sync up). Errol Smith errol <at> ros (dot) com [period] au
![]() |
0 |
![]() |