gzrt-0.8/0000775000175200017520000000000012223104236013456 5ustar telestriantelestriangzrt-0.8/Makefile0000664000175200017520000000021712103343653015123 0ustar telestriantelestrianall: gzrecover gzrecover: gzrecover.o cc -Wall -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 gzrecover.c -lz -o gzrecover clean: rm gzrecover gzrt-0.8/ChangeLog0000664000175200017520000000310012223102361015217 0ustar telestriantelestrianNew for release 0.8 (2013-10-03) o Eliminate call to fsync(), resulting in 99% speed improvement o Add ability to read from the standard input stream and write to standard output for pipeline support. o Thanks to Alexey Yurchenko (ayurchen@gmail.com) for the above suggestions. o Fix incorrect errpos tracker (probably caused some of the core dumps people had reported) o Fix verbose logging to fix stream positions being incorrect (had not been updated after the program moved from mmap to a read buffer) o Also, move verbose logging from stdout to stderr o Misc error reporting updates New for release 0.7 (2013-02-02) o Fix =/== confusion in read_internal error check (via Shawn Cokus (cokus@ucla.edu) New for release 0.6 (2012-02-09) o Patches from Paul Wise (pabs@debian.org) for stability and memory leaks New for release 0.5 (2006-08-29) Including public domain contributions from Paul Wise o Modify Makefile to append CFLAGS and LDFLAGS o Modify error handling to suppress gcc warnings o Include man page o Minor typo/documentation changes New for release 0.4 (2005-11-12) o Discontinue tar patch (replaced by out of the box GNU cpio) o Update instructions New for release 0.3 (2005-03-13) o Convert from mmap to traditional buffered file reads in gzrecover o Convert gzrecover to GPL licensing New for release 0.2 (2003-04-24) o Compile with flags for large file support o No longer crash if the gzip file is simply truncated (i.e. error on last byte o Enable split file support (-s) o Fix typo in tar patch help text gzrt-0.8/README0000664000175200017520000000736612223100702014344 0ustar telestriantelestriangzrecover - Recover data from a corrupted gzip file gzrecover is a program that will attempt to extract any readable data out of a gzip file that has been corrupted. ***************************************************************************** ATTENTION!!!! 99% of "corrupted" gzip archives are caused by transferring the file via FTP in ASCII mode instead of binary mode. Please re-transfer the file in the correct mode first before attempting to recover from a file you believe is corrupted. ***************************************************************************** It is highly likely that not all data in the file will be successfully retrieved. In the event that the compressed file was a tar archive, the standard tar program will probably not be able to extract all of the files in the recovered file, so you will need to use GNU cpio instead. For compilation and installation instructions see README.build USAGE: gzrecover [ -hpsVv ] [-o ] [filename] If no input filename is specific, gzrecover reads from the standard input. By default, gzrecover writes its output to .recovered. If the original filename ended in .gz, that extension is removed. The default output filename when reading from the standard input is "stdin.recovered". Options include: -o - Sets the output file name -p - Write output to standard output for pipeline support -s - Splits each recovered segment into its own file, with numeric suffixes (.1, .2, etc) (UNTESTED) -h - Print the help message -v - Verbose logging on -V - Print version number -o and -p cannot be specified at the same time. Note that gzrecover will run slower than regular gunzip does, but has been significantly inproved in speed since the last release. The more corruption in the file, the more slowly it runs. Running gzrecover on an uncorrupted gzip file should simply uncompress it. However, substituting gzrecover for gunzip on a regular basis is not recommended. Any recovered data should be manually verified for validity. There's no guarantee anything will be recovered RECOVERING TAR FILES If your .gz file is a tar archive, it is likely the recovered file cannot be processed by the tar program because tar will choke on any errors in the file format. Fortunately, GNU cpio will extract tar files and will skip any corrupted bytes. If you don't have GNU cpio on your system, you can download it from ftp://ftp.gnu.org/pub/gnu/cpio/cpio-2.6.tar.gz Note that I have only tested with version 2.5 or higher. To extract files, use the following cpio options: cpio -F -i -v Note that cpio may spew large amounts of error messages to the terminal, and may also take a very long time to run on a file that had lots of corruption. Note: I previously had patched the GNU tar sources to enable it to skip corrupted bytes, but that patch has been discontinued because it is not needed and was only marginally successful at best. PUTTING IT ALL TOGETHER Your file foo.tar.gz is on a tape with bad data. To recover, copy the tape file to foo.tar.gz and: gzrecover foo.tar.gz cpio -F foo.tar.recovered -i -v No guarantees, but I hope this helps you as much as it helped me! KNOWN ISSUES gzrecover sometimes segfaults on certain files. Neither I nor anyone else has been able to track down the source of this. I am looking for files of reasonable size for which I can replicate this bug on Linux, so if you encounter it with a file that isn't huge, let me know. COPYRIGHT NOTICE gzrecover written by Aaron M. Renn (arenn@urbanophile.com) Copyright (c) 2002-2013 Aaron M. Renn. This code is licensed under the same GNU General Public License v2 (or at your option, any later version). See http://www.gnu.org/licenses/gpl.html gzrt-0.8/gzrecover.c0000664000175200017520000003016212223103672015635 0ustar telestriantelestrian/************************************************************************* * gzrecover - A program to recover data from corrupted gzip files * * Copyright (c) 2002-2013 Aaron M. Renn (arenn@urbanophile.com) * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published * by the Free Software Foundation, either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software Foundation * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307 USA ************************************************************************/ #include #include #include #include #include #include #include #include #include #define VERSION "0.8" /* Global contants */ #define DEFAULT_INBUF_SIZE (1024*1024) #define DEFAULT_OUTBUF_SIZE (64*1024) static const char *optstring = "ho:psVv"; static const char *usage = "Usage: gzrecover [-hpsVv] [-o ] [infile]"; /* Global Variables */ static int split_mode = 0; static int verbose_mode = 0; static int outfile_specified = 0; static int stdout_specified = 0; static char *user_outname; static size_t inbuf_size = DEFAULT_INBUF_SIZE; static size_t outbuf_size = DEFAULT_OUTBUF_SIZE; /* Display usage string and exit */ void show_usage_and_exit(int exit_status) { fprintf(stderr, "%s\n", usage); exit(exit_status); } #define throw_error(callname) perror(callname); exit(1); /* Read bytes from a file - restart on EINTR */ ssize_t read_internal(int fd, void *buf, size_t count) { ssize_t rc = 0; for (;;) { rc = read(fd, buf, count); if ((rc == -1) && ((errno == EINTR) || (errno == EAGAIN))) continue; return(rc); } } /* Open output file for writing */ int open_outfile(char *infile) { int ofd; char *outfile, *ptr; static int suffix = 1; /* Just return standard output if that is specified */ if (stdout_specified) return STDOUT_FILENO; /* Build the output file name */ if (outfile_specified) outfile = (char *)malloc(strlen(user_outname) + 9); else outfile = (char *)malloc(strlen(infile) + 25); if( outfile == 0 ){ throw_error("malloc") } if (!outfile_specified) /* Strip of .gz unless user specified name */ { ptr = strstr(infile, ".gz"); if (ptr) *ptr = '\0'; /* Bad form to directly edit command line */ ptr = strrchr(infile, '/'); /* Kill pathname */ if (ptr) infile = ptr+1; } if (outfile_specified && split_mode) sprintf(outfile, "%s.%d", user_outname, suffix++); else if (outfile_specified) strcpy(outfile, user_outname); else if (split_mode) sprintf(outfile, "%s.recovered.%d", infile, suffix++); else sprintf(outfile, "%s.recovered", infile); /* Open it up */ ofd = open(outfile, O_RDWR | O_CREAT, S_IWUSR | S_IRUSR); if( ofd == -1 ){ throw_error("open") } if (verbose_mode) fprintf(stderr, "Opened output file for writing: %s\n", outfile); free(outfile); return(ofd); } /* Initialize the zlib decompression engine */ void init_zlib(z_stream *d_stream, unsigned char *buffer, size_t bufsize) { int rc; memset(d_stream, 0, sizeof(z_stream)); d_stream->next_in = buffer; d_stream->avail_in = bufsize; rc = inflateInit2(d_stream, -15); /* Don't ask why -15 - I don't know */ if (rc != Z_OK) { throw_error("inflateInit2"); } } /* Skip gzip header stuff we don't care about */ void skip_gzip_header(z_stream *d_stream) { char flags; unsigned int len; flags = d_stream->next_in[3]; d_stream->next_in += 10; d_stream->avail_in -= 10; if ((flags & 0x04) !=0) /* Extra field */ { len = (unsigned int)*d_stream->next_in; len += ((unsigned int)*(d_stream->next_in)) << 8; d_stream->next_in += (2 + len); d_stream->avail_in -= (2 + len); } if ((flags & 0x08) != 0) /* Orig Name */ { while(*d_stream->next_in != 0) { ++d_stream->next_in; --d_stream->avail_in; } ++d_stream->next_in; --d_stream->avail_in; } if ((flags & 0x10) != 0) /* Comment */ while(*d_stream->next_in != 0) { ++d_stream->next_in; --d_stream->avail_in; } if ((flags & 0x02) != 0) /* Head CRC */ { d_stream->next_in += 2; d_stream->avail_in -= 2 ; } } /* Main program driver */ int main(int argc, char **argv) { int opt, rc, rc2, ifd, ofd, founderr=0, foundgood=0; ssize_t bytes_read=0, tot_written=0; off_t errpos=0, errinc=0, readpos=0; char *infile; unsigned char *inbuf, *outbuf; z_stream d_stream; /* Parse options */ while ((opt = getopt(argc, argv, optstring)) != -1) { switch (opt) { case 'h': show_usage_and_exit(0); break; case 'o': user_outname = optarg; outfile_specified = 1; break; case 'p': stdout_specified = 1; break; case 's': split_mode = 1; break; case 'v': verbose_mode = 1; break; case 'V': fprintf(stderr, "gzrecover %s\n", VERSION); break; default: show_usage_and_exit(1); } } /* Either output to stdout (-p) or specify filename (-o) but not both */ if (outfile_specified && stdout_specified) { fprintf(stderr, "gzrecover: Cannot specify output filename (-o) and stdout (-p) simultaneously.\n"); show_usage_and_exit(1); } /* Allocate our read buffer */ inbuf = (unsigned char *)malloc(inbuf_size); if( inbuf == 0 ){ throw_error("malloc") } /* Open input file using name or set to standard input if no file specified */ if (optind == argc) { infile = "stdin"; ifd = STDIN_FILENO; } else { infile = argv[optind]; ifd = open(infile, O_RDONLY); } if( ifd == -1 ){ free(inbuf); throw_error("open") } if (verbose_mode) fprintf(stderr, "Opened input file for reading: %s\n", infile); /* Open output file & initialize output buffer */ ofd = open_outfile(infile); outbuf = (unsigned char *)malloc(outbuf_size); if( outbuf == 0 ){ throw_error("malloc") } /* Initialize zlib */ bytes_read = read_internal(ifd, inbuf, inbuf_size); if( -1 == bytes_read ){ throw_error("read") } if (bytes_read == 0) { if (verbose_mode) fprintf(stderr, "File is empty\n"); close(ifd); close(ofd); free(inbuf); free(outbuf); return(0); } readpos = bytes_read; init_zlib(&d_stream, inbuf, bytes_read); /* Assume there's a valid gzip header at the beginning of the file */ skip_gzip_header(&d_stream); /* Finally - decompress this bad boy */ for (;;) { d_stream.next_out = outbuf; d_stream.avail_out = outbuf_size; rc = inflate(&d_stream, Z_NO_FLUSH); /* Here is the strategy. If we bomb, we reset zlib to one byte past the * error location and keep doing it until such time as we are able * to start decompressing something. Alas, this seems to result in * a number of false starts. */ if ((rc != Z_OK) && (rc != Z_STREAM_END)) { foundgood = 0; /* If founderr flag is set, this is our first error. So set * the error flag, reset the increment counter to 0, and * read more data from the stream if necessary */ if (!founderr) { founderr = 1; errpos = bytes_read - d_stream.avail_in; if (verbose_mode) fprintf(stderr, "Found error at byte %d in input stream\n", (int)(readpos - (bytes_read - errpos))); if (d_stream.avail_in == 0) { bytes_read = read_internal(ifd, inbuf, inbuf_size); if( bytes_read == -1 ){ throw_error("read") } if (bytes_read == 0) break; readpos += bytes_read; errinc = 0; inflateEnd(&d_stream); init_zlib(&d_stream, inbuf, bytes_read); continue; } } /* Note that we fall through to here from above unless we * had to do a re-read n the stream. Set the increment the * error increment counter, then re-initialize zlib from * the point of the original error + the value of the increment * counter (which starts out at 1). Each time through we keep * incrementing one more byte through the buffer until we * either find a good byte, or exhaust it and have to re-read. */ inflateEnd(&d_stream); ++errinc; /* More left to try in our buffer */ if (bytes_read > (size_t)(errpos+errinc) ) { init_zlib(&d_stream, inbuf+errpos+errinc, bytes_read - (errpos+errinc)); } /* Nothing left in our buffer - read again */ else { bytes_read = read_internal(ifd, inbuf, inbuf_size); if( bytes_read == -1 ){ throw_error("read") } if (bytes_read == 0) break; readpos += bytes_read; inflateEnd(&d_stream); init_zlib(&d_stream, inbuf, bytes_read); /* Reset errpos and errinc to zero, but leave the founderr flag as true */ errpos = 0; errinc = 0; } continue; } /* If we make it here, we were able to decompress data. If the * founderr flag says we were previously in an error state, that means * we are starting to decode again after bypassing a region of * corruption. Reset the various flags and counters. If we are in * split mode, open the next increment of output files. */ if (founderr & !foundgood) { foundgood = 1; founderr = 0; errinc = 0; if (verbose_mode) fprintf(stderr, "Found good data at byte %d in input stream\n", (int)(readpos - (bytes_read - d_stream.avail_in))); if (split_mode) { close(ofd); ofd = open_outfile(infile); } } /* Write decompressed output - should really handle short write counts */ rc2 = write(ofd, outbuf, outbuf_size - d_stream.avail_out); if ( rc2 == -1 ){ throw_error("write") } tot_written += rc2; /* We've exhausted our input buffer, read some more */ if (d_stream.avail_in == 0) { bytes_read = read_internal(ifd, inbuf, inbuf_size); if( bytes_read == -1 ){ throw_error("read"); } if (bytes_read == 0) break; readpos += bytes_read; errinc = 0; d_stream.next_in = inbuf; d_stream.avail_in = bytes_read; } /* In we get a false alarm on end of file, we need to handle that to. * Reset to one byte past where it occurs. This seems to happen * quite a bit */ if (rc == Z_STREAM_END) { off_t tmppos = d_stream.avail_in; inflateEnd(&d_stream); if ((unsigned char *)d_stream.next_in == inbuf) { init_zlib(&d_stream, inbuf, bytes_read); } else { init_zlib(&d_stream, inbuf + (bytes_read - tmppos) + 1, tmppos + 1); } continue; } } inflateEnd(&d_stream); /* Close up files */ close(ofd); close(ifd); if (verbose_mode) fprintf(stderr, "Total decompressed output = %d bytes\n", (int)tot_written); free(inbuf); free(outbuf); return(0); } gzrt-0.8/gzrecover.10000664000175200017520000000120112223100533015534 0ustar telestriantelestrian.TH GZRT 1 "July 13, 2006" .SH NAME gzrecover \- gzip recovery toolkit .SH SYNOPSIS .B gzrecover .RI [ options ] " file" .SH DESCRIPTION \fBgzrecover\fP will attempt to skip over corrupted data in a gzip archive, allowing the remaining data to be recovered. .SH OPTIONS .TP .B \-h .br Show usage statement. .TP .B \-V .br Display version number. .TP .B \-v .br Turn on verbose mode. .TP .B \-s .br Turn on split mode. .TP .B \-o .I file .br Set the output file. .TP .B \-p .br Write recovered data to stdout instead of file. .SH SEE ALSO .BR gzip (1), .BR cpio (1). .SH AUTHOR gzrecover was written by Aaron M. Renn . gzrt-0.8/README.build0000664000175200017520000000166112222627741015452 0ustar telestriantelestrianINSTALLATION: To build gzrecover, type "make" at the command line. This will build the gzrecover executable. gzrecover relies on the zlib compression library which is not included in the distribution. You can download it from http://www.gzip.org/zlib/ if you need it. This needs to be installed before building gzrecover. Most GNU/Linux system should already have this. To install the executable, copy it into the directory of your choice. For example: cp gzrecover /usr/local/bin Or just run it out of the build directory. Certain Linux distributions, notably Debian based ones, have a package you can install for gzrecover. The package name is 'gzrt'. COPYRIGHT NOTICE gzrecover written by Aaron M. Renn (arenn@urbanophile.com) Copyright (c) 2002-2013 Aaron M. Renn. This code is licensed under the same GNU General Public License v2 (or at your option, any later version) at GNU tar. See http://www.gnu.org/licenses/gpl.html