python - Differentiating between compressed .gz files and archived tar.gz files properly? -
what proper way deal differentiating between plain compressed file in gzip or bzip2 format (eg. .gz) , tarball compressed gzip or bzip2 (eg. .tar.gz) identification using suffix extensions not reliable option it's possible files may end renamed.
now on command line able this:
bzip2 -dc test.tar.bz2 |head|file -
so attempted similar in python following function:
def get_magic(self, store_file, buffer=false, look_deeper=false): # see we're indexing if look_deeper == true: m = magic.magic(mime=true, uncompress=true) else: m = magic.magic(mime=true) if buffer == false: try: file_type = m.from_file(store_file) except exception, e: raise e else: try: file_type = m.from_buffer(store_file) except exception, e: raise e return file_type
then when trying read compressed tarball i'll pass in buffer elsewhere via:
file_buffer = open(file_name).read(8096) archive_check = self.get_magic(file_buffer, true, true)
unfortunately becomes problematic using uncompress flag in python-magic because appears python-magic expecting me pass in entire file though want read buffer. end exception:
bzip2 error: compressed file ends unexpectedly
seeing the files looking @ can end being 2m 20gb in size becomes rather problematic. don't want read entire file.
can hacked , chop end of compressed file off , append buffer? better ignore idea of uncompressing file using python-magic , instead before pass in buffer identify via:
file_buffer = open(file_name, "r:bz2").read(8096)
is there better way?
it tar file if uncompressed data @ offset 257 "ustar", or if uncompressed data in entirety 1024 0 bytes (an empty tar file).
you can read first 1024 bytes of uncompressed data using z = zlib.decompressobj()
or z = bz2.bz2decompressor()
, , z.decompress()
.
Comments
Post a Comment