python - Differentiating between compressed .gz files and archived tar.gz files properly? -

- March 15, 2015

what proper way deal differentiating between plain compressed file in gzip or bzip2 format (eg. .gz) , tarball compressed gzip or bzip2 (eg. .tar.gz) identification using suffix extensions not reliable option it's possible files may end renamed.

now on command line able this:

bzip2 -dc test.tar.bz2 |head|file -

so attempted similar in python following function:

def get_magic(self, store_file, buffer=false, look_deeper=false):     # see we're indexing     if look_deeper == true:         m = magic.magic(mime=true, uncompress=true)     else:         m = magic.magic(mime=true)       if buffer == false:         try:             file_type = m.from_file(store_file)          except exception, e:             raise e      else:         try:             file_type = m.from_buffer(store_file)          except exception, e:             raise e      return file_type

then when trying read compressed tarball i'll pass in buffer elsewhere via:

    file_buffer = open(file_name).read(8096)      archive_check = self.get_magic(file_buffer, true, true)

unfortunately becomes problematic using uncompress flag in python-magic because appears python-magic expecting me pass in entire file though want read buffer. end exception:

bzip2 error: compressed file ends unexpectedly

seeing the files looking @ can end being 2m 20gb in size becomes rather problematic. don't want read entire file.

can hacked , chop end of compressed file off , append buffer? better ignore idea of uncompressing file using python-magic , instead before pass in buffer identify via:

    file_buffer = open(file_name, "r:bz2").read(8096)

is there better way?

it tar file if uncompressed data @ offset 257 "ustar", or if uncompressed data in entirety 1024 0 bytes (an empty tar file).

you can read first 1024 bytes of uncompressed data using z = zlib.decompressobj() or z = bz2.bz2decompressor(), , z.decompress().

Search This Blog

celery

python - Differentiating between compressed .gz files and archived tar.gz files properly? -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -