python gzip压缩
Python gzip module provides a very simple way to compress and decompress files and work in a similar manner to GNU programs gzip and gunzip.
Python gzip模块提供了一种非常简单的方式来压缩和解压缩文件,并以类似于GNU程序gzip和gunzip的方式工作。
In this lesson, we will study what classes are present in this module which allows us to perform the mentioned operations along with the additional functions it provides.
在本课程中,我们将研究此模块中存在哪些类,这些类使我们能够执行上述操作以及它提供的其他功能。
This module provides us with the Gzip
class which contains some convenience functions like open()
, compress()
and decompress()
.
该模块为我们提供了Gzip
类,其中包含一些便捷功能,例如open()
, compress()
和decompress()
。
The advantage Gzip
class provides us is that it reads and writes gzip
files and automatically compresses and decompresses it so that in the program, they looks just like normal File objects.
Gzip
类提供给我们的好处是,它可以读写gzip
文件并自动对其进行压缩和解压缩,以便在程序中它们看起来像普通的File对象。
It is important to remember that the other formats which are supported by the programs gzip and gunzip are not supported by this module.
重要的是要记住,该模块不支持程序gzip和gunzip支持的其他格式。
We will now start using the functions we mentioned to perform compression and decompression operations.
现在,我们将开始使用我们提到的功能来执行压缩和解压缩操作。
We will start with the open() function which creates an instance of GzipFile
and open the file with wb
mode to write to a compressed file:
我们将从open()函数开始,该函数创建GzipFile
的实例,并以wb
模式打开文件以写入压缩文件:
import gzip
import io
import os
output_file_name = 'jd_example.txt.gz'
file_mode = 'wb'
with gzip.open(output_file_name, file_mode) as output:
with io.TextIOWrapper(output, encoding='utf-8') as encode:
encode.write('We can write anything in the file here.\n')
print(output_file_name,
'contains', os.stat(output_file_name).st_size, 'bytes')
os.system('file -b --mime {}'.format(output_file_name))
Let’s see the output for this program:
让我们看一下该程序的输出:
To write to the compressed file, we first opened it in the wb
mode and wrapped the GzipFile instance with a TextIOWrapper from the io module to encode Unicode text to bytes which is suitable for compression.
要写入压缩文件,我们首先以wb
模式打开它,并使用io模块中的TextIOWrapper包装GzipFile实例,以将Unicode文本编码为适合压缩的字节。
This time, we will use almost the same script as we used above but we will write multiple lines to it. Let’s look at the code how this can be achieved:
这次,我们将使用与上面几乎相同的脚本,但是将向其中写入多行。 让我们看一下如何实现此代码:
import gzip
import io
import os
import itertools
output_file_name = 'jd_example.txt.gz'
file_mode = 'wb'
with gzip.open(output_file_name, file_mode) as output:
with io.TextIOWrapper(output, encoding='utf-8') as enc:
enc.writelines(
itertools.repeat('JournalDev, same line again and again!.\n', 10)
)
os.system('gzcat jd_example.txt.gz')
Let’s see the output for this program:
让我们看一下该程序的输出:
Now that we’re done with the file writing process, we can read data form the compressed file as well. We will now use another file mode, which is rb
, read mode.
现在我们完成了文件写入过程,我们也可以从压缩文件中读取数据。 现在,我们将使用另一种文件模式,即rb
,读取模式。
import gzip
import io
import os
read_file_name = 'jd_example.txt.gz'
file_mode = 'rb'
with gzip.open(read_file_name, file_mode) as input_file:
with io.TextIOWrapper(input_file, encoding='utf-8') as dec:
print(dec.read())
Let’s see the output for this program:
让我们看一下该程序的输出:
Notice that there was nothing special we did here with Gzip apart form passing it a different file mode. The reading process is done by the TextIOWrapper
which uses as File object which is provided by the gzip
module.
请注意,我们在这里对Gzip进行分离并没有什么特别的形式,它将不同的文件模式传递给了它。 读取过程由TextIOWrapper
完成,该TextIOWrapper
用作gzip
模块提供的File对象。
Another big advantage gzip module offers is that it can be used to wrap other types of streams as well so they can make use of compression too. This is extremely useful when you want to transmit a lot of data over web sockets.
gzip模块提供的另一个重大优势是,它也可以用于包装其他类型的流,因此它们也可以利用压缩。 当您想通过Web套接字传输大量数据时,这非常有用。
Let’s see how we can compress and decompress stream data:
让我们看看如何压缩和解压缩流数据:
import gzip
from io import BytesIO
import binascii
write_mode = 'wb'
read_mode = 'rb'
uncompressed = b'Reiterated line n times.\n' * 8
print('Uncompressed Data:', len(uncompressed))
print(uncompressed)
buf = BytesIO()
with gzip.GzipFile(mode=write_mode, fileobj=buf) as file:
file.write(uncompressed)
compressed = buf.getvalue()
print('Compressed Data:', len(compressed))
print(binascii.hexlify(compressed))
inbuffer = BytesIO(compressed)
with gzip.GzipFile(mode=read_mode, fileobj=inbuffer) as file:
read_data = file.read(len(uncompressed))
print('\nReading it again:', len(read_data))
print(read_data)
Let’s see the output for this program:
让我们看一下该程序的输出:
Notice that while writing, we didn’t have to provide any length parameters. But this wasn’t the case when we re-read the data. We had to pass the length to read()
function explicitly.
请注意,在编写时,我们不必提供任何长度参数。 但是,当我们重新读取数据时情况并非如此。 我们必须将长度明确传递给read()
函数。
In this lesson, we studied Python gzip module which can be used to read and write to compressed files with a big advantage that the modules makes the compressed file looks like just a normal File object.
在本课程中,我们研究了Python gzip模块,该模块可用于读取和写入压缩文件,这具有很大的优势,即模块使压缩文件看起来就像普通的File对象。
Reference: API Doc
参考: API文档
翻译自: https://www.journaldev.com/19827/python-gzip-compress-decompress
python gzip压缩