使用Python处理文件

Python has several built-in modules and functions for handling files. These functions are spread out over several modules such as os, os.path, shutil, and pathlib, to name a few. This article gathers in one place many of the functions you need to know in order to perform the most common operations on files in Python.

Python有几个内置的模块和函数来处理文件。 这些功能分布在几个模块上,例如osos.pathshutilpathlib ,仅举几例。 本文将您需要了解的许多功能集中在一起,以便对Python中的文件执行最常见的操作。

In this tutorial, you’ll learn how to:

在本教程中,您将学习如何:

  • Retrieve file properties
  • Create directories
  • Match patterns in filenames
  • Traverse directory trees
  • Make temporary files and directories
  • Delete files and directories
  • Copy, move, or rename files and directories
  • Create and extract ZIP and TAR archives
  • Open multiple files using the fileinput module
  • 检索文件属性
  • 创建目录
  • 匹配文件名中的模式
  • 遍历目录树
  • 制作临时文件和目录
  • 删除文件和目录
  • 复制,移动或重命名文件和目录
  • 创建和提取ZIP和TAR存档
  • 使用fileinput模块打开多个文件

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.

免费奖金: 关于Python精通的5个想法 ,这是针对Python开发人员的免费课程,向您展示了将Python技能提升到新水平所需的路线图和心态。

用Python读写数据到文件 (Reading and Writing Data to Files in Python)

Reading and writing data to files using Python is pretty straightforward. To do this, you must first open files in the appropriate mode. Here’s an example of how to open a text file and read its contents:

使用Python读取和写入数据非常简单。 为此,您必须首先以适当的模式打开文件。 这是一个如何打开文本文件并读取其内容的示例:

 with with openopen (( 'data.txt''data.txt' , , 'r''r' ) ) as as ff :
    :
    data data = = ff .. readread ()
()

open() takes a filename and a mode as its arguments. r opens the file in read only mode. To write data to a file, pass in w as an argument instead:

open()以文件名和模式作为参数。 r以只读模式打开文件。 要将数据写入文件,请传入w作为参数:

In the examples above, open() opens files for reading or writing and returns a file handle (f in this case) that provides methods that can be used to read or write data to the file. Read Working With File I/O in Python for more information on how to read and write to files.

在上面的示例中, open()打开文件以进行读取或写入,并返回文件句柄(在本例中为f ,该句柄提供了可用于向文件读取或写入数据的方法。 阅读Python中的使用文件I / O,了解有关如何读写文件的更多信息。

获取目录列表 (Getting a Directory Listing)

Suppose your current working directory has a subdirectory called my_directory that has the following contents:

假设您当前的工作目录有一个名为my_directory的子目录,其内容如下:

 .
.
├── file1.py
├── file1.py
├── file2.csv
├── file2.csv
├── file3.txt
├── file3.txt
├── sub_dir
├── sub_dir
│   ├── bar.py
│   ├── bar.py
│   └── foo.py
│   └── foo.py
├── sub_dir_b
├── sub_dir_b
│   └── file4.txt
│   └── file4.txt
└── sub_dir_c
└── sub_dir_c
    ├── config.py
    ├── config.py
    └── file5.txt
    └── file5.txt

The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date.

内置的os模块具有许多有用的功能,可用于列出目录内容和过滤结果。 要获取文件系统中特定目录中所有文件和文件夹的列表,请在旧版Python中使用os.listdir()或在Python 3.x中使用os.scandir() 。 如果您还想获取文件和目录属性(例如文件大小和修改日期os.scandir()则使用os.scandir()是首选方法。

旧版Python版本中的目录列表 (Directory Listing in Legacy Python Versions)

In versions of Python prior to Python 3, os.listdir() is the method to use to get a directory listing:

在Python 3之前的Python版本中, os.listdir()是用于获取目录列表的方法:

>>>
>>>
 >>>  import os
>>>  entries = os . listdir ( 'my_directory/' )

os.listdir() returns a Python list containing the names of the files and subdirectories in the directory given by the path argument:

os.listdir()返回一个Python列表,其中包含path参数给定的目录中文件和子目录的名称:

>>>
>>>
 >>>  os . listdir ( 'my_directory/' )
['sub_dir_c', 'file1.py', 'sub_dir_b', 'file3.txt', 'file2.csv', 'sub_dir']

A directory listing like that isn’t easy to read. Printing out the output of a call to os.listdir() using a loop helps clean things up:

这样的目录清单不容易阅读。 使用循环将对os.listdir()的调用输出打印出来有助于清理:

>>>
>>>
 >>>  entries = os . listdir ( 'my_directory/' )
>>>  for entry in entries :
...     print ( entry )
...
...
sub_dir_c
file1.py
sub_dir_b
file3.txt
file2.csv
sub_dir

现代Python版本中的目录列表 (Directory Listing in Modern Python Versions)

In modern versions of Python, an alternative to os.listdir() is to use os.scandir() and pathlib.Path().

在现代版本的Python中, os.listdir()的替代方法是使用os.scandir()pathlib.Path()

os.scandir() was introduced in Python 3.5 and is documented in PEP 471. os.scandir() returns an iterator as opposed to a list when called:

os.scandir()在Python 3.5中引入,并在PEP 471中进行了记录 。 os.scandir()返回一个与列表相反的迭代器:

>>>
>>>
 >>>  import os
>>>  entries = os . scandir ( 'my_directory/' )
>>>  entries


The ScandirIterator points to all the entries in the current directory. You can loop over the contents of the iterator and print out the filenames:

ScandirIterator指向当前目录中的所有条目。 您可以遍历迭代器的内容并打印出文件名:

Here, os.scandir() is used in conjunction with the with statement because it supports the context manager protocol. Using a context manager closes the iterator and frees up acquired resources automatically after the iterator has been exhausted. The result is a print out of the filenames in my_directory/ just like you saw in the os.listdir() example:

此处, os.scandir()with语句一起使用with因为它支持上下文管理器协议。 使用上下文管理器将关闭迭代器,并在耗尽迭代器后自动释放所获取的资源。 结果是打印出my_directory/中的文件名,就像在os.listdir()示例中看到的那样:

 sub_dir_c
sub_dir_c
file1.py
file1.py
sub_dir_b
sub_dir_b
file3.txt
file3.txt
file2.csv
file2.csv
sub_dir
sub_dir

Another way to get a directory listing is to use the pathlib module:

获取目录列表的另一种方法是使用pathlib模块:

The objects returned by Path are either PosixPath or WindowsPath objects depending on the OS.

根据操作系统, Path返回的对象是PosixPathWindowsPath对象。

pathlib.Path() objects have an .iterdir() method for creating an iterator of all files and folders in a directory. Each entry yielded by .iterdir() contains information about the file or directory such as its name and file attributes. pathlib was first introduced in Python 3.4 and is a great addition to Python that provides an object oriented interface to the filesystem.

pathlib.Path()对象具有.iterdir()方法,用于创建目录中所有文件和文件夹的迭代器。 .iterdir()产生的每个条目都包含有关文件或目录的信息,例如其名称和文件属性。 pathlib是在Python 3.4中引入的,它是对Python的重要补充,它为文件系统提供了面向对象的接口。

In the example above, you call pathlib.Path() and pass a path argument to it. Next is the call to .iterdir() to get a list of all files and directories in my_directory.

在上面的示例中,您调用pathlib.Path()并将路径参数传递给它。 接下来是对.iterdir()的调用,以获取.iterdir()中所有文件和目录的my_directory

pathlib offers a set of classes featuring most of the common operations on paths in an easy, object-oriented way. Using pathlib is more if not equally efficient as using the functions in os. Another benefit of using pathlib over os is that it reduces the number of imports you need to make to manipulate filesystem paths. For more information, read Python 3’s pathlib Module: Taming the File System.

pathlib提供了一组类,这些类以一种简单的,面向对象的方式包含了路径上的大多数常见操作。 使用pathlib比使用os的函数效率更高。 与os ,使用pathlib另一个好处是它减少了处理文件系统路径所需的导入次数。 有关更多信息,请阅读Python 3的pathlib模块:驯服文件系统 。

Running the code above produces the following:

运行上面的代码将产生以下结果:

 sub_dir_c
sub_dir_c
file1.py
file1.py
sub_dir_b
sub_dir_b
file3.txt
file3.txt
file2.csv
file2.csv
sub_dir
sub_dir

Using pathlib.Path() or os.scandir() instead of os.listdir() is the preferred way of getting a directory listing, especially when you’re working with code that needs the file type and file attribute information. pathlib.Path() offers much of the file and path handling functionality found in os and shutil, and it’s methods are more efficient than some found in these modules. We will discuss how to get file properties shortly.

使用pathlib.Path()os.scandir()而不是os.listdir()是获取目录列表的首选方法,尤其是在使用需要文件类型和文件属性信息的代码时。 pathlib.Path()提供了在osshutil发现的许多文件和路径处理功能,并且其方法比在这些模块中发现的方法更有效。 我们将很快讨论如何获取文件属性。

Here are the directory-listing functions again:

这里又是目录列表功能:

Function 功能 Description 描述
os.listdir()os.listdir() Returns a list of all files and folders in a directory 返回目录中所有文件和文件夹的列表
os.scandir()os.scandir() Returns an iterator of all the objects in a directory including file attribute information 返回目录中所有对象的迭代器,包括文件属性信息
pathlib.Path.iterdir()pathlib.Path.iterdir() Returns an iterator of all the objects in a directory including file attribute information 返回目录中所有对象的迭代器,包括文件属性信息

These functions return a list of everything in the directory, including subdirectories. This might not always be the behavior you want. The next section will show you how to filter the results from a directory listing.

这些函数返回目录中所有内容的列表,包括子目录。 这可能并不总是您想要的行为。 下一节将向您展示如何从目录列表中过滤结果。

列出目录中的所有文件 (Listing All Files in a Directory)

This section will show you how to print out the names of files in a directory using os.listdir(), os.scandir(), and pathlib.Path(). To filter out directories and only list files from a directory listing produced by os.listdir(), use os.path:

本节将向您展示如何使用os.listdir()os.scandir()pathlib.Path()打印目录中文件的名称。 要过滤掉目录并仅列出os.listdir()产生的目录列表中的文件,请使用os.path

Here, the call to os.listdir() returns a list of everything in the specified path, and then that list is filtered by os.path.isfile() to only print out files and not directories. This produces the following output:

在这里,对os.listdir()的调用返回指定路径中所有内容的列表,然后该列表由os.path.isfile()过滤,仅打印出文件,而不打印目录。 这将产生以下输出:

 file1.py
file1.py
file3.txt
file3.txt
file2.csv
file2.csv

An easier way to list files in a directory is to use os.scandir() or pathlib.Path():

列出目录中文件的更简单方法是使用os.scandir()pathlib.Path()

Using os.scandir() has the advantage of looking cleaner and being easier to understand than using os.listdir(), even though it is one line of code longer. Calling entry.is_file() on each item in the ScandirIterator returns True if the object is a file. Printing out the names of all files in the directory gives you the following output:

使用os.scandir()有寻找更清洁,更易于使用比了解的优势os.listdir()即使它是一个代码行更长。 如果对象是文件,则在ScandirIterator每个项目上调用entry.is_file() ScandirIterator返回True 。 打印出目录中所有文件的名称将为您提供以下输出:

 file1.py
file1.py
file3.txt
file3.txt
file2.csv
file2.csv

Here’s how to list files in a directory using pathlib.Path():

这是使用pathlib.Path()列出目录中文件的方法:

Here, you call .is_file() on each entry yielded by .iterdir(). The output produced is the same:

在这里,您对.is_file()产生的每个条目调用.iterdir() 。 产生的输出是相同的:

 file1.py
file1.py
file3.txt
file3.txt
file2.csv
file2.csv

The code above can be made more concise if you combine the for loop and the if statement into a single generator expression. Dan Bader has an excellent article on generator expressions and list comprehensions.

如果将for循环和if语句组合到单个生成器表达式中,则可以使上面的代码更简洁。 Dan Bader在有关生成器表达式和列表理解的文章中非常出色 。

The modified version looks like this:

修改后的版本如下所示:

This produces exactly the same output as the example before it. This section showed that filtering files or directories using os.scandir() and pathlib.Path() feels more intuitive and looks cleaner than using os.listdir() in conjunction with os.path.

这将产生与之前示例完全相同的输出。 本节显示,与结合使用os.listdir()os.path相比,使用os.scandir()pathlib.Path()过滤文件或目录的感觉更直观,看起来更干净。

列出子目录 (Listing Subdirectories)

To list subdirectories instead of files, use one of the methods below. Here’s how to use os.listdir() and os.path():

要列出子目录而不是文件,请使用以下方法之一。 这是使用os.listdir()os.path()

 import import os


os


# List all subdirectories using os.listdir
# List all subdirectories using os.listdir
basepath basepath = = 'my_directory/'
'my_directory/'
for for entry entry in in osos .. listdirlistdir (( basepathbasepath ):
    ):
    if if osos .. pathpath .. isdirisdir (( osos .. pathpath .. joinjoin (( basepathbasepath , , entryentry )):
        )):
        printprint (( entryentry )
)

Manipulating filesystem paths this way can quickly become cumbersome when you have multiple calls to os.path.join(). Running this on my computer produces the following output:

当您多次调用os.path.join()时,以这种方式操作文件系统路径可能很快变得很麻烦。 在我的计算机上运行此命令将产生以下输出:

Here’s how to use os.scandir():

这是使用os.scandir()

 import import os


os


# List all subdirectories using scandir()
# List all subdirectories using scandir()
basepath basepath = = 'my_directory/'
'my_directory/'
with with osos .. scandirscandir (( basepathbasepath ) ) as as entriesentries :
    :
    for for entry entry in in entriesentries :
        :
        if if entryentry .. is_diris_dir ():
            ():
            printprint (( entryentry .. namename )
)

As in the file listing example, here you call .is_dir() on each entry returned by os.scandir(). If the entry is a directory, .is_dir() returns True, and the directory’s name is printed out. The output is the same as above:

就像在文件列表示例中一样,在这里您对.is_dir()返回的每个条目调用os.scandir() 。 如果条目是目录,则.is_dir()返回True ,并打印出目录名。 输出与上面相同:

Here’s how to use pathlib.Path():

这是使用pathlib.Path()

 from from pathlib pathlib import import Path


Path


# List all subdirectory using pathlib
# List all subdirectory using pathlib
basepath basepath = = PathPath (( 'my_directory/''my_directory/' )
)
for for entry entry in in basepathbasepath .. iterdiriterdir ():
    ():
    if if entryentry .. is_diris_dir ():
        ():
        printprint (( entryentry .. namename )
)

Calling .is_dir() on each entry of the basepath iterator checks if an entry is a file or a directory. If the entry is a directory, its name is printed out to the screen, and the output produced is the same as the one from the previous example:

调用.is_dir()上的每个条目basepath迭代器检查是否一个输入是文件还是目录。 如果条目是目录,则其名称将显示在屏幕上,并且产生的输出与上一个示例中的输出相同:

获取文件属性 (Getting File Attributes)

Python makes retrieving file attributes such as file size and modified times easy. This is done through os.stat(), os.scandir(), or pathlib.Path().

Python使检索文件属性(例如文件大小和修改时间)变得容易。 这是通过os.stat()os.scandir()pathlib.Path()

os.scandir() and pathlib.Path() retrieve a directory listing with file attributes combined. This can be potentially more efficient than using os.listdir() to list files and then getting file attribute information for each file.

os.scandir()pathlib.Path()检索结合了文件属性的目录列表。 这可能比使用os.listdir()列出文件然后获取每个文件的文件属性信息更有效。

The examples below show how to get the time the files in my_directory/ were last modified. The output is in seconds:

下面的示例显示如何获取my_directory/中文件的最后修改时间。 输出以秒为单位:

>>>
>>> import os
>>> with os.scandir('my_directory/') as dir_contents:
...     for entry in dir_contents:
...         info = entry.stat()
...         print(info.st_mtime)
...
1539032199.0052035
1539032469.6324475
1538998552.2402923
1540233322.4009316
1537192240.0497339
1540266380.3434134

>>>

os.scandir() returns a ScandirIterator object. Each entry in a ScandirIterator object has a .stat() method that retrieves information about the file or directory it points to. .stat() provides information such as file size and the time of last modification. In the example above, the code prints out the st_mtime attribute, which is the time the content of the file was last modified.

os.scandir()返回一个ScandirIterator对象。 ScandirIterator对象中的每个条目都有一个.stat()方法,该方法检索有关它指向的文件或目录的信息。 .stat()提供诸如文件大小和最后修改时间之类的信息。 在上面的示例中,代码输出了st_mtime属性,该属性是文件内容的最后修改时间。

The pathlib module has corresponding methods for retrieving file information that give the same results:

pathlib模块具有检索文件信息的相应方法,这些方法给出相同的结果:

>>>
>>> from pathlib import Path
>>> current_dir = Path('my_directory')
>>> for path in current_dir.iterdir():
...     info = path.stat()
...     print(info.st_mtime)
...
1539032199.0052035
1539032469.6324475
1538998552.2402923
1540233322.4009316
1537192240.0497339
1540266380.3434134

>>>

In the example above, the code loops through the object returned by .iterdir() and retrieves file attributes through a .stat() call for each file in the directory list. The st_mtime attribute returns a float value that represents seconds since the epoch. To convert the values returned by st_mtime for display purposes, you could write a helper function to convert the seconds into a datetime object:

在上面的示例中,代码循环遍历.iterdir()返回的对象,并通过.stat()调用为目录列表中的每个文件检索文件属性。 st_mtime属性返回一个浮点值,该值表示自epoch以来的秒数 。 为了转换st_mtime返回的值以进行显示,可以编写一个辅助函数,将秒转换为datetime对象:

 from from datetime datetime import import datetime
datetime
from from os os import import scandir

scandir

def def convert_dateconvert_date (( timestamptimestamp ):
    ):
    d d = = datetimedatetime .. utcfromtimestamputcfromtimestamp (( timestamptimestamp )
    )
    formated_date formated_date = = dd .. strftimestrftime (( '' %d%d  %b %Y' %b %Y' )
    )
    return return formated_date

formated_date

def def get_filesget_files ():
    ():
    dir_entries dir_entries = = scandirscandir (( 'my_directory/''my_directory/' )
    )
    for for entry entry in in dir_entriesdir_entries :
        :
        if if entryentry .. is_fileis_file ():
            ():
            info info = = entryentry .. statstat ()
            ()
            printprint (( ff '' {entry.name}{entry.name} tt  Last Modified: {convert_date(info.st_mtime)}' Last Modified: {convert_date(info.st_mtime)}' )
)

This will first get a list of files in my_directory and their attributes and then call convert_date() to convert each file’s last modified time into a human readable form. convert_date() makes use of .strftime() to convert the time in seconds into a string.

这将首先获取my_directory中的文件及其属性的列表,然后调用convert_date()将每个文件的最后修改时间转换为人类可读的形式。 convert_date()利用.strftime()将以秒为单位的时间转换为字符串。

The arguments passed to .strftime() are the following:

传递给.strftime()的参数如下:

  • %d: the day of the month
  • %b: the month, in abbreviated form
  • %Y: the year
  • %d每月的某天
  • %b月份(缩写形式)
  • %Y年份

Together, these directives produce output that looks like this:

这些指令一起产生的输出如下所示:

>>>
>>>
 >>>  get_files ()
file1.py        Last modified:  04 Oct 2018
file3.txt       Last modified:  17 Sep 2018
file2.txt       Last modified:  17 Sep 2018

The syntax for converting dates and times into strings can be quite confusing. To read more about it, check out the official documentation on it. Another handy reference that is easy to remember is http://strftime.org/ .

将日期和时间转换为字符串的语法可能会非常混乱。 要了解更多信息,请查看其官方文档 。 另一个易于记忆的便捷参考是http://strftime.org/ 。

制作目录 (Making Directories)

Sooner or later, the programs you write will have to create directories in order to store data in them. os and pathlib include functions for creating directories. We’ll consider these:

迟早要编写的程序必须创建目录才能在其中存储数据。 ospathlib包含用于创建目录的功能。 我们将考虑以下因素:

Function 功能 Description 描述
os.mkdir()os.mkdir() Creates a single subdirectory 创建一个子目录
pathlib.Path.mkdir()pathlib.Path.mkdir() Creates single or multiple directories 创建单个或多个目录
os.makedirs()os.makedirs() Creates multiple directories, including intermediate directories 创建多个目录,包括中间目录

创建一个目录 (Creating a Single Directory)

To create a single directory, pass a path to the directory as a parameter to os.mkdir():

要创建一个目录,请将目录路径作为参数传递给os.mkdir()

If a directory already exists, os.mkdir() raises FileExistsError. Alternatively, you can create a directory using pathlib:

如果目录已经存在,则os.mkdir()引发FileExistsError 。 另外,您可以使用pathlib创建目录:

 from from pathlib pathlib import import Path


Path


p p = = PathPath (( 'example_directory/''example_directory/' )
)
pp .. mkdirmkdir ()
()

If the path already exists, mkdir() raises a FileExistsError:

如果路径已经存在,则mkdir()引发FileExistsError

>>>
>>>
 >>>  p . mkdir ()
Traceback (most recent call last):
  File '', line 1, in 
  File '/usr/lib/python3.5/pathlib.py', line 1214, in mkdir
    self._accessor.mkdir(self, mode)
  File '/usr/lib/python3.5/pathlib.py', line 371, in wrapped
    return strfunc(str(pathobj), *args)
FileExistsError : [Errno 17] File exists: '.'
[Errno 17] File exists: '.'

To avoid errors like this, catch the error when it happens and let your user know:

为避免此类错误,请在发生错误时捕获错误 ,并让您的用户知道:

Alternatively, you can ignore the FileExistsError by passing the exist_ok=True argument to .mkdir():

或者,你可以忽略FileExistsError通过将exist_ok=True参数.mkdir()

 from from pathlib pathlib import import Path


Path


p p = = PathPath (( 'example_directory''example_directory' )
)
pp .. mkdirmkdir (( exist_okexist_ok == TrueTrue )
)

This will not raise an error if the directory already exists.

如果目录已经存在,这不会引发错误。

创建多个目录 (Creating Multiple Directories)

os.makedirs() is similar to os.mkdir(). The difference between the two is that not only can os.makedirs() create individual directories, it can also be used to create directory trees. In other words, it can create any necessary intermediate folders in order to ensure a full path exists.

os.makedirs()os.mkdir()类似。 两者的区别在于os.makedirs()不仅可以创建单个目录,还可以用于创建目录树。 换句话说,它可以创建任何必要的中间文件夹,以确保存在完整路径。

os.makedirs() is similar to running mkdir -p in Bash. For example, to create a group of directories like 2018/10/05, all you have to do is the following:

os.makedirs()与在Bash中运行mkdir -p相似。 例如,要创建一组类似于2018/10/05的目录,您需要做的是以下操作:

This will create a nested directory structure that contains the folders 2018, 10, and 05:

这将创建一个嵌套目录结构,其中包含文件夹2018、10和05:

 .
.
└── 2018
└── 2018
    └── 10
    └── 10
        └── 05
        └── 05

.makedirs() creates directories with default permissions. If you need to create directories with different permissions call .makedirs() and pass in the mode you would like the directories to be created in:

.makedirs()创建具有默认权限的目录。 如果需要创建具有不同权限的目录,请调用.makedirs()并采用该模式,您希望在以下目录中创建目录:

This creates the 2018/10/05 directory structure and gives the owner and group users read, write, and execute permissions. The default mode is 0o777, and the file permission bits of existing parent directories are not changed. For more details on file permissions, and how the mode is applied, see the docs.

这将创建2018/10/05目录结构,并为所有者和组用户提供读取,写入和执行权限。 默认模式是0o777 ,并且现有父目录的文件许可权位不会更改。 有关文件权限以及如何应用此模式的更多详细信息, 请参阅docs 。

Run tree to confirm that the right permissions were applied:

运行tree以确认已应用正确的权限:

 $ tree -p -i .
$ tree -p -i .
.
.
[drwxrwx---]  2018
[drwxrwx---]  2018
[drwxrwx---]  10
[drwxrwx---]  10
[drwxrwx---]  05
[drwxrwx---]  05

This prints out a directory tree of the current directory. tree is normally used to list contents of directories in a tree-like format. Passing the -p and -i arguments to it prints out the directory names and their file permission information in a vertical list. -p prints out the file permissions, and -i makes tree produce a vertical list without indentation lines.

这将打印出当前目录的目录树。 tree通常用于以树状格式列出目录的内容。 将-p-i参数传递给它会在垂直列表中打印出目录名称及其文件许可信息。 -p打印出文件许可权, -i使tree产生没有缩进线的垂直列表。

As you can see, all of the directories have 770 permissions. An alternative way to create directories is to use .mkdir() from pathlib.Path:

如您所见,所有目录都具有770权限。 创建目录的另一种方法是使用.mkdir()pathlib.Path

Passing parents=True to Path.mkdir() makes it create the directory 05 and any parent directories necessary to make the path valid.

Path.mkdir() parents=True传递给Path.mkdir()使其创建目录05和使该路径有效所需的任何父目录。

By default, os.makedirs() and Path.mkdir() raise an OSError if the target directory already exists. This behavior can be overridden (as of Python 3.2) by passing exist_ok=True as a keyword argument when calling each function.

默认情况下,如果目标目录已存在,则os.makedirs()Path.mkdir()引发OSError 。 可以在调用每个函数时通过将exist_ok=True作为关键字参数传递来覆盖此行为(从Python 3.2开始)。

Running the code above produces a directory structure like the one below in one go:

运行上面的代码会一次性生成类似于下面的目录结构:

 .
.
└── 2018
└── 2018
    └── 10
    └── 10
        └── 05
        └── 05

I prefer using pathlib when creating directories because I can use the same function to create single or nested directories.

我更喜欢在创建目录时使用pathlib ,因为我可以使用相同的功能来创建单个目录或嵌套目录。

文件名模式匹配 (Filename Pattern Matching)

After getting a list of files in a directory using one of the methods above, you will most probably want to search for files that match a particular pattern.

使用上述方法之一获取目录中的文件列表后,您很可能希望搜索与特定模式匹配的文件。

These are the methods and functions available to you:

这些是您可以使用的方法和功能:

  • endswith() and startswith() string methods
  • fnmatch.fnmatch()
  • glob.glob()
  • pathlib.Path.glob()
  • endswith()startswith()字符串方法
  • fnmatch.fnmatch()
  • glob.glob()
  • pathlib.Path.glob()

Each of these is discussed below. The examples in this section will be performed on a directory called some_directory that has the following structure:

这些每一个都在下面讨论。 本节中的示例将在名为some_directory的目录上执行,该目录具有以下结构:

If you’re following along using a Bash shell, you can create the above directory structure using the following commands:

如果要继续使用Bash Shell,则可以使用以下命令创建以上目录结构:

 $ mkdir some_directory
$ mkdir some_directory
$ $ cd some_directory/
cd some_directory/
$ mkdir sub_dir
$ mkdir sub_dir
$ touch sub_dir/file1.py sub_dir/file2.py
$ touch sub_dir/file1.py sub_dir/file2.py
$ touch data_$ touch data_ {{ 01..0301 ..03 }.txt data_} .txt data_ {{ 01..0301 ..03 }_backup.txt admin.py tests.py
} _backup.txt admin.py tests.py

This will create the some_directory/ directory, change into it, and then create sub_dir. The next line creates file1.py and file2.py in sub_dir, and the last line creates all the other files using expansion. To learn more about shell expansion, visit this site.

这将创建some_directory/目录,切换到该目录,然后创建sub_dir 。 下一行创建file1.pyfile2.pysub_dir ,最后行用扩展的所有其他文件。 要了解有关外壳扩展的更多信息,请访问此网站 。

使用字符串方法 (Using String Methods)

Python has several built-in methods for modifying and manipulating strings. Two of these methods, .startswith() and .endswith(), are useful when you’re searching for patterns in filenames. To do this, first get a directory listing and then iterate over it:

Python有几种内置的方法来修改和操作字符串 。 当您在文件名中搜索模式时, .startswith().endswith()这两个方法很有用。 为此,首先获取目录列表,然后遍历该目录:

>>>
>>>
 >>>  import os

>>>  # Get .txt files
>>>  for f_name in os . listdir ( 'some_directory' ):
...     if f_name . endswith ( '.txt' ):
...         print ( f_name )

The code above finds all the files in some_directory/, iterates over them and uses .endswith() to print out the filenames that have the .txt file extension. Running this on my computer produces the following output:

上面的代码在some_directory/查找所有文件, some_directory/迭代,然后使用.endswith()打印出扩展名为.txt文件名。 在我的计算机上运行此命令将产生以下输出:

使用fnmatch简单文件名模式匹配 (Simple Filename Pattern Matching Using fnmatch)

String methods are limited in their matching abilities. fnmatch has more advanced functions and methods for pattern matching. We will consider fnmatch.fnmatch(), a function that supports the use of wildcards such as * and ? to match filenames. For example, in order to find all .txt files in a directory using fnmatch, you would do the following:

字符串方法的匹配能力有限。 fnmatch具有更高级的模式匹配功能和方法。 我们将考虑fnmatch.fnmatch() ,该函数支持使用通配符(例如*? 匹配文件名。 例如,为了使用fnmatch查找目录中的所有.txt文件,您可以执行以下操作:

>>>
>>> import os
>>> import fnmatch


for file_name in os.listdir('some_directory/'):
    if fnmatch.fnmatch(file_name, '*.txt'):
        print(file_name)

>>>

This iterates over the list of files in some_directory and uses .fnmatch() to perform a wildcard search for files that have the .txt extension.

这会遍历some_directory中的文件列表,并使用.fnmatch()对具有.txt扩展名的文件执行通配符搜索。

更高级的模式匹配 (More Advanced Pattern Matching)

Let’s suppose you want to find .txt files that meet certain criteria. For example, you could be only interested in finding .txt files that contain the word data, a number between a set of underscores, and the word backup in their filename. Something similar to data_01_backup, data_02_backup, or data_03_backup.

假设您要查找满足某些条件的.txt文件。 例如,您可能只想查找包含单词data ,一组下划线之间的数字以及文件名中的单词backup .txt文件。 类似data_01_backupdata_02_backupdata_03_backup

Using fnmatch.fnmatch(), you could do it this way:

使用fnmatch.fnmatch() ,您可以这样做:

>>>
>>> for filename in os.listdir('.'):
...     if fnmatch.fnmatch(filename, 'data_*_backup.txt'):
...         print(filename)

>>>

Here, you print only the names of files that match the data_*_backup.txt pattern. The asterisk in the pattern will match any character, so running this will find all text files whose filenames start with the word data and end in backup.txt, as you can see from the output below:

在这里,您仅打印与data_*_backup.txt模式匹配的文件名。 模式中的星号将与任何字符匹配,因此运行该命令将找到所有文件名以单词data开头并以backup.txt结尾的文本文件,如下面的输出所示:

 data_03_backup.txt
data_03_backup.txt
data_02_backup.txt
data_02_backup.txt
data_01_backup.txt
data_01_backup.txt

使用glob文件名模式匹配 (Filename Pattern Matching Using glob)

Another useful module for pattern matching is glob.

模式匹配的另一个有用模块是glob

.glob() in the glob module works just like fnmatch.fnmatch(), but unlike fnmatch.fnmatch(), it treats files beginning with a period (.) as special.

.glob()glob模块的工作原理就像fnmatch.fnmatch()但与fnmatch.fnmatch()它把一个周期(开头的文件. )特殊。

UNIX and related systems translate name patterns with wildcards like ? and * into a list of files. This is called globbing.

UNIX和相关系统使用通配符(如?转换名称模式?*到文件列表中。 这称为通配符。

For example, typing mv *.py python_files/ in a UNIX shell moves (mv) all files with the .py extension from the current directory to the directory python_files. The * character is a wildcard that means “any number of characters,” and *.py is the glob pattern. This shell capability is not available in the Windows Operating System. The glob module adds this capability in Python, which enables Windows programs to use this feature.

例如,在UNIX shell中键入mv *.py python_files/将所有扩展名为.py文件从当前目录移动( mv )到目录python_files*字符是通配符,表示“任意数量的字符”,而*.py是通配符模式。 Windows操作系统中没有此外壳功能。 glob模块在Python中添加了此功能,使Windows程序可以使用此功能。

Here’s an example of how to use glob to search for all Python (.py) source files in the current directory:

这是有关如何使用glob搜索当前目录中所有Python( .py )源文件的示例:

>>>
>>>
 >>>  import glob
>>>  glob . glob ( '*.py' )
['admin.py', 'tests.py']

glob.glob('*.py') searches for all files that have the .py extension in the current directory and returns them as a list. glob also supports shell-style wildcards to match patterns:

glob.glob('*.py')搜索当前目录中所有扩展名为.py文件,并将它们作为列表返回。 glob还支持shell样式的通配符以匹配模式:

>>>
>>>
 >>>  import glob
>>>  for name in glob . glob ( '*[0-9]*.txt' ):
...     print ( name )

This finds all text (.txt) files that contain digits in the filename:

这将查找文件名中包含数字的所有文本( .txt )文件:

glob makes it easy to search for files recursively in subdirectories too:

glob使在子目录中递归搜索文件变得容易:

>>>
>>> import glob
>>> for file in glob.iglob('**/*.py', recursive=True):
...     print(file)

>>>

This example makes use of glob.iglob() to search for .py files in the current directory and subdirectories. Passing recursive=True as an argument to .iglob() makes it search for .py files in the current directory and any subdirectories. The difference between glob.iglob() and glob.glob() is that .iglob() returns an iterator instead of a list.

本示例使用glob.iglob()在当前目录和子目录中搜索.py文件。 将recursive=True作为参数传递给.iglob()使其可以在当前目录和任何子目录中搜索.py文件。 之间的差glob.iglob()glob.glob().iglob()返回一个迭代而不是列表。

Running the program above produces the following:

运行上面的程序将产生以下结果:

 admin.py
admin.py
tests.py
tests.py
sub_dir/file1.py
sub_dir/file1.py
sub_dir/file2.py
sub_dir/file2.py

pathlib contains similar methods for making flexible file listings. The example below shows how you can use .Path.glob() to list file types that start with the letter p:

pathlib包含用于生成灵活文件列表的类似方法。 下面的示例显示如何使用.Path.glob()列出以字母p开头的文件类型:

>>>
>>>
 >>>  from pathlib import Path
>>>  p = Path ( '.' )
>>>  for name in p . glob ( '*.p*' ):
...     print ( name )

admin.py
scraper.py
docs.pdf

Calling p.glob('*.p*') returns a generator object that points to all files in the current directory that start with the letter p in their file extension.

调用p.glob('*.p*')返回一个生成器对象,该对象指向当前目录中所有以文件扩展名中的字母p开头的文件。

Path.glob() is similar to os.glob() discussed above. As you can see, pathlib combines many of the best features of the os, os.path, and glob modules into one single module, which makes it a joy to use.

Path.glob()是类似于os.glob()如上所述。 如您所见, pathlibosos.pathglob模块的许多最佳功能组合到一个模块中,这使它使用起来很pathlib

To recap, here is a table of the functions we have covered in this section:

回顾一下,这是我们在本节中介绍的功能的表格:

Function 功能 Description 描述
startswith()startswith() True or TrueFalseFalse
endswith()endswith() True or TrueFalseFalse
fnmatch.fnmatch(filename, pattern)fnmatch.fnmatch(filename, pattern) True or TrueFalseFalse
glob.glob()glob.glob() Returns a list of filenames that match a pattern 返回与模式匹配的文件名列表
pathlib.Path.glob()pathlib.Path.glob() Finds patterns in path names and returns a generator object 查找路径名中的模式并返回生成器对象

遍历目录和处理文件 (Traversing Directories and Processing Files)

A common programming task is walking a directory tree and processing files in the tree. Let’s explore how the built-in Python function os.walk() can be used to do this. os.walk() is used to generate filename in a directory tree by walking the tree either top-down or bottom-up. For the purposes of this section, we’ll be manipulating the following directory tree:

常见的编程任务是遍历目录树并在该树中处理文件。 让我们探索如何使用内置的Python函数os.walk()来完成此任务。 os.walk()用于通过自上而下或自下而上浏览目录树来生成文件名。 就本节而言,我们将操作以下目录树:

The following is an example that shows you how to list all files and directories in a directory tree using os.walk().

以下是一个示例,向您展示如何使用os.walk()列出目录树中的所有文件和目录。

os.walk() defaults to traversing directories in a top-down manner:

os.walk()默认以自上而下的方式遍历目录:

 # Walking a directory tree and printing the names of the directories and files
# Walking a directory tree and printing the names of the directories and files
for for dirpathdirpath , , dirnamesdirnames , , files files in in osos .. walkwalk (( '.''.' ):
    ):
    printprint (( ff 'Found directory: 'Found directory:  {dirpath}{dirpath} '' )
    )
    for for file_name file_name in in filesfiles :
        :
        printprint (( file_namefile_name )
)

os.walk() returns three values on each iteration of the loop:

os.walk()在循环的每次迭代中返回三个值:

  1. The name of the current folder

  2. A list of folders in the current folder

  3. A list of files in the current folder

  1. 当前文件夹的名称

  2. 当前文件夹中的文件夹列表

  3. 当前文件夹中的文件列表

On each iteration, it prints out the names of the subdirectories and files it finds:

在每次迭代中,它都会打印出所找到的子目录和文件的名称:

To traverse the directory tree in a bottom-up manner, pass in a topdown=False keyword argument to os.walk():

要以自底向上的方式遍历目录树,请将topdown=False关键字参数传递给os.walk()

 for for dirpathdirpath , , dirnamesdirnames , , files files in in osos .. walkwalk (( '.''.' , , topdowntopdown == FalseFalse ):
    ):
    printprint (( ff 'Found directory: 'Found directory:  {dirpath}{dirpath} '' )
    )
    for for file_name file_name in in filesfiles :
        :
        printprint (( file_namefile_name )
)

Passing the topdown=False argument will make os.walk() print out the files it finds in the subdirectories first:

传递topdown=False参数将使os.walk()打印出在子目录中找到的文件:

As you can see, the program started by listing the contents of the subdirectories before listing the contents of the root directory. This is very useful in situations where you want to recursively delete files and directories. You will learn how to do this in the sections below. By default, os.walk does not walk down into symbolic links that resolve to directories. This behavior can be overridden by calling it with a followlinks=True argument.

如您所见,该程序首先列出子目录的内容,然后再列出根目录的内容。 这在您要递归删除文件和目录的情况下非常有用。 您将在以下各节中了解如何执行此操作。 默认情况下, os.walk不会进入解析为目录的符号链接。 可以通过使用followlinks=True参数调用它来覆盖此行为。

制作临时文件和目录 (Making Temporary Files and Directories)

Python provides a handy module for creating temporary files and directories called tempfile.

Python提供了一个方便的模块,用于创建称为tempfile临时文件和目录。

tempfile can be used to open and store data temporarily in a file or directory while your program is running. tempfile handles the deletion of the temporary files when your program is done with them.

程序运行时,可以使用tempfile将数据打开和存储在文件或目录中。 当您的程序完成后, tempfile处理临时文件的删除。

Here’s how to create a temporary file:

以下是创建临时文件的方法:

  from from tempfile tempfile import import TemporaryFile


 TemporaryFile


 # Create a temporary file and write some data to it
 # Create a temporary file and write some data to it
 fp fp = = TemporaryFileTemporaryFile (( 'w+t''w+t' )
 )
 fpfp .. writewrite (( 'Hello universe!''Hello universe!' )
 )
 # Go back to the beginning and read data from file
 # Go back to the beginning and read data from file
 fpfp .. seekseek (( 00 )
 )
 data data = = fpfp .. readread ()
 ()
 # Close the file, after which it will be removed
 # Close the file, after which it will be removed
 fpfp .. closeclose ()
()

The first step is to import TemporaryFile from the tempfile module. Next, create a file like object using the TemporaryFile() method by calling it and passing the mode you want to open the file in. This will create and open a file that can be used as a temporary storage area.

第一步是从tempfile模块导入TemporaryFile 。 接下来,通过调用TemporaryFile()方法并传递您要在其中打开文件的模式,使用TemporaryFile()方法创建类似于对象的文件。这将创建并打开一个可用作临时存储区的文件。

In the example above, the mode is 'w+t', which makes tempfile create a temporary text file in write mode. There is no need to give the temporary file a filename since it will be destroyed after the script is done running.

在上面的示例中,模式为'w+t' ,这使tempfile在写入模式下创建一个临时文本文件。 无需为临时文件提供文件名,因为在脚本运行完成后它将被销毁。

After writing to the file, you can read from it and close it when you’re done processing it. Once the file is closed, it will be deleted from the filesystem. If you need to name the temporary files produced using tempfile, use tempfile.NamedTemporaryFile().

写入文件后,您可以读取文件并在完成处理后将其关闭。 关闭文件后,将从文件系统中删除该文件。 如果需要命名使用tempfile生成的临时文件,请使用tempfile.NamedTemporaryFile()

The temporary files and directories created using tempfile are stored in a special system directory for storing temporary files. Python searches a standard list of directories to find one that the user can create files in.

使用tempfile创建的临时文件和目录存储在用于存储临时文件的特殊系统目录中。 Python搜索标准目录列表以查找用户可以在其中创建文件的目录。

On Windows, the directories are C:TEMP, C:TMP, TEMP, and TMP, in that order. On all other platforms, the directories are /tmp, /var/tmp, and /usr/tmp, in that order. As a last resort, tempfile will save temporary files and directories in the current directory.

在Windows上,该目录依次为C:TEMPC:TMPTEMPTMP 。 在所有其他平台上,目录分别是/tmp/var/tmp/usr/tmp 。 作为最后的选择, tempfile将临时文件和目录保存在当前目录中。

.TemporaryFile() is also a context manager so it can be used in conjunction with the with statement. Using a context manager takes care of closing and deleting the file automatically after it has been read:

.TemporaryFile()还是上下文管理器,因此可以与with语句结合使用。 使用上下文管理器负责在读取文件后自动关闭和删除文件:

This creates a temporary file and reads data from it. As soon as the file’s contents are read, the temporary file is closed and deleted from the file system.

这将创建一个临时文件并从中读取数据。 读取文件内容后,便会关闭临时文件并从文件系统中删除该文件。

tempfile can also be used to create temporary directories. Let’s look at how you can do this using tempfile.TemporaryDirectory():

tempfile也可以用于创建临时目录。 让我们看看如何使用tempfile.TemporaryDirectory()进行此操作:

>>>
>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     print('Created temporary directory ', tmpdir)
...     os.path.exists(tmpdir)
...
Created temporary directory  /tmp/tmpoxbkrm6c
True

>>> # Directory contents have been removed
...
>>> tmpdir
'/tmp/tmpoxbkrm6c'
>>> os.path.exists(tmpdir)
False

>>>

Calling tempfile.TemporaryDirectory() creates a temporary directory in the file system and returns an object representing this directory. In the example above, the directory is created using a context manager, and the name of the directory is stored in tmpdir. The third line prints out the name of the temporary directory, and os.path.exists(tmpdir) confirms if the directory was actually created in the file system.

调用tempfile.TemporaryDirectory()在文件系统中创建一个临时目录,并返回一个代表该目录的对象。 在上面的示例中,使用上下文管理器创建目录,目录名称存储在tmpdir 。 第三行打印出临时目录的名称,然后os.path.exists(tmpdir)确认该目录是否实际上是在文件系统中创建的。

After the context manager goes out of context, the temporary directory is deleted and a call to os.path.exists(tmpdir) returns False, which means that the directory was succesfully deleted.

上下文管理器脱离上下文后,将删除临时目录,并且对os.path.exists(tmpdir)的调用将返回False ,这意味着该目录已成功删除。

删除文件和目录 (Deleting Files and Directories)

You can delete single files, directories, and entire directory trees using the methods found in the os, shutil, and pathlib modules. The following sections describe how to delete files and directories that you no longer need.

您可以使用osshutilpathlib模块中的方法删除单个文件,目录和整个目录树。 以下各节介绍如何删除不再需要的文件和目录。

在Python中删除文件 (Deleting Files in Python)

To delete a single file, use pathlib.Path.unlink(), os.remove(). or os.unlink().

要删除单个文件,请使用pathlib.Path.unlink()os.remove() 。 或os.unlink()

os.remove() and os.unlink() are semantically identical. To delete a file using os.remove(), do the following:

os.remove()os.unlink()在语义上是相同的。 要使用os.remove()删除文件,请执行以下操作:

 import import os

os

data_file data_file = = 'C:'C:  UsersUsers  vuyisilevuyisile  DesktopDesktop  TestTest  data.txt'
data.txt'
osos .. removeremove (( data_filedata_file )
)

Deleting a file using os.unlink() is similar to how you do it using os.remove():

删除使用文件os.unlink()是类似于您使用如何做到这一点os.remove()

Calling .unlink() or .remove() on a file deletes the file from the filesystem. These two functions will throw an OSError if the path passed to them points to a directory instead of a file. To avoid this, you can either check that what you’re trying to delete is actually a file and only delete it if it is, or you can use exception handling to handle the OSError:

在文件上调用.unlink().remove()会从文件系统中删除该文件。 如果传递给它们的路径指向目录而不是文件,则这两个函数将引发OSError 。 为了避免这种情况,您可以检查要删除的文件是否确实是文件,然后仅将其删除,或者可以使用异常处理来处理OSError

 import import os


os


data_file data_file = = 'home/data.txt'

'home/data.txt'

# If the file exists, delete it
# If the file exists, delete it
if if osos .. pathpath .. is_fileis_file (( data_filedata_file ):
    ):
    osos .. removeremove (( data_filedata_file )
)
elseelse :
    :
    printprint (( ff 'Error: 'Error:  {data_file}{data_file}  not a valid filename' not a valid filename' )
)

os.path.is_file() checks whether data_file is actually a file. If it is, it is deleted by the call to os.remove(). If data_file points to a folder, an error message is printed to the console.

os.path.is_file()检查data_file是否实际上是一个文件。 如果是,则通过调用os.remove()将其删除。 如果data_file指向文件夹,则会在控制台上显示一条错误消息。

The following example shows how to use exception handling to handle errors when deleting files:

下面的示例显示删除文件时如何使用异常处理来处理错误:

The code above attempts to delete the file first before checking its type. If data_file isn’t actually a file, the OSError that is thrown is handled in the except clause, and an error message is printed to the console. The error message that gets printed out is formatted using Python f-strings.

上面的代码尝试先检查文件的类型,然后再检查其类型。 如果data_file实际上不是文件,则在except子句中处理引发的OSError ,并将错误消息打印到控制台。 使用Python f-strings格式化输出的错误消息。

Finally, you can also use pathlib.Path.unlink() to delete files:

最后,您还可以使用pathlib.Path.unlink()删除文件:

 from from pathlib pathlib import import Path


Path


data_file data_file = = PathPath (( 'home/data.txt''home/data.txt' )

)

trytry :
    :
    data_filedata_file .. unlinkunlink ()
()
except except IsADirectoryError IsADirectoryError as as ee :
    :
    printprint (( ff 'Error: 'Error:  {data_file}{data_file}  :  :  {e.strerror}{e.strerror} '' )
)

This creates a Path object called data_file that points to a file. Calling .remove() on data_file will delete home/data.txt. If data_file points to a directory, an IsADirectoryError is raised. It is worth noting that the Python program above has the same permissions as the user running it. If the user does not have permission to delete the file, a PermissionError is raised.

这将创建一个名为data_filePath对象,该对象指向一个文件。 在data_file上调用.remove()将删除home/data.txt 。 如果data_file指向目录, IsADirectoryError引发IsADirectoryError 。 值得注意的是,上面的Python程序与运行它的用户具有相同的权限。 如果用户没有删除文件的PermissionError则会引发PermissionError

删除目录 (Deleting Directories)

The standard library offers the following functions for deleting directories:

标准库提供以下用于删除目录的功能:

  • os.rmdir()
  • pathlib.Path.rmdir()
  • shutil.rmtree()
  • os.rmdir()
  • pathlib.Path.rmdir()
  • shutil.rmtree()

To delete a single directory or folder, use os.rmdir() or pathlib.rmdir(). These two functions only work if the directory you’re trying to delete is empty. If the directory isn’t empty, an OSError is raised. Here is how to delete a folder:

要删除单个目录或文件夹,请使用os.rmdir()pathlib.rmdir() 。 仅当您要删除的目录为空时,这两个功能才起作用。 如果目录不为空,则会引发OSError 。 以下是删除文件夹的方法:

Here, the trash_dir directory is deleted by passing its path to os.rmdir(). If the directory isn’t empty, an error message is printed to the screen:

在这里,通过将其路径传递给os.rmdir()来删除trash_dir目录。 如果目录不为空,则会在屏幕上显示一条错误消息:

>>>
Traceback (most recent call last):
  File '', line 1, in 
OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'

>>>

Alternatively, you can use pathlib to delete directories:

另外,您可以使用pathlib删除目录:

 from from pathlib pathlib import import Path


Path


trash_dir trash_dir = = PathPath (( 'my_documents/bad_dir''my_documents/bad_dir' )

)

trytry :
    :
    trash_dirtrash_dir .. rmdirrmdir ()
()
except except OSError OSError as as ee :
    :
    printprint (( ff 'Error: 'Error:  {trash_dir}{trash_dir}  :  :  {e.strerror}{e.strerror} '' )
)

Here, you create a Path object that points to the directory to be deleted. Calling .rmdir() on the Path object will delete it if it is empty.

在这里,您将创建一个Path对象,该对象指向要删除的目录。 如果Path对象为.rmdir() ,则将其删除。

删除整个目录树 (Deleting Entire Directory Trees)

To delete non-empty directories and entire directory trees, Python offers shutil.rmtree():

为了删除非空目录和整个目录树,Python提供了shutil.rmtree()

Everything in trash_dir is deleted when shutil.rmtree() is called on it. There may be cases where you want to delete empty folders recursively. You can do this using one of the methods discussed above in conjunction with os.walk():

在一切trash_dir时被删除shutil.rmtree()被调用就可以了。 在某些情况下,您需要递归删除空文件夹。 您可以使用上面讨论的方法之一与os.walk()一起执行此操作:

 import import os


os


for for dirpathdirpath , , dirnamesdirnames , , files files in in osos .. walkwalk (( '.''.' , , topdowntopdown == FalseFalse ):
     ):
     trytry :
         :
         osos .. rmdirrmdir (( dirpathdirpath )
     )
     except except OSError OSError as as exex :
         :
         pass
pass

This walks down the directory tree and tries to delete each directory it finds. If the directory isn’t empty, an OSError is raised and that directory is skipped. The table below lists the functions covered in this section:

这会沿目录树移动,并尝试删除找到的每个目录。 如果目录不为空,则会引发OSError并跳过该目录。 下表列出了本节涵盖的功能:

Function 功能 Description 描述
os.remove()os.remove() Deletes a file and does not delete directories 删除文件而不删除目录
os.unlink()os.unlink() os.remove() and deletes a single fileos.remove()相同,并删除单个文件
pathlib.Path.unlink()pathlib.Path.unlink() Deletes a file and cannot delete directories 删除文件,不能删除目录
os.rmdir()os.rmdir() Deletes an empty directory 删除一个空目录
pathlib.Path.rmdir()pathlib.Path.rmdir() Deletes an empty directory 删除一个空目录
shutil.rmtree()shutil.rmtree() Deletes entire directory tree and can be used to delete non-empty directories 删除整个目录树,可用于删除非空目录

复制,移动和重命名文件和目录 (Copying, Moving, and Renaming Files and Directories)

Python ships with the shutil module. shutil is short for shell utilities. It provides a number of high-level operations on files to support copying, archiving, and removal of files and directories. In this section, you’ll learn how to move and copy files and directories.

Python附带了shutil模块。 shutil是Shell实用程序的缩写。 它对文件提供了许多高级操作,以支持文件,目录的复制,归档和删除。 在本节中,您将学习如何移动和复制文件和目录。

用Python复制文件 (Copying Files in Python)

shutil offers a couple of functions for copying files. The most commonly used functions are shutil.copy() and shutil.copy2(). To copy a file from one location to another using shutil.copy(), do the following:

shutil提供了一些用于复制文件的功能。 最常用的函数是shutil.copy()shutil.copy2() 。 要使用shutil.copy()将文件从一个位置复制到另一位置,请执行以下操作:

shutil.copy() is comparable to the cp command in UNIX based systems. shutil.copy(src, dst) will copy the file src to the location specified in dst. If dst is a file, the contents of that file are replaced with the contents of src. If dst is a directory, then src will be copied into that directory. shutil.copy() only copies the file’s contents and the file’s permissions. Other metadata like the file’s creation and modification times are not preserved.

shutil.copy()与基于UNIX的系统中的cp命令相当。 shutil.copy(src, dst)会将文件src复制到dst指定的位置。 如果dst是文件,则将该文件的内容替换为src的内容。 如果dst是目录,则src将被复制到该目录中。 shutil.copy()仅复制文件的内容和文件的权限。 不保留其他元数据,例如文件的创建和修改时间。

To preserve all file metadata when copying, use shutil.copy2():

要在复制时保留所有文件元数据,请使用shutil.copy2()

 import import shutil


shutil


src src = = 'path/to/file.txt'
'path/to/file.txt'
dst dst = = 'path/to/dest_dir'
'path/to/dest_dir'
shutilshutil .. copy2copy2 (( srcsrc , , dstdst )
)

Using .copy2() preserves details about the file such as last access time, permission bits, last modification time, and flags.

使用.copy2()保留有关文件的详细信息,例如最后访问时间,权限位,最后修改时间和标志。

复制目录 (Copying Directories)

While shutil.copy() only copies a single file, shutil.copytree() will copy an entire directory and everything contained in it. shutil.copytree(src, dest) takes two arguments: a source directory and the destination directory where files and folders will be copied to.

shutil.copy()仅复制单个文件,而shutil.copytree()将复制整个目录以及其中包含的所有内容。 shutil.copytree(src, dest)具有两个参数:源目录和目标目录,文件和文件夹将被复制到该目录。

Here’s an example of how to copy the contents of one folder to a different location:

这是一个如何将一个文件夹的内容复制到另一个位置的示例:

>>>
>>>
 >>>  import shutil
>>>  shutil . copytree ( 'data_1' , 'data1_backup' )
'data1_backup'

In this example, .copytree() copies the contents of data_1 to a new location data1_backup and returns the destination directory. The destination directory must not already exist. It will be created as well as missing parent directories. shutil.copytree() is a good way to back up your files.

在此示例中, .copytree()data_1的内容data_1到新位置data1_backup并返回目标目录。 目标目录必须不存在。 它会被创建以及丢失的父目录。 shutil.copytree()是备份文件的好方法。

移动文件和目录 (Moving Files and Directories)

To move a file or directory to another location, use shutil.move(src, dst).

要将文件或目录移动到另一个位置,请使用shutil.move(src, dst)

src is the file or directory to be moved and dst is the destination:

src是要移动的文件或目录, dst是目标:

>>>
>>>
 >>>  import shutil
>>>  shutil . move ( 'dir_1/' , 'backup/' )
'backup'

shutil.move('dir_1/', 'backup/') moves dir_1/ into backup/ if backup/ exists. If backup/ does not exist, dir_1/ will be renamed to backup.

shutil.move('dir_1/', 'backup/')dir_1/移到backup/如果存在backup/ 。 如果backup/不存在,则dir_1/将重命名为backup

重命名文件和目录 (Renaming Files and Directories)

Python includes os.rename(src, dst) for renaming files and directories:

Python包含os.rename(src, dst)用于重命名文件和目录:

>>>
>>>
 >>>  os . rename ( 'first.zip' , 'first_01.zip' )

The line above will rename first.zip to first_01.zip. If the destination path points to a directory, it will raise an OSError.

上面的行会将first.zip重命名为first_01.zip 。 如果目标路径指向目录,则会引发OSError

Another way to rename files or directories is to use rename() from the pathlib module:

重命名文件或目录的另一种方法是使用pathlib模块中的rename()

>>>
>>>
 >>>  from pathlib import Path
>>>  data_file = Path ( 'data_01.txt' )
>>>  data_file . rename ( 'data.txt' )

To rename files using pathlib, you first create a pathlib.Path() object that contains a path to the file you want to replace. The next step is to call rename() on the path object and pass a new filename for the file or directory you’re renaming.

要使用pathlib重命名文件,首先要创建一个pathlib.Path()对象,该对象包含要替换的文件的路径。 下一步是在路径对象上调用rename() ,并为要重命名的文件或目录传递新的文件名。

封存 (Archiving)

Archives are a convenient way to package several files into one. The two most common archive types are ZIP and TAR. The Python programs you write can create, read, and extract data from archives. You will learn how to read and write to both archive formats in this section.

存档是将多个文件打包为一个文件的便捷方法。 两种最常见的存档类型是ZIP和TAR。 您编写的Python程序可以创建,读取和提取档案中的数据。 您将在本节中学习如何读写这两种归档格式。

读取ZIP文件 (Reading ZIP Files)

The zipfile module is a low level module that is part of the Python Standard Library. zipfile has functions that make it easy to open and extract ZIP files. To read the contents of a ZIP file, the first thing to do is to create a ZipFile object. ZipFile objects are similar to file objects created using open(). ZipFile is also a context manager and therefore supports the with statement:

zipfile模块是一个低级模块,是Python标准库的一部分。 zipfile具有使打开和提取ZIP文件变得容易的功能。 要读取ZIP文件的内容,首先要做的是创建一个ZipFile对象。 ZipFile对象类似于使用open()创建的文件对象。 ZipFile还是上下文管理器,因此支持with语句:

Here, you create a ZipFile object, passing in the name of the ZIP file to open in read mode. After opening a ZIP file, information about the archive can be accessed through functions provided by the zipfile module. The data.zip archive in the example above was created from a directory named data that contains a total of 5 files and 1 subdirectory:

在这里,您将创建一个ZipFile对象,传入ZIP文件的名称以在读取模式下打开。 打开ZIP文件后,可以通过zipfile模块提供的功能访问有关存档的信息。 上例中的data.zip归档文件是从名为data的目录创建的,该目录总共包含5个文件和1个子目录:

 .
.
├── file1.py
├── file1.py
├── file2.py
├── file2.py
├── file3.py
├── file3.py
└── sub_dir
└── sub_dir
    ├── bar.py
    ├── bar.py
    └── foo.py

    └── foo.py

1 directory, 5 files
1 directory, 5 files

To get a list of files in the archive, call namelist() on the ZipFile object:

要获取档案中的文件namelist() ,请在ZipFile对象上调用namelist()

This produces a list:

这将产生一个列表:

 ['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']

.namelist() returns a list of names of the files and directories in the archive. To retrieve information about the files in the archive, use .getinfo():

.namelist()返回存档中文件和目录的名称列表。 要检索有关存档中文件的信息,请使用.getinfo()

Here’s the output:

这是输出:

 15277
15277

.getinfo() returns a ZipInfo object that stores information about a single member of the archive. To get information about a file in the archive, you pass its path as an argument to .getinfo(). Using getinfo(), you’re able to retrieve information about archive members such as the date the files were last modified, their compressed sizes, and their full filenames. Accessing .file_size retrieves the file’s original size in bytes.

.getinfo()返回一个ZipInfo对象,该对象存储有关存档的单个成员的信息。 要获取有关存档中文件的信息,请将其路径作为参数传递给.getinfo() 。 使用getinfo() ,您可以检索有关归档成员的信息,例如文件的上次修改日期,其压缩大小和完整文件名。 访问.file_size检索文件的原始大小(以字节为单位)。

The following example shows how to retrieve more details about archived files in a Python REPL. Assume that the zipfile module has been imported and bar_info is the same object you created in previous examples:

以下示例显示了如何在Python REPL中检索有关存档文件的更多详细信息。 假设已导入zipfile模块,并且bar_info是您在前面的示例中创建的对象:

>>>
>>>
 >>>  bar_info . date_time
(2018, 10, 7, 23, 30, 10)
>>>  bar_info . compress_size
2856
>>>  bar_info . filename
'sub_dir/bar.py'

bar_info contains details about bar.py such as its size when compressed and its full path.

bar_info包含有关bar.py详细信息,例如压缩后的大小和完整路径。

The first line shows how to retrieve a file’s last modified date. The next line shows how to get the size of the file after compression. The last line shows the full path of bar.py in the archive.

第一行显示了如何检索文件的上次修改日期。 下一行显示如何在压缩后获取文件的大小。 最后一行显示存档中bar.py的完整路径。

ZipFile supports the context manager protocol, which is why you’re able to use it with the with statement. Doing this automatically closes the ZipFile object after you’re done with it. Trying to open or extract files from a closed ZipFile object will result in an error.

ZipFile支持上下文管理器协议,这就是为什么您可以在with语句中使用它。 完成此操作后,将自动关闭ZipFile对象。 尝试从关闭的ZipFile对象打开或提取文件将导致错误。

提取ZIP档案 (Extracting ZIP Archives)

The zipfile module allows you to extract one or more files from ZIP archives through .extract() and .extractall().

zipfile模块允许您通过.extract().extractall()从ZIP存档中提取一个或多个文件。

These methods extract files to the current directory by default. They both take an optional path parameter that allows you to specify a different directory to extract files to. If the directory does not exist, it is automatically created. To extract files from the archive, do the following:

这些方法默认情况下将文件提取到当前目录。 它们都带有可选的path参数,该参数允许您指定其他目录以将文件提取到其中。 如果该目录不存在,则会自动创建。 要从存档中提取文件,请执行以下操作:

>>>
>>>
 >>>  import zipfile
>>>  import os

>>>  os . listdir ( '.' )
['data.zip']
>>>  data_zip = zipfile . ZipFile ( 'data.zip' , 'r' )
>>>  # Extract a single file to current directory
...
>>>  data_zip . extract ( 'file1.py' )
'/home/terra/test/dir1/zip_extract/file1.py'
>>>  os . listdir ( '.' )
['file1.py', 'data.zip']
>>>  # Extract all files into a different directory
...
>>>  data_zip . extractall ( path = 'extract_dir/' )
>>>  os . listdir ( '.' )
['file1.py', 'extract_dir', 'data.zip']
>>>  os . listdir ( 'extract_dir' )
['file1.py', 'file3.py', 'file2.py', 'sub_dir']
>>>  data_zip . close ()

The third line of code is a call to os.listdir(), which shows that the current directory has only one file, data.zip.

第三行代码是对os.listdir()的调用,它显示当前目录只有一个文件data.zip

Next, you open data.zip in read mode and call .extract() to extract file1.py from it. .extract() returns the full file path of the extracted file. Since there’s no path specified, .extract() extracts file1.py to the current directory.

接下来,您以读取模式打开data.zip并调用.extract()从中提取file1.py.extract()返回提取文件的完整文件路径。 由于未指定路径, .extract() file1.py提取到当前目录。

The next line prints a directory listing showing that the current directory now includes the extracted file in addition to the original archive. The line after that shows how to extract the entire archive into the zip_extract directory. .extractall() creates the extract_dir and extracts the contents of data.zip into it. The last line closes the ZIP archive.

下一行将打印一个目录列表,该目录列表显示当前目录现在除了原始归档文件之外还包括提取的文件。 接下来的行显示了如何将整个存档提取到zip_extract目录中。 .extractall()创建extract_dir并将data.zip的内容提取到其中。 最后一行关闭ZIP存档。

从受密码保护的档案中提取数据 (Extracting Data From Password Protected Archives)

zipfile supports extracting password protected ZIPs. To extract password protected ZIP files, pass in the password to the .extract() or .extractall() method as an argument:

zipfile支持提取受密码保护的ZIP。 要提取受密码保护的ZIP文件,请将密码作为参数传递给.extract().extractall()方法:

>>>
>>>
 >>>  import zipfile

>>>  with zipfile . ZipFile ( 'secret.zip' , 'r' ) as pwd_zip :
...     # Extract from a password protected archive
...     pwd_zip . extractall ( path = 'extract_dir' , pwd = 'Quish3@o' )

This opens the secret.zip archive in read mode. A password is supplied to .extractall(), and the archive contents are extracted to extract_dir. The archive is closed automatically after the extraction is complete thanks to the with statement.

secret.zip读取模式打开secret.zip存档。 将密码提供给.extractall() ,并将存档内容提取到extract_dir 。 提取完成后, with语句自动关闭归档。

创建新的ZIP档案 (Creating New ZIP Archives)

To create a new ZIP archive, you open a ZipFile object in write mode (w) and add the files you want to archive:

要创建新的ZIP存档,请以写入模式( w )打开一个ZipFile对象,然后添加要存档的文件:

>>>
>>>
 >>>  import zipfile

>>>  file_list = [ 'file1.py' , 'sub_dir/' , 'sub_dir/bar.py' , 'sub_dir/foo.py' ]
>>>  with zipfile . ZipFile ( 'new.zip' , 'w' ) as new_zip :
...     for name in file_list :
...         new_zip . write ( name )

In the example, new_zip is opened in write mode and each file in file_list is added to the archive. When the with statement suite is finished, new_zip is closed. Opening a ZIP file in write mode erases the contents of the archive and creates a new archive.

在该示例中,以写入模式打开new_zipnew_zip file_list每个文件添加到存档中。 with语句套件完成后,将关闭new_zip 。 在写入模式下打开ZIP文件会删除档案的内容并创建一个新的档案。

To add files to an existing archive, open a ZipFile object in append mode and then add the files:

要将文件添加到现有存档中,请以附加模式打开ZipFile对象,然后添加文件:

>>>
>>>
 >>>  # Open a ZipFile object in append mode
...
>>>  with zipfile . ZipFile ( 'new.zip' , 'a' ) as new_zip :
...     new_zip . write ( 'data.txt' )
...     new_zip . write ( 'latin.txt' )

Here, you open the new.zip archive you created in the previous example in append mode. Opening the ZipFile object in append mode allows you to add new files to the ZIP file without deleting its current contents. After adding files to the ZIP file, the with statement goes out of context and closes the ZIP file.

在这里,您可以在追加模式下打开在上一个示例中创建的new.zip存档。 在追加模式下打开ZipFile对象使您可以将新文件添加到ZIP文件中,而无需删除其当前内容。 将文件添加到ZIP文件后, with语句将脱离上下文并关闭ZIP文件。

打开TAR档案 (Opening TAR Archives)

TAR files are uncompressed file archives like ZIP. They can be compressed using gzip, bzip2, and lzma compression methods. The TarFile class allows reading and writing of TAR archives.

TAR文件是未压缩的文件存档,例如ZIP。 可以使用gzip,bzip2和lzma压缩方法对其进行压缩。 TarFile类允许读取和写入TAR档案。

Do this to read from an archive:

这样做以从存档中读取:

tarfile objects open like most file-like objects. They have an open() function that takes a mode that determines how the file is to be opened.

tarfile对象像大多数类似文件的对象一样打开。 它们具有open()函数,该函数采用一种确定如何打开文件的模式。

Use the 'r', 'w' or 'a' modes to open an uncompressed TAR file for reading, writing, and appending, respectively. To open compressed TAR files, pass in a mode argument to tarfile.open() that is in the form filemode[:compression]. The table below lists the possible modes TAR files can be opened in:

使用'r''w''a'模式分别打开未压缩的TAR文件,以进行读取,写入和附加。 要打开压缩的TAR文件, tarfile.open() mode参数以filemode[:compression]的形式传递给tarfile.open() 。 下表列出了可以在其中打开TAR文件的可能模式:

Mode 模式 Action 行动
rr Opens archive for reading with transparent compression 打开存档以透明压缩方式阅读
r:gzr:gz Opens archive for reading with gzip compression 打开存档以使用gzip压缩进行阅读
r:bz2r:bz2 Opens archive for reading with bzip2 compression 打开存档以使用bzip2压缩进行读取
r:xzr:xz Opens archive for reading with lzma compression 打开存档以使用Lzma压缩进行读取
ww Opens archive for uncompressed writing 打开存档以进行未压缩的写入
w:gzw:gz Opens archive for gzip compressed writing 打开存档以进行gzip压缩写入
w:xzw:xz Opens archive for lzma compressed writing 打开存档以进行Lzma压缩写入
aa Opens archive for appending with no compression 打开存档以无压缩地追加

.open() defaults to 'r' mode. To read an uncompressed TAR file and retrieve the names of the files in it, use .getnames():

.open()默认为'r'模式。 要读取未压缩的TAR文件并检索其中的文件名,请使用.getnames()

>>>
>>> import tarfile

>>> tar = tarfile.open('example.tar', mode='r')
>>> tar.getnames()
['CONTRIBUTING.rst', 'README.md', 'app.py']

>>>

This returns a list with the names of the archive contents.

这将返回一个列表,其中包含存档内容的名称。

Note: For the purposes of showing you how to use different tarfile object methods, the TAR file in the examples is opened and closed manually in an interactive REPL session.

注意:为了向您展示如何使用不同的tarfile对象方法,示例中的TAR文件是在交互式REPL会话中手动打开和关闭的。

Interacting with the TAR file this way allows you to see the output of running each command. Normally, you would want to use a context manager to open file-like objects.

通过这种方式与TAR文件交互,可以查看运行每个命令的输出。 通常,您需要使用上下文管理器来打开类似文件的对象。

The metadata of each entry in the archive can be accessed using special attributes:

可以使用特殊属性访问档案中每个条目的元数据:

>>>
>>> for entry in tar.getmembers():
...     print(entry.name)
...     print(' Modified:', time.ctime(entry.mtime))
...     print(' Size    :', entry.size, 'bytes')
...     print()
CONTRIBUTING.rst
 Modified: Sat Nov  1 09:09:51 2018
 Size    : 402 bytes

README.md
 Modified: Sat Nov  3 07:29:40 2018
 Size    : 5426 bytes

app.py
 Modified: Sat Nov  3 07:29:13 2018
 Size    : 6218 bytes

>>>

In this example, you loop through the list of files returned by .getmembers() and print out each file’s attributes. The objects returned by .getmembers() have attributes that can be accessed programmatically such as the name, size, and last modified time of each of the files in the archive. After reading or writing to the archive, it must be closed to free up system resources.

在此示例中,您循环浏览.getmembers()返回的文件列表,并打印出每个文件的属性。 .getmembers()返回的对象具有可以通过编程方式访问的属性,例如存档中每个文件的名称,大小和最后修改时间。 在读取或写入存档后,必须将其关闭以释放系统资源。

从TAR存档中提取文件 (Extracting Files From a TAR Archive)

In this section, you’ll learn how to extract files from TAR archives using the following methods:

在本节中,您将学习如何使用以下方法从TAR档案中提取文件:

  • .extract()
  • .extractfile()
  • .extractall()
  • .extract()
  • .extractfile()
  • .extractall()

To extract a single file from a TAR archive, use extract(), passing in the filename:

要从TAR归档文件中提取单个文件,请使用extract() ,并传入文件名:

>>>
>>> tar.extract('README.md')
>>> os.listdir('.')
['README.md', 'example.tar']

>>>

The README.md file is extracted from the archive to the file system. Calling os.listdir() confirms that README.md file was successfully extracted into the current directory. To unpack or extract everything from the archive, use .extractall():

README.md文件从存档中提取到文件系统。 调用os.listdir()确认README.md文件已成功提取到当前目录中。 要从归档文件中解压缩或提取所有内容,请使用.extractall()

>>>
>>> tar.extractall(path="extracted/")

>>>

.extractall() has an optional path argument to specify where extracted files should go. Here, the archive is unpacked into the extracted directory. The following commands show that the archive was successfully extracted:

.extractall()有一个可选的path参数,用于指定提取的文件应存放的位置。 在这里,档案被解压缩到extracted目录中。 以下命令显示已成功提取存档:

 $ ls
$ ls
example.tar  extracted  README.md

example.tar  extracted  README.md

$ tree
$ tree
.
.
├── example.tar
├── example.tar
├── extracted
├── extracted
│   ├── app.py
│   ├── app.py
│   ├── CONTRIBUTING.rst
│   ├── CONTRIBUTING.rst
│   └── README.md
│   └── README.md
└── README.md

└── README.md

1 directory, 5 files

1 directory, 5 files

$ ls extracted/
$ ls extracted/
app.py  CONTRIBUTING.rst  README.md
app.py  CONTRIBUTING.rst  README.md

To extract a file object for reading or writing, use .extractfile(), which takes a filename or TarInfo object to extract as an argument. .extractfile() returns a file-like object that can be read and used:

要提取文件对象以进行读取或写入,请使用.extractfile() ,它将文件名或TarInfo对象作为参数提取。 .extractfile()返回可以读取和使用的类似文件的对象:

>>>
>>>
 >>>  f = tar . extractfile ( 'app.py' )
>>>  f . read ()
>>>  tar . close ()

Opened archives should always be closed after they have been read or written to. To close an archive, call .close() on the archive file handle or use the with statement when creating tarfile objects to automatically close the archive when you’re done. This frees up system resources and writes any changes you made to the archive to the filesystem.

在读取或写入打开的存档后,应始终将其关闭。 要关闭存档,请在存档文件句柄上调用.close()或在创建tarfile对象时使用with语句,以在完成后自动关闭存档。 这将释放系统资源,并将对归档所做的所有更改写入文件系统。

创建新的TAR档案 (Creating New TAR Archives)

Here’s how you do it:

这是您的操作方式:

>>>
>>>
 >>>  import tarfile

>>>  file_list = [ 'app.py' , 'config.py' , 'CONTRIBUTORS.md' , 'tests.py' ]
>>>  with tarfile . open ( 'packages.tar' , mode = 'w' ) as tar :
...     for file in file_list :
...         tar . add ( file )

>>>  # Read the contents of the newly created archive
>>>  with tarfile . open ( 'package.tar' , mode = 'r' ) as t :
...     for member in t . getmembers ():
...         print ( member . name )
app.py
config.py
CONTRIBUTORS.md
tests.py

First, you make a list of files to be added to the archive so that you don’t have to add each file manually.

首先,列出要添加到存档中的文件,这样就不必手动添加每个文件。

The next line uses the with context manager to open a new archive called packages.tar in write mode. Opening an archive in write mode('w') enables you to write new files to the archive. Any existing files in the archive are deleted and a new archive is created.

下一行使用with上下文管理器以写入模式打开一个名为packages.tar的新存档。 以写模式( 'w' )打开档案可以使您将新文件写入档案。 存档中的所有现有文件都将被删除,并创建一个新的存档。

After the archive is created and populated, the with context manager automatically closes it and saves it to the filesystem. The last three lines open the archive you just created and print out the names of the files contained in it.

创建并填充档案后, with上下文管理器会自动将其关闭并将其保存到文件系统中。 最后三行打开您刚刚创建的档案,并打印出其中包含的文件的名称。

To add new files to an existing archive, open the archive in append mode ('a'):

要将新文件添加到现有档案,请以追加模式( 'a' )打开档案:

>>>
>>>
 >>>  with tarfile . open ( 'package.tar' , mode = 'a' ) as tar :
...     tar . add ( 'foo.bar' )

>>>  with tarfile . open ( 'package.tar' , mode = 'r' ) as tar :
...     for member in tar . getmembers ():
...         print ( member . name )
app.py
config.py
CONTRIBUTORS.md
tests.py
foo.bar

Opening an archive in append mode allows you to add new files to it without deleting the ones already in it.

在追加模式下打开存档可让您向其中添加新文件,而不会删除其中已有的文件。

使用压缩档案 (Working With Compressed Archives)

tarfile can also read and write TAR archives compressed using gzip, bzip2, and lzma compression. To read or write to a compressed archive, use tarfile.open(), passing in the appropriate mode for the compression type.

tarfile还可以读写使用gzip,bzip2和lzma压缩方式压缩的TAR归档文件。 要读取或写入压缩档案,请使用tarfile.open() ,并以适当的模式传递压缩类型。

For example, to read or write data to a TAR archive compressed using gzip, use the 'r:gz' or 'w:gz' modes respectively:

例如,要将数据读取或写入使用gzip压缩的TAR归档文件,请分别使用'r:gz''w:gz'模式:

>>>
>>>
 >>>  files = [ 'app.py' , 'config.py' , 'tests.py' ]
>>>  with tarfile . open ( 'packages.tar.gz' , mode = 'w:gz' ) as tar :
...     tar . add ( 'app.py' )
...     tar . add ( 'config.py' )
...     tar . add ( 'tests.py' )

>>>  with tarfile . open ( 'packages.tar.gz' , mode = 'r:gz' ) as t :
...     for member in t . getmembers ():
...         print ( member . name )
app.py
config.py
tests.py

The 'w:gz' mode opens the archive for gzip compressed writing and 'r:gz' opens the archive for gzip compressed reading. Opening compressed archives in append mode is not possible. To add files to a compressed archive, you have to create a new archive.

'w:gz'模式打开用于gzip压缩写入的存档,而'r:gz'打开用于gzip压缩读取的存档。 无法以附加模式打开压缩的存档。 要将文件添加到压缩存档中,您必须创建一个新的存档。

创建档案的简便方法 (An Easier Way of Creating Archives)

The Python Standard Library also supports creating TAR and ZIP archives using the high-level methods in the shutil module. The archiving utilities in shutil allow you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower level tarfile and zipfile modules.

Python标准库还支持使用shutil模块中的高级方法创建TAR和ZIP存档。 shutil的归档实用程序允许您创建,读取和提取ZIP和TAR归档文件。 这些实用程序依赖于较低级别的tarfilezipfile模块。

Working With Archives Using shutil.make_archive()

使用shutil.make_archive()档案

shutil.make_archive() takes at least two arguments: the name of the archive and an archive format.

shutil.make_archive()至少接受两个参数:档案的名称和档案格式。

By default, it compresses all the files in the current directory into the archive format specified in the format argument. You can pass in an optional root_dir argument to compress files in a different directory. .make_archive() supports the zip, tar, bztar, and gztar archive formats.

默认情况下,它将当前目录中的所有文件压缩为format参数中指定的存档格式。 您可以传入一个可选的root_dir参数来压缩其他目录中的文件。 .make_archive()支持ziptarbztargztar存档格式。

This is how to create a TAR archive using shutil:

这是使用shutil创建TAR归档文件的方法:

This copies everything in data/ and creates an archive called backup.tar in the filesystem and returns its name. To extract the archive, call .unpack_archive():

这将复制data/所有内容,并在文件系统中创建一个名为backup.tar的存档,并返回其名称。 要提取档案,请调用.unpack_archive()

 shutilshutil .. unpack_archiveunpack_archive (( 'backup.tar''backup.tar' , , 'extract_dir/''extract_dir/' )
)

Calling .unpack_archive() and passing in an archive name and destination directory extracts the contents of backup.tar into extract_dir/. ZIP archives can be created and extracted in the same way.

调用.unpack_archive()并传入档案名称和目标目录, .unpack_archive() backup.tar的内容提取到extract_dir/ 。 可以用相同的方式创建和提取ZIP存档。

读取多个文件 (Reading Multiple Files)

Python supports reading data from multiple input streams or from a list of files through the fileinput module. This module allows you to loop over the contents of one or more text files quickly and easily. Here’s the typical way fileinput is used:

Python支持通过fileinput模块从多个输入流或文件列表中读取数据。 此模块使您可以快速轻松地循环浏览一个或多个文本文件的内容。 这是使用fileinput的典型方式:

fileinput gets its input from command line arguments passed to sys.argv by default.

默认情况下, fileinput从传递给sys.argv的命令行参数获取输入。

Using fileinput to Loop Over Multiple Files

使用fileinput循环多个文件

Let’s use fileinput to build a crude version of the common UNIX utility cat. The cat utility reads files sequentially, writing them to standard output. When given more than one file in its command line arguments, cat will concatenate the text files and display the result in the terminal:

让我们使用fileinput构建通用UNIX实用程序cat的原始版本。 cat实用程序按顺序读取文件,并将它们写入标准输出。 当在命令行参数中给多个文件时, cat将串联文本文件并在终端中显示结果:

 # File: fileinput-example.py
# File: fileinput-example.py
import import fileinput
fileinput
import import sys


sys


files files = = fileinputfileinput .. inputinput ()
()
for for line line in in filesfiles :
    :
    if if fileinputfileinput .. isfirstlineisfirstline ():
        ():
        printprint (( ff '' nn --- Reading {fileinput.filename()} ---'--- Reading {fileinput.filename()} ---' )
    )
    printprint (( ' -> ' ' -> ' + + lineline , , endend == '''' )
)
printprint ()
()

Running this on two text files in my current directory produces the following output:

在当前目录中的两个文本文件上运行此命令将产生以下输出:

fileinput allows you to retrieve more information about each line such as whether or not it is the first line (.isfirstline()), the line number (.lineno()), and the filename (.filename()). You can read more about it here.

fileinput允许您检索有关每行的更多信息,例如是否为第一行( .isfirstline() ),行号( .lineno() )和文件名( .filename() )。 您可以在此处了解更多信息。

结论 (Conclusion)

You now know how to use Python to perform the most common operations on files and groups of files. You’ve learned about the different built-in modules used to read, find, and manipulate them.

现在,您知道如何使用Python对文件和文件组执行最常见的操作。 您已经了解了用于读取,查找和操作它们的各种内置模块。

You’re now equipped to use Python to:

您现在可以使用Python进行以下操作:

  • Get directory contents and file properties
  • Create directories and directory trees
  • Find patterns in filenames
  • Create temporary files and directories
  • Move, rename, copy, and delete files or directories
  • Read and extract data from different types of archives
  • Read multiple files simultaneously using fileinput
  • 获取目录内容和文件属性
  • 创建目录和目录树
  • 查找文件名中的模式
  • 创建临时文件和目录
  • 移动,重命名,复制和删除文件或目录
  • 从不同类型的档案中读取和提取数据
  • 使用fileinput同时读取多个文件

翻译自: https://www.pybloggers.com/2019/01/working-with-files-in-python/

你可能感兴趣的:(python,linux,java,大数据,数据库)