使用python读写文件_使用Python读写文件(指南)

使用python读写文件

One of the most common tasks that you can do with Python is reading and writing files. Whether it’s writing to a simple text file, reading a complicated server log, or even analyzing raw byte data, all of these situations require reading or writing a file.

使用Python可以执行的最常见任务之一是读写文件。 无论是写入简单的文本文件,读取复杂的服务器日志,还是分析原始字节数据,所有这些情况都需要读取或写入文件。

In this tutorial, you’ll learn:

在本教程中,您将学习:

  • What makes up a file and why that’s important in Python
  • The basics of reading and writing files in Python
  • Some basic scenarios of reading and writing files
  • 组成文件的原因以及为什么这在Python中很重要
  • 使用Python读写文件的基础
  • 读写文件的一些基本方案

This tutorial is mainly for beginner to intermediate Pythonistas, but there are some tips in here that more advanced programmers may appreciate as well.

本教程主要针对初学者和中级Pythonista,但是这里有一些技巧,更高级的程序员也可能会喜欢。

Free Bonus: Click here to get our free Python Cheat Sheet that shows you the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

免费红利: 单击此处可获得我们的免费Python备忘单 ,其中显示了Python 3的基础知识,例如使用数据类型,字典,列表和Python函数。

Take the Quiz: Test your knowledge with our interactive “Reading and Writing Files in Python” quiz. Upon completion you will receive a score so you can track your learning progress over time:

参加测验:通过我们的交互式“用Python读写文件”测验来测试您的知识。 完成后,您将获得一个分数,因此您可以跟踪一段时间内的学习进度:

Take the Quiz »

参加测验»

什么是文件? (What Is a File?)

Before we can go into how to work with files in Python, it’s important to understand what exactly a file is and how modern operating systems handle some of their aspects.

在开始使用Python处理文件之前,了解文件的确切含义以及现代操作系统如何处理其某些方面非常重要。

At its core, a file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be anything as simple as a text file or as complicated as a program executable. In the end, these byte files are then translated into binary 1 and 0 for easier processing by the computer.

文件的核心是用于存储数据的一组连续字节。 此数据以特定格式组织,并且可以像文本文件一样简单,也可以像程序可执行文件一样复杂。 最后,然后将这些字节文件转换为二进制10以方便计算机处理。

Files on most modern file systems are composed of three main parts:

大多数现代文件系统上的文件都由三个主要部分组成:

  1. Header: metadata about the contents of the file (file name, size, type, and so on)
  2. Data: contents of the file as written by the creator or editor
  3. End of file (EOF): special character that indicates the end of the file
  1. 标头:有关文件内容的元数据(文件名,大小,类型等)
  2. 数据:由创建者或编辑者编写的文件内容
  3. 文件结尾(EOF):指示文件结尾的特殊字符

使用python读写文件_使用Python读写文件(指南)_第1张图片

What this data represents depends on the format specification used, which is typically represented by an extension. For example, a file that has an extension of .gif most likely conforms to the Graphics Interchange Format specification. There are hundreds, if not thousands, of file extensions out there. For this tutorial, you’ll only deal with .txt or .csv file extensions.

该数据表示什么取决于所使用的格式规范,该规范通常由扩展名表示。 例如,扩展名为.gif的文件很可能符合“ 图形交换格式”规范。 有数百个(如果不是数千个) 文件扩展名 。 在本教程中,您将仅处理.txt.csv文件扩展名。

文件路径 (File Paths)

When you access a file on an operating system, a file path is required. The file path is a string that represents the location of a file. It’s broken up into three major parts:

当您在操作系统上访问文件时,需要文件路径。 文件路径是代表文件位置的字符串。 它分为三个主要部分:

  1. Folder Path: the file folder location on the file system where subsequent folders are separated by a forward slash / (Unix) or backslash (Windows)
  2. File Name: the actual name of the file
  3. Extension: the end of the file path pre-pended with a period (.) used to indicate the file type
  1. 文件夹路径:文件系统上的文件夹位置,后续文件夹之间用正斜杠/ (Unix)或反斜杠分隔 (视窗)
  2. 文件名:文件的实际名称
  3. 扩展名:文件路径的末尾加句号( . ),用于指示文件类型

Here’s a quick example. Let’s say you have a file located within a file structure like this:

这是一个简单的例子。 假设您有一个位于以下文件结构中的文件:

/
│
├── path/
|   │
│   ├── to/
│   │   └── cats.gif
│   │
│   └── dog_breeds.txt
|
└── animals.csv
/
│
├── path/
|   │
│   ├── to/
│   │   └── cats.gif
│   │
│   └── dog_breeds.txt
|
└── animals.csv

Let’s say you wanted to access the cats.gif file, and your current location was in the same folder as path. In order to access the file, you need to go through the path folder and then the to folder, finally arriving at the cats.gif file. The Folder Path is path/to/. The File Name is cats. The File Extension is .gif. So the full path is path/to/cats.gif.

假设您要访问cats.gif文件,并且您当前的位置与path在同一文件夹中。 为了访问该文件,您需要依次浏览path文件夹和to文件夹,最后到达cats.gif文件。 文件夹路径是path/to/ 。 文件名是cats 。 文件扩展名是.gif 。 因此,完整路径为path/to/cats.gif

Now let’s say that your current location or current working directory (cwd) is in the to folder of our example folder structure. Instead of referring to the cats.gif by the full path of path/to/cats.gif, the file can be simply referenced by the file name and extension cats.gif.

现在,假设您的当前位置或当前工作目录(cwd)位于示例文件夹结构的to文件夹中。 而是指的cats.gif通过的完整路径path/to/cats.gif ,该文件可以简单地通过文件名和扩展名引用cats.gif

But what about dog_breeds.txt? How would you access that without using the full path? You can use the special characters double-dot (..) to move one directory up. This means that ../dog_breeds.txt will reference the dog_breeds.txt file from the directory of to:

但是dog_breeds.txt呢? 在不使用完整路径的情况下如何访问? 您可以使用特殊字符双点( .. )将一个目录上移。 这意味着, ../dog_breeds.txt将引用dog_breeds.txt从目录文件to

/
│
├── path/  ← Referencing this parent folder
|   │
|   ├── to/  ← Current working directory (cwd)
|   │   └── cats.gif
|   │
|   └── dog_breeds.txt  ← Accessing this file
|
└── animals.csv
/
│
├── path/  ← Referencing this parent folder
|   │
|   ├── to/  ← Current working directory (cwd)
|   │   └── cats.gif
|   │
|   └── dog_breeds.txt  ← Accessing this file
|
└── animals.csv

The double-dot (..) can be chained together to traverse multiple directories above the current directory. For example, to access animals.csv from the to folder, you would use ../../animals.csv.

可以将双点( .. )链接在一起,以遍历当前目录上方的多个目录。 例如,访问animals.csvto文件夹,您可以使用../../animals.csv

线尾 (Line Endings)

One problem often encountered when working with file data is the representation of a new line or line ending. The line ending has its roots from back in the Morse Code era, when a specific pro-sign was used to communicate the end of a transmission or the end of a line.

使用文件数据时经常遇到的一个问题是换行或换行的表示。 线路结尾的根源可以追溯到摩尔斯电码时代, 当时使用特定的符号来传达传输的结尾或线路的结尾 。

Later, this was standardized for teleprinters by both the International Organization for Standardization (ISO) and the American Standards Association (ASA). ASA standard states that line endings should use the sequence of the Carriage Return (CR or r) and the Line Feed (LF or n) characters (CR+LF or rn). The ISO standard however allowed for either the CR+LF characters or just the LF character.

后来,国际标准化组织(ISO)和美国标准协会(ASA) 对电传打印机进行了标准化。 ASA标准规定行尾应该使用回车(序列CRr和换行( LFn )字符( CR+LFrn )。 但是,ISO标准允许CR+LF字符或仅允许LF字符。

Windows uses the CR+LF characters to indicate a new line, while Unix and the newer Mac versions use just the LF character. This can cause some complications when you’re processing files on an operating system that is different than the file’s source. Here’s a quick example. Let’s say that we examine the file dog_breeds.txt that was created on a Windows system:

Windows使用CR+LF字符表示换行,而Unix和较新的Mac版本仅使用LF字符。 在与文件源不同的操作系统上处理文件时,这可能会导致一些复杂情况。 这是一个简单的例子。 假设我们检查了在Windows系统上创建的文件dog_breeds.txt

This same output will be interpreted on a Unix device differently:

在Unix设备上,相同的输出将被不同地解释:

Pugr
n
Jack Russel Terrierr
n
English Springer Spanielr
n
German Shepherdr
n
Staffordshire Bull Terrierr
n
Cavalier King Charles Spanielr
n
Golden Retrieverr
n
West Highland White Terrierr
n
Boxerr
n
Border Terrierr
n
Pugr
n
Jack Russel Terrierr
n
English Springer Spanielr
n
German Shepherdr
n
Staffordshire Bull Terrierr
n
Cavalier King Charles Spanielr
n
Golden Retrieverr
n
West Highland White Terrierr
n
Boxerr
n
Border Terrierr
n

This can make iterating over each line problematic, and you may need to account for situations like this.

这可能会使遍历每条线成为问题,并且您可能需要考虑这种情况。

字符编码 (Character Encodings)

Another common problem that you may face is the encoding of the byte data. An encoding is a translation from byte data to human readable characters. This is typically done by assigning a numerical value to represent a character. The two most common encodings are the ASCII and UNICODE Formats. ASCII can only store 128 characters, while Unicode can contain up to 1,114,112 characters.

您可能面临的另一个常见问题是字节数​​据的编码。 编码是从字节数据到人类可读字符的转换。 这通常是通过分配一个数值来表示一个字符来完成的。 两种最常见的编码是ASCII和UNICODE格式。 ASCII只能存储128个字符 ,而Unicode最多可以存储1,114,112个字符 。

ASCII is actually a subset of Unicode (UTF-8), meaning that ASCII and Unicode share the same numerical to character values. It’s important to note that parsing a file with the incorrect character encoding can lead to failures or misrepresentation of the character. For example, if a file was created using the UTF-8 encoding, and you try to parse it using the ASCII encoding, if there is a character that is outside of those 128 values, then an error will be thrown.

ASCII实际上是Unicode(UTF-8)的子集,这意味着ASCII和Unicode共享相同的数字到字符值。 重要的是要注意,使用错误的字符编码来解析文件会导致失败或字符的错误表示。 例如,如果文件是使用UTF-8编码创建的,而您尝试使用ASCII编码来分析文件,则如果这128个值之外的字符,则会引发错误。

用Python打开和关闭文件 (Opening and Closing a File in Python)

When you want to work with a file, the first thing to do is to open it. This is done by invoking the open() built-in function. open() has a single required argument that is the path to the file. open() has a single return, the file object:

当您要使用文件时,要做的第一件事就是打开它。 这是通过调用open()内置函数完成的 。 open()有一个必需的参数,它是文件的路径。 open()有一个返回值,即文件对象 :

After you open a file, the next thing to learn is how to close it.

打开文件后,接下来要学习的是如何关闭文件。

Warning: You should always make sure that an open file is properly closed.

警告:您应始终确保已正确关闭打开的文件。

It’s important to remember that it’s your responsibility to close the file. In most cases, upon termination of an application or script, a file will be closed eventually. However, there is no guarantee when exactly that will happen. This can lead to unwanted behavior including resource leaks. It’s also a best practice within Python (Pythonic) to make sure that your code behaves in a way that is well defined and reduces any unwanted behavior.

重要的是要记住,关闭文件是您的责任。 在大多数情况下,应用程序或脚本终止后,文件将最终关闭。 但是,不能保证确切的时间会发生。 这可能导致不良行为,包括资源泄漏。 这也是Python(Pythonic)中的最佳做法,以确保您的代码以定义良好的方式运行并减少任何不必要的行为。

When you’re manipulating a file, there are two ways that you can use to ensure that a file is closed properly, even when encountering an error. The first way to close a file is to use the try-finally block:

处理文件时,可以使用两种方法来确保文件正确关闭,即使遇到错误也是如此。 关闭文件的第一种方法是使用try-finally块:

 reader reader = = openopen (( 'dog_breeds.txt''dog_breeds.txt' )
)
trytry :
    :
    # Further file processing goes here
# Further file processing goes here
finallyfinally :
    :
    readerreader .. closeclose ()
()

If you’re unfamiliar with what the try-finally block is, check out Python Exceptions: An Introduction.

如果您不熟悉try-finally块是什么,请查看Python异常:简介 。

The second way to close a file is to use the with statement:

关闭文件的第二种方法是使用with语句:

The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error. I highly recommend that you use the with statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.

一旦文件离开with块,即使在发生错误的情况下, with语句也会自动负责关闭文件。 我强烈建议您尽可能使用with语句,因为它可以使代码更简洁,并使您更轻松地处理任何意外错误。

Most likely, you’ll also want to use the second positional argument, mode. This argument is a string that contains multiple characters to represent how you want to open the file. The default and most common is 'r', which represents opening the file in read-only mode as a text file:

最有可能的是,您还想使用第二个位置参数mode 。 此参数是一个包含多个字符的字符串,代表您要如何打开文件。 默认和最常见的是'r' ,它表示以只读模式将文件打开为文本文件:

 with with openopen (( 'dog_breeds.txt''dog_breeds.txt' , , 'r''r' ) ) as as readerreader :
    :
    # Further file processing goes here
# Further file processing goes here

Other options for modes are fully documented online, but the most commonly used ones are the following:

有关模式的其他选项已在网上全面记录 ,但最常用的选项如下:

Character 字符 Meaning 含义
'r''r' Open for reading (default) 打开阅读(默认)
'w''w' Open for writing, truncating (overwriting) the file first 打开进行写入,首先截断(覆盖)文件
'rb' or 'rb''wb''wb' Open in binary mode (read/write using byte data) 以二进制模式打开(使用字节数据进行读取/写入)

Let’s go back and talk a little about file objects. A file object is:

让我们回过头来谈谈文件对象。 文件对象是:

“an object exposing a file-oriented API (with methods such as read() or write()) to an underlying resource.” (Source)

“一个将面向文件的API(使用诸如read()write() )公开给基础资源的对象。” ( 来源 )

There are three different categories of file objects:

共有三种不同的文件对象类别:

  • Text files
  • Buffered binary files
  • Raw binary files
  • 文字档
  • 缓冲的二进制文件
  • 原始二进制文件

Each of these file types are defined in the io module. Here’s a quick rundown of how everything lines up.

这些文件类型均在io模块中定义。 这是所有内容排列方式的简要概述。

文字档案类型 (Text File Types)

A text file is the most common file that you’ll encounter. Here are some examples of how these files are opened:

文本文件是您将遇到的最常见的文件。 下面是一些有关如何打开这些文件的示例:

With these types of files, open() will return a TextIOWrapper file object:

对于这些类型的文件, open()将返回TextIOWrapper文件对象:

>>>
>>> file = open('dog_breeds.txt')
>>> type(file)


>>>

This is the default file object returned by open().

这是open()返回的默认文件对象。

缓冲的二进制文件类型 (Buffered Binary File Types)

A buffered binary file type is used for reading and writing binary files. Here are some examples of how these files are opened:

缓冲的二进制文件类型用于读取和写入二进制文件。 下面是一些有关如何打开这些文件的示例:

 openopen (( 'abc.txt''abc.txt' , , 'rb''rb' )

)

openopen (( 'abc.txt''abc.txt' , , 'wb''wb' )
)

With these types of files, open() will return either a BufferedReader or BufferedWriter file object:

对于这些类型的文件, open()将返回BufferedReaderBufferedWriter文件对象:

>>>
>>>
 >>>  file = open ( 'dog_breeds.txt' , 'rb' )
>>>  type ( file )

>>>  file = open ( 'dog_breeds.txt' , 'wb' )
>>>  type ( file )


原始文件类型 (Raw File Types)

A raw file type is:

原始文件类型为:

“generally used as a low-level building-block for binary and text streams.” (Source)

“通常用作二进制和文本流的低级构建块。” ( 来源 )

It is therefore not typically used.

因此,通常不使用它。

Here’s an example of how these files are opened:

以下是如何打开这些文件的示例:

With these types of files, open() will return a FileIO file object:

对于这些类型的文件, open()将返回FileIO文件对象:

>>>
>>> file = open('dog_breeds.txt', 'rb', buffering=0)
>>> type(file)


>>>

读取和写入打开的文件 (Reading and Writing Opened Files)

Once you’ve opened up a file, you’ll want to read or write to the file. First off, let’s cover reading a file. There are multiple methods that can be called on a file object to help you out:

打开文件后,您需要读取或写入文件。 首先,让我们开始阅读文件。 可以在文件对象上调用多种方法来帮助您:

Method 方法 What It Does 它能做什么
.read(size=-1).read(size=-1) size bytes. If no argument is passed or size字节数从文件中读取。 如果未传递任何参数,或者传递了None or None-1 is passed, then the entire file is read.-1 ,则将读取整个文件。
.readline(size=-1).readline(size=-1) size number of characters from the line. This continues to the end of the line and then wraps back around. If no argument is passed or size的字符数。 这继续到行尾,然后回绕。 如果未传递任何参数,或者传递了None or None-1 is passed, then the entire line (or rest of the line) is read.-1 ,则将读取整行(或该行的其余部分)。
.readlines().readlines() This reads the remaining lines from the file object and returns them as a list. 这将从文件对象中读取剩余的行,并将它们作为列表返回。

Using the same dog_breeds.txt file you used above, let’s go through some examples of how to use these methods. Here’s an example of how to open and read the entire file using .read():

使用与上面使用的相同的dog_breeds.txt文件,让我们看一下如何使用这些方法的一些示例。 这是一个如何使用.read()打开和读取整个文件的.read()

>>>
>>> with open('dog_breeds.txt', 'r') as reader:
>>>     # Read & print the entire file
>>>     print(reader.read())
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier

>>>

Here’s an example of how to read 5 bytes of a line each time using .readline():

这是一个如何使用.readline()每次读取5个字节的示例:

>>>
>>> with open('dog_breeds.txt', 'r') as reader:
>>>     # Read & print the first 5 characters of the line 5 times
>>>     print(reader.readline(5))
>>>     # Notice that line is greater than the 5 chars and continues
>>>     # down the line, reading 5 chars each time until the end of the
>>>     # line and then "wraps" around
>>>     print(reader.readline(5))
>>>     print(reader.readline(5))
>>>     print(reader.readline(5))
>>>     print(reader.readline(5))
Pug

Jack
Russe
ll Te
rrier

>>>

Here’s an example of how to read the entire file as a list using .readlines():

这是一个如何使用.readlines()以列表形式读取整个文件的示例:

>>>
>>> f = open('dog_breeds.txt')
>>> f.readlines()  # Returns a list object
['Pugn', 'Jack Russel Terriern', 'English Springer Spanieln', 'German Shepherdn', 'Staffordshire Bull Terriern', 'Cavalier King Charles Spanieln', 'Golden Retrievern', 'West Highland White Terriern', 'Boxern', 'Border Terriern']

>>>

The above example can also be done by using list() to create a list out of the file object:

上面的示例也可以通过使用list()在文件对象之外创建列表来完成:

>>>
>>> f = open('dog_breeds.txt')
>>> list(f)
['Pugn', 'Jack Russel Terriern', 'English Springer Spanieln', 'German Shepherdn', 'Staffordshire Bull Terriern', 'Cavalier King Charles Spanieln', 'Golden Retrievern', 'West Highland White Terriern', 'Boxern', 'Border Terriern']

>>>

遍历文件中的每一行 (Iterating Over Each Line in the File)

A common thing to do while reading a file is to iterate over each line. Here’s an example of how to use .readline() to perform that iteration:

读取文件时的常见操作是遍历每一行。 这是一个如何使用.readline()进行迭代的示例:

>>>
>>> with open('dog_breeds.txt', 'r') as reader:
>>>     # Read and print the entire file line by line
>>>     line = reader.readline()
>>>     while line != '':  # The EOF char is an empty string
>>>         print(line, end='')
>>>         line = reader.readline()
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier

>>>

Another way you could iterate over each line in the file is to use the .readlines() of the file object. Remember, .readlines() returns a list where each element in the list represents a line in the file:

您可以遍历文件中每一行的另一种方法是使用文件对象的.readlines() 。 请记住, .readlines()返回一个列表,其中列表中的每个元素代表文件中的一行:

>>>
>>> with open('dog_breeds.txt', 'r') as reader:
>>>     for line in reader.readlines():
>>>         print(line, end='')
Pug
Jack Russell Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier

>>>

However, the above examples can be further simplified by iterating over the file object itself:

但是,可以通过遍历文件对象本身来进一步简化上述示例:

>>>
>>> with open('dog_breeds.txt', 'r') as reader:
>>>     # Read and print the entire file line by line
>>>     for line in reader:
>>>         print(line, end='')
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier

>>>

This final approach is more Pythonic and can be quicker and more memory efficient. Therefore, it is suggested you use this instead.

最后一种方法更像Python一样,可以更快,更高效地使用内存。 因此,建议您改用它。

Note: Some of the above examples contain print('some text', end=''). The end='' is to prevent Python from adding an additional newline to the text that is being printed and only print what is being read from the file.

注意:上面的一些示例包含print('some text', end='')end=''是为了防止Python向正在打印的文本添加额外的换行符,并且仅打印从文件中读取的内容。

Now let’s dive into writing files. As with reading files, file objects have multiple methods that are useful for writing to a file:

现在让我们开始编写文件。 与读取文件一样,文件对象具有多种方法,可用于写入文件:

Method 方法 What It Does 它能做什么
.write(string).write(string) This writes the string to the file. 这会将字符串写入文件。
.writelines(seq).writelines(seq) This writes the sequence to the file. No line endings are appended to each sequence item. It’s up to you to add the appropriate line ending(s). 这会将序列写入文件。 没有行尾添加到每个序列项。 您可以自行添加适当的行尾。

Here’s a quick example of using .write() and .writelines():

这是使用.write().writelines()的简单示例:

 with with openopen (( 'dog_breeds.txt''dog_breeds.txt' , , 'r''r' ) ) as as readerreader :
    :
    # Note: readlines doesn't trim the line endings
    # Note: readlines doesn't trim the line endings
    dog_breeds dog_breeds = = readerreader .. readlinesreadlines ()

()

with with openopen (( 'dog_breeds_reversed.txt''dog_breeds_reversed.txt' , , 'w''w' ) ) as as writerwriter :
    :
    # Alternatively you could use
    # Alternatively you could use
    # writer.writelines(reversed(dog_breeds))

    # writer.writelines(reversed(dog_breeds))

    # Write the dog breeds to the file in reversed order
    # Write the dog breeds to the file in reversed order
    for for breed breed in in reversedreversed (( dog_breedsdog_breeds ):
        ):
        writerwriter .. writewrite (( breedbreed )
)

使用字节 (Working With Bytes)

Sometimes, you may need to work with files using byte strings. This is done by adding the 'b' character to the mode argument. All of the same methods for the file object apply. However, each of the methods expect and return a bytes object instead:

有时,您可能需要使用字节字符串处理文件。 这是通过在mode参数中添加'b'字符来完成的。 适用于文件对象的所有相同方法。 但是,每个方法都期望并返回一个bytes对象:

>>>
>>>
 >>>  with open ( ` dog_breeds . txt ` , 'rb' ) as reader :
>>>     print ( reader . readline ())
b'Pugn'

Opening a text file using the b flag isn’t that interesting. Let’s say we have this cute picture of a Jack Russell Terrier (jack_russell.png):

使用b标志打开文本文件并不是很有趣。 假设我们有一张可爱的杰克罗素梗犬( jack_russell.png )的图片:

使用python读写文件_使用Python读写文件(指南)_第2张图片
CC BY 3.0 (https://creativecommons.org/licenses/by/3.0)], from Wikimedia Commons CC Media 3.0的CC BY 3.0(https://creativecommons.org/licenses/by/3.0)]

You can actually open that file in Python and examine the contents! Since the .png file format is well defined, the header of the file is 8 bytes broken up like this:

您实际上可以在Python中打开该文件并检查其内容! 由于.png文件格式定义良好,因此文件头为8个字节,如下所示:

Value Interpretation 解释
0x890x89 PNGPNG的开始
0x50 0x4E 0x470x50 0x4E 0x47 PNG in ASCIIPNG
0x0D 0x0A0x0D 0x0A rnrn结尾
0x1A0x1A A DOS style EOF character DOS风格的EOF字符
0x0A0x0A nn结尾的Unix样式行

Sure enough, when you open the file and read these bytes individually, you can see that this is indeed a .png header file:

果然,当您打开文件并逐个读取这些字节时,您可以看到这确实是一个.png头文件:

>>>
>>>
 >>>  with open ( 'jack_russell.png' , 'rb' ) as byte_reader :
>>>     print ( byte_reader . read ( 1 ))
>>>     print ( byte_reader . read ( 3 ))
>>>     print ( byte_reader . read ( 2 ))
>>>     print ( byte_reader . read ( 1 ))
>>>     print ( byte_reader . read ( 1 ))
b'x89'
b'PNG'
b'rn'
b'x1a'
b'n'

完整示例: dos2unix.py (A Full Example: dos2unix.py)

Let’s bring this whole thing home and look at a full example of how to read and write to a file. The following is a dos2unix like tool that will convert a file that contains line endings of rn to n.

让我们将整个过程带回家,看看如何读写文件的完整示例。 以下是类似dos2unix工具,它将包含rn尾行的文件转换为n

This tool is broken up into three major sections. The first is str2unix(), which converts a string from rn line endings to n. The second is dos2unix(), which converts a string that contains rn characters into n. dos2unix() calls str2unix(). Finally, there’s the __main__ block, which is called only when the file is executed as a script. Think of it as the main function found in other programming languages.

该工具分为三个主要部分。 第一个是str2unix() ,它将字符串从rn行尾转换为n 。 第二个是dos2unix() ,它将包含rn字符的字符串转换为ndos2unix()调用str2unix() 。 最后,还有__main__块,仅当文件作为脚本执行时才调用。 可以将其视为其他编程语言中的main功能。

技巧和窍门 (Tips and Tricks)

Now that you’ve mastered the basics of reading and writing files, here are some tips and tricks to help you grow your skills.

既然您已经掌握了读写文件的基础知识,那么以下是一些技巧和窍门,可以帮助您提高技能。

__file__ (__file__)

The __file__ attribute is a special attribute of modules, similar to __name__. It is:

__file__属性是模块的特殊属性 ,类似于__name__ 。 它是:

“the pathname of the file from which the module was loaded, if it was loaded from a file.” (Source

“如果从文件中加载了模块,则从中加载模块的文件的路径名。” ( 资料来源

Note: To re-iterate, __file__ returns the path relative to where the initial Python script was called. If you need the full system path, you can use os.getcwd() to get the current working directory of your executing code.

注意:要重申, __file__返回相对于调用初始Python脚本的位置的路径。 如果需要完整的系统路径,则可以使用os.getcwd()获取执行代码的当前工作目录。

Here’s a real world example. In one of my past jobs, I did multiple tests for a hardware device. Each test was written using a Python script with the test script file name used as a title. These scripts would then be executed and could print their status using the __file__ special attribute. Here’s an example folder structure:

这是一个真实的例子。 在过去的工作之一中,我对硬件设备进行了多次测试。 每个测试都是使用Python脚本编写的,并且以测试脚本文件名作为标题。 然后将执行这些脚本,并可以使用__file__特殊属性打印它们的状态。 这是一个示例文件夹结构:

project/
|
├── tests/
|   ├── test_commanding.py
|   ├── test_power.py
|   ├── test_wireHousing.py
|   └── test_leds.py
|
└── main.py
project/
|
├── tests/
|   ├── test_commanding.py
|   ├── test_power.py
|   ├── test_wireHousing.py
|   └── test_leds.py
|
└── main.py

Running main.py produces the following:

运行main.py会产生以下结果:

I was able to run and get the status of all my tests dynamically through use of the __file__ special attribute.

通过使用__file__特殊属性,我能够动态运行并获得所有测试的状态。

附加到文件 (Appending to a File)

Sometimes, you may want to append to a file or start writing at the end of an already populated file. This is easily done by using the 'a' character for the mode argument:

有时,您可能想要附加到文件或在已填充文件的末尾开始写入。 通过在mode参数中使用'a'字符可以轻松完成此mode

 with with openopen (( 'dog_breeds.txt''dog_breeds.txt' , , 'a''a' ) ) as as a_writera_writer :
    :
    a_writera_writer .. writewrite (( '' nn Beagle'Beagle' )
)

When you examine dog_breeds.txt again, you’ll see that the beginning of the file is unchanged and Beagle is now added to the end of the file:

再次检查dog_breeds.txt时,您会看到文件的开头未更改,并且Beagle现在已添加到文件的结尾:

>>>
>>>
 >>>  with open ( 'dog_breeds.txt' , 'r' ) as reader :
>>>     print ( reader . read ())
Pug
Jack Russel Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Beagle

同时处理两个文件 (Working With Two Files at the Same Time)

There are times when you may want to read a file and write to another file at the same time. If you use the example that was shown when you were learning how to write to a file, it can actually be combined into the following:

有时您可能想同时读取文件和写入另一个文件。 如果使用在学习如何写入文件时显示的示例,则实际上可以将其合并为以下内容:

创建自己的上下文管理器 (Creating Your Own Context Manager)

There may come a time when you’ll need finer control of the file object by placing it inside a custom class. When you do this, using the with statement can no longer be used unless you add a few magic methods: __enter__ and __exit__. By adding these, you’ll have created what’s called a context manager.

有时候,您需要通过将文件对象放在自定义类中来对其进行更好的控制。 执行此操作时,除非再添加一些魔术方法__enter____exit__否则将不再使用with语句。 通过添加这些,您将创建所谓的上下文管理器 。

__enter__() is invoked when calling the with statement. __exit__() is called upon exiting from the with statement block.

调用with语句时将调用__enter__() 。 从with语句块退出时,将调用__exit__()

Here’s a template that you can use to make your custom class:

这是可以用来创建自定义类的模板:

 class class my_file_readermy_file_reader ():
    ():
    def def __init____init__ (( selfself , , file_pathfile_path ):
        ):
        selfself .. __path __path = = file_path
        file_path
        selfself .. __file_object __file_object = = None

    None

    def def __enter____enter__ (( selfself ):
        ):
        selfself .. __file_object __file_object = = openopen (( selfself .. __path__path )
        )
        return return self

    self

    def def __exit____exit__ (( selfself , , typetype , , valval , , tbtb ):
        ):
        selfself .. __file_object__file_object .. closeclose ()

    ()

    # Additional methods implemented below
# Additional methods implemented below

Now that you’ve got your custom class that is now a context manager, you can use it similarly to the open() built-in:

现在您有了自定义类,现在它是一个上下文管理器,可以像内置的open()一样使用它:

Here’s a good example. Remember the cute Jack Russell image we had? Perhaps you want to open other .png files but don’t want to parse the header file each time. Here’s an example of how to do this. This example also uses custom iterators. If you’re not familiar with them, check out Python Iterators:

这是一个很好的例子。 还记得我们拥有的可爱的杰克·罗素(Jack Russell)形象吗? 也许您想打开其他.png文件,但不想每次都解析头文件。 这是如何执行此操作的示例。 此示例还使用自定义迭代器。 如果您不熟悉它们,请查看Python迭代器 :

 class class PngReaderPngReader ():
    ():
    # Every .png file contains this in the header.  Use it to verify
    # Every .png file contains this in the header.  Use it to verify
    # the file is indeed a .png.
    # the file is indeed a .png.
    _expected_magic _expected_magic = = bb '' x89x89 PNGPNG rnx1anrnx1an '

    '

    def def __init____init__ (( selfself , , file_pathfile_path ):
        ):
        # Ensure the file has the right extension
        # Ensure the file has the right extension
        if if not not file_pathfile_path .. endswithendswith (( '.png''.png' ):
            ):
            raise raise NameErrorNameError (( "File must be a '.png' extension""File must be a '.png' extension" )
        )
        selfself .. __path __path = = file_path
        file_path
        selfself .. __file_object __file_object = = None

    None

    def def __enter____enter__ (( selfself ):
        ):
        selfself .. __file_object __file_object = = openopen (( selfself .. __path__path , , 'rb''rb' )

        )

        magic magic = = selfself .. __file_object__file_object .. readread (( 88 )
        )
        if if magic magic != != selfself .. _expected_magic_expected_magic :
            :
            raise raise TypeErrorTypeError (( "The File is not a properly formatted .png file!""The File is not a properly formatted .png file!" )

        )

        return return self

    self

    def def __exit____exit__ (( selfself , , typetype , , valval , , tbtb ):
        ):
        selfself .. __file_object__file_object .. closeclose ()

    ()

    def def __iter____iter__ (( selfself ):
        ):
        # This and __next__() are used to create a custom iterator
        # This and __next__() are used to create a custom iterator
        # See https://dbader.org/blog/python-iterators
        # See https://dbader.org/blog/python-iterators
        return return self

    self

    def def __next____next__ (( selfself ):
        ):
        # Read the file in "Chunks"
        # Read the file in "Chunks"
        # See https://en.wikipedia.org/wiki/Portable_Network_Graphics#%22Chunks%22_within_the_file

        # See https://en.wikipedia.org/wiki/Portable_Network_Graphics#%22Chunks%22_within_the_file

        initial_data initial_data = = selfself .. __file_object__file_object .. readread (( 44 )

        )

        # The file hasn't been opened or reached EOF.  This means we
        # The file hasn't been opened or reached EOF.  This means we
        # can't go any further so stop the iteration by raising the
        # can't go any further so stop the iteration by raising the
        # StopIteration.
        # StopIteration.
        if if selfself .. __file_object __file_object is is None None or or initial_data initial_data == == bb '''' :
            :
            raise raise StopIteration
        StopIteration
        elseelse :
            :
            # Each chunk has a len, type, data (based on len) and crc
            # Each chunk has a len, type, data (based on len) and crc
            # Grab these values and return them as a tuple
            # Grab these values and return them as a tuple
            chunk_len chunk_len = = intint .. from_bytesfrom_bytes (( initial_datainitial_data , , byteorderbyteorder == 'big''big' )
            )
            chunk_type chunk_type = = selfself .. __file_object__file_object .. readread (( 44 )
            )
            chunk_data chunk_data = = selfself .. __file_object__file_object .. readread (( chunk_lenchunk_len )
            )
            chunk_crc chunk_crc = = selfself .. __file_object__file_object .. readread (( 44 )
            )
            return return chunk_lenchunk_len , , chunk_typechunk_type , , chunk_datachunk_data , , chunk_crc
chunk_crc

You can now open .png files and properly parse them using your custom context manager:

现在,您可以打开.png文件,并使用自定义上下文管理器正确解析它们:

>>>
>>>
 >>>  with PngReader ( 'jack_russell.png' ) as reader :
>>>     for l , t , d , c in reader :
>>>         print ( f " {l:05} ,  {t} ,  {c} " )
00013, b'IHDR', b'vx121k'
00001, b'sRGB', b'xaexcex1cxe9'
00009, b'pHYs', b'(<]x19'
00345, b'iTXt', b"Lxc2'Y"
16384, b'IDAT', b'ix99x0c('
16384, b'IDAT', b'xb3xfax9a$'
16384, b'IDAT', b'xffxbfxd1n'
16384, b'IDAT', b'xc3x9cxb1}'
16384, b'IDAT', b'xe3x02xbax91'
16384, b'IDAT', b'xa0xa99='
16384, b'IDAT', b'xf4x8b.x92'
16384, b'IDAT', b'x17ixfcxde'
16384, b'IDAT', b'x8fbx0exe4'
16384, b'IDAT', b')3={'
01040, b'IDAT', b'xd6xb8xc1x9f'
00000, b'IEND', b'xaeB`x82'

不要重新发明蛇 (Don’t Re-Invent the Snake)

There are common situations that you may encounter while working with files. Most of these cases can be handled using other modules. Two common file types you may need to work with are .csv and .json. Real Python has already put together some great articles on how to handle these:

在处理文件时,可能会遇到常见的情况。 这些情况大多数都可以使用其他模块来处理。 您可能需要使用的两种常见文件类型是.csv.json 。 Real Python已经就如何处理这些问题撰写了一些很棒的文章:

  • Reading and Writing CSV Files in Python
  • Working With JSON Data in Python
  • 用Python读写CSV文件
  • 在Python中使用JSON数据

Additionally, there are built-in libraries out there that you can use to help you:

此外,还有内置库可用来帮助您:

  • wave: read and write WAV files (audio)
  • aifc: read and write AIFF and AIFC files (audio)
  • sunau: read and write Sun AU files
  • tarfile: read and write tar archive files
  • zipfile: work with ZIP archives
  • configparser: easily create and parse configuration files
  • xml.etree.ElementTree: create or read XML based files
  • msilib: read and write Microsoft Installer files
  • plistlib: generate and parse Mac OS X .plist files
  • wave :读写WAV文件(音频)
  • aifc :读取和写入AIFF和AIFC文件(音频)
  • sunau :读取和写入Sun AU文件
  • tarfile :读写tar存档文件
  • zipfile :使用ZIP存档
  • configparser :轻松创建和解析配置文件
  • xml.etree.ElementTree :创建或读取基于XML的文件
  • msilib :读取和写入Microsoft Installer文件
  • plistlib :生成和解析Mac OS X .plist文件

There are plenty more out there. Additionally there are even more third party tools available on PyPI. Some popular ones are the following:

还有更多。 此外,PyPI上还有更多第三方工具可用。 以下是一些流行的:

  • PyPDF2: PDF toolkit
  • xlwings: read and write Excel files
  • Pillow: image reading and manipulation
  • PyPDF2 :PDF工具包
  • xlwings :读取和写入Excel文件
  • Pillow :图像读取和操作

你是文件向导哈利! (You’re a File Wizard Harry!)

You did it! You now know how to work with files with Python, including some advanced techniques. Working with files in Python should now be easier than ever and is a rewarding feeling when you start doing it.

你做到了! 您现在知道了如何使用Python处理文件,包括一些高级技术。 使用Python处理文件现在应该比以往任何时候都更容易,并且在您开始使用它时会感到很有收获。

In this tutorial you’ve learned:

在本教程中,您学习了:

  • What a file is
  • How to open and close files properly
  • How to read and write files
  • Some advanced techniques when working with files
  • Some libraries to work with common file types
  • 什么是文件
  • 如何正确打开和关闭文件
  • 如何读写文件
  • 处理文件时的一些高级技巧
  • 一些库可用于常见文件类型

If you have any questions, hit us up in the comments.

如果您有任何疑问,请在评论中打扰我们。

Take the Quiz: Test your knowledge with our interactive “Reading and Writing Files in Python” quiz. Upon completion you will receive a score so you can track your learning progress over time:

参加测验:通过我们的交互式“用Python读写文件”测验来测试您的知识。 完成后,您将获得一个分数,因此您可以跟踪一段时间内的学习进度:

Take the Quiz »

参加测验»

翻译自: https://www.pybloggers.com/2019/02/reading-and-writing-files-in-python-guide/

使用python读写文件

你可能感兴趣的:(python,linux,java,编程语言,大数据)