在 Python 中可以使用 open() 函数来打开文件,该函数将返回一个文件对象,然后我们可以通过调用该文件对象的 read() 函数对其内容进行读取。
在目录 D:\work\20190810 下新建文件,编辑其内容为 Hello Python~ 后保存。执行以下 Python 代码:
# Python Program to Read Text File
f = open("D:/work/20190810/sample.txt", "r")
data = f.read()
print(type(f))
print(type(data))
print(data)
上面的示例中,注意给的是绝对路径,如果给相对路径的话,以执行程序所在目录为当前目录。
如果你只想读取该文件起始的 N 个字符,将数字 N 作为参数传给 read() 函数即可。
# Read only some characters in the Text File
f = open("D:/work/20190810/sample.txt", "r")
data = f.read(7)
print(data)
执行和输出:
上边读取了前 7 个字符,可见 N 以 1 开始。
在默认情况下,当你打开一个文件的时候,该文件将会以文本模式被打开。还有另一种打开模式叫做二进制模式。在接下来的示例中,我们将会用 “t” 和 “r” 一起,显式以文本只读模式打开文件。
# Read file in Text mode
f = open("D:/work/20190810/sample.txt", "rt")
data = f.read()
print(data)
要向文本文件写入字符串,你可以遵循以下步骤:
在接下来的示例中,我们将按照上述步骤将一个字符串常量写入到一个文本文件。
# Write String to Text File
text_file = open("D:/work/20190810/sample.txt", "w")
n = text_file.write('Python welcome you~')
text_file.close()
print(n)
执行该示例:
可见 write() 方法返回的是写入文本文件的字符串所包含的字符个数。
使用文本编辑器打开该文件查看其内容如下所示:
可见写入模式打开文本文件后,并对其进行写入,如果该文件已经存在,原来的内容将会被覆盖。如果该文件不存在,将新建一个文件然后将字符串写入。
上小节中我们提到过,一个文件可以使用两种模式的其中一种打开:文本或者二进制。在默认情况下一个文件将以文本模式打开,当然你也可以显式定义使用文本模式打开。在接下来的示例中,我们将使用 “t” 来显式说明使用文本模式打开文件,并遵循上述步骤将字符串写入该文件。
# Write String to Text File in Text Mode
text_file = open("D:/work/20190810/sample.txt", "wt")
n = text_file.write('Python, Python~')
text_file.close()
print(n)
在 Python 里移除一个文件可以调用 os 库的 remove() 方法,将该文件的路径作为参数传给它即可。
在接下来的示例中,我们将上面小节中用到的文件删除。
# Remove File with Python
import os
os.remove("D:/work/20190810/sample.txt")
print("The file is removed")
如果提供给 remove() 方法的文件并不存在,你将会得到一个 FileNotFoundError 错误。
再次执行上述删除程序,输出:
在 Python 程序里创建一个新目录可以使用 os.mkdir() 函数,将要创建目录的路径传递给它即可。
os.mkdir() 函数的语法如下:
os.mkdir(path, mode=0o777, *, dir_fd=None)
其中 path 是上面所说的路径,mode 为分配给该目录的文件权限。请参考 linux 文件权限体系,777 为所有用户可读可写可执行权限。
在接下来的示例中我们将新建一个名为 sample 的目录。
# Create a Directory or Folder with Python
import os
os.mkdir("D:/work/20190810/sample")
print('The directory is created.')
如果你要创建一个已经存在了的目录,你将会得到一个 FileExistsError。
继续执行上小节的创建目录的程序输出如下:
该错误指示出 当文件已存在时,无法创建该文件。: ‘D:/work/20190810/sample’,同时还提示出错在程序的哪行代码 File “D:/PycharmProjects/MyPythonApp/testfile.py”, line 8, in
要检查给定的路径是一个文件还是一个目录,使用 os 库的 isfile() 方法来检查其是否是一个文件,isdir() 方法来检查其是否是一个目录。
import os
isFile = os.path.isfile(fpath)
isDirectory = os.path.isdir(fpath)
两个方法都将返回一个布尔值,指示出该文件路径是否是一个文件,或者是否是一个目录。
接下来我们先执行上面写入代码将 sample.txt 创建好,然后判断其是否文件。
# Check if the Path is a File
import os
fpath = "D:/work/20190810/sample.txt"
isFile = os.path.isfile(fpath)
print("The file present at the path is a regular file:", isFile)
# try with a path that is a folder
fpath = "D:/work/20190810"
isFile = os.path.isfile(fpath)
print("The file present at the path is a regular file:", isFile)
在接下来的示例中,我们使用 isdir() 方法来判断指定路径是否是一个目录。
# Check if the Path is a Directory
fpath = "D:/work/20190810"
isDirectory = os.path.isdir(fpath)
print("Path points to a Directory:", isDirectory)
fpath = "D:/work/20190810/sample.txt"
isDirectory = os.path.isdir(fpath)
print("Path points to a Directory:", isDirectory)
要拿到某个文件夹/目录及其子文件夹/子目录下的所有的文件列表,我们可以使用 os.walk() 函数。os.walk() 函数将提供当前目录、子目录以及它们下边所有文件的一个迭代。
在本示例中,我们先拿到一个目录的路径,接下来递归列举该目录及其子目录下的所有文件。
# Get the list of all files
import os
path = "D:\PycharmProjects\MyPythonApp"
for root, dirs, files in os.walk(path):
for file in files:
print(os.path.join(root,file))
执行该程序输出结果如下:
D:\PycharmProjects\MyPythonApp\venv\Scripts\python.exe D:/PycharmProjects/MyPythonApp/testfile.py D:\PycharmProjects\MyPythonApp\Car.py D:\PycharmProjects\MyPythonApp\first.py D:\PycharmProjects\MyPythonApp\inputtest.py D:\PycharmProjects\MyPythonApp\testfile.py D:\PycharmProjects\MyPythonApp\testkeywords.py D:\PycharmProjects\MyPythonApp\testlist.py D:\PycharmProjects\MyPythonApp\testobject.py D:\PycharmProjects\MyPythonApp\.idea\encodings.xml D:\PycharmProjects\MyPythonApp\.idea\misc.xml D:\PycharmProjects\MyPythonApp\.idea\modules.xml D:\PycharmProjects\MyPythonApp\.idea\MyPythonApp.iml D:\PycharmProjects\MyPythonApp\.idea\workspace.xml D:\PycharmProjects\MyPythonApp\venv\pyvenv.cfg D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\easy-install.pth D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\setuptools-39.1.0-py3.7.egg D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\setuptools.pth |
可见指定目录下下所有子目录,子目录的子目录,子目录的子目录的子目录…下的文件都被打印了出来。
要理解上述程序的执行,只需要执行以下程序:
# Get the list of all files with a specific extension
import os
path = "D:\PycharmProjects\MyPythonApp"
for root, dirs, files in os.walk(path):
print(root)
print(dirs)
print(files)
执行后的输出结果如下:
D:\PycharmProjects\MyPythonApp\venv\Scripts\python.exe D:/PycharmProjects/MyPythonApp/testfile.py D:\PycharmProjects\MyPythonApp ['.idea', 'venv'] ['Car.py', 'first.py', 'inputtest.py', 'testfile.py', 'testkeywords.py', 'testlist.py', 'testobject.py'] D:\PycharmProjects\MyPythonApp\.idea [] ['encodings.xml', 'misc.xml', 'modules.xml', 'MyPythonApp.iml', 'workspace.xml'] D:\PycharmProjects\MyPythonApp\venv ['Include', 'Lib', 'Scripts'] ['pyvenv.cfg'] D:\PycharmProjects\MyPythonApp\venv\Include [] [] |
Python 源码 os.py 对 walk 定义时的解释如下:
Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).
在上小节示例的基础上,我们将所有扩展名为 .py 的文件过滤出来打印。
# Get the list of all files with a specific extension
import os
path = "D:\PycharmProjects\MyPythonApp"
for root, dirs, files in os.walk(path):
for file in files:
if (file.endswith(".py")):
print(os.path.join(root, file))
执行后的输出结果:
D:\PycharmProjects\MyPythonApp\venv\Scripts\python.exe D:/PycharmProjects/MyPythonApp/testfile.py D:\PycharmProjects\MyPythonApp\Car.py D:\PycharmProjects\MyPythonApp\first.py D:\PycharmProjects\MyPythonApp\inputtest.py D:\PycharmProjects\MyPythonApp\testfile.py D:\PycharmProjects\MyPythonApp\testkeywords.py D:\PycharmProjects\MyPythonApp\testlist.py D:\PycharmProjects\MyPythonApp\testobject.py D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\__init__.py D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\__main__.py |
接下来的示例我们遍历统计指定目录下文件占用空间,查看哪个子目录占用空间较大,可以实现类似于 linux 的 du 命令的效果,可以应用于系统磁盘空间的实时监控。这对于系统运维非常重要,因为磁盘监控是系统运行监控及异常实时报警的一个重要 KPI。
# Get the size of all directories of specific directory
import os
from os.path import join, getsize
path = "D:\PycharmProjects\MyPythonApp"
for root, dirs, files in os.walk(path):
print(root, "directory takes", sum([getsize(join(root, name)) for name in files]), "bytes")
执行后输出结果:
D:\PycharmProjects\MyPythonApp\venv\Scripts\python.exe D:/PycharmProjects/MyPythonApp/testfile.py D:\PycharmProjects\MyPythonApp directory takes 8575 bytes D:\PycharmProjects\MyPythonApp\.idea directory takes 21842 bytes D:\PycharmProjects\MyPythonApp\venv directory takes 81 bytes D:\PycharmProjects\MyPythonApp\venv\Include directory takes 0 bytes D:\PycharmProjects\MyPythonApp\venv\Lib directory takes 0 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages directory takes 563301 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg directory takes 0 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\EGG-INFO directory takes 15653 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip directory takes 653 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal directory takes 227304 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal\commands directory takes 82597 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal\models directory takes 518 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal\operations directory takes 29549 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal\req directory takes 82600 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal\utils directory takes 77945 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_internal\vcs directory takes 44880 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor directory takes 425914 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\cachecontrol directory takes 37555 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\cachecontrol\caches directory takes 5435 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\certifi directory takes 272070 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\chardet directory takes 370901 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\chardet\cli directory takes 2861 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\colorama directory takes 26810 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\distlib directory takes 768144 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\distlib\_backport directory takes 153388 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\html5lib directory takes 358491 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\html5lib\filters directory takes 47191 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\html5lib\treeadapters directory takes 4304 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\html5lib\treebuilders directory takes 55339 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\html5lib\treewalkers directory takes 28632 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\html5lib\_trie directory takes 4334 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\idna directory takes 244485 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\lockfile directory takes 30251 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\msgpack directory takes 40249 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\packaging directory takes 59842 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\pkg_resources directory takes 107226 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\progress directory takes 12243 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\pytoml directory takes 15475 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\requests directory takes 171402 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3 directory takes 124905 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3\contrib directory takes 69295 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3\contrib\_securetransport directory takes 30558 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3\packages directory takes 40274 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3\packages\backports directory takes 1514 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3\packages\ssl_match_hostname directory takes 6583 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\urllib3\util directory takes 79210 bytes D:\PycharmProjects\MyPythonApp\venv\Lib\site-packages\pip-10.0.1-py3.7.egg\pip\_vendor\webencodings directory takes 32843 bytes D:\PycharmProjects\MyPythonApp\venv\Scripts directory takes 1426147 bytes |
在 Python 中需要按以下步骤追加文本到文件。
在接下来的示例中,我们先创建了一个带有一些文本的 data.txt 文件。我们将按照上述步骤追加一些文本内容到该文件中去。
该文件初试内容:
编辑写入程序:
# Concatenate or Append Text to File
fin = open("D:/work/20190810/data.txt", "a+")
fin.write("\nThis is newly append text.")
fin.close()
你可以任意在文本或二进制模式下处理文件。默认情况下,文件是以文本模式进行处理。在接下来的示例中,我们将以追加 “t” 到 “a” 后面的办法显式地以文本模式处理文件。
# Append Text to File in Text Mode
fin = open("D:/work/20190810/data.txt", "at")
fin.write("\nThis is newly append text.")
fin.close()
要使用 Python 替换掉文件中的某个字符串,可以遵循以下步骤:
新建 input.txt 并编辑其内容如下:
执行以下 Python 程序:
# Replace string in File
fin = open("D:/work/20190810/input.txt", "rt")
fout = open("D:/work/20190810/out.txt", "wt")
for line in fin:
fout.write(line.replace("pyton", "python"))
fin.close()
fout.close()
执行成功后,该目录下发现有新文件 out.txt 生成,查看其内容:
发生了什么?
在接下来的示例中,我们直接对 input.txt 文件进行操作,将其中的 pyton 替换为 python,并以替换后的文本重写 input.txt。
# Replace string in the same File
fin = open("D:/work/20190810/input.txt", "rt")
data = fin.read()
data = data.replace("pyton", "python")
fin.close()
fin = open("D:/work/20190810/input.txt", "wt")
fin.write(data)
fin.close()
执行该程序后查看 input.txt 内容如下:
发生了什么?
有很多种办法可以将文件中连续的多个空格替换为单个空格,比如使用字符串分割或者正则表达式。接下来我们使用具体的例子来学习一下这些办法。
接下来我们先将前边例子里生成的 input.txt 和 out.txt 删掉,然后新建 input.txt 并编辑其内容如下:
Welcome to www.defonds.net. Here, you will find python programs for all general use cases. |
代码示例:
# Using Split String
fin = open("D:/work/20190810/input.txt", "rt")
fout = open("D:/work/20190810/out.txt", "wt")
for line in fin:
fout.write(' '.join(line.split()))
fin.close()
fout.close()
执行该程序后发现 out.txt 已生成,其内容为:
Welcome to www.defonds.net. Here, you will find python programs for all general use cases. |
可以看到,所有连续的多个空格都已被单个空格取代。
我们还可以使用正则表达式来找到连续的多个空格,并使用单个空格将它们替换。
使用正则替换连续的多个空格为单个空格的程序示例如下:
# Using Regular Expression Replace
import re
fin = open("D:/work/20190810/input.txt", "rt")
fout = open("D:/work/20190810/out.txt", "wt")
for line in fin:
fout.write(re.sub("\s+", ' ', line))
fin.close()
fout.close()
要统计出现在一个文本文件中的单词的个数,需要遵循以下步骤。
新建或编辑 data.txt 内容如下:
Welcome to www.defonds.net. Here, you will find python programs for all general use cases. |
# Count Number of Words
file = open("D:/work/20190810/data.txt", "rt")
data = file.read()
words = data.split()
print("Number of words in text file:", len(words))
重新编辑 data.txt 的内容,添加一行后的内容如下:
Welcome to www.defonds.net. Here, you will find python programs for all general use cases. This is another line with some words. |
要统计一个文本文件中字符的个数,需要遵循以下步骤。
新建或编辑 data.txt 内容如下所示。
Welcome to www.defonds.net. Here, you will find python programs for all general use cases. |
# Count Characters in a Text File
file = open("D:/work/20190810/data.txt", "r")
data = file.read()
number_of_characters = len(data)
print("Number of characters in text file:", number_of_characters)
接下来我们尝试统计该文件中去除空格以外的字符数。
# Count Characters in a Text File excluding spaces
file = open("D:/work/20190810/data.txt", "r")
data = file.read().replace(" ", "")
number_of_characters = len(data)
print("Number of characters in text file:", number_of_characters)