Python运维学习Day01-文件基本操作

这里写自定义目录标题

  • 1.遍历目录下所有的文件
    • 1.1 这里主要利用os.walk 函数的功能
  • 2. 计算文件的 MD5 值
  • 3. 我们组合下两个函数,遍历下某个文件夹下的文件的md5码

1.遍历目录下所有的文件

def getFileName(directory):
    file_list = []
    for dir_name, sub_dir,file_name_list in os.walk(directory):
        #print(dir_name,sub_dir,file_list)
        if file_name_list:
            for file in file_name_list:
                file_path_abs = fr'{dir_name}/{file}'
                file_list.append(file_path_abs)
    return file_list

1.1 这里主要利用os.walk 函数的功能

我们看下os.walk的用法

In [6]: os.walk??
Signature: os.walk(top, topdown=True, onerror=None, followlinks=False)
Source:
def walk(top, topdown=True, onerror=None, followlinks=False):
    """Directory tree generator.

    For each directory in the directory tree rooted at top (including top
    itself, but excluding '.' and '..'), yields a 3-tuple

        dirpath, dirnames, filenames

    dirpath is a string, the path to the directory.  dirnames is a list of
    the names of the subdirectories in dirpath (excluding '.' and '..').
    filenames is a list of the names of the non-directory files in dirpath.
    Note that the names in the lists are just names, with no path components.
    To get a full path (which begins with top) to a file or directory in
    dirpath, do os.path.join(dirpath, name).

    If optional arg 'topdown' is true or not specified, the triple for a
    directory is generated before the triples for any of its subdirectories
    (directories are generated top down).  If topdown is false, the triple
    for a directory is generated after the triples for all of its
    subdirectories (directories are generated bottom up).

    When topdown is true, the caller can modify the dirnames list in-place
    (e.g., via del or slice assignment), and walk will only recurse into the
    subdirectories whose names remain in dirnames; this can be used to prune the
    search, or to impose a specific order of visiting.  Modifying dirnames when
    topdown is false has no effect on the behavior of os.walk(), since the
    directories in dirnames have already been generated by the time dirnames
    itself is generated. No matter the value of topdown, the list of
    subdirectories is retrieved before the tuples for the directory and its
    subdirectories are generated.

    By default errors from the os.scandir() call are ignored.  If
    optional arg 'onerror' is specified, it should be a function; it
    will be called with one argument, an OSError instance.  It can
    report the error to continue with the walk, or raise the exception
    to abort the walk.  Note that the filename is available as the
    filename attribute of the exception object.

    By default, os.walk does not follow symbolic links to subdirectories on
    systems that support them.  In order to get this functionality, set the
    optional argument 'followlinks' to true.

    Caution:  if you pass a relative pathname for top, don't change the
    current working directory between resumptions of walk.  walk never
    changes the current directory, and assumes that the client doesn't
    either.

    Example:

    import os
    from os.path import join, getsize
    for root, dirs, files in os.walk('python/Lib/email'):
        print(root, "consumes", end="")
        print(sum(getsize(join(root, name)) for name in files), end="")
        print("bytes in", len(files), "non-directory files")
        if 'CVS' in dirs:
            dirs.remove('CVS')  # don't visit CVS directories

    """
    sys.audit("os.walk", top, topdown, onerror, followlinks)
    return _walk(fspath(top), topdown, onerror, followlinks)
File:      c:\users\thinkpad\appdata\local\programs\python\python39\lib\os.py
Type:      function

Signature : 函数签名,与其他函数一样,函数签名是区别两个函数是否是同一个函数的唯一标志(敲黑板面试-可能问到的重点)。包括函数名函数列表
Source: 函数的源代码
该函数的功能是目录(directory)树生成器。
以顶部为根的目录树中的每一个目录(包括本身,但不包括父目录),会生成一个三元组,(dirpath,dirnames,filenames)
dirpath–>string: 一个字符串,目录树的路径。
dirnames–>list: 是dirpath的子目录列表(不包括"."-本身-dirpath, “…” -父目录)
filenames–>list: 非目录文件列表,一般这个为空表示dirpath下全是目录,不包含文件,如果非空表示为根节点,可以确定文件的路径了。
综合起来看就是表示 dirpath目录下包含dirnames目录和filenames文件
因此,遍历每个文件夹中的文件就是: filenames 不为空,即可确定文件的路径为dirpath+filenames[x]
Type: 函数

2. 计算文件的 MD5 值

def fileMD5(filePathAbs):
    md5_tool = hashlib.md5()
    with open(filePathAbs,mode='rb') as fobj:
        while True:
            data = fobj.read(4096)
            if data:
                md5_tool.update(data)
            else:
                break
    return md5_tool.hexdigest()

这里使用hashlib模块的md5函数求文件的md5码,我们先来看看md5函数的说明

In [8]: hashlib.md5??
Signature: hashlib.md5(string=b'', *, usedforsecurity=True)
Docstring: Returns a md5 hash object; optionally initialized with a string
Type:      builtin_function_or_method

Type: 内置函数,表示这个函数一般运行很快。
这里初始化是对string=b’'求md5值,并返回一个hash类型的对象,我们看下其用法

In [15]: t = hashlib.md5()

In [16]: t
Out[16]: <md5 _hashlib.HASH object @ 0x00000205960238F0>

In [17]: dir(t)
Out[17]:
['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'block_size',
 'copy',
 'digest',
 'digest_size',
 'hexdigest',
 'name',
 'update']

In [18]: t.hexdigest()
Out[18]: 'd41d8cd98f00b204e9800998ecf8427e'

In [19]:

我们再看看update方法

In [19]: t.update??
Signature: t.update(obj, /)
Docstring: Update this hash object's state with the provided string.
Type:      builtin_function_or_method

该方法是根据提供的string更新其hash对象的值。

3. 我们组合下两个函数,遍历下某个文件夹下的文件的md5码

import os,hashlib

def getFileName(directory):
    file_list = []
    for dir_name, sub_dir,file_name_list in os.walk(directory):
        #print(dir_name,sub_dir,file_list)
        if file_name_list:
            for file in file_name_list:
                file_path_abs = fr'{dir_name}/{file}'
                file_list.append(file_path_abs)
    return file_list

def fileMD5(filePathAbs):
    md5_tool = hashlib.md5()
    with open(filePathAbs,mode='rb') as fobj:
        while True:
            data = fobj.read(4096)
            if data:
                md5_tool.update(data)
            else:
                break
    return md5_tool.hexdigest()
if __name__ == '__main__':
    file_list = getFileName(r"E:/Project/Support/Day01/北京")
    for file in file_list:
        md5 = fileMD5(file)
        print(file,md5)

目录结构:
Python运维学习Day01-文件基本操作_第1张图片
运行结果:


In [23]: run fileManager.py
E:/Project/Support/Day01/北京/a - 副本 (2).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (3).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (4).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (5).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (6).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本.txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a.txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/b - 副本 (2) - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 (2).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 (3) - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 (3).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (2).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (3).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (4).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (5).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (6).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京\昌平/d - 副本 (2).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (3).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (4).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (5).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (6).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (7).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (8).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (9).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (2).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (3).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (4).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (5).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (6).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (7).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (8).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (9).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\海淀/c.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\海淀/e.txt d41d8cd98f00b204e9800998ecf8427e
In [24]:

你可能感兴趣的:(Python运维,python,学习,windows)