模块:每个python文件都是一个独立的模块
模块作用:实际工作中,整个项目代码比较多,可以将相同功能代码放到一个文件中,不同功能代码放到不同文件中,使代码易于维护;
模块:引入命名空间与作用域
#导入整个模块
import 模块
#导如指定的属性
from 模块 import xxx
#导入多个属性
from 模块 import xxx, xxx
#导入后起别名
import 模块 as 别名
from 模块 import xxx as 别名1,xxx as 别名2
import os
from functools import reduce
import time as tm
from random import randint, randrange
from os.path import join as os_join
模块导入要点:
实例:
准备工作:在vscode一个文件中,创建两个文件:my_add.py, main_test.py,在mian_test.py中导入my_add,观察现象?
结果:my_add.py运行一次。
问题:实际工作中,每当编写一个模块,一般会有测试代码,如何使测试代码在导入中不执行?
# my_add.py
def func_add(x,y):
return x+y
print("test func_add(1,2)=%d"%func_add(1,2))
func_add(1,2)
test func_add(1,2)=3
3
# main_test.py
import my_add
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[3], line 2
1 # main_test.py
----> 2 import my_add
ModuleNotFoundError: No module named 'my_add'
查找过程:
具体可以查看sys.path的值:
import sys
sys.path
['d:\\study\\code\\jupyter\\PythonLearning',
'D:\\software\\py3.11\\python311.zip',
'D:\\software\\py3.11\\DLLs',
'D:\\software\\py3.11\\Lib',
'D:\\software\\py3.11',
'',
'C:\\Users\\26822\\AppData\\Roaming\\Python\\Python311\\site-packages',
'D:\\software\\py3.11\\Lib\\site-packages',
'D:\\software\\py3.11\\Lib\\site-packages\\win32',
'D:\\software\\py3.11\\Lib\\site-packages\\win32\\lib',
'D:\\software\\py3.11\\Lib\\site-packages\\Pythonwin']
__name__
说明:
__name__
值为__main__
__name__
值为模块名需求:当文件被执行时,执行测试代码,当文件作为模块被导入,不执行测试代码:
def func_add(x, y):
return x + y
#通过__name__的值,判断是否导入
if __name__ == "__main__":
print("test func_add(1, 2)=%d"%func_add(1,2))
func_add(1, 2)
主要内容:
包:是一个包含__init__.py文件的文件夹,
作用:更好的管理源码;
绝对导入:
import 模块
from 模块 import 属性
相对导入:在包内部进行导入,基本语法:
from .模块 import xxx
from ..模块 import xxx
import .模块
#注意:
#.代表当前目录
#..代表上一级目录
#...代表上上级目录,依次类推
注意点:
绝对导入:一个模块只能导入自身的子模块或和它的顶层模块同级别的模块及其子模块;
相对导入:一个模块必须有包结构且只能导入它的顶层模块内部的模块
错误:
# 语法错误
1a =10
Cell In[11], line 2
1a =10
^
SyntaxError: invalid decimal literal
# 语法错误
a = 10
b = 10
Cell In[12], line 3
b = 10
^
IndentationError: unexpected indent
异常:
作用:捕获指定的异常;
基本语法:
try:
try_suite
except Exception as e:
except_suite
Exception:指定捕获的异常类型,如果设置捕获异常与触发异常不一致,不能捕获;
捕获多种异常:
try:
try_suite
except Exception1 as e:
except_suite1
except Exception2 as e:
except_suite2
try:
print(abc)
except Exception as e:
print('error',e)
print("abc")
error name 'abc' is not defined
abc
try:
print(abc)
except ValueError as e:
print('ValueError:',e)
print("abc")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 2
1 try:
----> 2 print(abc)
3 except ValueError as e:
4 print('ValueError:',e)
NameError: name 'abc' is not defined
try:
print(abc)
except ValueError as e:
print('ValueError:',e)
except NameError as e:
print('NameError:',e)
print("abc")
NameError: name 'abc' is not defined
abc
try:
int("abc")
print(abc)
except ValueError as e:
print('ValueError:',e)
except NameError as e:
print('NameError:',e)
print("abc")
ValueError: invalid literal for int() with base 10: 'abc'
abc
作用:不管是否捕获异常,程序都会执行finally中的语句;
使用场景:释放资源等;
基本语法:
try:
try_suite
except Exception as e:
except_suite
finally:
pass
try:
print('test')
l = []
print(l[10])
except ValueError as e:
print('ValueError:',e)
except NameError as e:
print('NameError:',e)
finally:
print("go to finally")
test
go to finally
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[21], line 4
2 print('test')
3 l = []
----> 4 print(l[10])
5 except ValueError as e:
6 print('ValueError:',e)
IndexError: list index out of range
while True:
msg = input('输入:')
if msg == 'q':
break
try:
num = int(msg)
print(num)
except Exception as e:
print('erro:',e)
输入: abc
erro: invalid literal for int() with base 10: 'abc'
输入: 10
10
输入: q
raise与assert语句,用于主动产生异常;
例如:
raise语句:检查程序异常,主动抛出异常;
基本语法:
raise Exception(args)
raise NameError(‘value not define’)
raise ValueError('name error')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[29], line 1
----> 1 raise ValueError('name error')
ValueError: name error
assert语句:判断表达式结果是否为真,如果不为真,抛出AssertError异常;
基本语法:
assert expression [,args]
def my_add(x,y):
assert isinstance(x, int),"x must be int"
assert isinstance(y, int),"y must be int"
return x + y
my_add(1,2)
3
my_add(1, "2")
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[27], line 1
----> 1 my_add(1, "2")
Cell In[26], line 3, in my_add(x, y)
1 def my_add(x,y):
2 assert isinstance(x, int),"x must be int"
----> 3 assert isinstance(y, int),"y must be int"
4 return x + y
AssertionError: y must be int
自定义异常类注意点:
class Net404Error(Exception):
def __init__(self):
args = ("访问连接不存在", "404")
super().__init__(*args)
net_error_404 = Net404Error()
raise net_error_404
---------------------------------------------------------------------------
Net404Error Traceback (most recent call last)
Cell In[33], line 1
----> 1 raise net_error_404
Net404Error: ('访问连接不存在', '404')
with/as:操作上下文管理器(context manager),达到自动分配且释放资源目标;
基本语法:
with context as var:
with_suite
注意点:context对象必须支持上下文协议
使用场景:打开文件,忘记关闭;
文件操作:
fpath = r'D:\study\code\jupyter\DATA\csv_learn\2017_data.csv'
with open(fpath) as f:
pass
print("f closed:", f.closed)
f closed: True
上下文管理理解:
支持__enter__()
和__exit__()
方法
__enter__()
:进入上下文,设置as var,var接收该方法返回值
__exit__()
:退出上下文
class TestContext:
def __enter__(self):
print("call __enter__")
return self
def __exit__(self, exc_type, exc_val, exc_tb):
print("call __exit__")
with TestContext() as tc:
print(tc)
call __enter__
<__main__.TestContext object at 0x00000188DDEB31D0>
call __exit__
tc
<__main__.TestContext at 0x188ddeb31d0>
学习目标:掌握一种对文本处理的一种方式
正则表达式(Regular Expression):是用于描述一组字符串特征的模式,用来匹配特定的字符串。
应用场景:
#需求:匹配以数字开头的字符串,以match方法进行匹配
import re
s1 = '001_sun'
s2 = 'qimiao'
#\d表示匹配任意数字
ma = re.match(r'\d',s1)
print(ma)
ma = re.match(r'\d',s2)
print(ma)
None
import re
pattern = r'\d+'
string = '123abc456'
'''
re.match(pattern, string):
尝试从字符串开头匹配正则表达式,如果匹配成功,返回一个匹配对象;如果匹配失败,返回None。
'''
match = re.match(pattern, string)
print(match) # 打印出匹配对象
'''
re.search(pattern, string)
扫描整个字符串,找到第一个成功的匹配然后返回匹配对象。
'''
search = re.search(pattern, string)
print(search)
'''
re.findall(pattern, string)
找到字符串中所有匹配正则表达式的子串,返回结果列表。
'''
results = re.findall(pattern, string)
print(results)
['123', '456']
'''
re.split(pattern, string)
根据匹配进行分割字符串,返回分割后子串列表。
'''
results = re.split(pattern, string)
print(results)
['', 'abc', '']
'''
re.sub(pattern, repl, string)
使用repl替换字符串中匹配正则表达式的部分,返回替换后的字符串。
'''
new_string = re.sub(pattern, '*NUMBER*', string)
print(new_string)
*NUMBER*abc*NUMBER*
import re
s1 = "001_sun"
#\d表示匹配任意数字
ma = re.match(r'\d', s1)
print("ma:",ma)
#m.group() 匹配的字符串
print("ma.group:", ma.group())
#m.span() 匹配索引开始结束组成元组
print("ma.span:", ma.span())
#m.start()/m.end() 匹配开始和结束时的索引
print("ma.start:%d, ma.end:%d"%(ma.start(), ma.end()))
ma:
ma.group: 0
ma.span: (0, 1)
ma.start:0, ma.end:1
re.compile用于将字符串形式的正则表达式编译为Pattern对象,可以使用Pattern对象种方法完成匹配查找等操作;
应用场景:如果在循环中进行重复的操作,推荐先将正则表达式转成Pattern对象;
re_cmp = re.compile(r'\d')
ma = re_cmp.match("0123")
print(ma)
需求:
字符串以大写字母开头;
字符串以数字开头;
字符串以数字或者小写字母开头;
字符串第一个字符位数字,第二个字符为小写字符;
字符串以ABCDE中某个字符开头;
import re
s1 = "Python"
s2 = "15011345578"
s3 = "AB_test"
s4 = "test"
#字符串以大写字母开头
re.match(r'[A-Z]', s1)
#字符串以数字开头
re.match(r'\d', s2)
#字符串以数字或者小写字母开头
re.match(r'[0-9a-z]', s4)
s5 = "1aabc"
#字符串第一个字符位数字,第二个字符为小写字符
re.match('\d[a-z]', s5)
#字符串以ABCDE中某个字符开头
re.match(r'[ABCDE]', s3)
需求:
字符串开头以小写字符+数字或数字开头;
判断100以内的有效数字字符串;
有效的QQ号,长度6到15位;
#* 匹配前一个内容0次或者无限次
s0 = 'c'
s1 = "AAAc"
print(re.match(r'A*', s1))
print(re.match(r'A*', s0))
#+ 匹配前一个内容一次或者无限次
s2 = "AAc"
print(re.match(r'A+', s2))
print(re.match(r'A+', s0))
None
#? 匹配前一个内容一次或者0次
s3 = '1ab'
print(re.match(r'\d?', s3))
print(re.match(r'\d?', s0))
#*? 尽可能少匹配,最少0次
s4 = "AAC"
re.match(r'A*?', s4)
#+? 尽可能少匹配,最少1次
s4 = "AAC"
re.match(r'A+?', s4)
#{m,n} 匹配前一个内容m到n次
s5 = "123456abc"
re.match(r'\d{3,5}', s5)
s6 = "my age is 10cm"
ma = re.search(r'\d+', s6)
ma.group()
'10'
#字符串开头以小写字符+数字或数字开头
s7 = 'a1abc'
re.match(r'[a-z]?\d', s7)
#判断100以内的有效数字字符串;0-99
s8 = '10'
s8_1 = '0'
s8_2 = '100'
print(re.match(r'[1-9]?\d$', s8))
print(re.match(r'[1-9]?\d$', s8_1))
print(re.match(r'[1-9]?\d$', s8_2))
None
#有效的QQ号,长度6到15位
s9 = '123458888888'
re.match(r'\d{5,9}', s9)
需求:
匹配有效的邮箱,邮箱格式:邮箱名:由数字,字母,下划线组成,长度6~15,后缀:@xxx.com;
找到以t结尾的单词;
找到以t开头的单词;
s1 = 'AAAAc'
# $匹配以该格式为结尾
print(re.match(r'A+',s1))
print(re.match(r'A+$',s1))
print(re.match(r'A+c$',s1))
None
#匹配有效的邮箱,邮箱格式:邮箱名:由数字,字母,下划线组成,长度6~15,后缀:@xxx.com;
mail = '[email protected]'
re.match(r'[\da-zA-Z_]{6,15}@qq.com$', mail)
#找到以t结尾的单词;
s = "where what hat the this that thtot"
#\w 表示匹配字母、数字和下划线,等价于字符集:[A-Za-z0-9_]
#\b 表示匹配单词边界
print(re.findall(r'\w+?t',s))
print(re.findall(r'\w+?t\b',s))
re.findall(r't\w+?\b',s)
['what', 'hat', 'that', 'tht', 'ot']
['what', 'hat', 'that', 'thtot']
['the', 'this', 'that', 'thtot']
需求:
匹配100内的有效数字字符串(0~99);
给定字符串:“apple:8, pear:20, banana:10”,提取文本与数字;
提取html文本中所有的url;
文本开头与结尾为相同的数字;
#匹配100内的有效数字字符串(0~99);
snum = '100'
snum2 = '99'
# |匹配左右任意一个表达式
print(re.match(r'\d$|[1-9]\d$', snum))
print(re.match(r'\d$|[1-9]\d$', snum2))
None
items = ["01", "100", "10", "9", "99"]
re_cmp = re.compile(r"^\d$|[1-9]\d$")
item = "99"
for item in items:
ma = re_cmp.match(item)
print(ma)
None
None
#给定字符串:"apple:8, pear:20, banana:10",提取文本与数字;
s = "apple:8, pear:20, banana:10"
#()进行分组
print(re.findall(r'[a-z]+:\d+', s))
print(re.findall(r'([a-z]+):(\d+)', s))
dict(re.findall(r'([a-z]+):(\d+)', s))
['apple:8', 'pear:20', 'banana:10']
[('apple', '8'), ('pear', '20'), ('banana', '10')]
{'apple': '8', 'pear': '20', 'banana': '10'}
html = """"""
#.*? 表示匹配任意数量的任意字符,但是尽量少匹配,也就是非贪婪模式。这样可以避免匹配到多个双引号之间的内容。
re.findall(r'"(https:.*?)"', html)
['https://movie.douban.com/subject/6786002/',
'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1454261925.webp']
#文本开头与结尾为相同的数字;
text = '1021'
#\1 对分组1的引用
re.match(r'(\d).*?(\1)$', text)
#使用分组索引
texts = ['101', "2223", '1omyhat', '5abc6']
for text in texts:
print(re.match(r'(\d).*?(\1)', text))
None
None
#使用别名
text = "1234541"
ma = re.match(r'(?P.*).*?(?P=start)' , text)
ma.groupdict()
{'start': '1'}
split:按照规则对文本切分,返回列表;
需求:
import re
s = "When someone walk out your life, let them. They are just making more room for someone else better to walk in."
words = re.split(r'\W', s)
words = [word for word in words if word.strip()]
words
['When',
'someone',
'walk',
'out',
'your',
'life',
'let',
'them',
'They',
'are',
'just',
'making',
'more',
'room',
'for',
'someone',
'else',
'better',
'to',
'walk',
'in']
len(words)
21
s = "python/c\C++/Java/Php/Nodejs"
# \ 本身有转义的意思,要匹配\需要\\完成
re.split(r'[/\\]', s)
['python', 'c', 'C++', 'Java', 'Php', 'Nodejs']
函数原型:
re.sub(pattern, repl, string, count=0, flags=0)
主要参数:
pattern:匹配内容;
repl:替换值,字符串或者函数,若为函数,替换为函数的返回字符串;
string:替换字符串;
需求:
#s1 = "name:sun, pwd:123456, name:zhang,pwd:667788"
s1 = "name:sun, pwd:123456, name:zhang,pwd:667788"
re.sub(r'\d+', "****", s1)
'name:sun, pwd:****, name:zhang,pwd:****'
#给定绩效文本,大于等于6,替换为"A", 否则替换为"B";
def replace_ab(ma):
value = ma.group()
value = int(value)
if value >= 6:
return "A"
return "B"
s2 = "sun:5, li:10, zhao:7, gao:8, wang:5"
re.sub(r'\d+', replace_ab, s2)
'sun:B, li:A, zhao:A, gao:A, wang:B'
#给定多个运动员三次运动成绩,只保留最大值;
def replace_max(ma):
value = ma.group()
#print('value:',value)
values = value.split(',')
#print('values:',values)
values = [float(value) for value in values if value.strip()]
max_val = max(values)
return str(max_val)
s3 = "谷爱凌:9.8,9.7,9.6,高梨沙罗:9.88,9.6,9.7"
re.sub(r'[\d,\.]+', replace_max, s3)
'谷爱凌:9.8高梨沙罗:9.88'
内容
s = 'tushu '
re.match(r'<(.*?)>.+?\1>', s)
ma = re.match(r'<(?P.*?)>.+?(?P=tag)>' , s)
print(ma.groups())
print(ma.groupdict())
('li',)
{'tag': 'li'}
s = 'xxx '
re.match(r'<([\w]+)>.+\1>', s)
html = '.findall(r'src="(http.*?)"', html)
['https://ss0.bdstatic.com/=0.jpg']
s = 'that this,theme father/this teeth'
list = re.findall(r'\bth[a-zA-Z]*?\b', s)
print(f'{list},{len(list)}')
['that', 'this', 'theme', 'this'],4
info = 'apple:21, banana:8, pear:7'
result = re.findall(r'\d+', info)
result
['21', '8', '7']
info = 'Things turned out quite nicely after four years of hard work in college.With a GPA of 3.9,I currently rank the top 3% among 540 peers in my grade.'
len(re.split(r'\W', info))
33
scores = '90,100,66,77,33,80,27'
def replace_faild(ma):
values = ma.group()
v = int(values)
if v < 60:
return 'xx'
return values
re.sub(r'\d+', replace_faild, scores)
'90,100,66,77,xx,80,xx'
#规则:邮箱以字母开头,由下划线,数字,字母组成,长度8~13,并以@163.com结尾;
mail = '[email protected]'
mail_wrong = '[email protected]'
print(re.match(r'[a-zA-Z]\w{7,12}@163.com$',mail))
print(re.match(r'[a-zA-Z]\w{7,12}@163.com$',mail_wrong))
None
#统计th开头单词,不区分大小写
s = 'This that the who'
print(re.findall(r'th[a-zA-Z]*', s, flags=re.I))
print(re.findall(r'th[a-zA-Z]*', s))
['This', 'that', 'the']
['that', 'the']
#多行匹配,统计代码中函数数量
code = '''
def func1():
pass
Def func2():
pass
class t:
def func():
pass
'''
print(re.findall(r'^def ', code))
print(re.findall(r'^def ', code, flags=re.M))
print(re.findall(r'^def ', code, flags=re.M | re.I))
[]
['def ']
['def ', 'Def ']
操作流程:
import pymysql
#链接数据库
db = pymysql.connect(host = "localhost",user="root",password = "",database="test")
config = {
'user':'root', #用户名
'password':'', #密码
'host':'localhost', #mysql服务地址
'port':3306, #端口,默认3306
'database':'test' #数据库名字,test
}
db = pymysql.connect(**config)
#获取游标
cursor = db.cursor()
#查看表名
f = cursor.execute("show tables;")
#读取所有数据
data = cursor.fetchall()
#输出数据
for item in data:
print(item)
('user_info',)
#执行sql语句,插入一条数据
sql = 'insert into user_info (user_name, user_id, channel) values(%s,%s,%s)'
#插入一条数据
cursor.execute(sql, ('何同学', "10001", "B站"))
#插入多条数据
cursor.executemany(sql, [('张同学', "10002", "抖音"),('奇猫', "10003", "抖音")])
db.commit()
sql = 'select * from user_info'
cursor.execute(sql)
3
#读取所有数据
data = cursor.fetchall()
#打印数据
for item in data:
print(item)
('何同学', '10001', 'B站')
('张同学', '10002', '抖音')
('奇猫', '10003', '抖音')
cursor.close()
#关闭连接
db.close()
进程:程序运行的实例,执行的过程,它是系统调度与资源分配基本单元;
场景:
进程的ID:程序运行的唯一标识;
Python中获取进程ID方式:
os.getpid():获取当前进程ID
os.getppid():获取当前父进程ID
Python中进程相关模块:multiprocessing
import os
# 获取该进程id
os.getpid()
10588
# 获取父进程
os.getppid()
3712
#导入模块
import multiprocessing
import os
#定义子进程函数:
def func(*args, **kwargs):
print("subProcess pid:%d ppid:%d"%(os.getpid(), os.getppid()))
if __name__ == "__main__":
#创建进程对象
p = multiprocessing.Process(target=func)
#创建进程,并执行进程函数
p.start()
#等待子进程结束
p.join()
print("main process pid:%d"%os.getpid())
main process pid:10588
子进程是父进程的拷贝,子进程继承父进程的所有资源;
import multiprocessing
import os
import time
tmp = 10
def work():
global tmp
tmp = 100
print('work pid:', os.getpid(), os.getppid())
print("tmp in work:", tmp)
if __name__ == '__main__':
# 创建进程
p = multiprocessing.Process(target=work)
# 运行进程
p.start()
print("call main process pid:", os.getpid())
# 等待程序结束
p.join()
#tmp的输出值
print("tmp in main:", tmp)
call main process pid: 10588
tmp in main: 10
输出:
call main process pid: 15636
work pid: 5708 15636
tmp in work: 100
tmp in main: 10
使用场景:并行计算,某个函数执行时间过长,阻塞等;
一个例子:某函数,执行过程中休眠1秒,执行6次,使用单进程与多进程调用,对比耗时;
import multiprocessing
import os
import time
tmp = 10
def work():
print("call work")
time.sleep(1)
if __name__ == '__main__':
n = 6
plist = []
ts = time.time()
#if内使用多进程,else不使用多进程
if False:
for i in range(n):
p = multiprocessing.Process(target=work)
p.start()
plist.append(p)
for i in range(n):
p.join()
else:
for i in range(n):
work()
print("run time:%.2f"%(time.time() - ts))
call work
call work
call work
call work
call work
call work
run time:6.00
使用多进程:
call work
call work
call work
call work
call work
call work
run time:1.14
不使用多进程:
call work
call work
call work
call work
call work
call work
run time:6.01
常用方法:
import multiprocessing
import os
import time
from multiprocessing import Queue
def work(msgq):
while True:
msg = msgq.get()
if msg == "Q":
break
else:
print(f"pid:{os.getpid()} recv msg:{msg}")
if __name__ == '__main__':
msgq = Queue()
list_p = []
for i in range(1, 10):
p = multiprocessing.Process(target=work, args=(msgq,))
list_p.append(p)
p.start()
#发送不同的消息
for i in range(1, 10):
msgq.put("Test%d"%i)
#发出退出命令
for p in list_p:
msgq.put("Q")
#等待进程退出
for p in list_p:
p.join()
结果:
pid:15464 recv msg:Test1
pid:15464 recv msg:Test2
pid:7124 recv msg:Test3
pid:7124 recv msg:Test5
pid:7124 recv msg:Test6
pid:7124 recv msg:Test7
pid:7124 recv msg:Test8
pid:7124 recv msg:Test9
pid:15464 recv msg:Test4
import multiprocessing
import os
import time
from multiprocessing import Queue
def work(msgq):
while True:
msg = msgq.get()
time.sleep(0.5)
if msg == "Q":
break
else:
print(f"pid:{os.getpid()} recv msg:{msg}")
if __name__ == '__main__':
msgq = Queue()
list_p = []
for i in range(1, 10):
p = multiprocessing.Process(target=work, args=(msgq,))
list_p.append(p)
p.start()
#发送不同的消息
for i in range(1, 10):
msgq.put("Test%d"%i)
#发出退出命令
for p in list_p:
msgq.put("Q")
#等待进程退出
for p in list_p:
p.join()
运行结果:
pid:9776 recv msg:Test1
pid:11560 recv msg:Test2
pid:4024 recv msg:Test3
pid:10828 recv msg:Test4
pid:7696 recv msg:Test5
pid:8292 recv msg:Test6
pid:2528 recv msg:Test7
pid:10152 recv msg:Test8
pid:14476 recv msg:Test9
加入sleep延迟后,程序可以按序接收
进程池:创建一定数量的进程,供用户调用;
进程池类:
from multiprocessing import Pool
基本实现过程:
from multiprocessing import Pool
#创建进程池对象,指定进程数量3
pool = Pool(processes = 3)
#添加任务与参数
pool.apply_async(func, (msg, ))
#停止添加
pool.close()#停止添加
#等待所有任务结束
pool.join()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[167], line 5
3 pool = Pool(processes = 3)
4 #添加任务与参数
----> 5 pool.apply_async(func, (msg, ))
6 #停止添加
7 pool.close()#停止添加
NameError: name 'msg' is not defined
Python中的进程池是一种用于并行处理的高级工具,通常用于同时执行多个函数或任务。它允许您管理一组工作进程,从而更有效地利用多核处理器。Python标准库中的multiprocessing
模块提供了Pool
类,它是一个常用的进程池实现。
以下是有关Python进程池的详细介绍:
导入模块:
import multiprocessing
首先,导入multiprocessing
模块。
创建进程池:
pool = multiprocessing.Pool(processes=4)
使用multiprocessing.Pool
类创建一个进程池。在这里,我们创建了一个最大进程数为4的进程池,这意味着最多同时运行4个进程。
提交任务:
result = pool.apply_async(function, (arg1, arg2))
使用apply_async
方法将函数提交到进程池中执行。function
是要执行的函数,(arg1, arg2)
是函数的参数。此方法会返回一个AsyncResult
对象,可以用来获取函数的结果。
获取结果:
result.get()
使用get()
方法获取函数的结果。这个方法会阻塞主线程,直到进程池中的任务执行完毕并返回结果。
关闭进程池:
pool.close()
pool.join()
使用close()
方法关闭进程池,然后使用join()
方法等待所有任务完成。一旦进程池关闭,将不再接受新任务。
进程池示例:
下面是一个完整的进程池示例,演示如何使用进程池并行执行函数:
import multiprocessing
def square(x):
return x * x
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
inputs = [1, 2, 3, 4, 5]
results = [pool.apply_async(square, (x,)) for x in inputs]
pool.close()
pool.join()
for result in results:
print(result.get())
在此示例中,我们使用进程池并行计算了一组数字的平方。
进程池在多核处理器上执行多个任务时非常有用,因为它可以显著提高程序的性能。它简化了并行编程,并处理了底层的进程管理和调度,使得并行化变得更加容易。但请注意,使用进程池时要谨慎,确保不会创建过多的进程,以避免资源竞争和性能下降。
使用进程池统计文件数量:
from multiprocessing import Pool
import os
import time
from unittest import result
#统计文件行数
def countLine(fpath):
linenum = 0
if fpath.endswith('.py'):
with open(fpath, encoding="utf-8") as f:
linenum = len(f.readlines())
return linenum
def sacndir(fpath, pools):
result = []
# 获取指定目录下所有文件
for root, sundir, flist in os.walk(fpath):
if flist:
for fname in flist:
# 判断是否为.py
if fname.endswith('.py'):
# 拼接目录
path = os.path.join(root, fname)
#进程池添加任务
r = pools.apply_async(countLine, args=(path,))
#将结果保存到result中
result.append(r)
#计算统计结果
total= sum([r.get() for r in result])
return total
if __name__ == "__main__":
total = 0
nums = 20
src_dir = r'E:\vscode_dir\part_7\process\django'
start_time = time.time()
pools = Pool(processes=10)
for i in range(nums):
total += sacndir(src_dir, pools)
#停止添加任务
pools.close()
#等待程序结束
pools.join()
end_time = time.time()
#输出统计结果
print("run time:%.2f, code total nums:%d"%(end_time-start_time, total))
线程:系统进行运算调度的最小单元,线程依赖与进程;
多线程:在一个进程中,启动多线程并发执行任务,线程之间全局资源可以共享;
进程与线程区别:
Python中多线程限制
GIL(Global Interpreter Lock):实现CPython(Python解释器)时引入的一个概念,
GIL锁:实质是一个互斥锁(mutex);
GIL作用:防止多个线程同时去执行字节码,降低执行效率;
GIL问题:在多核CPU中,Python的多线程无法发挥其作用,降低任务执行效率;
import threading
#线程函数
def thread_func(*args, **kwargs):
print("in thread func")
def main():
#创建线程对象
t = threading.Thread(target=thread_func, args=())
#创建线程,启动线程函数
t.start()
print("in main func")
#等待线程结束
t.join()
if __name__ == "__main__":
main()
in thread func
in main func
# 多线程应用
import threading
import time
g_value = 1
#线程函数
def thread_func(*args, **kwargs):
global g_value
g_value += 1
#休眠1秒
time.sleep(1)
#获取线程ID
ident = threading.get_ident()
#获取当前线程
t = threading.current_thread()
#获取线程名称与ident
print("name:%s ident:%d"%(t.getName(), t.ident))
def main():
thread_num = 5
thread_list = []
#创建线程对象
for i in range(thread_num):
name = "thread_%d"%i
t = threading.Thread(name=name, target=thread_func, args=())
thread_list.append(t)
t.start()
#等待线程结束
for t in thread_list:
t.join()
if __name__ == "__main__":
main()
print("g_value:", g_value)
name:thread_0 ident:16832
name:thread_4 ident:16416
name:thread_2 ident:3236
name:thread_1 ident:17212
name:thread_3 ident:10840
g_value: 6
从输出结果中可以看到:
需求:
from threading import Thread
g_value = 10000
nums = 500000
def sub_func():
# 减1操作
global g_value
for i in range(nums):
g_value -= 1
def add_func():
# 加1操作
global g_value
for i in range(nums):
g_value += 1
if __name__ == "__main__":
# 创建线程对象
t = Thread(target=sub_func, name='test')
# 创建线程运行程序
t.start()
add_func()
# 等待线程执行完成
t.join()
print(f'g_value={g_value}')
g_value=10000