Python多进程并发与同步机制超详细讲解
1. 什么是多进程并发
多进程并发指的是在同一时间内,有多个进程可以同时执行。在操作系统中,一个进程是一个独立的执行单元,有自己的内存空间和系统资源。多进程并发可以提高程序的执行效率和并发度。Python中的multiprocessing模块提供了多进程并发的功能。
2. multiprocessing模块的介绍
- multiprocessing模块可以在Python中轻松创建和管理多个进程
- multiprocessing中的Process类是Process实例的基础,可以用来创建和管理子进程
- multiprocessing还提供了相互间进行通信和同步的工具,如Queue、Pipe、Value、Lock等
3. multiprocessing模块的使用
3.1 创建子进程
- 使用Process类的构造函数来创建进程实例,并指定需要执行的函数
import multiprocessing
def worker():
print("子进程执行")
if __name__ == '__main__':
p = multiprocessing.Process(target=worker)
p.start()
- 继承Process类,并覆盖run()方法,来自定义进程
import multiprocessing
class Worker(multiprocessing.Process):
def run(self):
print("子进程执行")
if __name__ == '__main__':
p = Worker()
p.start()
3.2 进程间通信
- 使用Queue实现多个进程之间的信息共享
import multiprocessing
def producer(queue):
queue.put('a')
def consumer(queue):
print(queue.get())
if __name__ == '__main__':
queue = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(queue,))
p2 = multiprocessing.Process(target=consumer, args=(queue,))
p1.start()
p2.start()
p1.join()
p2.join()
- 使用Pipe实现两个进程之间的信息传递
import multiprocessing
def sender(conn):
conn.send('a')
def receiver(conn):
print(conn.recv())
if __name__ == '__main__':
parent_conn, child_conn = multiprocessing.Pipe()
p1 = multiprocessing.Process(target=sender, args=(child_conn,))
p2 = multiprocessing.Process(target=receiver, args=(parent_conn,))
p1.start()
p2.start()
p1.join()
p2.join()
3.3 进程间同步
使用Lock实现多个进程之间的同步
import multiprocessing
def worker(lock):
lock.acquire()
print('子进程执行')
lock.release()
if __name__ == '__main__':
lock = multiprocessing.Lock()
p = multiprocessing.Process(target=worker, args=(lock,))
p.start()
p.join()
4. 多进程并发示例
4.1 使用multiprocessing模块下载图片
import requests
import multiprocessing
def download(url):
filename = url.split('/')[-1]
response = requests.get(url)
with open(filename, 'wb') as f:
f.write(response.content)
if __name__ == '__main__':
urls = [
'https://picsum.photos/500/500',
'https://picsum.photos/600/600',
'https://picsum.photos/700/700',
'https://picsum.photos/800/800',
]
processes = []
for url in urls:
p = multiprocessing.Process(target=download, args=(url,))
p.start()
processes.append(p)
for p in processes:
p.join()
4.2 使用multiprocessing模块并发爬虫
import requests
import multiprocessing
from bs4 import BeautifulSoup
def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
links = [link.get('href', '') for link in soup.find_all('a')]
return links
def crawl_links(links):
for link in links:
response = requests.get(link)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.find('title').text
print(title)
if __name__ == '__main__':
urls = [
'https://www.baidu.com',
'https://www.bilibili.com',
'https://www.jd.com',
'https://www.douban.com',
]
pool = multiprocessing.Pool(processes=4)
for url in urls:
links = pool.apply(get_links, args=(url,))
pool.apply_async(crawl_links, args=(links,))
pool.close()
pool.join()
以上就是Python多进程并发与同步机制的超详细讲解,希望对大家有所帮助。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:Python多进程并发与同步机制超详细讲解 - Python技术站