线程智能，进程硬核：掌握Python并行编程手册

Python经常被指责"太慢"。虽然Python在原始计算方面确实不如C或Rust快，但使用正确的技术，你可以显著加速你的Python代码——特别是当你处理I/O密集型工作负载时。

在本文中，我们将深入探讨：

何时以及如何在Python中使用threading

它与multiprocessing的区别

如何识别I/O绑定和CPU绑定工作负载

可以提升应用性能的实用示例

让我们开始穿针引线。

🧠 理解I/O绑定 vs CPU绑定

在选择线程还是多进程之前，你必须理解你要优化的任务类型：

类型

描述

示例

最佳工具

I/O绑定

大部分时间在等待外部资源

网络爬虫、文件下载

threading, asyncio

CPU绑定

大部分时间在执行重计算

图像处理、机器学习推理

multiprocessing

💡 经验法则：

如果你的程序因为等待而变慢，使用线程。

如果它因为计算而变慢，使用进程。

🧵 在Python中使用线程

Python的全局解释器锁（GIL）限制了CPU绑定线程的真正并行性，但对于I/O绑定任务，threading可以带来巨大的速度提升。

示例：I/O绑定任务的线程化

import threading

import requests

import time

urls = [

'https://example.com',

'https://httpbin.org/delay/2',

'https://httpbin.org/get'

]

def fetch(url):

print(f"正在获取 {url}")

response = requests.get(url)

print(f"完成: {url} - 状态码 {response.status_code}")

start = time.time()

threads = []

for url in urls:

t = threading.Thread(target=fetch, args=(url,))

threads.append(t)

t.start()

for t in threads:

t.join()

print(f"总时间: {time.time() - start:.2f} 秒")

🕒 没有线程，这将需要约6秒（每个请求2秒）。

使用线程，它在约2秒内运行，显示出真正的加速。

💡 线程注意事项

线程共享内存 → 可能出现竞态条件。

使用threading.Lock()避免共享资源冲突。

适合I/O，但对CPU密集型工作无效。

🧮 CPU绑定任务的多进程

对于CPU密集型工作负载，GIL成为瓶颈。这就是multiprocessing模块发挥作用的地方。它生成单独的进程，每个进程都有自己的Python解释器。

示例：使用多进程的CPU绑定任务

from multiprocessing import Process, cpu_count

import math

import time

def compute():

print(f"进程启动")

for _ in range(10**6):

math.sqrt(12345.6789)

if __name__ == "__main__":

start = time.time()

processes = []

for _ in range(cpu_count()):

p = Process(target=compute)

processes.append(p)

p.start()

for p in processes:

p.join()

print(f"总时间: {time.time() - start:.2f} 秒")

在这里，我们将工作分配到所有可用的CPU核心——对计算密集型任务来说是一个巨大的提升。

🔍 如何判断任务是CPU绑定还是I/O绑定

使用分析工具或观察：

视觉检查

等待API调用、文件读取 → I/O绑定

数学循环、数据处理 → CPU绑定

使用分析工具

pip install line_profiler

kernprof -l script.py

python -m line_profiler script.py.lprof

或使用cProfile：

python -m cProfile myscript.py

检查时间花在哪里：I/O调用还是计算。

🧰 奖励：concurrent.futures用于清洁代码

不要手动管理线程或进程，使用：

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

I/O的ThreadPool：

with ThreadPoolExecutor(max_workers=5) as executor:

executor.map(fetch, urls)

CPU的ProcessPool：

with ProcessPoolExecutor() as executor:

executor.map(compute, range(cpu_count()))

✅ 最终思考

Python本身并不慢——它只是需要正确的工具。

任务类型 | 使用这个

---------|----------

I/O绑定 | threading, asyncio, ThreadPoolExecutor

CPU绑定 | multiprocessing, ProcessPoolExecutor

从小开始，分析你的代码，选择正确的并行化策略。你的应用——和你的用户——会感谢你。

实际应用示例

让我们看一些更实际的例子来展示这些概念：

示例1：并行文件处理

import os

import threading

from concurrent.futures import ThreadPoolExecutor

import time

def process_file(filename):

"""模拟文件处理"""

print(f"处理文件: {filename}")

time.sleep(1) # 模拟I/O操作

return f"已处理 {filename}"

def process_files_sequential(files):

"""顺序处理文件"""

results = []

for file in files:

results.append(process_file(file))

return results

def process_files_parallel(files):

"""并行处理文件"""

with ThreadPoolExecutor(max_workers=4) as executor:

results = list(executor.map(process_file, files))

return results

# 测试

files = [f"file_{i}.txt" for i in range(10)]

# 顺序处理

start = time.time()

process_files_sequential(files)

sequential_time = time.time() - start

# 并行处理

start = time.time()

process_files_parallel(files)

parallel_time = time.time() - start

print(f"顺序处理时间: {sequential_time:.2f}秒")

print(f"并行处理时间: {parallel_time:.2f}秒")

print(f"加速比: {sequential_time/parallel_time:.2f}x")

示例2：CPU密集型计算

from multiprocessing import Pool, cpu_count

import time

def heavy_computation(n):

"""模拟CPU密集型计算"""

result = 0

for i in range(n):

result += i ** 2

return result

def compute_sequential(numbers):

"""顺序计算"""

results = []

for num in numbers:

results.append(heavy_computation(num))

return results

def compute_parallel(numbers):

"""并行计算"""

with Pool(processes=cpu_count()) as pool:

results = pool.map(heavy_computation, numbers)

return results

# 测试

numbers = [1000000] * 8 # 8个相同的计算任务

# 顺序计算

start = time.time()

compute_sequential(numbers)

sequential_time = time.time() - start

# 并行计算

start = time.time()

compute_parallel(numbers)

parallel_time = time.time() - start

print(f"顺序计算时间: {sequential_time:.2f}秒")

print(f"并行计算时间: {parallel_time:.2f}秒")

print(f"加速比: {sequential_time/parallel_time:.2f}x")

最佳实践和注意事项

1. 选择合适的并行化策略

import asyncio

import aiohttp

import time

# 对于I/O密集型任务，asyncio可能是更好的选择

async def fetch_url(session, url):

async with session.get(url) as response:

return await response.text()

async def fetch_all_urls(urls):

async with aiohttp.ClientSession() as session:

tasks = [fetch_url(session, url) for url in urls]

return await asyncio.gather(*tasks)

# 使用示例

urls = ['https://httpbin.org/delay/1'] * 10

start = time.time()

asyncio.run(fetch_all_urls(urls))

print(f"异步获取时间: {time.time() - start:.2f}秒")

2. 避免常见的陷阱

import threading

import time

# 错误示例：共享状态没有保护

counter = 0

def increment_bad():

global counter

for _ in range(1000):

counter += 1

# 正确示例：使用锁保护共享状态

counter = 0

lock = threading.Lock()

def increment_good():

global counter

for _ in range(1000):

with lock:

counter += 1

# 测试

threads = []

for _ in range(10):

t = threading.Thread(target=increment_good)

threads.append(t)

t.start()

for t in threads:

t.join()

print(f"最终计数: {counter}")

3. 性能监控

import cProfile

import pstats

import io

def profile_function(func, *args, **kwargs):

"""分析函数性能"""

pr = cProfile.Profile()

pr.enable()

result = func(*args, **kwargs)

pr.disable()

s = io.StringIO()

ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')

ps.print_stats()

print(s.getvalue())

return result

# 使用示例

def example_function():

time.sleep(1)

return "完成"

profile_function(example_function)

总结

Python的并行编程并不复杂，但需要理解正确的工具和时机：

I/O绑定任务：使用threading或asyncio

CPU绑定任务：使用multiprocessing

简单场景：使用concurrent.futures

复杂场景：考虑asyncio或专门的并行库

记住，过早优化是万恶之源。首先确保你的代码是正确的，然后测量性能瓶颈，最后应用适当的并行化策略。

通过掌握这些技术，你可以显著提升Python应用的性能，特别是在处理I/O密集型或CPU密集型任务时。

本文翻译自DEV Community上的原创文章，旨在帮助中文开发者掌握Python并行编程技术。