发布时间:2025-03-10 09:30:26编辑:123阅读(1270)
asyncio 采用单线程事件循环(event loop)来运行异步任务。这使得它在执行大量并发网络连接时特别高效。 asyncio 定义了一个 async/await 语法糖,用于编写异步代码。这种方法比旧式的回调函数风格更直观、更易于维护。
asyncio 为Python的异步编程提供了核心的框架和工具,但随着需求的不断增长,开发者需要更多功能来构建异步服务。这促进了像 aiohttp 这样的异步网络库的出现,它建立在 asyncio 的基础上,提供了一个简单的异步HTTP客户端和服务器端的实现。
目标网站 https://spa5.scrape.center/
示例代码:
import asyncio
import aiohttp
import json
# 目标网站 https://spa5.scrape.center/
index_url = 'https://spa5.scrape.center/api/book/?limit=18&offset={offset}' # 列表页
detail_url = 'https://spa5.scrape.center/api/book/{id}' # 书详情
page_size = 18 # 每页显示数
concurrency = 10 # 并发数
page_number = 20 # 总page数 实际503
semaphore = asyncio.Semaphore(concurrency)
session = None
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"}
async def scrape_api(url):
async with semaphore:
try:
async with session.get(url=url, headers=headers) as response:
await asyncio.sleep(0.5)
return await response.json()
except aiohttp.ClientError:
print('error %S', url)
async def scrape_index(page):
"""
获取列表页url
:param page:
:return:
"""
url = index_url.format(offset=str(page_size * (page - 1)))
return await scrape_api(url)
async def scrape_detail(id):
"""
获取详情页信息
:param id:
:return:
"""
url = detail_url.format(id=id)
data = await scrape_api(url)
return data
async def main():
global session
session = aiohttp.ClientSession()
scrape_index_tasks = [asyncio.ensure_future(scrape_index(page)) for page in range(1, page_number + 1)]
results = await asyncio.gather(*scrape_index_tasks)
ids = []
for index_data in results:
if not index_data:
continue
for item in index_data.get('results'):
ids.append(item.get('id'))
scrape_detail_tasks = [asyncio.ensure_future(scrape_detail(i)) for i in ids]
ids_results = await asyncio.gather(*scrape_detail_tasks)
await session.close()
with open('data.json', 'a', encoding='utf-8') as file:
file.write(json.dumps(ids_results, ensure_ascii=False))
file.write("\n")
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(main())
上一篇: python爬取有道词典
51209
50625
41248
38062
32524
29431
28296
23152
23109
21450
1502°
2218°
1840°
1774°
2078°
1833°
2510°
4221°
4081°
2917°