重庆分公司,新征程启航
为企业提供网站建设、域名注册、服务器等服务
这篇文章主要介绍了Python怎么爬取漫画图片,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。
在成都网站设计、成都做网站中从网站色彩、结构布局、栏目设置、关键词群组等细微处着手,突出企业的产品/服务/品牌,帮助企业锁定精准用户,提高在线咨询和转化,使成都网站营销成为有效果、有回报的无锡营销推广。创新互联专业成都网站建设10多年了,客户满意度97.8%,欢迎成都创新互联客户联系。
开发环境:
Python 3.6
Pycharm
目标地址
https://www.dmzj.com/info/yaoshenji.html
代码
导入工具
import requests import os import re from bs4 import BeautifulSoup from contextlib import closing from tqdm import tqdm import time
获取动漫章节链接和章节名
r = requests.get(url=target_url) bs = BeautifulSoup(r.text, 'lxml') list_con_li = bs.find('ul', class_="list_con_li") cartoon_list = list_con_li.find_all('a') chapter_names = [] chapter_urls = [] for cartoon in cartoon_list: href = cartoon.get('href') name = cartoon.text chapter_names.insert(0, name) chapter_urls.insert(0, href) print(chapter_urls)
下载漫画
for i, url in enumerate(tqdm(chapter_urls)): print(i,url) download_header = { 'Referer':url, 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36' } name = chapter_names[i] # 去掉. while '.' in name: name = name.replace('.', '') chapter_save_dir = os.path.join(save_dir, name) if name not in os.listdir(save_dir): os.mkdir(chapter_save_dir) r = requests.get(url=url) html = BeautifulSoup(r.text, 'lxml') script_info = html.script pics = re.findall('\d{13,14}', str(script_info)) for j, pic in enumerate(pics): if len(pic) == 13: pics[j] = pic + '0' pics = sorted(pics, key=lambda x: int(x)) chapterpic_hou = re.findall('\|(\d{5})\|', str(script_info))[0] chapterpic_qian = re.findall('\|(\d{4})\|', str(script_info))[0] for idx, pic in enumerate(pics): if pic[-1] == '0': url = 'https://images.dmzj.com/img/chapterpic/' + chapterpic_qian + '/' + chapterpic_hou + '/' + pic[ :-1] + '.jpg' else: url = 'https://images.dmzj.com/img/chapterpic/' + chapterpic_qian + '/' + chapterpic_hou + '/' + pic + '.jpg' pic_name = '%03d.jpg' % (idx + 1) pic_save_path = os.path.join(chapter_save_dir, pic_name) print(url) response = requests.get(url,headers=download_header) # with closing(requests.get(url, headers=download_header, stream=True)) as response: # chunk_size = 1024 # content_size = int(response.headers['content-length']) print(response) if response.status_code == 200: with open(pic_save_path, "wb") as file: # for data in response.iter_content(chunk_size=chunk_size): file.write(response.content) else: print('链接异常') time.sleep(2)
创建保存目录
save_dir = '妖神记' if save_dir not in os.listdir('./'): os.mkdir(save_dir) target_url = "https://www.dmzj.com/info/yaoshenji.html"
感谢你能够认真阅读完这篇文章,希望小编分享的“Python怎么爬取漫画图片”这篇文章对大家有帮助,同时也希望大家多多支持创新互联,关注创新互联行业资讯频道,更多相关知识等着你来学习!