爬小说_BeautifulSoup解析_easy

杨晓东

发布于：2021年10月2日

次浏览

小说_五术传人.txt

# 小说_五术传人.txt    针对  脚本  test  demo  

import requests
from bs4 import BeautifulSoup

def get_html():
    url = 'https://www.yeshuyuan.com/read/53887/17033995.html'
    req = requests.get(url)
    req.encoding = 'utf-8'
    html = req.text
    soup = BeautifulSoup(html, "html.parser")
    with open('./五术传人.txt', 'a', encoding='utf-8') as fp:
        # 全本710章 
        for i in range(712):
            # 找到标题对应标签节点
            table = soup.find("li", class_="active")
            # 获取节点内容-标题
            name = table.text
            print(name)
            # 找到内容对应标签节点and获取内容
            content = soup.find('div', class_="readcontent").text
            print(content)
            # 写入到txt
            fp.write('\n'+name+'\n')
            fp.write('\n')
            fp.write(content+'\n')
            print('已写入: ', name)
            urls = soup.find('a', id="linkNext").get('href')
            req = requests.get(urls)
            req.encoding = 'utf-8'
            html = req.text
            soup = BeautifulSoup(html, "html.parser")

if __name__ == '__main__':
    get_html()

杨晓东

发布于：2021年10月2日

更新于：2021年12月9日

嗯哼

次浏览

字数：219字

时长：1分钟

无聊

拉勾网职位爬取

拉勾网职位爬取file_rar 123456789101112131415161718192021222324252627282930313233343536373839404142434445...

笔下文学_爬小说

笔下文学_爬小说12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849...