个人技术分享

首先提问:

使用python搜取https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm的内容

 得到代码如下,能直接使用:

import requests
from bs4 import BeautifulSoup

# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找您感兴趣的内容(例如,标题、段落等)
    title = soup.find('title').get_text()
    paragraphs = soup.find_all('p')
    
    # 打印内容
    print(title)
    for p in paragraphs:
        print(p.get_text())
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

 不过会有乱码,告诉智谱清言需要使用utf-8编码,于是重新生成了代码:

import requests
from bs4 import BeautifulSoup

# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 检查响应的encoding属性
    encoding = response.encoding
    if encoding != 'utf-8':
        # 如果encoding不是UTF-8,使用正确的编码解析网页内容
        response.encoding = 'gbk'  # 假设网页使用GBK编码
    
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找您感兴趣的内容(例如,标题、段落等)
    title = soup.find('title').get_text()
    paragraphs = soup.find_all('p')
    
    # 打印内容
    print(title)
    for p in paragraphs:
        print(p.get_text())
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

这下就完全搞定了!!! 

我们需要存储在txt文件中,于是提问:

得到代码如下:

import requests
from bs4 import BeautifulSoup

# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 检查响应的encoding属性
    encoding = response.encoding
    if encoding != 'utf-8':
        # 如果encoding不是UTF-8,使用正确的编码解析网页内容
        response.encoding = 'gbk'  # 假设网页使用GBK编码
    
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找您感兴趣的内容(例如,标题、段落等)
    title = soup.find('title').get_text()
    paragraphs = soup.find_all('p')
    
    # 准备写入的文本内容
    text_content = f"{title}\n\n"
    for p in paragraphs:
        text_content += p.get_text() + '\n'
    
    # 存储为.txt文件
    file_path = 'output.txt'
    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(text_content)
    
    print(f'Content saved to {file_path}.')
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

 得到结果:

 告诉大模型,你的代码满足我的要求,请详细分析,哪儿不懂点哪里!!!

 比如: