利用智谱清言使用python编写代码获取简单ecupl网站信息

2024-05-10 17:22
python, 前端, html, 开发语言
22人已看

首先提问：

使用python搜取https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm的内容

得到代码如下，能直接使用：

import requests
from bs4 import BeautifulSoup

# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找您感兴趣的内容（例如，标题、段落等）
    title = soup.find('title').get_text()
    paragraphs = soup.find_all('p')
    
    # 打印内容
    print(title)
    for p in paragraphs:
        print(p.get_text())
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

不过会有乱码，告诉智谱清言需要使用utf-8编码，于是重新生成了代码：

import requests
from bs4 import BeautifulSoup

# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 检查响应的encoding属性
    encoding = response.encoding
    if encoding != 'utf-8':
        # 如果encoding不是UTF-8，使用正确的编码解析网页内容
        response.encoding = 'gbk'  # 假设网页使用GBK编码
    
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找您感兴趣的内容（例如，标题、段落等）
    title = soup.find('title').get_text()
    paragraphs = soup.find_all('p')
    
    # 打印内容
    print(title)
    for p in paragraphs:
        print(p.get_text())
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

这下就完全搞定了！！！

我们需要存储在txt文件中，于是提问：

得到代码如下：

import requests
from bs4 import BeautifulSoup

# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 检查响应的encoding属性
    encoding = response.encoding
    if encoding != 'utf-8':
        # 如果encoding不是UTF-8，使用正确的编码解析网页内容
        response.encoding = 'gbk'  # 假设网页使用GBK编码
    
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找您感兴趣的内容（例如，标题、段落等）
    title = soup.find('title').get_text()
    paragraphs = soup.find_all('p')
    
    # 准备写入的文本内容
    text_content = f"{title}\n\n"
    for p in paragraphs:
        text_content += p.get_text() + '\n'
    
    # 存储为.txt文件
    file_path = 'output.txt'
    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(text_content)
    
    print(f'Content saved to {file_path}.')
else:
    print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

得到结果：

告诉大模型，你的代码满足我的要求，请详细分析，哪儿不懂点哪里！！！

比如：