首先提问:
使用python搜取https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm的内容
得到代码如下,能直接使用:
import requests
from bs4 import BeautifulSoup
# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'
# 使用requests获取网页内容
response = requests.get(url)
# 检查请求是否成功
if response.status_code == 200:
# 使用BeautifulSoup解析网页内容
soup = BeautifulSoup(response.text, 'html.parser')
# 查找您感兴趣的内容(例如,标题、段落等)
title = soup.find('title').get_text()
paragraphs = soup.find_all('p')
# 打印内容
print(title)
for p in paragraphs:
print(p.get_text())
else:
print(f'Failed to retrieve the webpage. Status code: {response.status_code}')
不过会有乱码,告诉智谱清言需要使用utf-8编码,于是重新生成了代码:
import requests
from bs4 import BeautifulSoup
# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'
# 使用requests获取网页内容
response = requests.get(url)
# 检查请求是否成功
if response.status_code == 200:
# 检查响应的encoding属性
encoding = response.encoding
if encoding != 'utf-8':
# 如果encoding不是UTF-8,使用正确的编码解析网页内容
response.encoding = 'gbk' # 假设网页使用GBK编码
# 使用BeautifulSoup解析网页内容
soup = BeautifulSoup(response.text, 'html.parser')
# 查找您感兴趣的内容(例如,标题、段落等)
title = soup.find('title').get_text()
paragraphs = soup.find_all('p')
# 打印内容
print(title)
for p in paragraphs:
print(p.get_text())
else:
print(f'Failed to retrieve the webpage. Status code: {response.status_code}')
这下就完全搞定了!!!
我们需要存储在txt文件中,于是提问:
得到代码如下:
import requests
from bs4 import BeautifulSoup
# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'
# 使用requests获取网页内容
response = requests.get(url)
# 检查请求是否成功
if response.status_code == 200:
# 检查响应的encoding属性
encoding = response.encoding
if encoding != 'utf-8':
# 如果encoding不是UTF-8,使用正确的编码解析网页内容
response.encoding = 'gbk' # 假设网页使用GBK编码
# 使用BeautifulSoup解析网页内容
soup = BeautifulSoup(response.text, 'html.parser')
# 查找您感兴趣的内容(例如,标题、段落等)
title = soup.find('title').get_text()
paragraphs = soup.find_all('p')
# 准备写入的文本内容
text_content = f"{title}\n\n"
for p in paragraphs:
text_content += p.get_text() + '\n'
# 存储为.txt文件
file_path = 'output.txt'
with open(file_path, 'w', encoding='utf-8') as file:
file.write(text_content)
print(f'Content saved to {file_path}.')
else:
print(f'Failed to retrieve the webpage. Status code: {response.status_code}')
得到结果:
告诉大模型,你的代码满足我的要求,请详细分析,哪儿不懂点哪里!!!
比如: