python 3.x 爬虫基础---常用第三方库（requests，BeautifulSoup4，selenium，lxml ）

日期：2021-05-23 栏目：程序人生浏览：次

python 3.x 爬虫基础

python 3.x 爬虫基础---http headers详解

python 3.x 爬虫基础---Urllib详解

python 3.x 爬虫基础---常用第三方库

前言

其实前两章都是python内置的爬虫函数，大家都知道python有强大的第三方库，今天我们就来说一下requests，BeautifulSoup4，selenium，lxml ，顺便正则re也会在这篇文章中提及。

Requersts

参考文档：

python实现的简单易用的HTTP库（第三方库记得去导入）你不需要手动为 URL 添加查询字串，也不需要对 POST 数据进行表单编码。Keep-alive 和 HTTP 连接池的功能是 100% 自动化的，一切动力都来自于根植在 Requests 内部的 urllib3，使用起来比urllib简洁很多。上面的得文档有详细的介绍，所以如果想系统的学习就直接观看文档即可，我在这只是简单一下。

常见的操作属性

import requests response = requests.get('') print('文本形式的网页源码') print(response.text) print('二进制流形式打印') print(response.content) print('返回JSON格式，可能抛出异常') print(response.json) print('状态码') print(response.status_code) print('请求url') print(response.url) print('头信息') print(response.headers) print('cookie信息') print(response.cookies)

转载注明出处：https://www.heiqu.com/wpgsdf.html

python 3.x 爬虫基础---常用第三方库（requests，BeautifulSoup4，selenium，lxml ）

相关推荐