《爬虫网络开发实战》

爬虫基础

URL&&URI

《爬虫网络开发实战》

请求方法：GET&&POST

《爬虫网络开发实战》

响应

《爬虫网络开发实战》

基本库的使用

urllib

《爬虫网络开发实战》

urlopen(传递参数data)

《爬虫网络开发实战》

urlopen(设置超时timeout)

《爬虫网络开发实战》

打开网站需要验证账号密码可以借助HTTPBasicAuthHandler完成

《爬虫网络开发实战》

代理IP,ProxyHandler

《爬虫网络开发实战》

Cookie

《爬虫网络开发实战》

解析连接urlparse

《爬虫网络开发实战》

url拼接使用urljoin

《爬虫网络开发实战》

urlencode可以把参数拼接进url

《爬虫网络开发实战》

当url传入的参数是中文是需要使用quote转换为URL的编码格式

《爬虫网络开发实战》

url.requests(urlopen)

《爬虫网络开发实战》

requests.post上传文件

《爬虫网络开发实战》

获取网站的cookie

《爬虫网络开发实战》

会话维持（session）

《爬虫网络开发实战》

verify设置SSL证书的检查与否

《爬虫网络开发实战》

HTTP代理&&socks协议代理

《爬虫网络开发实战》

requests身份认证（账号密码）

《爬虫网络开发实战》

通过etree模块修正HTML文本

《爬虫网络开发实战》

通过etree的XPath匹配节点

《爬虫网络开发实战》

from etree import etree匹配节点属性

《爬虫网络开发实战》

lxml--etree属性获取

《爬虫网络开发实战》

lxml--etree属性多值匹配

《爬虫网络开发实战》

lxml--etree多属性匹配

《爬虫网络开发实战》

按序选择

《爬虫网络开发实战》

节点轴选择

《爬虫网络开发实战》

BeautifulSoup

《爬虫网络开发实战》

节点选择器

《爬虫网络开发实战》

CSS选择器

《爬虫网络开发实战》

pyquery

《爬虫网络开发实战》

MySQL

《爬虫网络开发实战》

MongoDB

《爬虫网络开发实战》

Redis

《爬虫网络开发实战》

redis dump

《爬虫网络开发实战》

Ajax数据爬取

《爬虫网络开发实战》

Selenium

《爬虫网络开发实战》

selenium--expected_conditions


selenium.webdriver.support.expected_conditions（模块）
 
这两个条件类验证title，验证传入的参数title是否等于或包含于driver.title
title_is
title_contains
 
这两个人条件验证元素是否出现，传入的参数都是元组类型的locator，如(By.ID, 'kw')
顾名思义，一个只要一个符合条件的元素加载出来就通过；另一个必须所有符合条件的元素都加载出来才行
presence_of_element_located
presence_of_all_elements_located
 
这三个条件验证元素是否可见，前两个传入参数是元组类型的locator，第三个传入WebElement
第一个和第三个其实质是一样的
visibility_of_element_located
invisibility_of_element_located
visibility_of
 
这两个人条件判断某段文本是否出现在某元素中，一个判断元素的text，一个判断元素的value
text_to_be_present_in_element
text_to_be_present_in_element_value
 
这个条件判断frame是否可切入，可传入locator元组或者直接传入定位方式：id、name、index或WebElement
frame_to_be_available_and_switch_to_it
 
这个条件判断是否有alert出现
alert_is_present
 
这个条件判断元素是否可点击，传入locator
element_to_be_clickable
 
这四个条件判断元素是否被选中，第一个条件传入WebElement对象，第二个传入locator元组
第三个传入WebElement对象以及状态，相等返回True，否则返回False
第四个传入locator以及状态，相等返回True，否则返回False
element_to_be_selected
element_located_to_be_selected
element_selection_state_to_be
element_located_selection_state_to_be
 
最后一个条件判断一个元素是否仍在DOM中，传入WebElement对象，可以判断页面是否刷新了