Python之Requests 庫學習筆記

者：蔣蜀黍，Python愛好者社區專欄作者
網址：https://mp.weixin.qq.com/s/tfWsiy_LxQSJKUAvB49U0g

1、概覽

1.1、實例引入

# 引入Requests庫
import requests
# 發起GET請求
response=requests.get('https://www.baidu.com/')
# 查看響應類型 requests.models.Response
print(type(response))
# 輸出狀態碼
print(response.status_code)
# 輸出響應內容類型 text
print(type(response.text))
# 輸出響應內容
print(response.text)
# 輸出cookies
print(response.cookies)

1.2、各種請求方式

import requests
# 發起POST請求
requests.post('http://httpbin.org/post')
# 發起PUT請求
requests.put('http://httpbin.org/put')
# 發起DELETE請求
requests.delete('http://httpbin.org/delete')
# 發送HEAD請求
requests.head('http://httpbin.org/get')
# 發送OPTION請求
requests.options('http://httpbin.org/get')

2、請求

2.1 、基本GET請求

2.1.1、基本寫法

import requests
response=requests.get('http://httpbin.org/get')
print(response.text)

2.1.2、帶參數的GET請求

import requests
response=requests.get('http://httpbin.org/get?name=jyx&age=18')
print(response.text)

2.1.3、帶參數的GET請求(2)

import requests
# 分裝GET請求參數
param={'name':'jyx','age':19}
# 設置GET請求參數(Params)
response=requests.get('http://httpbin.org/get',params=param)
print(response.text)

2.1.4、解析json

import requests
response=requests.get('http://httpbin.org/get')
# 獲取響應內容
print(type(response.text))
# 如果響應內容是json,就將其轉為json
print(response.json())
# 輸出的是字典類型
print(type(response.json()))

2.1.5、獲取二進制數據

import requests
response=requests.get('http://github.com/favicon.ico')
# str，bytes
print(type(response.text),type(response.content))
# 輸出響應的文本內容
print(response.text)
# 輸出響應的二進制內容
print(response.content)
# 下載二進制數據到本地
with open('favicon.ico','wb') as f:
 f.write(response.content)
 f.close()

2.1.6、添加headers

import requests
# 設置User-Agent瀏覽器信息
headers={
 "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
# 設置請求頭信息
response=requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.text)

2.2、基本POST請求

import requests
# 設置傳入post表單信息
data={ 'name':'jyx', 'age':18}
# 設置請求頭信息
headers={
 "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
# 設置請求頭信息和POST請求參數(data)
response=requests.post('http://httpbin.org/post', data=data, headers=headers)
print(response.text)

3、響應

3.1 response屬性

import requests
response=requests.get('http://www.jianshu.com/')
# 獲取響應狀態碼
print(type(response.status_code),response.status_code)
# 獲取響應頭信息
 print(type(response.headers),response.headers)
# 獲取響應頭中的cookies
print(type(response.cookies),response.cookies)
# 獲取訪問的url
 print(type(response.url),response.url)
# 獲取訪問的歷史記錄
 print(type(response.history),response.history)

3.2、狀態碼判斷

import requests
response=requests.get('http://www.jianshu.com/404.html')
# 使用request內置的字母判斷狀態碼
if not response.status_code==requests.codes.ok:
 print('404-1')
response=requests.get('http://www.jianshu.com')
# 使用狀態碼數字判斷
if not response.status_code==200:
 print('404-2')

3.3 requests內置的狀態字符

100: ('continue',), 101: ('switching_protocols',), 102: ('processing',), 103: ('checkpoint',), 122: ('uri_too_long', 'request_uri_too_long'), 200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\o/', '?'), 201: ('created',), 202: ('accepted',), 203: ('non_authoritative_info', 'non_authoritative_information'), 204: ('no_content',), 205: ('reset_content', 'reset'), 206: ('partial_content', 'partial'), 207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'), 208: ('already_reported',), 226: ('im_used',), # Redirection. 300: ('multiple_choices',), 301: ('moved_permanently', 'moved', '\'), 302: ('found',), 303: ('see_other', 'other'), 304: ('not_modified',), 305: ('use_proxy',), 306: ('switch_proxy',), 307: ('temporary_redirect', 'temporary_moved', 'temporary'), 308: ('permanent_redirect', 'resume_incomplete', 'resume',), # These 2 to be removed in 3.0 # Client Error. 400: ('bad_request', 'bad'), 401: ('unauthorized',), 402: ('payment_required', 'payment'), 403: ('forbidden',), 404: ('not_found', '-'), 405: ('method_not_allowed', 'not_allowed'), 406: ('not_acceptable',), 407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'), 408: ('request_timeout', 'timeout'), 409: ('conflict',), 410: ('gone',), 411: ('length_required',), 412: ('precondition_failed', 'precondition'), 413: ('request_entity_too_large',), 414: ('request_uri_too_large',), 415: ('unsupported_media_type', 'unsupported_media', 'media_type'), 416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'), 417: ('expectation_failed',), 418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'), 421: ('misdirected_request',), 422: ('unprocessable_entity', 'unprocessable'), 423: ('locked',), 424: ('failed_dependency', 'dependency'), 425: ('unordered_collection', 'unordered'), 426: ('upgrade_required', 'upgrade'), 428: ('precondition_required', 'precondition'), 429: ('too_many_requests', 'too_many'), 431: ('header_fields_too_large', 'fields_too_large'), 444: ('no_response', 'none'), 449: ('retry_with', 'retry'), 450: ('blocked_by_windows_parental_controls', 'parental_controls'), 451: ('unavailable_for_legal_reasons', 'legal_reasons'), 499: ('client_closed_request',), # Server Error. 500: ('internal_server_error', 'server_error', '/o\', '?'), 501: ('not_implemented',), 502: ('bad_gateway',), 503: ('service_unavailable', 'unavailable'), 504: ('gateway_timeout',), 505: ('http_version_not_supported', 'http_version'), 506: ('variant_also_negotiates',), 507: ('insufficient_storage',), 509: ('bandwidth_limit_exceeded', 'bandwidth'), 510: ('not_extended',), 511: ('network_authentication_required', 'network_auth', 'network_authentication'),

4、高級操作

4.1、文件上傳

import requests
files={'file':open('favicon.ico','rb')}
# 往POST請求頭中設置文件(files)
response=requests.post('http://httpbin.org/post',files=files)
print(response.text)

4.2、獲取cookies

import requests
response=requests.get('https://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
 print(key,'=====',value)

4.3、會話維持

4.3.1、普通請求

import requests
requests.get('http://httpbin.org/cookies/set/number/12456')
response=requests.get('http://httpbin.org/cookies')
# 本質上是兩次不同的請求，session不一致
print(response.text)

4.3.2、會話維持請求

import requests
# 從Requests中獲取session
session=requests.session()
# 使用seesion去請求保證了請求是同一個session
session.get('http://httpbin.org/cookies/set/number/12456')
response=session.get('http://httpbin.org/cookies')
print(response.text)

4.4、證書驗證

4.4.1、無證書訪問

import requests
response=requests.get('https://www.12306.cn')
# 在請求https時，request會進行證書的驗證，如果驗證失敗則會拋出異常
print(response.status_code)

4.4.2、關閉證書驗證

import requests
# 關閉驗證，但是仍然會報出證書警告
response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

4.4.3、消除關閉證書驗證的警告

from requests.packages import urllib3
import requests
# 關閉警告
urllib3.disable_warnings()
response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

4.4.4、手動設置證書

import requests
# 設置本地證書
response=requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key'))
print(response.status_code)

4.5、代理設置

4.5.1、設置普通代理

import requests
proxies={
 "http": "http://127.0.0.1:9743",
 "https": "https://127.0.0.1:9743",
}
# 往請求中設置代理(proxies
)
response=requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

4.5.2、設置帶有用戶名和密碼的代理

import requests
proxies={
 "http": "http://user:password@127.0.0.1:9743/",
}
response=requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

4.5.3、設置socks代理

pip3 install 'requests[socks]

import requests
proxies={
 'http': 'socks5://127.0.0.1:9742',
 'https': 'socks5://127.0.0.1:9742'
}
response=requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)

4.6、超時設置

import requests
from requests.exceptions import ReadTimeout
 
try:
 # 設置必須在500ms內收到響應，不然或拋出ReadTimeout異常
 response=requests.get("http://httpbin.org/get", timeout=0.5)
 print(response.status_code)
except ReadTimeout:
 print('Timeout')

4.7、認證設置

import requests
from requests.auth import HTTPBasicAuth
r=requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123'))
# r=requests.get('http://120.27.34.24:9001', auth=('user', '123'))
print(r.status_code)

4.8、異常處理

境準備：

事先安裝好，pycharm
打開File——>Settings——>Projext——>Project Interpriter

點擊加號（圖中紅圈的地方）

點擊紅圈中的按鈕

選中第一條，點擊鉛筆，將原來的鏈接替換為（這里已經替換過了）：
https://pypi.tuna.tsinghua.edu.cn/simple/
點擊OK后，輸入requests-html然后回車
選中requests-html后點擊Install Package

等待安裝成功，關閉

通過解析網頁源代碼

實例內容：
從某博主的所有文章爬取想要的內容。
實例背景：
從（https://me.csdn.net/weixin_44286745）博主的所有文章獲取各文章的標題，時間，閱讀量。

導入requests_html中HTMLSession方法，并創建其對象

from requests_html import HTMLSession
session=HTMLSession()

123

使用get請求獲取要爬的網站,得到該網頁的源代碼。

html=session.get("https://me.csdn.net/weixin_44286745").html

12

找到所有文章

  allBlog=html.xpath("//dl[@class='tab_page_list']") 
1

進入網站主頁（本例： https://me.csdn.net/weixin_44286745）
文章空白處右鍵檢查可以定位到這文章的標簽

其他文章一樣操作，然后找到所有文章共同的標記（這里所有文章的class都是‘my_tab_page_con’）
xpath 可以遍歷html的各個標簽和屬性，來定位到我們需要的信息的位置，并提取。
網頁分析獲取標題，閱讀量，日期。

for i in allBlog:
    title=i.xpath("dl/dt/h3/a")[0].text
    views=i.xpath("//div[@class='tab_page_b_l fl']")[0].text
    date=i.xpath("//div[@class='tab_page_b_r fr']")[0].text
    print(title +' ' +views +' ' + date )
12345

網頁分析：

因為有多篇文章，分別獲取使用for循環，上述代碼已得到所有文章所以i表示一篇文章
第二行代碼獲取文章標題，于獲取文章類似，鼠標放到標題上右鍵檢查，因為文章只有一個標題所以用絕對路徑也可以按標簽一層層進到標題位置。

xpath返回的是列表，我們要第一個所以要加下標（列表里也只有一個元素），要輸出的是文本，所以,text獲取文本。
閱讀量和時間也是重復的操作

可以用相對路徑也可以用絕對路徑，一般都是用相對路徑，格式仿照代碼。
第五行代碼，每得到一篇文章的信息就輸出，遍歷完就可以獲得全部的信息。

完整代碼：

from requests_html import HTMLSession
session=HTMLSession()


html=session.get("https://me.csdn.net/weixin_44286745").html

allBlog=html.xpath("//dl[@class='tab_page_list']")

for i in allBlog:
    title=i.xpath("dl/dt/h3/a")[0].text
    views=i.xpath("//div[@class='tab_page_b_l fl']")[0].text
    date=i.xpath("//div[@class='tab_page_b_r fr']")[0].text
    print(title +' ' +views +' ' + date )

1234567891011121314

可以自己爬其他東西，如文章圖片，動手試試吧！！！
未完待續

通過html請求

自動化

喜歡編程的小伙伴可以加一下小編的Q群867067945大家一起交流學習，群里也有專業的大神給你解答難題

本文的文字及圖片來源于網絡加上自己的想法,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯系我們以作處理。

言：

很多時候我們部署應用會發現點擊其他頁面總要重新登錄，這種一般是會話問題，系統訪問的時候總是無法保持會話，本文主要是通過配置tomcat集群來實現session共享！

1、配置tomcat8080和tomcat9090端口

修改tomcat9090配置文件server.xml

<Server port="9005" shutdown="SHUTDOWN">
 <Connector port="9090" protocol="HTTP/1.1"
 connectionTimeout="20000"
 redirectPort="8443" />
 <Connector port="9009" protocol="AJP/1.3" redirectPort="8443" />

2、設置tomcat集群

修改tomcat的配置文件，打開conf下的server.xml文件，找到下面這一行

<Engine name="Catalina" defaultHost="localhost">

不需要做任何修改，在這一行的下面加入如下代碼：

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" 
 channelSendOptions="8"> 
 
 <Manager className="org.apache.catalina.ha.session.DeltaManager" 
 expireSessionsOnShutdown="false" 
 notifyListenersOnReplication="true"/> 
 
 <Channel className="org.apache.catalina.tribes.group.GroupChannel"> 
 <Membership className="org.apache.catalina.tribes.membership.McastService" 
 address="228.0.0.4" 
 port="45564" 
 frequency="500" 
 dropTime="3000"/> 
 <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" 
 address="auto" 
 port="4000" 
 autoBind="100" 
 selectorTimeout="5000" 
 maxThreads="6"/> 
 
 <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> 
 <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> 
 </Sender> 
 <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> 
 <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> 
 </Channel> 
 
 <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" 
 filter=""/> 
 <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/> 
 
 <Deployer className="org.apache.catalina.ha.deploy.FarmWarDeployer" 
 tempDir="/tmp/war-temp/" 
 deployDir="/tmp/war-deploy/" 
 watchDir="/tmp/war-listen/" 
 watchEnabled="false"/> 
 
 <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> 
 <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> 
 </Cluster>

這個就是tomcat自帶的集群配置了，在tomcat官方文檔中的cluster-howto.html中看到相關注意事項，其中有一條需要注意一下：Make sure your web.xml has the <distributable/> element

很明顯是說web項目的web.xml文件中需要有<distributable/>這個元素，所以在引入的web項目中做修改。

3、修改項目

在\tomcat\webapps\下新建testcluster文件夾，testcluster下新建index.jsp獲取SessionID

<html>
<head>
<title>title</title>
<meta http-equiv="Content-Type"content="text/html; charset=gb2312"/> 
</head>
<body>
?
 SessionID:<%=session.getId()%> 
 <BR> 
 SessionIP:<%=request.getServerName()%> 
 <BR> 
 SessionPort:<%=request.getServerPort()%> 
 <% 
 out.println("This is Tomcat Server 8080"); 
 %> 
?
</body>
</html>

testcluster下建立WEB-INF文件夾，在WEB-INF下新建web.xml指向index.jsp和添加<distributable/>元素

<?xml version="1.0" encoding="UTF-8"?>
<!-- PublicCMS使用Servlet3.0技術，Web.xml不再是整個工程的入口，config.initializer.*Initializer為工程的入口類，config.*Config為Spring配置 -->
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="http://java.sun.com/xml/ns/javaee"
 xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
 id="WebApp_ID" version="3.0">
 <display-name>elearning</display-name>
 
 <distributable/>
?
 <welcome-file-list>
 <welcome-file>index.jsp</welcome-file>
 </welcome-file-list>
</web-app>

注意：由于\conf\content.xml配置了web.xml指向是要放在WEB-INF下，所以web.xml需要放在WEB-INF里面

D:\tomcat集群\tomcat8080\conf\content.xml

<Context>
?
 <!-- Default set of monitored resources -->
 <WatchedResource>WEB-INF/web.xml</WatchedResource>
?
 <!-- Uncomment this to disable session persistence across Tomcat restarts -->
 <!--
 <Manager pathname="" />
 -->
?
 <!-- Uncomment this to enable Comet connection tacking (provides events
 on session expiration as well as webapp lifecycle) -->
 <!--
 <Valve className="org.apache.catalina.valves.CometConnectionManagerValve" />
 -->
?
</Context>

4、啟動tomcat并測試

啟動tomcat：

D:\tomcat集群\tomcat8080\bin\startup.bat

D:\tomcat集群\tomcat9090\bin\startup.bat

測試地址：

http://localhost:8080/testcluster/index.jsp

http://localhost:9090/testcluster/index.jsp

每個瀏覽器會有不同的SessionID，但同個瀏覽器訪問不同端口所獲取的SessionID一致

附：tomcat集群配置參數

以上關于tomcat集群的配置只要在 <Engine> 節點或者 <Host> 節點內部加上下面的代碼即可支持集群化:

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/>

該配置將開啟 all-to-all session 復制,并通過 DeltaManager 來復制 session 增量. 采用 all-to-all 方式意味著 session 會被復制到此集群中其他的所有節點. 對于很小的集群,這種方式很適用, 但我們不推薦在較大的集群中使用(有很多 tomcat 節點的情況。例如,幾十個節點及以上...). 另外,使用 delta 增量管理器時,即使某些節點沒有部署應用程序,也會復制 session 到所有節點上.

要在較大的集群上進行session復制,需要使用 BackupManager. 此 manager 只復制 session 數據到一個備份節點, 并且只復制到部署了對應應用程序的那些節點. BackupManager的缺點: 經過測試,性能不如 delta manager.

下面是一些重要的默認值:

1. 默認的 Multicast (組播)地址是: 228.0.0.4

2. 默認的 Multicast (組播)端口是: 45564 (端口號和地址組合以后就決定了 cluster 關系,被認為是同一個集群).

3. 默認廣播的IP java.net.InetAddress.getLocalHost().getHostAddress() (確保你不是廣播到 127.0.0.1, 這是一個常見的錯誤)

4. 默認的監聽復制消息的 TCP 端口是在 4000-4100 范圍內第一個可用的server socket。

5. 配置了兩個監聽器: ClusterSessionListener 和 JvmRouteSessionIDBinderListener

6. 配置了兩個攔截器: TcpFailureDetector 和 MessageDispatch15Interceptor

很多人都說寫技術類的文章沒人看，不管怎樣，分享出來起碼自己也可以看~

在線咨詢

上一篇：前端必須懂的設計模式-代理模式
下一篇：沙雕動畫的制作軟件，Animation adobe

您的項目需求

*請認真填寫需求信息，我們會在24小時內與您取得聯系。