背景
一般我們進(jìn)行網(wǎng)絡(luò)資源的爬取操作時(shí),通常需要把瀏覽器中的request header的內(nèi)容復(fù)制出來(lái),放在腳本中進(jìn)行操作。
通常我們是手動(dòng)在每個(gè)key和value的兩邊都加上'',但是這種方法比較麻煩,且比較耗時(shí),以下為輸入一段瀏覽器header內(nèi)容字符串,自動(dòng)格式化為字典類型的方法。
代碼實(shí)現(xiàn)
def get_headers(input_headers_string):
'''
自動(dòng)格式化爬蟲瀏覽器請(qǐng)求頭參數(shù),輸入一個(gè)從瀏覽器中復(fù)制過(guò)來(lái)的請(qǐng)求頭,自動(dòng)轉(zhuǎn)換為字典格式內(nèi)容,一鍵粘貼為headers即可
:param input_headers_string:str,從瀏覽器中復(fù)制過(guò)來(lái)的請(qǐng)求頭,例如: headers = """
Host: zhan.qq.com
Proxy-Connection: keep-alive
Content-Length: 799432
Pragma: no-cache
Cache-Control: no-cache
"""
:return:
'''
# 使用三引號(hào)將瀏覽器復(fù)制出來(lái)的requests headers參數(shù)賦值給一個(gè)變量
headers =str(input_headers_string)
# 去除參數(shù)頭尾的空格并按換行符分割
headers = headers.strip().split('\n')
# 使用字典生成式將參數(shù)切片重組,并去掉空格,處理帶協(xié)議頭中的://
headers = {x.split(':')[0].strip(): ("".join(x.split(':')[1:])).strip().replace('//', "://") for x in headers}
# 使用json模塊將字典轉(zhuǎn)化成json格式打印出來(lái)
return_headers=json.dumps(headers, indent=1)
print('headers={}'.format(return_headers))
return
代碼調(diào)用
if __name__ == '__main__':
headers = """
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
content-length: 14
content-type: application/x-www-form-urlencoded; charset=UTF-8
origin: https://www.2ppt.com
referer: https://www.2ppt.com/so/1.html
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36
x-requested-with: XMLHttpRequest
"""
get_headers(headers)
運(yùn)行結(jié)果
headers={
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9",
"content-length": "14",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"origin": "https://www.2ppt.com",
"referer": "https://www.2ppt.com/so/1.html",
"sec-ch-ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
"x-requested-with": "XMLHttpRequest"
}