Python知識(shí)分享網(wǎng) - 專業(yè)的Python學(xué)習(xí)網(wǎng)站 學(xué)Python,上Python222
爬蟲工具|輸入瀏覽器header內(nèi)容字符串,自動(dòng)格式化為字典類型
發(fā)布于:2023-07-09 11:08:56

背景


一般我們進(jìn)行網(wǎng)絡(luò)資源的爬取操作時(shí),通常需要把瀏覽器中的request header的內(nèi)容復(fù)制出來,放在腳本中進(jìn)行操作。

通常我們是手動(dòng)在每個(gè)key和value的兩邊都加上'',但是這種方法比較麻煩,且比較耗時(shí),以下為輸入一段瀏覽器header內(nèi)容字符串,自動(dòng)格式化為字典類型的方法。

 

代碼實(shí)現(xiàn)
 

def get_headers(input_headers_string):
    '''
    自動(dòng)格式化爬蟲瀏覽器請(qǐng)求頭參數(shù),輸入一個(gè)從瀏覽器中復(fù)制過來的請(qǐng)求頭,自動(dòng)轉(zhuǎn)換為字典格式內(nèi)容,一鍵粘貼為headers即可
    
    :param input_headers_string:str,從瀏覽器中復(fù)制過來的請(qǐng)求頭,例如:    headers = """
    Host: zhan.qq.com
    Proxy-Connection: keep-alive
    Content-Length: 799432
    Pragma: no-cache
    Cache-Control: no-cache
    """
    
    :return:
    '''
    # 使用三引號(hào)將瀏覽器復(fù)制出來的requests headers參數(shù)賦值給一個(gè)變量
    headers =str(input_headers_string)
    # 去除參數(shù)頭尾的空格并按換行符分割
    headers = headers.strip().split('\n')
    
    # 使用字典生成式將參數(shù)切片重組,并去掉空格,處理帶協(xié)議頭中的://
    headers = {x.split(':')[0].strip(): ("".join(x.split(':')[1:])).strip().replace('//', "://") for x in headers}
    
    # 使用json模塊將字典轉(zhuǎn)化成json格式打印出來
    return_headers=json.dumps(headers, indent=1)
    print('headers={}'.format(return_headers))
    
    return

 

代碼調(diào)用

 

if __name__ == '__main__':
    headers = """
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
content-length: 14
content-type: application/x-www-form-urlencoded; charset=UTF-8
origin: https://www.2ppt.com
referer: https://www.2ppt.com/so/1.html
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36
x-requested-with: XMLHttpRequest
    """
    get_headers(headers)

 

運(yùn)行結(jié)果

 

headers={
 "accept-encoding": "gzip, deflate, br",
 "accept-language": "zh-CN,zh;q=0.9",
 "content-length": "14",
 "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
 "origin": "https://www.2ppt.com",
 "referer": "https://www.2ppt.com/so/1.html",
 "sec-ch-ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"",
 "sec-ch-ua-mobile": "?0",
 "sec-ch-ua-platform": "\"Windows\"",
 "sec-fetch-dest": "empty",
 "sec-fetch-mode": "cors",
 "sec-fetch-site": "same-origin",
 "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
 "x-requested-with": "XMLHttpRequest"
}

 

 

轉(zhuǎn)載自:https://blog.csdn.net/zh6526157/article/details/121947884