Python 爬蟲(chóng)之模擬登陸CSND

firim 發(fā)布于2019-07-31 11:00 / 1105人閱讀

摘要：它也會(huì)在同一個(gè)實(shí)例發(fā)出的所有請(qǐng)求之間保持，期間使用的功能。而主要是方便解析源碼，從中獲取請(qǐng)求需要的一些參數(shù)完整代碼請(qǐng)輸入賬號(hào)請(qǐng)輸入密碼項(xiàng)目地址模擬京東登錄吐槽群

Python 爬蟲(chóng)之模擬登陸CSND 工具

基本的腳本語(yǔ)言是Python，雖然不敢說(shuō)是最好的語(yǔ)言，至少是最好的之一（0.0），用模擬登陸，我們需要用到多個(gè)模塊，如下：

requests

BeautifulSoup

requests 安裝

下載源碼安裝

git clone git://github.com/kennethreitz/requests.git
cd requests
pip install .

pip

pip install requests

BeautifulSoup 介紹

Beautiful Soup 是一個(gè)可以從HTML或XML文件中提取數(shù)據(jù)的Python庫(kù).它能夠通過(guò)你喜歡的轉(zhuǎn)換器實(shí)現(xiàn)慣用的文檔導(dǎo)航,查找,修改文檔的方式.Beautiful Soup會(huì)幫你節(jié)省數(shù)小時(shí)甚至數(shù)天的工作時(shí)間.

安裝

easy_install beautifulsoup4

pip install beautifulsoup4

使用

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("index.html"))

soup = BeautifulSoup("data", "lxml")

說(shuō)明

requests主要是為了利用requests的高級(jí)會(huì)話(huà)機(jī)制，requests的會(huì)話(huà)對(duì)象可以讓我們跨請(qǐng)求保持某些參數(shù)，比如cookies, headers等，

會(huì)話(huà)對(duì)象讓你能夠跨請(qǐng)求保持某些參數(shù)。它也會(huì)在同一個(gè) Session 實(shí)例發(fā)出的所有請(qǐng)求之間保持 cookie， 期間使用 urllib3 的 connection pooling 功能。所以如果你向同一主機(jī)發(fā)送多個(gè)請(qǐng)求，底層的 TCP 連接將會(huì)被重用，從而帶來(lái)顯著的性能提升。

而B(niǎo)eautifulSoup主要是方便解析HTML源碼，從中獲取請(qǐng)求需要的一些參數(shù)

完整代碼

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import requests

s = requests.Session()


class CSDN:

    def __init__(self, username, password):
        self.username = username
        self.password = password
        self.login_url = "https://passport.csdn.net/account/login"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebK"
                          "it/537.36 (KHTML, like Gecko) Chrome/61.0.3163.1"
                          "00 Safari/537.36 OPR/48.0.2685.52",
            "Referer": "http://my.csdn.net/my/mycsdn"
        }

    def login(self):
        params = {
            "from": "http://my.csdn.net/my/mycsdn"
        }
        html = s.get(self.login_url, params=params, headers=self.headers)
        soup = BeautifulSoup(html.content, "lxml")
        lt = soup.select("input[name="lt"]")[0].get("value")
        execution = soup.select("input[name="execution"]")[0].get("value")
        event_id = soup.select("input[name="_eventId"]")[0].get("value")
        data = {
            "username": self.username,
            "password": self.password,
            "rememberMe": "true",
            "lt": lt,
            "execution": execution,
            "_eventId": event_id
        }
        r = s.post(self.login_url, data=data)
        self.headers["Referer"] = "http://passport.csdn.net/account/login?from=http%3A%2F%2Fmy.csdn.net%2Fmy%2Fmycsdn"
        resp = s.get("http://my.csdn.net/my/mycsdn", headers=self.headers)
        print(resp.text)


username = input("請(qǐng)輸入賬號(hào)：")
password = input("請(qǐng)輸入密碼：")
cs = CSDN(username, password)
cs.login()

項(xiàng)目地址：模擬京東登錄

吐槽QQ群： 173318043