python爬蟲之連接mysql

ISherry 發布于2019-07-31 10:02 / 1207人閱讀

摘要：準備工作運行本地數據庫服務器安裝建表連接數據庫用操作還是比較簡單的，如果有一點數據庫基礎的話，可以直接上手，最后一定不要忘了寫提交，不然數據只是緩存，存不到數據庫里完整示例爬取百度上最熱的幾個新聞標題，并存儲到數據庫，太懶了沒寫注釋

準備工作

運行本地數據庫服務器

    mysql -u root -p

安裝pymysql

    pip install pymysql

建表

CREATE DATABASE crawls;
// show databases; 
use db;

CREATE TABLE IF NOT EXISTS baiduNews("
       "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,"
       "ranking VARCHAR(30),"
       "title VARCHAR(60),"
       "datetime TIMESTAMP,"
       "hot VARCHAR(30));
// show tables;

pymysql連接數據庫

db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", 
                    db="crawls", charset="utf8")
cursor = db.cursor()
cursor.execute(sql_query)
db.commit()

用python操作mysql還是比較簡單的，如果有一點數據庫基礎的話，可以直接上手，最后一定不要忘了寫commit提交，不然數據只是緩存，存不到數據庫里

完整示例

爬取百度上最熱的幾個新聞標題，并存儲到數據庫，太懶了沒寫注釋-_- (確保本地mysql服務器已經打開）

"""
Get the hottest news title on baidu page,
then save these data into mysql
"""
import datetime

import pymysql
from pyquery import PyQuery as pq
import requests
from requests.exceptions import ConnectionError

URL = "https://www.baidu.com/s?wd=%E7%83%AD%E7%82%B9"
headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36",
    "Upgrade-Insecure-Requests": "1"
}

def get_html(url):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            return response.text
        return None
    except ConnectionError as e:
        print(e.args)
        return None

def parse_html(html):
    doc = pq(html)
    trs = doc(".FYB_RD table.c-table tr").items()
    for tr in trs:
        index = tr("td:nth-child(1) span.c-index").text()
        title = tr("td:nth-child(1) span a").text()
        hot = tr("td:nth-child(2)").text().strip(""")
        yield {
            "index":index,
            "title":title,
            "hot":hot
        }

def save_to_mysql(items):
    try:
        db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456",
                             db="crawls", charset="utf8")
        cursor = db.cursor()
        cursor.execute("use crawls;")
        cursor.execute("CREATE TABLE IF NOT EXISTS baiduNews("
                       "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,"
                       "ranking VARCHAR(30),"
                       "title VARCHAR(60),"
                       "datetime TIMESTAMP,"
                       "hot VARCHAR(30));")
        try:
            for item in items:
                print(item)
                now = datetime.datetime.now()
                now = now.strftime("%Y-%m-%d %H:%M:%S")
                sql_query = "INSERT INTO baiduNews(ranking, title, datetime, hot) VALUES ("%s", "%s", "%s", "%s")" % (
                            item["index"], item["title"], now, item["hot"])
                cursor.execute(sql_query)
                print("Save into mysql")
            db.commit()
        except pymysql.MySQLError as e:
            db.rollback()
            print(e.args)
            return
    except pymysql.MySQLError as e:
        print(e.args)
        return

def check_mysql():
    try:
        db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456",
                             db="crawls", charset="utf8")
        cursor = db.cursor()
        cursor.execute("use crawls;")
        sql_query = "SELECT * FROM baiduNews"
        results = cursor.execute(sql_query)
        print(results)
    except pymysql.MySQLError as e:
        print(e.args)

def main():
    html = get_html(URL)
    items = parse_html(html)
    save_to_mysql(items)
    #check_mysql()

if __name__ == "__main__":
    main()

云服務器 GPU云服務器 python連接MySQL python35連接mysql python3連接mysql 連接之云服務器失敗

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://m.specialneedsforspecialkids.com/yun/43127.html

爬蟲初級操作（二）

摘要：本篇內容為網絡爬蟲初級操作的簡單介紹，內容主要有以下部分解析網頁數據庫解析網頁一般來說，解析網頁有三種方式正則表達式。關于，我們最后再來看一個實戰項目爬取北京二手房價格。代碼如下第頁這樣就成功爬取了安居客上前頁的北京二手房價格。本篇內容為 python 網絡爬蟲初級操作的簡單介紹，內容主要有以下 2 部分：解析網頁數據庫解析網頁一般來說，解析網頁有三種方式：正則表達式、...

崔曉明 2019-07-30 17:04 評論0 收藏0
爬蟲初級操作（二）

摘要：本篇內容為網絡爬蟲初級操作的簡單介紹，內容主要有以下部分解析網頁數據庫解析網頁一般來說，解析網頁有三種方式正則表達式。關于，我們最后再來看一個實戰項目爬取北京二手房價格。代碼如下第頁這樣就成功爬取了安居客上前頁的北京二手房價格。本篇內容為 python 網絡爬蟲初級操作的簡單介紹，內容主要有以下 2 部分：解析網頁數據庫解析網頁一般來說，解析網頁有三種方式：正則表達式、...

callmewhy 2019-06-26 17:44 評論0 收藏0
Python入門網絡爬蟲之精華版

摘要：學習網絡爬蟲主要分個大的版塊抓取，分析，存儲另外，比較常用的爬蟲框架，這里最后也詳細介紹一下。網絡爬蟲要做的，簡單來說，就是實現瀏覽器的功能。 Python學習網絡爬蟲主要分3個大的版塊：抓取，分析，存儲另外，比較常用的爬蟲框架Scrapy，這里最后也詳細介紹一下。首先列舉一下本人總結的相關文章，這些覆蓋了入門網絡爬蟲需要的基本概念和技巧：寧哥的小站-網絡爬蟲,當我們在瀏覽器中輸入...

Bmob 2019-07-25 11:34 評論0 收藏0