国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

<fieldset id="s6mag"></fieldset>

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

立即前往

首頁/文章專欄/第一個網(wǎng)絡(luò)爬蟲-抓取CodeSnippet代碼片段

第一個網(wǎng)絡(luò)爬蟲-抓取CodeSnippet代碼片段

xcold 發(fā)布于2019-07-25 11:16 / 1246人閱讀

摘要：抓取代碼片段目標抓取中的代碼片段分析代碼分享你的世界代碼塊發(fā)布代碼片段片段列表一個線程如果是個人英雄主義，那么多線程就是集體主義，你不再是一個獨行俠，而是一個指揮家。

CodeSnippet 抓取代碼片段 目標

抓取CodeSnippet中的代碼片段

分析

代碼


    
        
            
                
                    
                
                
                                        
                    
                 
                
                    
                        發(fā)布代碼片段
                        片段列表 
                    
                    

                
                
                    一個線程如果是個人英雄主義，那么多線程就是集體主義，你不再是一個獨行俠，而是一個指揮家。
                
                
                    
                        共有 {15106} 個代碼片段 
                      
                 
                
                    京ICP備13038605號

我們想要抓取的內(nèi)容在為 li class="con-code bbor" 所以 BeautifulSoup find()方法獲取到該標簽然后獲取其文本內(nèi)容

準備

準備我們爬蟲比用的兩個模塊

from urllib2 import urlopen

from bs4 import BeautifulSoup

編寫抓取代碼

# 抓取http://www.codesnippet.cn/index.html 中的代碼片段

def GrapIndex():
    html = "http://www.codesnippet.cn/index.html"
    bsObj = BeautifulSoup(urlopen(html), "html.parser")
    return bsObj.find("li",  {"class":"con-code bbor"}).get_text()

當我們抓取到我們想要的數(shù)據(jù)之后接下來要做的就是把數(shù)據(jù)寫到數(shù)據(jù)庫里，由于我們現(xiàn)在抓取數(shù)據(jù)簡單，所以只寫文件即可！

def SaveResult():
    codeFile=open("code.txt", "a") # 追加
    for list in GrapIndex():
        codeFile.write(list)
    codeFile.close()

當我們在寫文件的時候出現(xiàn)了以下錯誤，而下面這個錯誤的造成原因則是由于python2.7是基于ascii去處理字符流，當字符流不屬于ascii范圍內(nèi)，就會拋出異常（ordinal not in range(128)）

UnicodeEncodeError: "ascii" codec can"t encode character u"u751f" in position 0: ordinal not in range(128)

分析

python2.7是基于ascii去處理字符流，當字符流不屬于ascii范圍內(nèi)，就會拋出異常（ordinal not in range(128)）

解決辦法

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

完整代碼展示

from urllib2 import urlopen

from bs4 import BeautifulSoup

import os
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

def GrapIndex():
    html = "http://www.codesnippet.cn/index.html"
    bsObj = BeautifulSoup(urlopen(html), "html.parser")
    return bsObj.find("li",  {"class":"con-code bbor"}).get_text()

def SaveResult():
    codeFile=open("code.txt", "a")
    for list in GrapIndex():
        codeFile.write(list)
    codeFile.close()

if __name__ == "__main__":
    for i in range(0,9):
        SaveResult()