想免費用谷歌資源訓練神經網絡？Colab 詳細使用教程 —— Jinkey 原創

XboxYan 發布于2019-06-26 18:19 / 737人閱讀

摘要：網址庫的安裝和使用自帶了等深度學習基礎庫。遍歷目錄列出根目錄的所有文件查詢條件教程詳見可以看到控制臺打印結果測試其中是接下來的教程獲取文件的唯一標識。該示例演示的是對健康科技設計三個類別的標題進行分類。

原文鏈接 https://jinkey.ai/post/tech/x...
本文作者 Jinkey（微信公眾號 jinkey-love，官網 https://jinkey.ai）
文章允許非篡改署名轉載，刪除或修改本段版權信息轉載的，視為侵犯知識產權，我們保留追求您法律責任的權利，特此聲明！

1 簡介

Colab 是谷歌內部類 Jupyter Notebook 的交互式 Python 環境，免安裝快速切換 Python 2和 Python 3 的環境，支持Google全家桶(TensorFlow、BigQuery、GoogleDrive等)，支持 pip 安裝任意自定義庫。
網址：
https://colab.research.google...

2 庫的安裝和使用

Colab 自帶了 Tensorflow、Matplotlib、Numpy、Pandas 等深度學習基礎庫。如果還需要其他依賴，如 Keras，可以新建代碼塊，輸入

# 安裝最新版本Keras
# https://keras.io/
!pip install keras
# 指定版本安裝
!pip install keras==2.0.9
# 安裝 OpenCV
# https://opencv.org/
!apt-get -qq install -y libsm6 libxext6 && pip install -q -U opencv-python
# 安裝 Pytorch
# http://pytorch.org/
!pip install -q http://download.pytorch.org/whl/cu75/torch-0.2.0.post3-cp27-cp27mu-manylinux1_x86_64.whl torchvision
# 安裝 XGBoost
# https://github.com/dmlc/xgboost
!pip install -q xgboost
# 安裝 7Zip
!apt-get -qq install -y libarchive-dev && pip install -q -U libarchive
# 安裝 GraphViz 和 PyDot
!apt-get -qq install -y graphviz && pip install -q pydot

3 Google Drive 文件操作 授權登錄

對于同一個 notebook，登錄操作只需要進行一次，然后才可以進度讀寫操作。

# 安裝 PyDrive 操作庫，該操作每個 notebook 只需要執行一次
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 授權登錄，僅第一次的時候會鑒權
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

執行這段代碼后，會打印以下內容，點擊連接進行授權登錄，獲取到 token 值填寫到輸入框，按 Enter 繼續即可完成登錄。

遍歷目錄

# 列出根目錄的所有文件
# "q" 查詢條件教程詳見：https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile({"q": ""root" in parents and trashed=false"}).GetList()
for file1 in file_list:
  print("title: %s, id: %s, mimeType: %s" % (file1["title"], file1["id"], file1["mimeType"]))

可以看到控制臺打印結果

title: Colab 測試, id: 1cB5CHKSdL26AMXQ5xrqk2kaBv5LSkIsJ8HuEDyZpeqQ, mimeType: application/vnd.google-apps.document

title: Colab Notebooks, id: 1U9363A12345TP2nSeh2K8FzDKSsKj5Jj, mimeType: application/vnd.google-apps.folder

其中 id 是接下來的教程獲取文件的唯一標識。根據 mimeType 可以知道 Colab 測試 文件為 doc 文檔，而 Colab Notebooks 為文件夾（也就是 Colab 的 Notebook 儲存的根目錄），如果想查詢 Colab Notebooks 文件夾下的文件，查詢條件可以這么寫：

# "目錄 id" in parents
file_list = drive.ListFile({"q": ""1cB5CHKSdL26AMXQ5xrqk2kaBv5LBkIsJ8HuEDyZpeqQ" in parents and trashed=false"}).GetList()

讀取文件內容

目前測試過可以直接讀取內容的格式為 .txt（mimeType: text/plain），讀取代碼：

file = drive.CreateFile({"id": "替換成你的 .txt 文件 id"}) 
file.GetContentString()

而 .csv 如果用GetContentString()只能打印第一行的數據，要用``

file = drive.CreateFile({"id": "替換成你的 .csv 文件 id"}) 
#這里的下載操作只是緩存，不會在你的Google Drive 目錄下多下載一個文件
file.GetContentFile("iris.csv", "text/csv") 

# 直接打印文件內容
with open("iris.csv") as f:
  print f.readlines()
# 用 pandas 讀取
import pandas
pd.read_csv("iris.csv", index_col=[0,1], skipinitialspace=True)

Colab 會直接以表格的形式輸出結果（下圖為截取 iris 數據集的前幾行）， iris 數據集地址為 http://aima.cs.berkeley.edu/d... ，學習的同學可以執行上傳到自己的 Google Drive。

寫文件操作

# 創建一個文本文件
uploaded = drive.CreateFile({"title": "示例.txt"})
uploaded.SetContentString("測試內容")
uploaded.Upload()
print("創建后文件 id 為 {}".format(uploaded.get("id")))

更多操作可查看 http://pythonhosted.org/PyDri...

4 Google Sheet 電子表格操作 授權登錄

對于同一個 notebook，登錄操作只需要進行一次，然后才可以進度讀寫操作。

!pip install --upgrade -q gspread
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

讀取

把 iris.csv 的數據導入創建一個 Google Sheet 文件來做演示，可以放在 Google Drive 的任意目錄

worksheet = gc.open("iris").sheet1

# 獲取一個列表[
# [第1行第1列, 第1行第2列, ... , 第1行第n列], ... ,[第n行第1列, 第n行第2列, ... , 第n行第n列]]
rows = worksheet.get_all_values()
print(rows)

#  用 pandas 讀取
import pandas as pd
pd.DataFrame.from_records(rows)

打印結果分別為

[["5.1", "3.5", "1.4", "0.2", "setosa"], ["4.9", "3", "1.4", "0.2", "setosa"], ...

寫入

sh = gc.create("谷歌表")

# 打開工作簿和工作表
worksheet = gc.open("谷歌表").sheet1
cell_list = worksheet.range("A1:C2")

import random
for cell in cell_list:
  cell.value = random.randint(1, 10)
worksheet.update_cells(cell_list)

5 下載文件到本地

with open("example.txt", "w") as f:
  f.write("測試內容")
files.download("example.txt")

6 實戰

這里以我在 Github 的開源LSTM 文本分類項目為例子https://github.com/Jinkeycode...
把 master/data 目錄下的三個文件存放到 Google Drive 上。該示例演示的是對健康、科技、設計三個類別的標題進行分類。

新建

在 Colab 上新建 Python2 的筆記本

安裝依賴

!pip install keras
!pip install jieba
!pip install h5py

import h5py
import jieba as jb
import numpy as np
import keras as krs
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder

加載數據

授權登錄

# 安裝 PyDrive 操作庫，該操作每個 notebook 只需要執行一次
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

def login_google_drive():
  # 授權登錄，僅第一次的時候會鑒權
  auth.authenticate_user()
  gauth = GoogleAuth()
  gauth.credentials = GoogleCredentials.get_application_default()
  drive = GoogleDrive(gauth)
  return drive

列出 GD 下的所有文件

def list_file(drive):
  file_list = drive.ListFile({"q": ""root" in parents and trashed=false"}).GetList()
  for file1 in file_list:
    print("title: %s, id: %s, mimeType: %s" % (file1["title"], file1["id"], file1["mimeType"]))
    

drive = login_google_drive()
list_file(drive)

緩存數據到工作環境

def cache_data():
  # id 替換成上一步讀取到的對應文件 id
  health_txt = drive.CreateFile({"id": "117GkBtuuBP3wVjES0X0L4wVF5rp5Cewi"}) 
  tech_txt = drive.CreateFile({"id": "14sDl4520Tpo1MLPydjNBoq-QjqOKk9t6"})
  design_txt = drive.CreateFile({"id": "1J4lndcsjUb8_VfqPcfsDeOoB21bOLea3"})
  #這里的下載操作只是緩存，不會在你的Google Drive 目錄下多下載一個文件
  
  health_txt.GetContentFile("health.txt", "text/plain")
  tech_txt.GetContentFile("tech.txt", "text/plain")
  design_txt.GetContentFile("design.txt", "text/plain")
  
  print("緩存成功")
  
cache_data()

讀取工作環境的數據

def load_data():
    titles = []
    print("正在加載健康類別的數據...")
    with open("health.txt", "r") as f:
        for line in f.readlines():
            titles.append(line.strip())

    print("正在加載科技類別的數據...")
    with open("tech.txt", "r") as f:
        for line in f.readlines():
            titles.append(line.strip())


    print("正在加載設計類別的數據...")
    with open("design.txt", "r") as f:
        for line in f.readlines():
            titles.append(line.strip())

    print("一共加載了 %s 個標題" % len(titles))

    return titles
  
titles = load_data()

加載標簽

def load_label():
    arr0 = np.zeros(shape=[12000, ])
    arr1 = np.ones(shape=[12000, ])
    arr2 = np.array([2]).repeat(7318)
    target = np.hstack([arr0, arr1, arr2])
    print("一共加載了 %s 個標簽" % target.shape)

    encoder = LabelEncoder()
    encoder.fit(target)
    encoded_target = encoder.transform(target)
    dummy_target = krs.utils.np_utils.to_categorical(encoded_target)

    return dummy_target
  
target = load_label()

文本預處理

max_sequence_length = 30
embedding_size = 50

# 標題分詞
titles = [".".join(jb.cut(t, cut_all=True)) for t in titles]

# word2vec 詞袋化
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length, min_frequency=1)
text_processed = np.array(list(vocab_processor.fit_transform(titles)))

# 讀取詞標簽
dict = vocab_processor.vocabulary_._mapping
sorted_vocab = sorted(dict.items(), key = lambda x : x[1])

構建神經網絡

這里使用 Embedding 和 lstm 作為前兩層，通過 softmax 激活輸出結果

# 配置網絡結構
def build_netword(num_vocabs):
    # 配置網絡結構
    model = krs.Sequential()
    model.add(krs.layers.Embedding(num_vocabs, embedding_size, input_length=max_sequence_length))
    model.add(krs.layers.LSTM(32, dropout=0.2, recurrent_dropout=0.2))
    model.add(krs.layers.Dense(3))
    model.add(krs.layers.Activation("softmax"))
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

    return model
  
num_vocabs = len(dict.items())
model = build_netword(num_vocabs=num_vocabs)

import time
start = time.time()
# 訓練模型
model.fit(text_processed, target, batch_size=512, epochs=10, )
finish = time.time()
print("訓練耗時：%f 秒" %(finish-start))

預測樣本

sen 可以換成你自己的句子，預測結果為[健康類文章概率, 科技類文章概率, 設計類文章概率], 概率最高的為那一類的文章，但最大概率低于 0.8 時判定為無法分類的文章。

sen = "做好商業設計需要學習的小技巧"
sen_prosessed = " ".join(jb.cut(sen, cut_all=True))
sen_prosessed = vocab_processor.transform([sen_prosessed])
sen_prosessed = np.array(list(sen_prosessed))
result = model.predict(sen_prosessed)

catalogue = list(result[0]).index(max(result[0]))
threshold=0.8
if max(result[0]) > threshold:
    if catalogue == 0:
        print("這是一篇關于健康的文章")
    elif catalogue == 1:
        print("這是一篇關于科技的文章")
    elif catalogue == 2:
        print("這是一篇關于設計的文章")
    else:
        print("這篇文章沒有可信分類")

GPU云服務器云服務器 vps使用詳細教程 linux訓練教程 js訓練教程視頻無權限使用網絡資源

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://m.specialneedsforspecialkids.com/yun/19675.html

用免費TPU訓練Keras模型，速度還能提高20倍！

摘要：本文介紹了如何利用上的免費資源更快地訓練模型。本文將介紹如何在上使用訓練已有的模型，其訓練速度是在上訓練速度的倍。使用靜態訓練模型，并將權重保存到文件。使用推理模型進行預測。本文介紹了如何利用 Google Colab 上的免費 Cloud TPU 資源更快地訓練 Keras 模型。很長一段時間以來，我在單個 GTX 1070 顯卡上訓練模型，其單精度大約為 8.18 TFlops。后來谷...

IamDLY 2019-04-25 18:32 評論0 收藏0

發表評論

登陸后可評論

0條評論

XboxYan

男|高級講師

我要關注我要私信

TA的文章

青果云：日本東京CN2_GIA，簡單測評

閱讀 930·2021-10-27 14:14
C語言預處理詳解

閱讀 1753·2021-10-11 10:59
Web如何防止XSS攻擊

閱讀 1325·2019-08-30 13:13
前端每日實戰：14# 視頻演示如何用純 CSS 創作一種側立圖書的特效

閱讀 3160·2019-08-29 15:17
css揭秘筆記——字體排版

閱讀 2759·2019-08-29 13:48
「每日一瞥

閱讀 498·2019-08-26 13:36
初識 jquery.simulate.js 模擬鍵盤事件

閱讀 2090·2019-08-26 13:25
來實現一個縮水版Vuex

閱讀 866·2019-08-26 12:24

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務器低至59元/年，更有多臺、長期優惠，快來選購！

想免費用谷歌資源訓練神經網絡？Colab 詳細使用教程 —— Jinkey 原創

相關文章

用免費TPU訓練Keras模型，速度還能提高20倍！

發表評論

0條評論

XboxYan

男|高級講師

TA的文章

青果云：日本東京CN2_GIA，簡單測評

C語言預處理詳解

Web如何防止XSS攻擊

前端每日實戰：14# 視頻演示如何用純 CSS 創作一種側立圖書的特效

css揭秘筆記——字體排版

「每日一瞥

初識 jquery.simulate.js 模擬鍵盤事件

來實現一個縮水版Vuex

最新活動