摘要:使用內(nèi)置的優(yōu)化器對(duì)數(shù)據(jù)集進(jìn)行回歸在使用實(shí)現(xiàn)梯度下降之前,我們先嘗試使用的內(nèi)置優(yōu)化器比如來(lái)解決數(shù)據(jù)集分類問題。使用對(duì)數(shù)據(jù)集進(jìn)行回歸通過(guò)梯度下降公式,權(quán)重的更新方式如下為了實(shí)現(xiàn)梯度下降,我將不使用優(yōu)化器的代碼,而是采用自己寫的權(quán)重更新。
作者:chen_h
微信號(hào) & QQ:862251340
微信公眾號(hào):coderpai
簡(jiǎn)書地址:http://www.jianshu.com/p/13e0...
我喜歡 TensorFlow 的其中一個(gè)原因是它可以自動(dòng)的計(jì)算函數(shù)的梯度。我們只需要設(shè)計(jì)我們的函數(shù),然后去調(diào)用 tf.gradients 函數(shù)就可以了。是不是非常簡(jiǎn)單。
接下來(lái)讓我們來(lái)舉個(gè)例子,具體說(shuō)明一下。
使用 TensorFlow 內(nèi)置的優(yōu)化器對(duì) MNIST 數(shù)據(jù)集進(jìn)行 softmax 回歸在使用 tf.gradients 實(shí)現(xiàn)梯度下降之前,我們先嘗試使用 TensorFlow 的內(nèi)置優(yōu)化器(比如 GradientDescentOptimizer)來(lái)解決MNIST數(shù)據(jù)集分類問題。
import tensorflow as tf # Import MNIST data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) # Parameters learning_rate = 0.01 training_epochs = 10 batch_size = 100 display_step = 1 # tf Graph Input x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes # Set model weights W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) # Construct model pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax # Minimize error using cross entropy cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1)) optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) # Start training with tf.Session() as sess: sess.run(tf.global_variables_initializer()) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) # Fit training using batch data _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs, y: batch_ys}) # print(__w) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if (epoch+1) % display_step == 0: # print(sess.run(W)) print ("Epoch:", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost)) print ("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy for 3000 examples accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]})) #### Output # Extracting /tmp/data/train-images-idx3-ubyte.gz # Extracting /tmp/data/train-labels-idx1-ubyte.gz # Extracting /tmp/data/t10k-images-idx3-ubyte.gz # Extracting /tmp/data/t10k-labels-idx1-ubyte.gz # Epoch: 0001 cost= 1.184285608 # Epoch: 0002 cost= 0.665428013 # Epoch: 0003 cost= 0.552858426 # Epoch: 0004 cost= 0.498728328 # Epoch: 0005 cost= 0.465593693 # Epoch: 0006 cost= 0.442609185 # Epoch: 0007 cost= 0.425552949 # Epoch: 0008 cost= 0.412188290 # Epoch: 0009 cost= 0.401390140 # Epoch: 0010 cost= 0.392354651 # Optimization Finished! # Accuracy: 0.873333
所以,我們?cè)谶@里做的是利用內(nèi)置的優(yōu)化器來(lái)計(jì)算損失值。如果我們想自己計(jì)算漸變過(guò)程和更新權(quán)重,那應(yīng)該怎么辦?這就是 tf.gradients 的作用了。
使用 tf.gradients 對(duì)MNIST數(shù)據(jù)集進(jìn)行 softmax 回歸通過(guò)梯度下降公式,權(quán)重的更新方式如下:
為了實(shí)現(xiàn)梯度下降,我將不使用優(yōu)化器的代碼,而是采用自己寫的權(quán)重更新。
因?yàn)檫@里有權(quán)重矩陣 w 和偏差項(xiàng)矩陣 b,所以我們需要去計(jì)算這些矩陣的梯度。所以實(shí)現(xiàn)的代碼如下:
# Computing the gradient of cost with respect to W and b grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost) # Gradient Step new_W = W.assign(W - learning_rate * grad_W) new_b = b.assign(b - learning_rate * grad_b)
這三行代碼只是替代前面的一行代碼,干嘛給自己造成這么大的麻煩呢?因?yàn)槿绻阈枰约旱膿p失函數(shù)的梯度,并且你不想編寫嚴(yán)格的數(shù)學(xué)函數(shù),那么 TensorFlow 就可以幫助你了。
我們已經(jīng)構(gòu)建好了計(jì)算圖,所以接下來(lái)我們只需要在會(huì)話中運(yùn)行這個(gè)計(jì)算圖就行了。讓我來(lái)試試吧。
# Fit training using batch data _, _, c = sess.run([new_W, new_b ,cost], feed_dict={x: batch_xs, y: batch_ys})
我們不需要 new_W 和 new_b 的輸出,所以我忽略了這些變量。
完整代碼如下:
import tensorflow as tf # Import MNIST data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) # Parameters learning_rate = 0.01 training_epochs = 10 batch_size = 100 display_step = 1 # Parameters learning_rate = 0.01 training_epochs = 10 batch_size = 100 display_step = 1 # tf Graph Input x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes # Set model weights W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) # Construct model pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax # Minimize error using cross entropy cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1)) grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost) new_W = W.assign(W - learning_rate * grad_W) new_b = b.assign(b - learning_rate * grad_b) # Initialize the variables (i.e. assign their default value) init = tf.global_variables_initializer() # Start training with tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) # Fit training using batch data _, _, c = sess.run([new_W, new_b ,cost], feed_dict={x: batch_xs, y: batch_ys}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if (epoch+1) % display_step == 0: # print(sess.run(W)) print ("Epoch:", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost)) print ("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy for 3000 examples accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]})) # Output # Epoch: 0001 cost= 1.183741399 # Epoch: 0002 cost= 0.665312284 # Epoch: 0003 cost= 0.552796521 # Epoch: 0004 cost= 0.498697014 # Epoch: 0005 cost= 0.465521633 # Epoch: 0006 cost= 0.442611256 # Epoch: 0007 cost= 0.425528946 # Epoch: 0008 cost= 0.412203073 # Epoch: 0009 cost= 0.401364554 # Epoch: 0010 cost= 0.392398663 # Optimization Finished! # Accuracy: 0.874使用梯度公式的 softmax 回歸
我們對(duì)于權(quán)重 w 的梯度處理如下:
如前所示,不使用 tf.gradients 或使用 TensorFlow 的內(nèi)置優(yōu)化器,這樣可以實(shí)現(xiàn)梯度方程。完整代碼如下:
import tensorflow as tf # Import MNIST data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) # Parameters learning_rate = 0.01 training_epochs = 10 batch_size = 100 display_step = 1 # Parameters learning_rate = 0.01 training_epochs = 10 batch_size = 100 display_step = 1 # tf Graph Input x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes # Set model weights W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) # Construct model pred = tf.nn.softmax(tf.matmul(x, W)) # Softmax # Minimize error using cross entropy cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1)) W_grad = - tf.matmul ( tf.transpose(x) , y - pred) b_grad = - tf.reduce_mean( tf.matmul(tf.transpose(x), y - pred), reduction_indices=0) new_W = W.assign(W - learning_rate * W_grad) new_b = b.assign(b - learning_rate * b_grad) init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) # Fit training using batch data _, _, c = sess.run([new_W, new_b, cost], feed_dict={x: batch_xs, y: batch_ys}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if (epoch+1) % display_step == 0: print ("Epoch:", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost)) print ("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy for 3000 examples accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]})) # Output # Extracting /tmp/data/train-images-idx3-ubyte.gz # Extracting /tmp/data/train-labels-idx1-ubyte.gz # Extracting /tmp/data/t10k-images-idx3-ubyte.gz # Extracting /tmp/data/t10k-labels-idx1-ubyte.gz # Epoch: 0001 cost= 0.432943137 # Epoch: 0002 cost= 0.330031527 # Epoch: 0003 cost= 0.313661941 # Epoch: 0004 cost= 0.306443773 # Epoch: 0005 cost= 0.300219418 # Epoch: 0006 cost= 0.298976618 # Epoch: 0007 cost= 0.293222957 # Epoch: 0008 cost= 0.291407861 # Epoch: 0009 cost= 0.288372261 # Epoch: 0010 cost= 0.286749691 # Optimization Finished! # Accuracy: 0.898Tensorflow 是如何計(jì)算梯度的?
你可以在思考,TensorFlow是如何計(jì)算函數(shù)的梯度?
TensorFlow 使用的是一種稱為 Automatic Differentiation 的方法,具體你可以查看 Wikipedia。
我希望這篇文章對(duì)你有幫會(huì)幫助。
作者:chen_h
微信號(hào) & QQ:862251340
簡(jiǎn)書地址:http://www.jianshu.com/p/13e0...
CoderPai 是一個(gè)專注于算法實(shí)戰(zhàn)的平臺(tái),從基礎(chǔ)的算法到人工智能算法都有設(shè)計(jì)。如果你對(duì)算法實(shí)戰(zhàn)感興趣,請(qǐng)快快關(guān)注我們吧。加入AI實(shí)戰(zhàn)微信群,AI實(shí)戰(zhàn)QQ群,ACM算法微信群,ACM算法QQ群。長(zhǎng)按或者掃描如下二維碼,關(guān)注 “CoderPai” 微信號(hào)(coderpai)
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://m.specialneedsforspecialkids.com/yun/41085.html
摘要:訓(xùn)練深度神經(jīng)網(wǎng)絡(luò)需要大量的內(nèi)存,用戶使用這個(gè)工具包,可以在計(jì)算時(shí)間成本僅增加的基礎(chǔ)上,在上運(yùn)行規(guī)模大倍的前饋模型。使用導(dǎo)入此功能,與使用方法相同,使用梯度函數(shù)來(lái)計(jì)算參數(shù)的損失梯度。隨后,在反向傳播中重新計(jì)算檢查點(diǎn)之間的節(jié)點(diǎn)。 OpenAI是電動(dòng)汽車制造商特斯拉創(chuàng)始人 Elon Musk和著名的科技孵化器公司 Y Combinator總裁 Sam Altman于 2016年聯(lián)合創(chuàng)立的 AI公司...
摘要:前面兩個(gè)期望的采樣我們都熟悉,第一個(gè)期望是從真樣本集里面采,第二個(gè)期望是從生成器的噪聲輸入分布采樣后,再由生成器映射到樣本空間。 Wasserstein GAN進(jìn)展:從weight clipping到gradient penalty,更加先進(jìn)的Lipschitz限制手法前段時(shí)間,Wasserstein ?GAN以其精巧的理論分析、簡(jiǎn)單至極的算法實(shí)現(xiàn)、出色的實(shí)驗(yàn)效果,在GAN研究圈內(nèi)掀起了一陣...
摘要:經(jīng)過(guò)第一步的處理已經(jīng)把古詩(shī)詞詞語(yǔ)轉(zhuǎn)換為可以機(jī)器學(xué)習(xí)建模的數(shù)字形式,因?yàn)槲覀儾捎盟惴ㄟM(jìn)行古詩(shī)詞生成,所以還需要構(gòu)建輸入到輸出的映射處理。 LSTM 介紹 序列化數(shù)據(jù)即每個(gè)樣本和它之前的樣本存在關(guān)聯(lián),前一數(shù)據(jù)和后一個(gè)數(shù)據(jù)有順序關(guān)系。深度學(xué)習(xí)中有一個(gè)重要的分支是專門用來(lái)處理這樣的數(shù)據(jù)的——循環(huán)神經(jīng)網(wǎng)絡(luò)。循環(huán)神經(jīng)網(wǎng)絡(luò)廣泛應(yīng)用在自然語(yǔ)言處理領(lǐng)域(NLP),今天我們帶你從一個(gè)實(shí)際的例子出發(fā),介紹循...
閱讀 2348·2021-11-23 09:51
閱讀 1152·2021-11-22 13:52
閱讀 3623·2021-11-10 11:35
閱讀 1205·2021-10-25 09:47
閱讀 3010·2021-09-07 09:58
閱讀 1074·2019-08-30 15:54
閱讀 2830·2019-08-29 14:21
閱讀 3042·2019-08-29 12:20