Introduction

On a whim, I decided to learn machine learning. These are notes from my learning process.

Preparation

I made these preparations:

A MacBook with Python environment set up, and numpy and matplotlib installed
Registered for Udacity’s free “Deep Learning” course (in collaboration with Google)
Studied Liao Xuefeng’s Python introductory tutorial
Spent two days roughly browsing through “Machine Learning in Action”

Learning these fundamentals should be sufficient for the upcoming Udacity course.

Course One: From Machine Learning to Deep Learning

Introduction

Sections 1-8 mainly introduce the current state of deep learning and related knowledge.

Sections 9-12 introduce the softmax model.

After a rough review of “Machine Learning in Action,” I learned that machine learning consists of several classification and clustering algorithms. On the surface, machine learning appears to be a collection of classification and clustering algorithms. Among these algorithms, one called logistic regression classification was introduced.

In sections 9-12, the focus is on the classifier model—logistic regression, using the softmax function as the classification function.

What is the softmax function?

softmax

This image illustrates what a softmax function is. For each number z in the original sequence, we calculate exp(z), and the proportion of each new number’s magnitude becomes the softmax probability for that number.

Properties

If the inputs are scaled up proportionally, the classifier’s results become more polarized and confident. If the inputs are scaled down proportionally, the classifier’s results tend toward the average and lack confidence.

Algorithm

import numpy as np
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    expList = [np.exp(i) for i in x]
    expSum = sum(expList)
    x = [i/expSum for i in expList]
    return np.array(x)

Sections 13-14 mainly discuss One-Hot encoding. After the softmax function provides a sequence of probability values, how do we determine the classification? For example, a sequence where the highest probability is 1 and others are 0 is called One-Hot encoding. This type of encoding has already determined the classification.

Cross-entropy

Sections 15-16 cover cross-entropy. Softmax can calculate a probability sequence, and OneHot is a determined classification. So how do we calculate the distance from a probability sequence to a specific classification? We use cross-entropy to measure this distance.

Sections 17-20 explain how to use this classifier. Section 18 specifically discusses why special initial data is needed.

sum = 1000000000

for i in range(1000000):
    sum += 0.000001

sum -= 1000000000
print(sum)

The result of running this code is not 1. If we change sum to a very small number, like 1, instead of 1000000000, we find that the error becomes smaller. Based on this reason, we want our initial data to always have a mean of 0 and consistent variance in all directions. For example, for a grayscale image with pixel values from 0-255, we need to subtract 128 and then divide by 128, so that each number is between -1 and 1. Such initial data is more suitable for training.

This way, we can proceed with training. Reviewing the video content: xi is the training data matrix, w is a random weight matrix. For performance reasons, random values are taken from a normal distribution with an axis of 0 and very small variance. Then we calculate the probability sequence and the distance to the target. Then we compute the average distance to all targets. Our goal is to make this distance smaller, so we optimize the weight matrix along the direction of gradient descent while optimizing the intercept b. We repeat this process continuously until we reach a local optimum.

Installing Docker

https://www.docker-cn.com/community-edition#/download

Configure the official Chinese mirror.

Installing Jupyter Notebook

$ pip3 install jupyter $ jupyter notebook You can now use the jupyter notebook command to open a Jupyter editor.

Setting up TensorFlow environment

$ docker run -it -p 8888:8888 tensorflow/tensorflow Running this command will automatically download the TensorFlow image, provided that the repository mirror is set to a Chinese mirror; otherwise, the download will be very slow. After running the command, you’ll be prompted to open a webpage. When you open this URL, you’ll see the TensorFlow Jupyter editing environment, assuming Jupyter Notebook is installed correctly.

Mounting Docker’s file directory We need to import the official assignments. Close the container, reopen it, and use -v host_directory:container_directory for mounting. docker run -v /Users/hahaha/tensorflow/:/notebooks -it -p 8888:8888 tensorflow/tensorflow

Where /Users/hahaha/tensorflow/ is a folder on my Mac, and notebooks is the default Jupyter editing directory in TensorFlow.

Paste the first assignment file, 1_notmnist.ipynb, into the mounted directory on the host. This file can be found here: 1_notmnist.ipynb

Assignment One Content

Assignment Code Segment One

First, run the import statements in the first code segment. There should be no errors. If you see red error output, it means these imports were not successful.

# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
# print function
import matplotlib.pyplot as plt
# plotting tool
import numpy as np
# matrix calculations
import os
# file paths
import sys
# file output
import tarfile
# decompression
from IPython.display import display, Image
# display images
from scipy import ndimage
# image processing
from sklearn.linear_model import LogisticRegression
# logistic regression module for linear models
from six.moves.urllib.request import urlretrieve
# url handling
from six.moves import cPickle as pickle
# data processing

# Config the matplotlib backend as plotting inline in IPython
%matplotlib inline
# matplotlib is the most famous Python chart plotting extension library,
# it supports outputting various formats of graphical images, and can use various GUI interface libraries to display charts interactively.
# Using the %matplotlib command can embed matplotlib charts directly into the Notebook,
# or display charts using a specified interface library, it has a parameter specifying how matplotlib charts are displayed.
# inline indicates embedding charts in the Notebook.

Assignment Code Segment Two

Next is the second code segment, which will download letter sets for training and testing, approximately 300MB in size. After successful download, you can see these two files in the mounted directory.

Assignment

url = 'https://commondatastorage.googleapis.com/books1000/'
last_percent_reported = None
data_root = '.' # Change me to store data elsewhere

def download_progress_hook(count, blockSize, totalSize):
  """A hook to report the progress of a download. This is mostly intended for users with
  slow internet connections. Reports every 5% change in download progress.
  """
# Hook function to display download progress in real-time
  global last_percent_reported
  percent = int(count * blockSize * 100 / totalSize)

  if last_percent_reported != percent:
    if percent % 5 == 0:
      sys.stdout.write("%s%%" % percent)
      sys.stdout.flush()
    else:
      sys.stdout.write(".")
      sys.stdout.flush()
      
    last_percent_reported = percent
        
def maybe_download(filename, expected_bytes, force=False):
  """Download a file if not present, and make sure it's the right size."""
  dest_filename = os.path.join(data_root, filename)
#   data_root is the current directory, add the filename to it, set as the location to save the file
  if force or not os.path.exists(dest_filename):
#         force is to force download, ignoring already downloaded files
    print('Attempting to download:', filename) 
    filename, _ = urlretrieve(url + filename, dest_filename, reporthook=download_progress_hook)
#     Use urlretrieve to download the file, with the hook attached
    print('\nDownload Complete!')
  statinfo = os.stat(dest_filename)
# Get information about the downloaded file
  if statinfo.st_size == expected_bytes:
#         Correct size
    print('Found and verified', dest_filename)
  else:
#     Wrong size, prompt user to use a browser to download
    raise Exception(
      'Failed to verify ' + dest_filename + '. Can you get to it with a browser?')
  return dest_filename

train_filename = maybe_download('notMNIST_large.tar.gz', 247336696)
test_filename = maybe_download('notMNIST_small.tar.gz', 8458043)

Assignment Code Segment Three

Extracting use cases

num_classes = 10
# Total number of digits
np.random.seed(133)
# Initialize random seed
def maybe_extract(filename, force=False):
#     Assuming already extracted
  root = os.path.splitext(os.path.splitext(filename)[0])[0]  # remove .tar.gz
#     splitext(filename)[0] removes one suffix, used twice to remove both suffixes, i.e., remove the .tar.gz suffix
  if os.path.isdir(root) and not force:
    # You may override by setting force=True.
#     If already extracted, don't extract again
    print('%s already present - Skipping extraction of %s.' % (root, filename))
  else:
    print('Extracting data for %s. This may take a while. Please wait.' % root)
    tar = tarfile.open(filename)
    sys.stdout.flush()
    tar.extractall(data_root)
    tar.close()
#     Extract to the current directory
  data_folders = [
    os.path.join(root, d) for d in sorted(os.listdir(root))
    if os.path.isdir(os.path.join(root, d))]
  if len(data_folders) != num_classes:
    raise Exception(
      'Expected %d folders, one per class. Found %d instead.' % (
        num_classes, len(data_folders)))
  print(data_folders)
# Check if the number of extracted directories matches expectations, and print the extracted directories
  return data_folders
  
train_folders = maybe_extract(train_filename)
test_folders = maybe_extract(test_filename)

Question One

Write code to display information about the extracted file contents

Reference answer

import random
import matplotlib.image as mpimg


def plot_samples(data_folders, sample_size, title=None):
    fig = plt.figure()
#     Create empty figure
    if title: fig.suptitle(title, fontsize=16, fontweight='bold')
#         Add title
    for folder in data_folders:
#         Loop through each letter
        image_files = os.listdir(folder)
        image_sample = random.sample(image_files, sample_size)
#         Randomly select a certain number of images from that letter
        for image in image_sample:
            image_file = os.path.join(folder, image)
            ax = fig.add_subplot(len(data_folders), sample_size, sample_size * data_folders.index(folder) +
                                 image_sample.index(image) + 1)
#             Create a subplot
            image = mpimg.imread(image_file)
#     Load subplot image
            ax.imshow(image)
#     Display subplot image
            ax.set_axis_off() 
#     Turn off subplot coordinate lines

    fig.set_size_inches(18.5, 10.5)
#     Set the display size of the image
    plt.show()


plot_samples(train_folders, 20, 'Train')
plot_samples(test_folders, 20, 'Test')

Running results:

As we can see, some of the training data has issues.


## Assignment Code Segment Four

After this, we need to normalize the data, which means transforming each image pixel from 0~255 to -1.0~1.0, and persisting it to a file.

image_size = 28 # Pixel width and height. pixel_depth = 255.0 # Number of levels per pixel.

Image width, height and pixel depth

def load_letter(folder, min_num_images): “““Load the data for a single letter label.”””

Process files in a folder belonging to one letter

image_files = os.listdir(folder)

List all files in that directory

dataset = np.ndarray(shape=(len(image_files), image_size, image_size), dtype=np.float32)

Create a dataset with length equal to number of files, width and height of 28

print(folder)

Print directory

num_images = 0

Initialize num_images

for image in image_files:

Process each file

image_file = os.path.join(folder, image)

Get complete file path

try:
  image_data = (ndimage.imread(image_file).astype(float) - 
                pixel_depth / 2) / pixel_depth

Read in the image and normalize it

  if image_data.shape != (image_size, image_size):

Check image width and height

    raise Exception('Unexpected image shape: %s' % str(image_data.shape))
  dataset[num_images, :, :] = image_data

Read into the dataset

  num_images = num_images + 1

Increment image number

except IOError as e:

If file can’t be read, skip it

  print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')

dataset = dataset[0:num_images, :, :]

If fewer files were read than the minimum required

if num_images < min_num_images: raise Exception(‘Many fewer images than expected: %d < %d’ % (num_images, min_num_images))

Display number of missing files

print(‘Full dataset tensor:’, dataset.shape)

Display file count, image width and height

print(‘Mean:’, np.mean(dataset))

Mean value

print(‘Standard deviation:’, np.std(dataset))

Standard deviation

return dataset

def maybe_pickle(data_folders, min_num_images_per_class, force=False): dataset_names = [] for folder in data_folders:

Process each letter folder

set_filename = folder + '.pickle'

Set output file

dataset_names.append(set_filename)

Set processed folders

if os.path.exists(set_filename) and not force:
  # You may override by setting force=True.

Check if processed file already exists

  print('%s already present - Skipping pickling.' % set_filename)
else:
  print('Pickling %s.' % set_filename)
  dataset = load_letter(folder, min_num_images_per_class)

Normalize all images in this folder

  try:
    with open(set_filename, 'wb') as f:
      pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)

Persist data, save to disk instead of keeping in memory

  except Exception as e:
    print('Unable to save data to', set_filename, ':', e)

return dataset_names

train_datasets = maybe_pickle(train_folders, 45000) test_datasets = maybe_pickle(test_folders, 1800)


## Question Two

Display processed images
- Reference answer

def plot_samples_2(data_folders, sample_size, title=None): fig = plt.figure()

Create empty figure

if title: fig.suptitle(title, fontsize=16, fontweight='bold')

Add title

for folder in data_folders:

Loop through each letter

    with open(folder, 'rb') as pk_f:
        data = pickle.load(pk_f)
        for index, image in enumerate(data):
            if index < sample_size :

Randomly select a certain number of images from that letter

                ax = fig.add_subplot(len(data_folders), sample_size, sample_size * data_folders.index(folder) +
                             index + 1)

Load subplot image

                ax.imshow(image)

Display subplot image

                ax.set_axis_off()

Turn off subplot coordinate lines

fig.set_size_inches(18.5, 10.5)

Set the display size of the image

plt.show()

plot_samples_2(train_datasets, 20, ‘Train’) plot_samples_2(test_datasets, 20, ‘Test’)



![image.png](http://upload-images.jianshu.io/upload_images/4388248-e3406390a28cd9b0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


![image.png](http://upload-images.jianshu.io/upload_images/4388248-135416c384df602a.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

## Question Three 
Check if the number of files under each letter is similar.

- Reference answer

file_path = ’notMNIST_large/{0}.pickle’ for ele in ‘ABCDEFJHIJ’: with open(file_path.format(ele), ‘rb’) as pk_f:

Loop through each directory

    dat = pickle.load(pk_f)

Load the persisted file in this directory

print('number of pictures in {}.pickle = '.format(ele), dat.shape[0])

Print relevant information

Results show that the numbers are basically consistent.
![Question 3 Result](http://upload-images.jianshu.io/upload_images/4388248-dbeceed47af0c6d8.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

## Code Segment—Data Splitting
Data cannot be loaded all at once into memory. This code segment splits the data.

def make_arrays(nb_rows, img_size): if nb_rows: dataset = np.ndarray((nb_rows, img_size, img_size), dtype=np.float32)

Create an empty set, data type is a matrix with rows length, img_size width, img_size height, data type is 32-bit float

labels = np.ndarray(nb_rows, dtype=np.int32)

Create a label, data type is 32-bit integer, length is rows

else: dataset, labels = None, None return dataset, labels

Return created data type

def merge_datasets(pickle_files, train_size, valid_size=0): num_classes = len(pickle_files)

Number of categories to process

valid_dataset, valid_labels = make_arrays(valid_size, image_size)

Build validation dataset, length is validation length

train_dataset, train_labels = make_arrays(train_size, image_size)

Build training dataset, length is training length

vsize_per_class = valid_size // num_classes tsize_per_class = train_size // num_classes

Calculate average length for each category with given training and validation lengths

start_v, start_t = 0, 0

Initialize indices, start_v is the start of validation data, start_t is the start of training data

end_v, end_t = vsize_per_class, tsize_per_class

Initialize indices, end_v is the end of validation data, end_t is the end of training data

end_l = vsize_per_class + tsize_per_class

Initialize indices, end_l is the end of the letter set, equal to length of validation data for each category + length of training data

for label, pickle_file in enumerate(pickle_files):

Loop through each pickle_file

try:
  with open(pickle_file, 'rb') as f:

Open this persistence file

    letter_set = pickle.load(f)

Load dataset

    # let's shuffle the letters to have random validation and training set
    np.random.shuffle(letter_set)

Shuffle the dataset

    if valid_dataset is not None:

If not a test set, update the test set, otherwise valid_dataset is not updated

      valid_letter = letter_set[:vsize_per_class, :, :]

numpy slicing http://brieflyx.me/2015/python-module/numpy-array-split/

Select data of ‘valid data per class’ count from shuffled data for processing, put into valid_letter

      valid_dataset[start_v:end_v, :, :] = valid_letter

Put this data into valid_dataset

      valid_labels[start_v:end_v] = label

Mark label should be one of 0~9

      start_v += vsize_per_class
      end_v += vsize_per_class

Update indices

At the end of the loop, valid_dataset should be data with total length valid_size, valid_labels is the label at the corresponding position

    train_letter = letter_set[vsize_per_class:end_l, :, :]

Other random elements except valid part, length is end_l - vsize_per_class = tsize_per_class

    train_dataset[start_t:end_t, :, :] = train_letter

At the end of the loop, train_dataset should be data with total length train_size

    train_labels[start_t:end_t] = label
    start_t += tsize_per_class
    end_t += tsize_per_class

Update indices

except Exception as e:
  print('Unable to process data from', pickle_file, ':', e)
  raise

return valid_dataset, valid_labels, train_dataset, train_labels

train_size = 200000 valid_size = 10000 test_size = 10000

valid_dataset, valid_labels, train_dataset, train_labels = merge_datasets( train_datasets, train_size, valid_size) _, _, test_dataset, test_labels = merge_datasets(test_datasets, test_size)

print(‘Training:’, train_dataset.shape, train_labels.shape) print(‘Validation:’, valid_dataset.shape, valid_labels.shape) print(‘Testing:’, test_dataset.shape, test_labels.shape)


## Code Segment—Shuffling Data
Introduction to the permutation function: http://www.jianshu.com/p/f0eb10acaa2d

def randomize(dataset, labels):

labels.shape[0] is the length of labels

permutation = np.random.permutation(labels.shape[0])

Randomly select a shuffled set of this many numbers

print(labels.shape[0]) shuffled_dataset = dataset[permutation,:,:]

Shuffle data

shuffled_labels = labels[permutation]

Shuffle labels

return shuffled_dataset, shuffled_labels train_dataset, train_labels = randomize(train_dataset, train_labels) test_dataset, test_labels = randomize(test_dataset, test_labels) valid_dataset, valid_labels = randomize(valid_dataset, valid_labels)


## Question Four
Verify if the shuffled data is correct

- Reference answer

import random def plot_sample_3(dataset, labels, title): fig = plt.figure() plt.suptitle(title, fontsize=16, fontweight=‘bold’)

Set title style

items = random.sample(range(len(labels)), 200)

Shuffle the sequential sequence of labels length

for i, item in enumerate(items):

Randomly pick one

    plt.subplot(10, 20, i + 1)

Draw subplot

    plt.axis('off')

Turn off coordinate axes

    plt.title(chr(ord('A') + labels[item]))

Add title

    plt.imshow(dataset[item])

Display subplot at corresponding position

fig.set_size_inches(18.5, 10.5)
plt.show()

Display image

plot_sample_3(train_dataset, train_labels, ’train dataset suffled’) plot_sample_3(valid_dataset, valid_labels, ‘valid dataset suffled’) plot_sample_3(test_dataset, test_labels, ’test dataset suffled’)


![Question 4](http://upload-images.jianshu.io/upload_images/4388248-c33532945864acd9.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

Similar two figures omitted

## Code Segment—Saving Data

pickle_file = os.path.join(data_root, ’notMNIST.pickle’)

Output file path

try: f = open(pickle_file, ‘wb’)

Open this file

save = { ’train_dataset’: train_dataset, ’train_labels’: train_labels, ‘valid_dataset’: valid_dataset, ‘valid_labels’: valid_labels, ’test_dataset’: test_dataset, ’test_labels’: test_labels, }

Write a dictionary string-ndarray

pickle.dump(save, f, pickle.HIGHEST_PROTOCOL) f.close() except Exception as e: print(‘Unable to save data to’, pickle_file, ‘:’, e) raise


## Code Segment—Displaying Saved Data Size

statinfo = os.stat(pickle_file) print(‘Compressed pickle size:’, statinfo.st_size)


## Question Five
Google translation of the question:

By construction, this dataset may contain a lot of overlapping samples, including in the validation and test sets. Overlap between training and test can skew the results if you expect to use your model in an environment where there is never an overlap, but in practice this doesn't usually matter. Measure how much overlap there is between training, validation, and test samples.
Optional question:
What about the duplicates between datasets? (For instance, the same letter images)
Create a sanitized validation and test set, and compare your accuracy on those versus your accuracy on the original sets.

The basic idea is that training data should not overlap with testing data, otherwise it leads to inaccurate accuracy.

Reference code:
- Just check the number of duplicate images

import hashlib

pickle_file = os.path.join(’.’, ’notMNIST.pickle’) try: with open(pickle_file, ‘rb’) as f: data = pickle.load(f) except Exception as e: print(‘Unable to open data from’, pickle_file, ‘:’, e) raise

After saving the data, if the kernel crashed, you can read directly from local without rerunning previous code

If there’s an error, you can search for the exception online

def calcOverlap(sourceSet, targetSet, description): sourceSetMd5 = np.array([hashlib.md5(img).hexdigest() for img in sourceSet])

Build an md5 table

targetSetMd5 = np.array([hashlib.md5(img).hexdigest() for img in targetSet])

Build an md5 table

overlap = np.intersect1d(sourceSetMd5, targetSetMd5, assume_unique=False)

Deduplicate

print(description)
print("overlap",overlap.shape[0], "from",sourceSetMd5.shape[0],"to", targetSetMd5.shape[0])
print("rate",overlap.shape[0]*100.0/sourceSetMd5.shape[0],"% and", overlap.shape[0]*100.0/targetSetMd5.shape[0],"%")

Print overlap count

calcOverlap(data[’train_dataset’], data[‘valid_dataset’], “train_dataset & valid_dataset”) calcOverlap(data[’train_dataset’], data[’test_dataset’], “train_dataset & test_dataset”) calcOverlap(data[’test_dataset’], data[‘valid_dataset’], “test_dataset & valid_dataset”)


![Running result](http://upload-images.jianshu.io/upload_images/4388248-2882159fe68dc672.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

- Remove duplicate image resources
To be updated

## Question Six
Use logistic regression to train the model and test it

- Reference code

import random def disp_sample_dataset(dataset, labels,trueLabels, title=None):

Display training results

fig = plt.figure()
if title: fig.suptitle(title, fontsize=16, fontweight='bold')

Set title style

items = random.sample(range(len(labels)), 200)

Randomly select a series of images

for i, item in enumerate(items):
    plt.subplot(10, 20, i + 1)

Set a subplot

    plt.axis('off')

Turn off coordinate lines

    lab = str(chr(ord('A') + labels[item]))
    trueLab = str(chr(ord('A') + trueLabels[item]))
    if lab == trueLab:
        plt.title( lab )
    else:
        plt.title(lab + " but " + trueLab)

Add title

    plt.imshow(dataset[item])

Display this image

fig.set_size_inches(18.5, 10.5)
plt.show()

def train_and_predict(train_dataset, train_labels, test_dataset, test_labels ,sample_size): regr = LogisticRegression()

Generate trainer

X_train = train_dataset[:sample_size].reshape(sample_size, 784)

Choose amount of data to train based on sample_size

Compress 2D vector to 1D vector

y_train = train_labels[:sample_size]

Extract training data

regr.fit(X_train, y_train)

Train data

X_test = test_dataset.reshape(test_dataset.shape[0], 28 * 28)

Compress test data to 1D vector

y_test = test_labels

True labels corresponding to test data

pred_labels = regr.predict(X_test)

Generate prediction data

print('Accuracy:', regr.score(X_test, y_test), 'when sample_size=', sample_size)
disp_sample_dataset(test_dataset, pred_labels, test_labels, 'sample_size=' + str(sample_size))

train_and_predict(data[’train_dataset’],data[’train_labels’],data[’test_dataset’],data[’test_labels’], 1000)


![image.png](http://upload-images.jianshu.io/upload_images/4388248-6b3fb8a1d1b1ce34.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


## Model Performance

Sections 22~27 discuss model performance-related knowledge. We usually hope that the model's performance can reach 100%, which is obviously impossible. Also, in order to improve the accuracy of the training set, the model may overfit. At this point, we should follow two points:
- Don't use all training data at once, but use it in blocks, train a portion each time
- When model parameter changes cause 30 or more cases to change from error to correct, then this parameter change is effective.


![Model Performance](http://upload-images.jianshu.io/upload_images/4388248-033910ba1d5c09e3.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

## Stochastic Gradient Descent
Sections 29~31 explain what stochastic gradient descent is.
During training, to make the model move in the optimal direction, we need to calculate the derivative at that point. 1. The calculation of derivatives is quite large, so we need to randomly select a subset of samples to calculate derivatives, to substitute for the real derivative. This is stochastic gradient descent. 2. To reduce the randomness of random selection, we use momentum inertia to reduce randomness. 3. To make the model stable in later stages, we reduce the learning step size.

End of Course One


> Reference for assignment code
> http://www.hankcs.com/ml/notmnist.html