This page provides a short deep learning tutorial for beginners in this area.
1 Introduction to Deep Learning
Deep learning is a branch of machine learning techniques that utilizes artificial neural networks to approximate a desired function based on the provided training data. It is particularly suitable for complicated tasks such as natural language processing (NLP), computer vision (CV), autonomous driving because the algorithms for these applications are difficult to design by hand. Deep learning can automatically learn algorithms from extensive number of real-world training data, making it an ideal approach for tackling such difficult tasks.
2 A Typical Deep Learning Procedure
This section takes the MNIST handwritten digits classification problem as an example to describe a typical deep learning procedure. The Keras library is used for its simplicity. Note that this is a supervised classification task.
2.1 Load Dataset
Import used Python libraries.
import numpy as np from tensorflow import keras from tensorflow.keras import layers
Load the training and test dataset.
y_train are training data (images) and labels used to update neural network parameters.
y_test are test data and labels used to evaluate the neural network performance.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Scale images to the [0, 1] range.
x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255
Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1) x_test = np.expand_dims(x_test, -1) print("x_train shape:", x_train.shape) print(x_train.shape, "train samples") print(x_test.shape, "test samples")
One-hot encoding, i.e., converting labels to binary class vectors.
y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes)
2.2 Build the Model
Next a neural network is built before the training can be carried out. There are various types of neural networks available, such as recurrent neural network (RNN), convolutional neural network (CNN), graph neural network (GNN), transformer, etc. These neural network architectures/designs exhibit different inductive biases, i.e., the prior knowledge/assumption to the data. For example, the CNNs incorporates the assumption of translation invariance, and are therefore primarily used to process images and spatial data. While RNNs exhibits an inductive bias for capturing temporal dependencies in the inputs thus are widely used to process sequential data.
The loaded MNIST handwritten digits are gray-scale images, thus a simple CNN is built for processing.
num_classes = 10 input_shape = (28, 28, 1) model = keras.Sequential( [ keras.Input(shape=input_shape), layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Flatten(), layers.Dropout(0.5), layers.Dense(num_classes, activation="softmax"), ] )
Next we will do the training. Keras is a highly encapsulated library and enables us to train the model with several lines of codes. The internal initialization, forward/backward propagation, parameter updating procedures are all encapsulated in the
batch_size = 128 epochs = 15 model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
Then we can evaluate the model once the training is complete. A commonly used evaluation metric for classification problems is the overall accuracy.
score = model.evaluate(x_test, y_test, verbose=0) print("Test loss:", score) print("Test accuracy:", score)
3 Development Tools
This section introduces useful development tools, frameworks for AI research. The Python and MATLAB languages are recommended for beginners.
Python is currently the most popular programming language for AI researchers.
3.1.1 TensorFlow and PyTorch
Two of the most popular deep learning frameworks, Tensorflow and PyTorch, are developed by Google and Facebook respectively. The best way to learn them is to follow the official tutorials, the links to which are listed below:
- TensorFlow: Tutorials TensorFlow Core
- PyTorch: Welcome to PyTorch Tutorials — PyTorch Tutorials 2.0.1+cu117 documentation
A special note is the TensorFlow has integrated Keras as a high-level API (they exist separately in the past). Keras offers a high level of encapsulation, making it especially suitable for beginners with limited knowledge in deep learning.
NumPy is another popular Python package used for scientific/numerical computing. It is often used to preprocess data before using Tensorflow/PyTorch to build and train a neural network. Beginners with experience in MATLAB should become familiar with NumPy quickly.
Anaconda simplified the package management process in Python. In short, the packages such as Tensorflow/PyTorch/NumPy can be installed using the anaconda platform. Anaconda can help create isolated environments and check the version of the package to be installed to avoid conflicts between different library versions.
- Download Anaconda: Free Download Anaconda
3.1.4 IDE (Integrated Development Environment)
PyCharm and VS Code are highly recommended IDEs for Python. MATLAB-familiar users can also try Spyder.
The MATLAB Deep Learning Toolbox provides a framework for designing and implementing neural networks. The author’s personal feeling is that MATLAB’s deep learning APIs are closer to Keras and are beginner friendly as well. MATLAB provides detailed official documentation and examples to get users familiar with deep learning faster.
As MATLAB requires a paid license, the public resources for MATLAB are limited compared to Python-based deep learning framework. Researchers who rely heavily on other MATLAB toolboxes, such as wireless researchers, may prefer the MATLAB Deep Learning Toolbox.
4 Learning Resources
- CS231n: Convolutional Neural Networks for Visual Recognition at Stanford University link
- Book: Dive into Deep Learning — Dive into Deep Learning 1.0.0-beta0 documentation (d2l.ai)