Machine Learning
Machine Learning is the process where a machine learns to identify features from an existing dataset so that it can accurately identify features in new data.
This definition isn’t strictly formal, and there’s no need for it to be. The term machine simply refers to a computer, or programming code. The dataset used for learning is known as a training set. If we compare machine learning to a student studying for exams, the analogy would be:
- Machine: Student
- Training Set: Past exam papers
- Features: Exam patterns
- New Data: Actual exam questions
- Learning: The process of solving past papers to prepare for actual exam questions
As of 2021, the most commonly used method for implementing machine learning is deep learning, which involves increasing the number of hidden layers in artificial neural networks. Artificial neural networks have significantly improved in performance recently and often deliver the best results. Before deep learning showed satisfactory performance, the mainstream of machine learning was models based on statistical theories.
To become proficient in machine learning, one must have knowledge in mathematics, statistics, and programming. Mathematical and statistical knowledge is required to understand the theories, and programming skills are necessary to implement these theories. In-depth study of machine learning theories requires not only knowledge of matrix and linear algebra but also measure theory, functional analysis, and more. Recently, research combining artificial intelligence with geometry, graph theory, partial differential equations, and other fields is also underway.1
The following articles are written to be as accessible as possible for mathematics majors.
Basics
Learning Concepts
- Supervised and Unsupervised Learning
- Training/Validation/Test Sets
- Online Learning, Batch Learning, Mini-batch Learning
Optimization
- Loss Functions
- Gradient Descent, Stochastic Gradient Descent (SGD)
- 🔒 (07/24/17) Graduate Student Descent
- 🔒 (07/13/24) Grid Search, Brute Force
- What is the Monte Carlo Method?
Classical Machine Learning
- Positive Definite Kernels and Reproducing Kernel Hilbert Space (k, H_k)
- Proof of the Representation Theorem
Linear Regression Models
- Linear Regression Models
- Gradient Descent
- Least Squares Method
- Maximum Likelihood
- Support Vector Machines
Linear Classification Models
- Linear Classification Models
- Least Squares Method
- Fisher’s Linear Discriminant
- Neyman-Pearson Criterion for Binary Classification
- Bayes Risk Classifier
Clustering
Sampling
- Monte Carlo Integration
- Rejection Sampling
- Importance Sampling
- Markov Chain Monte Carlo (MCMC)
Reinforcement Learning
- 🔒Basic Mathematics for Reinforcement Learning
- 🔒What is Reinforcement Learning
- 🔒Multi-Armed Bandit Problem
- 🔒Markov (Reward) Process
- 🔒Markov Decision Process
Deep Learning
Cheat Sheet: Equivalent Code in Flux, PyTorch, TensorFlow
Theory
- Weights
- Layers
- Linear Layers
- Convolutional Layers
- Skip Connections
- Activation Functions
- What is an Artificial Neural Network (ANN)?
- Definition of Perceptron
- What is Deep Learning?
- Mathematical Foundation of Deep Learning, Proof of the Universal Approximation Theorem
- Continual Learning in Deep Learning
- What is Computer Vision?
- Boltzmann Machine
- Restricted Boltzmann Machine
- Batch Learning Algorithm
- Online Learning Algorithm
- RBM for Classification
- Radial Basis Function
Regularization Techniques
- Overfitting and Regularization
- Dropout
- Paper Review: Do We Need Zero Training Loss After Achieving Zero Training Error?
Various Neural Networks
- [MLP (Multilayer Perceptron)]
- [CNN (Convolutional Neural Network)]
- PINN (Physics-Informed Neural Networks) Paper Review
- [U-net Paper Review]
- [Implementation in Julia]
PyTorch
General
- Checking the Device a Model/Tensor is On
.get_device()
- Random Sampling from a Given Distribution
torch.distributions.Distribution().sample()
- Creating and Using Custom Datasets with Numpy Arrays
TensorDataset
,DataLoader
- Saving and Loading Weights, Models, Optimizers
torch.save(model.state_dict())
Neural Networks
- Implementing a Multi-Layer Perceptron
- Defining Neural Networks with Lists and Loops
nn.ModuleList
- Accessing Model Weights
.weight.data
,.bias.data
- Weight Initialization
torch.nn.init
Tensors
- Modular Arithmetic
fmod
,remainder
- Handling Dimensions and Sizes
.dim()
,.ndim
,.view()
,.reshape()
,.shape
,.size()
- Creating Random Permutations and Shuffling Tensors
torch.randperm
,tensor[indices]
- Deep Copying
.clone()
- Concatenating or Stacking
torch.cat()
,torch.stack()
- Padding
torch.nn.functional.pad()
- Sorting
torch.sort()
,torch.argsort()
Errors
- Solving ‘RuntimeError: grad can be implicitly created only for scalar outputs’
- Fixing ‘TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first’ for Lists
- Addressing ‘RuntimeError: Boolean value of Tensor with more than one value is ambiguous’
- Resolving ‘RuntimeError: Parent directory does not exists’ When Saving Models
Julia
- Deep Learning Frameworks in Julia
Flux.jl
,Knet.jl
,Lux.jl
- Using Machine Learning Datasets with
MLDatasets.jl
Flux
- Handling Hidden Layers
- Implementing MLP and Optimizing with Gradient Descent
- One-Hot Encoding with
onehot()
,onebatch()
,onecold()
- Implementing MLP and Approximating Non-linear Functions
- Implementing MLP and Training with MNIST
- 🔒(08/12/24) Using GPUs
- Setting Training and Testing Modes
trainmode!
,testmode!
- Getting Auto Differentiation of Neural Networks
References
- Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006)
- Simon Haykin, Neural Networks and Learning Machines (3rd Edition, 2009)
- Trevor Hastie, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Edition, 2017)
- 오일석, 기계 학습(MACHINE LEARNING) (2017)
- Richard S. Sutton, Reinforcement Learning: An Introduction (2nd Edition, 2018)
All posts
- How to Use GPU in Julia Flux
- Using AdaBelief Optimizer in PyTorch
- What is Skip Connection in Artificial Neural Networks?
- What are Weights in Machine Learning?
- Difference Between torch.nn and torch.nn.functional in PyTorch
- How to Set Training and Testing Modes for Neural Networks in Julia Flux
- Confusion Matrix, Sensitivity, and Specificity
- Cross-validation
- Drawing ROC Curves in R
- Finding the optimal cutoff using ROC curves
- Comparing Models Using the AUC of ROC Curves
- What is an Artificial Neural Network?
- Loss Functions in Machine Learning
- Gradient Descent and Stochastic Gradient Descent in Machine Learning
- What is Deep Learning?
- Activation Functions in Deep Learning
- Softmax Function in Deep Learning
- Dropout in Deep Learning
- Supervised and Unsupervised Learning
- k-Means Clustering
- What is Overfitting and Regularization in Machine Learning?
- Commonly Used Datasets in Machine Learning
- Paper Review: Do We Need Zero Training Loss After Achieving Zero Training Error?
- Continuous Learning in Deep Learning
- What is Computer Vision
- Perceptron Definition
- What is a Sigmoid Function?
- What is a Logistic Function?
- Linear Models for Regression in Machine Learning
- What is a Discriminant Function?
- What is a Sigmoid Function?
- Mathematical Foundations of Deep Learning, Proof of the Universal Approximation Theorem
- What is Reinforcement Learning in Machine Learning
- Perceptron Convergence Theorem
- Back Propagation Algorithm
- PyTorch RuntimeError: "grad can be implicitly created only for scalar outputs" Solution
- How to Implement MLP in PyTorch
- Initializing Weights in PyTorch
- Creating and Using Custom Datasets from Numpy Arrays in PyTorch
- Saving and Loading Weights, Models, and Optimizers in PyTorch
- Creating Random Permutations and Shuffling Tensor Order in PyTorch
- How to Define Artificial Neural Network Layers with Lists and Loops in PyTorch
- How to Deep Copy Tensors in PyTorch
- How to Obtain the Weight Values of a Model in PyTorch
- How to Concatenate or Stack Tensors in PyTorch
- Using Machine Learning Datasets in Julia
- How to Pad PyTorch Tensors
- Handling the Dimensions and Sizes of PyTorch Tensors
- Handling Hidden Layers in Julia Flux
- Implementing MLP in Julia Flux and Optimizing with Gradient Descent
- How to Perform One-Hot Encoding in Julia Flux
- Implementing MLP in Julia Flux and Learning with MNIST
- Resolving 'TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.' with Lists in PyTorch
- Implementing MLP in Julia Flux to Approximate Nonlinear Functions
- Hierarchical Clustering
- Gradient Descent Learning of Linear Regression Models in Machine Learning
- Paper Review: Physics-Informed Neural Networks
- What is ReLU in Machine Learning?
- Training/Validation/Test Sets in Machine Learning
- What is a Softplus Function?
- How to Check the Device on which the PyTorch Model/Tensor is loaded
- Support Vector Machine
- Definite Kernel and Reproducing Kernel Hilbert Space in Machine Learning
- What is One-Hot Encoding in Machine Learning?
- Proof of the Representation Theorem
- Various Deep Learning Frameworks of Julia
- MNIST Database
- Iris Dataset
- What is a Layer in Deep Learning?
- Sampling Randomly from a Given Distribution in PyTorch
- Solving 'RuntimeError: Boolean value of Tensor with more than one value is ambiguous' Error in PyTorch
- Functions for Tensor Sorting in PyTorch
- Flux-PyTorch-TensorFlow Cheat Sheet
- Solutions to 'RuntimeError: Parent directory does not exist' Error When Saving Models in PyTorch
- Monte Carlo Integration
- Importance Sampling
- Rejection Sampling
- Monte Carlo Method
- What is Data Augmentation?
- Online Learning vs. Batch Learning in Machine Learning
- Modular Arithmetic in PyTorch
- Momentum Method in Gradient Descent
- Adaptive Learning Rates: AdaGrad, RMSProp, Adam
- How to Define and Train MLP with the Sequence Model and Functional API in TensorFlow and Keras