L1&L2 Regularization

Machine Learning

Machine Learning Series

Learning Curves for Machine Learning

What To Optimize for? Loss Function Cheat Sheet

漫谈 Clustering (3): Gaussian Mixture Model

The area under the ROC curve

Handling imbalanced datasets in machine learning

Understanding the Bias-Variance Tradeoff

Estimators, Loss Functions, Optimizers —Core of ML Algorithms

Deep learning – Information theory & Maximum likelihood



逻辑回归(Logistic Regression)

Understanding binary cross-entropy / log loss: a visual explanation



Probabilistic Programming and Bayesian Methods for Hackers

A Principled Bayesian Workflow

Hierarchical Bayesian Neural Networks with Informative Priors

Dirichlet process mixtures for density estimation


Dimension Reduction

How to cross-validate PCA, clustering, and matrix decomposition models

Understanding Dimension Reduction



Deep Learning


How the backpropagation algorithm works

Implementing a Neural Network from Scratch in Python

CS231n Notes

The Matrix Calculus You Need For Deep Learning

The mostly complete chart of Neural Networks, explained

The Curse of Dimensionality and the Autoencoder

Neural Networks Backward Propagation


RNNs in Tensorflow

Understanding LSTM and its diagrams

Vanilla LSTM with numpy

The Unreasonable Effectiveness of Recurrent Neural Networks

Recurrent Neural Networks Tutorial

Understanding LSTM Networks

Attention and Augmented Recurrent Neural Networks

A Deep Dive into Recurrent Neural Nets

When recurrent models don’t need to be recurrent

Deriving LSTM Gradient for Backpropagation

Building RNN(LSTM cell) from scratch

Illustrated Guide to Recurrent Neural Networks: Understanding the Intuition

Illustrated Guide to LSTM’s and GRU’s: A step by step explanation

Non-Zero Initial States for Recurrent Neural Networks

A review of Dropout as applied to RNNs



Preprocessing for deep learning



ResNet, AlexNet, VGG, Inception: 理解各种各样的CNN架构




Optimization Algorithms for Cost Functions

Numerical Optimization: Understanding L-BFGS

Why Momentum Really Works

An Interactive Tutorial on Numerical Optimization


Statistical machine learning and convex optimization

Gradient Descent & AdaGrad

Series: Optimization

Distributed Deep Learning



Learning to Rank

Learning to Rank Sketchfab Models with LightFM

Intro to WARP Loss, automatic differentiation and PyTorch


Transfer Learning

Transfer Learning – Machine Learning’s Next Frontier


Information Retrival

Fast Near-Duplicate Image Search using Locality Sensitive Hashing

ElastiK Nearest Neighbors

Building a Semantic Search Engine



Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

Document Deduplication with Locality Sensitive Hashing

How To Create Data Products That Are Magical Using Sequence-to-Sequence Models

How To Create Natural Language Semantic Search For Arbitrary Objects With Deep Learning

Product Quantizers for k-NN Tutorial

Nearest neighbor methods and vector models – part 1




Yoshua Bengio

Mark Chang


Sargur N. Srihari