avatar

Chuizixiaoxing

Per aspera ad astra.

Statistics

System rebuilding...

.
.
.

Machine Learning


Principal Component Analysis (PCA)

Kernel Principal Component Analysis (Kernel PCA)

K-means

Gaussian Mixture Model


Tree-Based Methods

Decision Tree

Gradient Boost Decision Tree

XGBoost

LightGBM


Incoming Topics...

Logistic Regression

K-Nearest Neighbor (KNN)

Support Vector Machine (SVM)

.
.
.

Introduction to Deep Learning


Description: A simple graph demonstrating how MLP, RNN (LSTM, GRU) work, including the mathematical derivation of feedforward, backpropagation, and backpropagation through time.

MLP (Multiple Layer Perceptron)

RNN (Recurrent Neural Network)

LSTM (Long Short-Term Memory) & GRU (Gate Recurrent Unit)

Attention

Transformer

.
. .

Statistical Foundation of Data Science (A Guide)


Set 1 Probability theory and mathematical statistics (I)

  • Discrete Random Variables
  • Continuous Random Variables
  • Functions of a Random Variable
  • Joint Distributions
  • Independent Random Variables
  • Conditional Distributions
  • Functions of Jointly Distributed Random Variables
  • Extrema and Order Statistics
  • Expected Values
  • Limit Theorems

Set 2 Probability theory and mathematical statistics (II)

  • Examples & Reasons for Fitting Distribution
  • Parameter Estimation
  • The Method of Moments
  • The Method of Maximum Likelihood
  • Maximum Likelihood Estimates of Multinomial Cell Probabilities
  • Large Sample Theory for Maximum Likelihood Estimates
  • Confidence Intervals from Maximum Likelihood Estimates
  • Efficiency and the Cramer-Rao Lower Bound
  • Ancillary Statistics
  • Sufficient Statistics
  • A Factorization Theorem
  • The Rao-Blackwell Theorem
  • Delta method

Set 3 The Bayesian Approach to Parameter Estimation

  • Example of Bayesian Inference
  • Bayesian Point Estimation and Interval Estimation
  • Large Sample Normal Approximation to the Posterior
  • Computational Aspects: Gibbs Sampling, Markov Chain Monte Carlo
  • Bayesian Testing Procedures

Set 4 Testing Hypotheses and Assessing Goodness of Fit

  • Basics of Hypotheses Testing
  • The Neyman-Pearson Paradigm
  • Specification of the Significance Level and the Concept of a p-value
  • Uniformly Most Powerful Tests
  • The Duality of Confidence Intervals and Hypothesis Tests
  • Generalized Likelihood Ratio Tests
  • Likelihood Ratio Tests for the Multinomial Distribution
  • The Poisson Dispersion Test
  • Probability Plots
  • Tests for Normality

Set 5 Nonparametric Statistics

  • Nonparametric hypothesis testing:
    Permutation testing, Rank-based tests: Mann-Whitney Test, Wilcoxon Rank Sum Test
  • Empirical distributions and the plug-in principle
    Empirical CDF, empirical distributions, convergence theorems, Monte Carlo integration
  • Density estimation
    Histogram estimators, Kernel density estimators
  • Nonparametric regression
    Definitions, Linear regression: Regressograms, Kernel regression: Nadaraya-Watson kernel regression,
    Cross-validation, Curse of dimensionality

Set 6 Bootstrap

  • Motivation for bootstrap
  • Bootstrap basics
  • Bootstrap confidence intervals
  • Other uses of bootstrap
  • Quantifying uncertainty more generally

Set 7 Monte Carlo Sampling

  • Motivation from Bayesian inference
  • Monte Carlo Methods:
  • Direct Sampling
  • Rejection Sampling
  • Importance Sampling
  • Markov Chain Monte Carlo
  • Cont.

Github Resource from Set 4-7
.
.
.