Data Science & Machine Learning Basics

Data Science
Machine Learning
Artificial Intelligence
Author

Rafiq Islam

Published

September 20, 2024

Post under construction

This page is my personal repository of most common and useful machine learning algorithms using Python and other data science tricks and tips.

\(\text{Data Science}\)

Data science involves extracting knowledge from structured and unstructured data. It combines principle from statistics, machine learning, data analysis, and domain knoledge to understand and interpret the data

Data Collection & Accuisition

  • Web srcaping: Data collection through Webscraping
  • API integration
  • Data Lakes, Data Warehouse

Data Cleaning & Preprocessing

This involves Handling Missing Values, Data Transformation, Feature Engineering, Encoding Categorical Variables, Handling Outliers

Exploratory Data Analysis (EDA)

This usually includes the Descriptive Statistics, Data Visualization, Identifying Patterns, Trends, Correlations of the features and labels.

Statistical Methods

  • ANOVA - Categorical Features’: How do we treat the categorical features for our data science project?
  • Hypothesis Testing
  • Probability Distributions
  • Inferential Statistics
  • Sampling Methods

Big Data Techniques

  • Hadoop, Spark
  • Distributed Data Storage (e.g., HDFS, NoSQL)
  • Data PipeLines, ETL (Extract, Transform, Load)

\(\text{Machine Learning Algorithms}\)

\(\text{Supervised Learning}\)

(Training with labeled data: input-output pairs)

Regression

Classification

Parametric
Multi-Class Classification
Bayesian or Probabilistic Classification

\(\text{Unsupervised Learning}\)

(Training with unlabeled data)

Clustering
  • k-Means Clustering
  • Hierarchical Clustering
  • DBSCAN (Density-Based Spatial Clustering)
  • Gaussian Mixture Models (GMM)
Dimensionality Reduction
  • Principal Component Analysis
  • Latent Dirichlet Allocation (LDA)
  • t-SNE (t-distributed Stochastic Neihbor Embedding)
  • Factor Analysis
  • Autoencoders
Anomaly Detection
  • Isolation Forests
  • One-Class SVM

\(\text{Semi-Supervised Learning}\)

(Combination of labeled and unlabeled data)

  • Self-training
  • Co-training
  • Label Propagation

\(\text{Reinforcement Learning}\)

(Learning via rewards and penalties)

  • Markov Decision Process (MDP)
  • Q-Learning
  • Deep Q-Networks (DQN)
  • Policy Gradient Method

\(\text{Deep Learnings}\)

  • PyTorch
  • Artificial Neural Networks (ANN)
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)
  • Long Short-Term Memory (LSTM)
  • Generative Adversarial Networks (GAN)

\(\text{Model Evaluation and Fine Tuning}\)

Model Evaluation Metrics

  • For Regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), \(R^2\) score
  • For Classification: Accuracy, Precision, Recall, F1 Score, ROC-AUC
  • Cross-validation: kFold, Stratified k-fold, leave-one-out

Model Optimization

Ensemble Methods


You may also like

Back to top