Data Science
- Home
- »
- Data Science
INTRODUCTION TO DATA SCIENCE
➢ Market trend of Data Science
➢ Opportunities for Data Science
➢ What is the need for Data Scientists
➢ What is Data Science
➢ Data Science Venn Diagram
➢ Data Science Use cases
➢ Knowing the roles of a Data Science practitioner
➢ Data Science – Skills set
➢ Understanding the concepts & definitions of:
o Artificial Intelligence
o Machine Learning – Deep Learning
o NLP
o Computer Vision
DATA AND TOOLS
➢ What is Business Intelligence?
➢ What is ETL?
➢ Layers of a Data Warehouse
➢ OLAP VS OLTP
➢ Facts and Dimensions
➢ Big Data tools and it’s uses
➢ Big Data Stack
➢ Understanding Structured text Data
➢ Understanding Unstructured text Data
DATA SCIENCE DEEP DIVE
➢ Understanding Descriptive vs Predictive vs Prescriptive Analytics
➢ Difference between Analytics vs. Analysis
➢ Data Science Project Lifecycle
➢ Technology Stack Involved in the Lifecycle
o Machine Learning tools
o Development tools
o Languages
o Data Platforms
➢ CRISP – Cross-industry standard process for Data Mining
➢ 5WIH – The questions that kick start a ML project 80-20 Rule of Data Analytics
➢ Supervised Vs Unsupervised Learning
➢ Data Science – Use case bubble
➢ Data Mining
DATA
➢ Data Wrangling or Data Munging
➢ Data Categorization basics
➢ Different Types of Data
➢ Types of Data Collection
➢ Data Sources
➢ Data Collection plan
➢ Data Quality Issues
➢ Types of Data Error
➢ Ration Scale Vs Interval Scale
➢ Predictors/Features vs Predictions/Labels
➢ Understanding Imbalance in Data
STATISTICS & PROBABILITY
➢ What is Statistics
➢ Sample Vs Population
➢ Measure of central vs Dispersion
➢ Frequency Distribution
➢ Cumulative Frequency Distribution
➢ Mean, Median, Mode
➢ Quartiles/Percentile
➢ Range, Variance, Standard Deviation, Co-efficient of Variation
➢ 68-95-99 Rule of SD
➢ Z Score (Standard Score)
➢ P-Value
➢ Maximum Likelihood Estimation
➢ Probability vs Likelihood
➢ PDF vs PMF
➢ Normal Distribution of Data
➢ Skewness & it’s types
➢ Kurtosis & it’s types
➢ Kth Central Moments
➢ Co-Variance/Joint Probability Distribution
➢ Correlation
➢ Entropy
➢ ANOVA
➢ Chi-Square
➢ F tests
➢ Types of Data Distribution
SETUP
➢ Anaconda & Python
➢ Understanding Jupiter Notebooks
➢ Python Package Installation
➢ Tableau Installation
➢ Oracle Database & Server
DATA SOURCING, EXPLORATORY DATA ANALYSIS & READINESS
➢ Concept of List, Data frame, Dictionary
➢ Connecting to Databases using Python
➢ Importing data from csv, text, Excel
➢ Converting JSON, XML, to Data frame
➢ Understanding EDA
➢ Frequency Distribution
➢ Analyzing NA, blanks
➢ Using SQL concepts inside Python
DATA TRANSFORMATION/WRANGLING
➢ Handling missing Values
➢ Handling Outliers
➢ Normalization techniques
➢ Standardization techniques
➢ Regularization techniques
➢ Feature Extraction
➢ Train Test data selection
DATA SCIENCE CONCEPTS
➢ No Free Lunch
➢ Hypothesis vs Null Hypothesis
➢ BIAS VS Variance tradeoff
➢ Local Vs Global Minima/Maxima
➢ Bias – Loss/ Loss-Cost Function
LINEAR REGRESSION
➢ Understanding Regression math
➢ Linear Algebra concepts
➢ Least Mean Square
➢ Analyzing Co-relation
➢ Heat Maps, Pair Plots, Distribution Graphs
➢ Simple Vs Multiple Linear regression
➢ Train Test data selection
POLYNOMIAL REGRESSION
➢ Understanding the math
➢ Polynomial Algebra concepts
➢ Degree of Polynomial
CLASSIFICATION
➢ Overfitting/ Under fitting/ Optimal Fits
➢ Handling Categorical Data inside
➢ Confusion Matrix
➢ Type I & Type II errors
➢ Precision Vs Accuracy
➢ AUC/ROC curve
LOGISTIC REGRESSION
➢ Understanding the statistics behind Logistic Sigmoid
➢ Logistic regression math
RANDOM FOREST
➢ Understanding the Decision Tree & Bagging
➢ Math behind Classification and Regression in tree
➢ Decision Tree concepts
➢ Using Random Forest for Regression
➢ K fold Cross Validation
➢ Model Optimizers
➢ Hyper parameter Tuning
➢ Building aDecision TreesModel in R
NAÏVE BAYES THEOREM
➢ Understanding the Naïve Bayes theorem
➢ Bayesian Vs Gaussian theorems
➢ Using naïve Bayes for Regression
➢ Model Optimizers
➢ Hyper parameter Tuning
NLP FOR MACHINE LEARNING FEATURING
➢ Label Encoding
➢ One hot encoding
➢ Synonym treatment
➢ Stemming
➢ Lemmatization
➢ Stop words
➢ Parts Of Speech Tagging
➢ TF-IDF and its math Behind
SUPPORT VECTOR MACHINE
➢ Understanding the SVM Concept
➢ Hyper plane and Kernel
➢ Using SVM for Regression
➢ Grid Search
➢ Model Optimizers
➢ Hyper parameter Tuning
GRADIENT BOOSTING MACHINE & XGBOOST
➢ Understanding the Boosting Concept
➢ Hyper plane and Kernel
➢ Learning Rate
➢ Model Optimizers
➢ Hyper parameter Tuning
K MEANS CLUSTERING ALGORITHM
➢ Understanding Nearest Neighbors concept
➢ Statistics behind K Means Clustering Algorithm
KERAS TENSOR FLOW – MLP DEEP LEARNING (NEURAL NETWORKS)
➢ Understanding Deep learning
➢ MLP Vs other Deep Learning
➢ How Neural Network works & Architecture
➢ Activation functions.
➢ Model Optimizers
➢ Hyper parameter Tuning
➢ Best Practice and when to use DL
H2O.AI
➢ Introduction to H20.ai
➢ Pros and Cons
➢ Available models in H20.ai
SAMPLING & DIMENSION REDUCTION (DR)
➢ Introduction to Sampling
➢ Over sampling and Under sampling
➢ SMOTE/SMOTENC & Near Miss
➢ Pros and Cons of sampling
➢ Introduction to DR
➢ PCA & it’s code
DEPLOYMENT OF MODEL TO PRODUCTION
➢ Introduction to Pyinstaller
➢ Pickle and Joblib
TABLEAU BASICS
➢ Introduction to Tableau
➢ Data sources
➢ Exploratory Data Analysis
➢ Clustering Analysis and Inferences using Tableau
➢ Creating visualizations
ADDED FEATURES
➢ Resume Preparation
➢ Resume Preparation Tips
➢ Sample Resumes
➢ Soft copy of Notes for each module
➢ 4 case studies on use case diagrams & 2 Real-time project specifications
Copyright © 2026 IngeniousFusionTek | All Rights Reserved