Detailed Schedule#
Day 01: Exploring data with pandas#
Today, we will:#
Start up and navigate
jupyter labRead, slice, and filter data with
pandasExplore data with
pandas,matplotlib, andseaborn
Materials#
Before you start#
Python Review — work through this if you need a refresher on Python basics, NumPy, and matplotlib
Resources#
Day 02: Using KNN to classify objects with scikit-learn and pandas#
Today, we will:#
Read in and clean data
Discuss and plan a classification problem
Use
scikit-learnto build a KNN classification modelEvaluate the model with confusion matrices and classification reports
Materials#
Supporting Notes#
Methods and Model Validation — when do different classifiers work or fail?
Support Vector Machines — linear vs. RBF kernels; an alternative classifier to KNN
Resources#
Day 03: Performing Regression with scikit-learn#
Today, we will:#
Read in and clean data
Discuss and plan a regression problem
Use
scikit-learnto build a regression modelEvaluate with MSE, R², residual plots, and predicted vs. actual plots
Materials#
Supporting Notes#
The Importance of Visualization — why residual plots and predicted vs. actual plots matter
Resources#
Day 04: Modeling Project#
Today, we will:#
Read in a new dataset
Explore, clean, and preprocess data
Build and evaluate a model
Improve the model through feature selection and cross-validation
Materials#
Supporting Notes#
Cross-Validation — how to estimate model performance with uncertainty
Recursive Feature Elimination — how to systematically find the best features
Principal Component Analysis — how to reduce dimensionality while preserving information