If you are interested in the course and have a year of programming, Python proficiency, and taken statistics, you can enroll directly via CUNYFirst without additional permissions. If you took the courses elsewhere, reach out to a computer science advisors about what's needed to align your previous courses to Hunter courses on your CUNYFirst record.
Theme: | Topics: | Coursework: | Reading: | ||
---|---|---|---|---|---|
Data Science Overview | #1: Monday, 31 January |
Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference Python Recap: basics, dictionaries, & keyword parameters |
CW1 |
DS 100: Chapter 1 (Data Science Lifecycle), Think CS: Chapter 12 (Dictionaries), python.org: Section 4.7 (Functions) |
|
#2: Thursday, 3 February |
Data Scope, Big Data, Accuracy Python Recap: file I/O and string methods |
CW2 |
DS 100: Chapter 2 (Data Scope), DS 100: Section 13.1 (String Methods), Think CS: Chapter 11 (Files) |
||
#3: Monday, 7 February |
Theory for Data Design, Sampling Variation, Measurement Error Data Representation, DataFrames (Pandas), Lambda Expressions & Applying Functions |
CW3 |
DS 100: Chapter 3 (Data Design), DS 100: Chapter 6 (DataFrames), python.org: Section 4.7 (Functions) |
||
#4: Thursday, 10 February |
Modeling and Estimation DataFrame Basics, List comprehensions & zips |
CW4 Program 1 Quiz 1 |
DS 100: Chapter 4 (Modeling and Estimation), DS 100: Chapter 6 (DataFrames), Think CS: Section 10.23 (List Comprehensions), Zip Tutorial (RealPython), Constructing DataFrames (pydata.org) |
||
Representation & Visualization | #5: Monday, 14 February |
Data Representation: Structure & Granularity Manipulating Data in Pandas: More on Subsetting & Aggregating DataFrames Project Overview |
CW5 |
DS 100: Chapter 6 (DataFrames), DS 100: Section 8.5 (Table Shape & Granularity) |
|
#6: Thursday, 17 February |
Data Quality & Wrangling Joining & Transforming Data in Pandas |
CW6 Program 2 Quiz 2 |
DS 100: Chapter 6 (DataFrames), DS 100: Chapter 9 (Data Wrangling), | ||
21 February | President Day: College Closed | ||||
#7: Thursday, 24 February |
Features, Visualizing Qualitative & Quantitative Data Regular Expressions |
CW7 Program 3 Quiz 3 |
DS 100: Chapter 10 (Exploratory Data Analysis), DS 100: Sections 11.1-11.3 (Data Visualization), DS 100: Sections 13.2-3 (Regular Expressions) |
||
#8: Monday, 28 February |
Customizing Plots in matplotlib & seaborn Time-Series Data, GeoJSON Format |
CW8 Opt-in for Optional Project |
DS 100: Sections 11.1-11.3 (Data Visualization), Hands On ML (Matplotlib Tools), GeoJSON Editor |
||
#9: Thursday, 3 March |
Visualization Principles Visualizing GIS Data: Choropleth Maps, Voronoi Diagrams |
CW9 Program 4 Quiz 4 |
DS 100: Chapter 11 (Data Visualization), Folium documentation |
||
Models & Loss Functions | #10: Monday, 7 March |
Linear Models, Predicting Tip Amounts
Statistics Recap: Expectation, Variance, Correlation, & Sampling |
CW10 Proposal for Optional Project |
DS 100: Chapter 15 (Linear Models), Seeing Theory (Brown U), Guessing Correlation Coefficients (UBC), Computing Correlations (Real Python) | |
#11: Thursday, 10 March |
Probability and Generalization: Distributions, Probability Mass Functions Sampling & Confidence Intervals |
CW11
Program 5 Quiz 5 |
DS 100: Chapter 16 (Probability & Generalization), Sampling from a Normally Distributed Population (UBC), Confidence Intervals (UBC), Residuals (UBC) |
||
#12: Monday, 14 March |
Central Limit Theorem Fitting Models: Convexity, Least Squares, & Validating Loss Functions: Mean Squared and Mean Absolute Error |
CW12 |
Central Limit Theorem (UBC), DS 8: Chapter 15 (Prediction), DS 100, Sections 4.2-3 (Loss Functions), |
||
Multiple Linear Modeling | #13: Thursday, 17 March |
Gradient Descent Multiple Linear Regression |
CW13 Program 6 Quiz 6 |
DS 100: Chapter 17 (Gradient Descent), Gradient Descent Visualization (Lili Jiang), DS 100: Chapter 19 (Multiple Linear Regression), | |
#14: Monday, 21 March |
Feature Engineering: Variable Transformations
& Categorical Encoding; Bias-Variance Tradeoff Code Demo: Walmart Sales |
CW14 | DS 100: Chapter 20 (Feature Engineering), DS 100: Exam Resources |
||
#15: Thursday, 24 March |
Regularization: Ridge Regularization (L2) &
Lasso Regularization (L1) Classwork: Modeling & Regularization |
CW15
Program 7 Quiz 7 |
DS 100: Chapter 20 (Feature Engineering)
DS 100: Chapter 21 (Bias-Variance Tradeoff), DS 100: Chapter 22 (Regularization) |
||
Classification | #16: Monday, 28 March |
Regression on Probabilities; The Logistic Model & Loss Function;
Serializing and Evaluating Models |
CW16 | DS 100: Chapter 24 (Classification),
Confusion Matrices (sklearn), Python Object Serialization Docs (Pickling) |
|
#17: Thursday, 31 March |
Using Logistic Models: Fitting & Evaluating a Logistic Model;
Cross Validation & Metrics |
CW17 Program 8 Quiz 8 |
DS 100: Chapter 24 (Classification), DS 100: Chapter 21 (Cross Validation), |
||
#18: Monday, 4 April |
One-Versus-Rest (OVR) Classification Other Approaches: Naive Bayes, Support Vector Machines (SVM's), Decision Trees & Random Forests |
CW18 Project: Interim Check-In |
Python DS Handbook Chapter 5 (SVMs), Karparthy's SVM Demo (Stanford), Data Camp Tutorial (SVM's) SVM's (sklearn), Recognizing Hand-Written Digits (sklearn) |
||
Dimensionality Reduction | #19: Thursday, 7 April |
Linear Algebra Recap: Vectors, Matrices, Eigenvectors & Eigenvalues
Principal Components Analysis |
CW19 Program 9 Quiz 9 |
Explained Visually (Eigenvectors and Eigenvalues), Explained Visually (Principal Components Analysis), Linear Algebra Review (MIT), DS 100: Section 26.1 (PCA Dimensions) |
|
#20: Monday, 11 April |
PCA as Dimensionality Reduction; Intrinistic Dimensionality (Scree Plots) Multiple Dimensional Scaling |
CW20 | DS 100: Chapter 26 (PCA) Python DS Handbook: Section 5.09 (PCA), Python DS Handbook Section 5.10 (Manifold Learning) |
||
#21: Thursday, 14 April |
Non-Euclidean Distances; Non-Linear Dimensionality Reduction: t-SNE, UMAP |
CW21 Program 10 Quiz 10 |
Manifold Learning (sklearn), Python DS Handbook Section 5.10 (Manifold Learning) |
||
15-22 April | Spring Break: No Classes | ||||
Clustering | #22: Monday, 25 April |
Supervised vs. Unsupervised Learning; K-Means Clustering |
CW22 Complete Project & Website |
Supervised vs. Unsupervised Learning (IBM), DS 100: Chapter 28 (clustering), K-Means gif (wiki), Python DS Handbook: Section 5.11 (K-Means) |
|
#23: Thursday, 28 April |
K-Means: Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch K-Means Gaussian Mixture Models |
CW 23 Program 11 Quiz 11 |
DS 100: Chapter 28 (clustering),
Python DS Handbook: Section 5.11 (K-Means), Python DS Handbook: Section 5.12 (Gaussian Mixture Models) |
||
#24: Monday, 2 May |
Other Clustering Approaches; Replicability, P-Hacking, A/B testing |
CW24 | Cluster Analysis (wiki),
DS Chapter 25 (Replicability) |
||
More on Structured Data | #25: Thursday, 5 May |
Relational Databases and SQL Code Demo: SQL in Python: setting up a database, basic SQL |
CW25 Program 12 Quiz 12 |
DS 100: Chapter 7 (Relational Databases & SQL)
|
|
#26: Monday, 9 May |
Relational Databases: Subsetting, Aggregating, Joining, & Transforming Data | CW26 |
DS 100: Chapter 7 (Relational Databases & SQL) | ||
Review | #27: Thursday, 12 May |
Project Showcase Semester Review |
CW27 Program 13 Quiz 13: End-of-semester Survey |
||
Monday, 16 May, 2:45-4pm | Final Exam: Coding | ||||
Monday, 23 May, 1:45-3:45pm | Final Exam: Written |