If you are interested in the course and have a year of programming, Python proficiency, and taken statistics, you can enroll directly via CUNYFirst without additional permissions. If you took the courses elsewhere, reach out to a computer science advisors about what's needed to align your previous courses to Hunter courses on your CUNYFirst record.
| Theme: | Topics: | Coursework: | Reading: | ||
|---|---|---|---|---|---|
| Data Science Overview | #1: Monday, 31 January | Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference Python Recap: basics, dictionaries, & keyword parameters | CW1 | DS 100: Chapter 1 (Data Science Lifecycle), Think CS: Chapter 12 (Dictionaries), python.org: Section 4.7 (Functions) | |
| #2: Thursday, 3 February | Data Scope, Big Data, Accuracy Python Recap: file I/O and string methods | CW2 | DS 100: Chapter 2 (Data Scope), DS 100: Section 13.1 (String Methods), Think CS: Chapter 11 (Files) | ||
| #3: Monday, 7 February | Theory for Data Design, Sampling Variation, Measurement Error Data Representation, DataFrames (Pandas), Lambda Expressions & Applying Functions | CW3 | DS 100: Chapter 3 (Data Design), DS 100: Chapter 6 (DataFrames), python.org: Section 4.7 (Functions) | ||
| #4: Thursday, 10 February | Modeling and Estimation DataFrame Basics, List comprehensions & zips | CW4 Program 1 Quiz 1 | DS 100: Chapter 4 (Modeling and Estimation), DS 100: Chapter 6 (DataFrames), Think CS: Section 10.23 (List Comprehensions), Zip Tutorial (RealPython), Constructing DataFrames (pydata.org) | ||
| Representation & Visualization | #5: Monday, 14 February | Data Representation:  Structure & Granularity Manipulating Data in Pandas: More on Subsetting & Aggregating DataFrames Project Overview | CW5 | DS 100: Chapter 6 (DataFrames), DS 100: Section 8.5 (Table Shape & Granularity) | |
| #6: Thursday, 17 February | Data Quality & Wrangling Joining & Transforming Data in Pandas | CW6 Program 2 Quiz 2 | DS 100: Chapter 6 (DataFrames), DS 100: Chapter 9 (Data Wrangling), | ||
| 21 February | President Day: College Closed | ||||
| #7: Thursday, 24 February | Features, Visualizing Qualitative & Quantitative Data Regular Expressions | CW7 Program 3 Quiz 3 | DS 100: Chapter 10 (Exploratory Data Analysis), DS 100: Sections 11.1-11.3 (Data Visualization), DS 100: Sections 13.2-3 (Regular Expressions) | ||
| #8: Monday, 28 February | Customizing Plots in matplotlib & seaborn Time-Series Data, GeoJSON Format | CW8 Opt-in for Optional Project | DS 100: Sections 11.1-11.3 (Data Visualization), Hands On ML (Matplotlib Tools), GeoJSON Editor | ||
| #9: Thursday, 3 March | Visualization Principles Visualizing GIS Data: Choropleth Maps, Voronoi Diagrams | CW9 Program 4 Quiz 4 | DS 100: Chapter 11 (Data Visualization), Folium documentation | ||
| Models & Loss Functions | #10: Monday, 7 March | Linear Models, Predicting Tip Amounts Statistics Recap: Expectation, Variance, Correlation, & Sampling | CW10 Proposal for Optional Project | DS 100: Chapter 15 (Linear Models), Seeing Theory (Brown U), Guessing Correlation Coefficients (UBC), Computing Correlations (Real Python) | |
| #11: Thursday, 10 March | Probability and Generalization:  Distributions, Probability Mass Functions Sampling & Confidence Intervals | CW11 Program 5 Quiz 5 | DS 100: Chapter 16 (Probability & Generalization), Sampling from a Normally Distributed Population (UBC), Confidence Intervals (UBC), Residuals (UBC) | ||
| #12: Monday, 14 March | Central Limit Theorem Fitting Models: Convexity, Least Squares, & Validating Loss Functions: Mean Squared and Mean Absolute Error | CW12 | Central Limit Theorem (UBC), DS 8: Chapter 15 (Prediction), DS 100, Sections 4.2-3 (Loss Functions), | ||
| Multiple Linear Modeling | #13: Thursday, 17 March | Gradient Descent Multiple Linear Regression | CW13 Program 6 Quiz 6 | DS 100: Chapter 17 (Gradient Descent), Gradient Descent Visualization (Lili Jiang), DS 100: Chapter 19 (Multiple Linear Regression), | |
| #14: Monday, 21 March | Feature Engineering: Variable Transformations
       & Categorical Encoding; Bias-Variance Tradeoff Code Demo: Walmart Sales | CW14 | DS 100: Chapter 20 (Feature Engineering), DS 100: Exam Resources | ||
| #15: Thursday, 24 March | Regularization: Ridge Regularization (L2) &
      Lasso Regularization (L1) Classwork: Modeling & Regularization | CW15 Program 7 Quiz 7 | DS 100: Chapter 20 (Feature Engineering) DS 100: Chapter 21 (Bias-Variance Tradeoff), DS 100: Chapter 22 (Regularization) | ||
| Classification | #16: Monday, 28 March | Regression on Probabilities; The Logistic Model & Loss Function; Serializing and Evaluating Models | CW16 | DS 100: Chapter 24 (Classification), Confusion Matrices (sklearn), Python Object Serialization Docs (Pickling) | |
| #17: Thursday, 31 March | Using Logistic Models: Fitting & Evaluating a Logistic Model; Cross Validation & Metrics | CW17 Program 8 Quiz 8 | DS 100: Chapter 24 (Classification), DS 100: Chapter 21 (Cross Validation), | ||
| #18: Monday, 4 April | One-Versus-Rest (OVR) Classification Other Approaches: Naive Bayes, Support Vector Machines (SVM's), Decision Trees & Random Forests | CW18 Project: Interim Check-In | Python DS Handbook Chapter 5 (SVMs), Karparthy's SVM Demo (Stanford), Data Camp Tutorial (SVM's) SVM's (sklearn), Recognizing Hand-Written Digits (sklearn) | ||
| Dimensionality Reduction | #19: Thursday, 7 April | Linear Algebra Recap:  Vectors, Matrices, Eigenvectors & Eigenvalues Principal Components Analysis | CW19 Program 9 Quiz 9 | Explained Visually (Eigenvectors and Eigenvalues), Explained Visually (Principal Components Analysis), Linear Algebra Review (MIT), DS 100: Section 26.1 (PCA Dimensions) | |
| #20: Monday, 11 April | PCA as Dimensionality Reduction; Intrinistic Dimensionality (Scree Plots) Multiple Dimensional Scaling | CW20 | DS 100: Chapter 26 (PCA) Python DS Handbook: Section 5.09 (PCA), Python DS Handbook Section 5.10 (Manifold Learning) | ||
| #21: Thursday, 14 April | Non-Euclidean Distances; Non-Linear Dimensionality Reduction: t-SNE, UMAP | CW21 Program 10 Quiz 10 | Manifold Learning (sklearn), Python DS Handbook Section 5.10 (Manifold Learning) | ||
| 15-22 April | Spring Break: No Classes | ||||
| Clustering | #22: Monday, 25 April | Supervised vs. Unsupervised Learning; K-Means Clustering | CW22 Complete Project & Website | Supervised vs. Unsupervised Learning (IBM), DS 100: Chapter 28 (clustering), K-Means gif (wiki), Python DS Handbook: Section 5.11 (K-Means) | |
| #23: Thursday, 28 April | K-Means:  Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch K-Means Gaussian Mixture Models | CW 23 Program 11 Quiz 11 | DS 100: Chapter 28 (clustering), Python DS Handbook: Section 5.11 (K-Means), Python DS Handbook: Section 5.12 (Gaussian Mixture Models) | ||
| #24: Monday, 2 May | Other Clustering Approaches; Replicability, P-Hacking, A/B testing | CW24 | Cluster Analysis (wiki), DS Chapter 25 (Replicability) | ||
| More on Structured Data | #25: Thursday, 5 May | Relational Databases and SQL Code Demo: SQL in Python: setting up a database, basic SQL | CW25 Program 12 Quiz 12 | DS 100: Chapter 7 (Relational Databases & SQL) | |
| #26: Monday, 9 May | Relational Databases: Subsetting, Aggregating, Joining, & Transforming Data | CW26 | DS 100: Chapter 7 (Relational Databases & SQL) | ||
| Review | #27: Thursday, 12 May | Project Showcase Semester Review | CW27 Program 13 Quiz 13: End-of-semester Survey | ||
| Monday, 16 May, 2:45-4pm | Final Exam: Coding | ||||
| Monday, 23 May, 1:45-3:45pm | Final Exam: Written | ||||