For questions about the course, write to: datasci AT hunter cuny edu.
If you are interested in the course and have a year of programming, Python proficiency, and taken statistics, you can enroll directly via CUNYFirst without additional permissions. If you took the courses elsewhere, reach out to a computer science advisor about what's needed to align your previous courses to Hunter courses on your CUNYFirst record.
Class: | Topics: | Coursework: | Reading: |
---|---|---|---|
Wednesday, 25 January |
Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference, Data Scope, Big Data, Accuracy Python Recap: I/O, dictionaries, keyword parameters, & linting |
Classwork 1 Quiz 1 Program 1 |
DS 100: Chapter 1 (Data Science Lifecycle), DS 100: Chapter 2 (Data Scope), DS 100: Section 13.1 (String Methods), Think CS: Chapter 11 (Files), Think CS: Chapter 12 (Dictionaries), python.org: Section 4.7 (Functions), pylint documentation |
Wednesday, 1 February |
Statistics Recap: Expectation, Variance, Correlation, & Sampling Modeling and Estimation: Linear Models, Predicting Tip Amounts, Loss Functions Data Representation, DataFrames (Pandas), Lambda Expressions & Applying Functions |
Classwork 2 Quiz 2 Program 2 |
Seeing Theory (Brown U), Guessing Correlation Coefficients (GeoGebra), Computing Correlations (Real Python), Residuals (UBC), DS 100: Chapter 3 (Data Design), DS 100: Chapter 4 (Modeling with Summary Statistics), DS 100: Chapter 15 (Linear Models), DS 100: Chapter 6 (DataFrames), Constructing DataFrames (pydata.org) python.org: Section 4.7 (Functions) |
Wednesday, 8 February |
Data Representation: Structure & Granularity,
Joining & Transforming Data in Pandas, Handling Missing Values (Imputation) & Modifying Structure Linear Regression Python Recap: List comprehensions & zips Project Overview |
Classwork 3 Quiz 3 Program 3 |
DS 100: Chapter 6 (DataFrames), DS 100: Section 8.5 (Table Shape & Granularity), DS 100: Chapter 9 (Data Wrangling), DS 100: Chapter 15 (Linear Models), Think CS: Section 10.23 (List Comprehensions), Zip Tutorial (RealPython) |
Wednesday, 15 February |
Fitting Linear Models, Loss Functions: Mean Squared and Mean Absolute Error,
Serializing and Evaluating Models Visualizing Qualitative & Quantitative Data, Time-Series Data, Customizing Plots in plotly, matplotlib & seaborn |
Classwork 4 Quiz 4 Program 4 |
DS 100, Sections 4.2-3 (Loss Functions), DS 100: Chapter 10 (Exploratory Data Analysis), DS 100: Sections 11.1-11.3 (Data Visualization), DS 100: Chapter 15 (Linear Models), Python Object Serialization Docs (Pickling), Hands On ML (Matplotlib Tools) |
Wednesday, 22 February |
Visualizing GIS Data: GeoJSON Format, Choropleth Maps, Voronoi Diagrams, Visualization Principles Multiple Linear Regression, More on Fitting Models: Convexity, Validating, & Gradient Descent; Feature Engineering: Categorical Encoding |
Classwork 5 Quiz 5 Program 5 Opt-in for Optional Project |
DS 8: Chapter 15 (Prediction), DS 100: Chapter 11 (Data Visualization), DS 100: Chapter 15.3 (Multiple Linear Regression), DS 100: Chapter 15.6 (Feature Engineering), DS 100: Chapter 20 (Gradient Descent), Gradient Descent Visualization (Lili Jiang), Folium documentation, GeoJSON Editor |
Wednesday, 1 March |
Probability and Generalization: Distributions, Probability Mass Functions, Confidence Intervals, Smoothing;
Feature Engineering: Variable Transformations; Training Models: Cross Validation, Ridge Regularization (L2) & Lasso Regularization (L1); Bias-Variance Tradeoff |
Classwork 6 Quiz 6 Program 6 Project Proposal Window Opens |
Chapter 16 Model Selection, DS 100: Chapter 17 (Probability & Generalization), DS 100: Appendix (Cross Validation), Confidence Intervals (UBC), Sampling from a Normally Distributed Population (UBC), |
Wednesday, 8 March |
Regression on Probabilities; The Logistic Model & Loss Function;
Using Logistic Models: Fitting & Evaluating a Logistic Model
Hypothesis Testing, Central Limit Theorem |
Classwork 7 Quiz 7 Program 7 Proposal for Optional Project |
DS 100: Chapter 17 (Theory for Inference & Prediction), DS 100: Chapter 19 (Classification), Central Limit Theorem (UBC), Recognizing Hand-Written Digits (sklearn) Confusion Matrices (sklearn) |
Wednesday, 15 March |
Linear Algebra Recap: Vectors, Matrices, Eigenvectors & Eigenvalues One-Versus-Rest (OVR) Classification; Other Approaches: Naive Bayes, Support Vector Machines (SVM's), Decision Trees & Random Forests |
Classwork 8 Quiz 8 Program 8 |
DS 100: Chapter 19 (Classification), Explained Visually (Eigenvectors and Eigenvalues), Explained Visually (Principal Components Analysis), Linear Algebra Review (MIT), Python DS Handbook Chapter 5 (SVMs), Karparthy's SVM Demo (Stanford), Data Camp Tutorial (SVM's) SVM's (sklearn), |
Wednesday, 22 March |
Vector Space Recap;
Intrinsic Dimensionality (Scree Plots);
Principal Components Analysis (PCA) Non-Euclidean Distances; Multidimensional Scaling (MDS) |
Classwork 9 Quiz 9 Program 9 Project: Interim Check-In Opens |
DS 100: Chapter 19 (Classification), DS 100: Chapter 22 (PCA) Python DS Handbook: Section 5.09 (PCA), |
Wednesday, 29 March |
Other Dimensionality Reduction: Multiple Dimensional Scaling;
Non-Linear Dimensionality Reduction: t-SNE, UMAP
K-Means Clustering: Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch; More on Clustering: Gaussian Mixture Models; Hierarchical Clustering |
Classwork 10 Quiz 10 Project: Interim Check-In Closes |
Python DS Handbook Section 5.10 (Manifold Learning), Manifold Learning (sklearn), K-Means gif (wiki), DS 100: Chapter 24 (clustering), Python DS Handbook: Section 5.11 (K-Means), |
5-13 April | Spring Break: No Classes | ||
Wednesday, 19 April |
Supervised vs. Unsupervised Learning; Regular Expressions |
Classwork 11 Quiz 11 Program 11 |
Supervised vs. Unsupervised Learning (IBM),
Python DS Handbook: Section 5.12 (Gaussian Mixture Models) Cluster Analysis (wiki), |
Wednesday, 26 April |
Hypothesis Testing & A/B Testing Relational Databases and SQL, Part 1 |
Classwork 12 Quiz 12 Program 12 |
DS 100: Sections 13.2-3 (Regular Expressions) DS 100: Chapter 7 (Relational Databases & SQL) |
Wednesday, 3 May |
Relational Databases and SQL, Part 2 Code Demo: SQL in Python |
Classwork 13 Quiz 13 Program 13 Project: Final Version Project: Presentation Video |
DS 100: Chapter 7 (Relational Databases & SQL)
|
Wednesday, 10 May |
Final Exam: Coding Project Showcase Semester Review |
Classwork 14 Quiz 14: End-of-Semester Survey |
|
Wednesday, 17 May, 11:30am-1:30pm |
Final Exam: Written |