For questions about the course, write to: datasci AT hunter cuny edu.
Week: | Topics: | Coursework: | Reading: | |
---|---|---|---|---|
Week 0: | Friday, 25 August |
Syllabus & Frequently Asked Questions | Classwork 0 | |
Week 1: | Wednesday, 30 August |
Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference, Data Scope, Big Data, Accuracy Python Recap: dictionaries, I/O, keyword parameters, & linting |
Classwork 1 | DS 100: Chapter 1 (Data Science Lifecycle), DS 100: Chapter 2 (Data Scope), DS 100: Chapter 4 (Modeling with Summary Statistics), Think CS: Chapter 12 (Dictionaries), DS 100: Section 13.1 (String Methods), Think CS: Chapter 11 (Files), python.org: Section 4.7 (Functions), pylint documentation |
Week 2: | Wednesday, 6 September |
Statistics Recap: Expectation, Variance, Correlation, Residuals & Sampling Linear Regression, Loss Functions: Mean Squared and Mean Absolute Error, Data Representation, DataFrames (Pandas) Python Recap: Lambda Expressions & Applying Functions |
Classwork 2 |
Seeing Theory (Brown U), Guessing Correlation Coefficients (GeoGebra), Computing Correlations (Real Python), Residuals (UBC), DS 100: Chapter 3 (Simulation & Data Design), DS 100: Chapter 15 (Linear Models), DS 100: Chapter 6 (DataFrames), Constructing DataFrames (pydata.org) DS 100: Section 8.5 (Table Shape & Granularity), python.org: Section 4.7 (Functions) |
Friday, 8 September |
Program 1 |
|||
Week 3: | Wednesday, 13 September |
Multiple Linear Regression, Handling Missing Values (Imputation),
Feature Engineering: Categorical Encoding Joining & Transforming Data in Pandas Python Recap: List comprehensions & zips Project Overview |
Classwork 3 |
DS 100: Chapter 6 (DataFrames), DS 100: Chapter 9 (Data Wrangling), DS 100: Chapter 15 (Linear Models), Think CS: Section 10.23 (List Comprehensions), Zip Tutorial (RealPython) |
Week 4: | Wednesday, 20 September |
Fitting Models with sklearn, More on Loss Functions Visualizing Qualitative & Quantitative Data, Time-Series Data, Customizing Plots in plotly, matplotlib & seaborn Serializing & Evaluating Models (pickling) Python recap: dates & times Project Overview |
Classwork 4 |
DS 100, Sections 4.2-3 (Loss Functions), DS 100: Chapter 10 (Exploratory Data Analysis), DS 100: Sections 11.1-11.3 (Data Visualization), DS 100: Chapter 15 (Linear Models), Python Object Serialization Docs (Pickling), Hands On ML (Matplotlib Tools) |
Friday, 22 September |
Program 2 | |||
Week 5: | Wednesday, 27 September |
Visualizing GIS Data: GeoJSON Format, Choropleth Maps, Voronoi Diagrams, Visualization Principles Polynomial Models, Training Models: Cross Validation, Ridge Regularization (L2) & Lasso Regularization (L1); Bias-Variance Tradeoff More on Fitting Models: Convexity, Validating, & Gradient Descent Testing Frameworks |
Classwork 5 |
DS 8: Chapter 15 (Prediction), DS 100: Chapter 11 (Data Visualization), DS 100: Chapter 15.4 (Multiple Linear Regression), DS 100: Chapter 15.7 (Feature Engineering), DS 100: Chapter 16.3 (Cross Validation), DS 100: Chapter 20 (Gradient Descent), Folium documentation, GeoJSON Editor Gradient Descent Visualization (Lili Jiang), ThinkCS: Unit Testing, Pytest |
Week 6: | Wednesday, 4 October |
Probability and Generalization: Distributions, Probability Mass Functions, Confidence Intervals, Smoothing; Hypothesis Testing, Central Limit Theorem Review |
Classwork 6 Opt-in for Optional Project |
DS 100: Chapter 17 (Theory for Inference & Prediction), DS 100: Chapter 17 (Probability & Generalization), Sampling from a Normally Distributed Population (UBC), Central Limit Theorem (UBC), Confidence Intervals (UBC) |
Friday, 6 October |
Program 3 | |||
Week 7: | Wednesday, 11 October |
Midterm Exam |
Classwork 7 | |
Thursday, 12 October |
Project Proposal Window Opens | |||
Week 8: | Wednesday, 18 October |
Regression on Probabilities; The Logistic Model & Loss Function; Using Logistic Models: Fitting & Evaluating a Logistic Model Linear Algebra Recap: Vectors, Matrices, Eigenvectors & Eigenvalues Classification: Support Vector Machines (SVM's) |
Classwork 8 |
DS 100: Chapter 19 (Classification), Recognizing Hand-Written Digits (sklearn) Confusion Matrices (sklearn) Explained Visually (Eigenvectors and Eigenvalues), Linear Algebra Review (MIT), Python DS Handbook Chapter 5 (SVMs), Karparthy's SVM Demo (Stanford), Data Camp Tutorial (SVM's) SVM's (sklearn), |
Friday, 22 October |
Proposal for Optional Project | |||
Week 9: | Wednesday, 25 October |
Multi-class Classification;
Other Classifiers: Naive Bayes, Decision Trees & Random Forests Intrinsic Dimensionality (Scree Plots); Principal Components Analysis (PCA) |
Classwork 9 | DS 100: Chapter 19 (Classification), DS 100: Chapter 22 (PCA) Decisions Trees; Bias & Variance (R2D3) Python DS Handbook: Section 5.09 (PCA), Explained Visually (Principal Components Analysis), |
Week 10: | Wednesday, 1 November |
Multidimensional Scaling (MDS); Non-Euclidean Distances Other Dimensionality Reduction: Multiple Dimensional Scaling; Non-Linear Dimensionality Reduction: t-SNE, UMAP |
Classwork 10 |
Python DS Handbook Section 5.10 (Manifold Learning), Manifold Learning (sklearn), |
Friday, 3 November |
Program 4 | |||
Week 11: | Wednesday, 8 November |
K-Means Clustering: Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch; More on Clustering: Gaussian Mixture Models, Hierarchical Clustering, Spectral Clustering |
Classwork 11 |
Spectral Clustering (Kaggle), Clustering (Carpentry), Spectral Clustering (Great Learning), K-Means gif (wiki), DS 100: Chapter 24 (clustering), Python DS Handbook: Section 5.11 (K-Means), Python DS Handbook: Section 5.12 (Gaussian Mixture Models), Cluster Analysis (wiki) |
Friday, 10 November |
Project Draft / Interim Check-In | |||
Week 12: | Wednesday, 15 November |
Supervised vs. Unsupervised Learning; Regular Expressions Relational Databases, Structured Query Language (SQL) Basics |
Classwork 12 |
Machine Learning Summary (PDSH) Supervised vs. Unsupervised Learning (IBM) DS 100: Sections 13.2-3 (Regular Expressions) DS 100: Sections 13.2-3 (Regular Expressions) |
Friday, 17 November |
Program 5 |
|||
22-24 November | Thanksgiving Break: No Classes | |||
Week 13: | Wednesday, 29 November |
SQL: Aggregating, Joining, & Transforming Data
More on Regular Expressions |
Classwork 13 | DS 100: Chapter 7 (Relational Databases & SQL) |
Friday, 1 December |
Project: Final Code, Slides Submission Project: Pre-recorded Video Recording Submission (if not doing in-class live demo) |
|||
Week 14: | Wednesday, 6 December |
Project Showcase Semester Review |
Classwork 14 | DS 100: Chapter 7 (Relational Databases & SQL) |
Friday, 8 December |
Program 6 | |||
Wednesday, 12 December | Reading Day-- no class | |||
Wednesday, 20 December 11:30am-1:30pm |
Final Exam |