Help::General Questions
.
Help::Individual Questions
.
Theme: | Topics: | Coursework: | Reading: | ||
---|---|---|---|---|---|
Data Science Overview | #1: Thursday, 26 August |
Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference Code Demo: Textbook: Predicting Ages from SSN Data Code Demo: Python Recap: basics & standard packages (pandas, numpy, matplotlib, & seaborn), zips and list comprehensions |
Q1: Academic Integrity P1: Hello, world P2: Senators' Names |
DS 100: Chapter 1 (Data Science Lifecycle) | |
#2: Monday, 30 August |
Guest Speaker: Elise Harris, Coordinator for Tech Internships and External Partnerships: Tech Internships Exploratory Data Analysis, Generalizing from Data, Data Sampling, Probability Sampling Classwork: Are Senators older than Representatives? Code Demo: Python string methods |
Q2: Python Recap P3: Senators' Ages P4: ELA Proficiency |
DS 100: Chapter 2 (Data Scope), DS 100: Chapter 3 (Data Design), DS 100: Section 13.1 (Python String Methods) |
||
Rectangular Data | #3: Thursday, 2 September |
Data Representation: standard primitive types, rectangular data, data tables in Python (Pandas)
Code Demo: Regular Expressions Guest Speaker: Provost Valeda Dent: The Rural Village Libraries Research Network project Classwork: Measuring impact of libraries in NYC communities |
Q3: Data Sampling P5: URL Collection P6: Regex on Restaurant Inspection Data |
DS 100: Chapter 7 (Data Tables in Python) DS 100: Sections 13.2-3 (Regular Expressions) |
|
6 September | Labor Day: College Closed | ||||
#4: Thursday, 9 September |
Relational Databases and SQL Code Demo: SQL in Python: setting up a database, basic SQL |
Q4: Python Strings & Data Types P7: Neighborhood Tabulation Areas P8: Restaurant SQL Queries |
DS 100: Chapter 6 (Relational Databases & SQL) | ||
#5: Monday, 13 September |
Aggregating Data in SQL and Pandas: Code Demo: Revisiting Python Functions: Applying Functions to Tables |
Q5: Coding Quiz P9: Aggregating Restaurant Data (SQL) P10: Extracting Districts |
DS 100: Chapter 6 (Relational Databases & SQL), DS 100: Chapter 7 (Data Tables in Python), python.org: Section 4.7 (More on Defining Functions) |
||
16 September | No Class | ||||
#6: Monday, 20 September |
Joining Data in SQL and Pandas
Classwork: Combining NYC schools data Code Demo: Lambda Expressions Project Overview |
Q6: Regular Expressions P11: Joining Restaurant & NTA Data Project Pre-Proposal Window Opens P12: MTA Ridership |
DS 100: Chapter 6 (Relational Databases & SQL), DS 100: Chapter 7 (Data Tables in Python), python.org: Section 4.7 (More on Defining Functions) | ||
Data Visualization | #7: Thursday, 23 September |
Plotting Numerical & Categorical Data, Time-Series Data Code Demo: Customizing Plots in matplotlib & seaborn Classwork: Plotting MTA Ridership Data Code Demo: Revisiting Python Functions: Defaults, Keywords, Unpacking Argument Lists |
Q7: SQL P13: Column Summaries P14: Library Cleaning |
DS 100, Chapter 8 (Data Representation)
DS 100: Chapter 9 (Data Quality), DS 100: Sections 11.1-11.3 (Data Visualization) Matplotlib Tools (Hands On ML) |
|
#8: Monday, 27 September |
Visualizing GIS Data, GeoJSON, Choropleth Maps, Voronoi Diagrams
Code Demo: Interactive Library Maps, School District Choropleth Maps Classwork: GeoJSON Editor |
Q8: Data Frames (Python)
P15: Plotting Challenge P16: Choropleth Attendance Cleaning |
DS 100: Sections 11.1-11.3 (Data Visualization) Folium documentation, GeoJSON Editor |
||
#9: Thursday, 30 September |
Visualization Principles Code Demo: Voronoi Diagrams, Classwork: Altair: declarative visualization techniques |
Q9: Python Functions P17: Grouping ELA/Math by Districts P18: Log Scale |
DS 100: Chapter 11 (Data Visualization), Altair overview Altair maps (gallery of case studies) |
||
Models & Loss Functions | #10: Monday, 4 October |
Probability Distributions Loss Functions: Mean Squared and Mean Absolute Error Code Demo: Textbook: Restaurant Tips |
Q10: Coding Quiz
P19: Smoothing with Gaussians Project Pre-Proposal P20: Loss Functions for Tips |
DS 100, Sections 4.2-3 (Loss Functions), Seeing Theory (Brown U), Sampling from a Normally Distributed Population (UBC) | |
#11: Thursday, 7 October |
More on Loss Functions: Huber Loss, Properties of Different Loss Functions Correlation, Linear Regression, Residuals |
Q11: Data Visualization
P21: Taxi Cleaning P22: Dice Simulator |
DS 100: Chapter 4 (Modeling Intro), DS 100: Chapter 15 (Linear Models), DS 8: Chapter 15 (Prediction), Guessing Correlation Coefficients (UBC), Residuals (UBC) |
||
11 October | No Class | ||||
#12: Thursday, 14 October |
Least Squares Probability Overview: Distributions Probability Mass Functions, Central Limit Theorem Classwork: Predicting taxi tips & costs (NYC OpenData Yellow Taxi Data) |
Q12: Loss Functions P23: Correlation Coefficients P24: Enrollments |
DS 8: Chapter 9 (Randomness), DS 100: Chapter 16 (Probability & Generalization) Central Limit Theorem (UBC), DS 100: Chapter 17 (Gradient Descent) |
||
#13: Monday, 18 October |
Recap: Confidence Intervals; Gradient Descent, Convexity, Fitting Models Classwork: Gradient Descent |
Q13: Probability & Risk P25: PMF of Senators' Ages Project: Title & Proposal P26: Weekday Entries |
DS 100: Chapter 17 (Gradient Descent), Confidence Intervals (UBC) | ||
Multiple Linear Modeling | #14: Thursday, 21 October |
Multiple Linear Regression; More on Gradient Descent | Q14: Linear Models & Gradient Descent P27: Fitting OLS P28: CS Courses |
DS 100: Chapter 19 (Multiple Linear Regression) Gradient Descent Visualization (Lili Jiang) |
|
#15: Monday, 25 October |
Feature Engineering Overview Code Demo: Walmart Sales |
Q15: Coding Quiz
P29: Predictions with MLM's Project: Peer Review #1 P30: Computing Ranges |
DS 100: Chapter 20 (Feature Engineering) | ||
#16: Thursday, 28 October |
Feature Engineering: Variable Transformations;
Bias-Variance Tradeoff Classwork: Friday Attendance Code Demo: Ice Cream Ratings |
Q16: Review P31: Sampling Distributions P32: Attendance |
DS 100: Chapter 20 (Feature Engineering)
DS 100: Chapter 21 (Bias-Variance Tradeoff) |
||
Classification | #17: Monday, 1 November |
Feature Engineering: Categorical Encoding; Overview of Regularization; Regression on Probabililities; The Logistic Model & Loss Function;
Classwork: 311 Pothole Dataset |
Q17: Feature Engineering P33: Confidence Intervals P34: Polynomial Features |
DS 100: Chapter 22 (Regularization)
DS 100: Chapter 24 (Classification) |
|
#18: Thursday, 4 November |
Using Logistic Models: Approximating the Empirical Probability Distribution; Fitting & Evaluating a Logistic Model;
Code Demo: Free Throws |
Q18: Logistic Model P35: Parking Tickets P36: Multiple Locations |
DS 100: Chapter 24 (Classification) | ||
#19: Monday, 8 November |
Evaluating Logistic Models; Confusion Matrices Classwork: Binary Digit Classifier Support Vector Machines (SVM) |
Q19: Logistic Regression P37: Score Predictor Project Check-in #1 (Data Collection) P38: Ticket Prep |
DS 100: Chapter 24 (Classification) Python DS Handbook Chapter 5 (SVMs) Karparthy's SVM Demo (Stanford) |
||
#20: Thursday, 11 November |
More on SVM's; One-Versus-Rest (OVR) Classification; Survey of Classifier Techniques Classwork: Iris Classification |
Q20: Coding Quiz P39: Binary Digit Classifier P40: Enrollment by Courses |
Data Camp Tutorial (SVM's) SVM's (sklearn) SKLearn (Recognizing Hand-Written Digits) |
||
Dimensionality Reduction | #21: Monday, 15 November |
Linear Algebra Recap; Principal Components Analysis Code Demo: Explained Visually: Eigenvectors & Eigenvalues |
Q21: Classification 41: Classifier Misses Project Check-in #2 (Analysis) P42: Ticket Predictor |
Explained Visually (Eigenvectors and Eigenvalues) Explained Visually (Principal Components Analysis) Linear Algebra Review (MIT) DS 100: Section 26.1 (PCA Dimensions) |
|
#22: Thursday, 18 November |
PCA as Dimensionality Reduction; Optimal Number of Components Multiple Dimensional Scaling |
Q22: Linear Algebra P43: Moving P44: Model Comparison |
DS 100: Chapter 26 (PCA) Python Data Science Handbook: Section 5.9 (PCA) |
||
#23: Monday, 22 November |
Non-Linear Dimensionality Reduction: t-SNE, UMAP
Project Update: Presentation Details: Abstract, Website, and Presentation |
Q23: PCA P45: Component Retention Project Check-in #3 (Visualization) P46: Digits Components |
Manifold Learning (sklearn) | ||
25-26 November | Thanksgiving Break: College Closed | ||||
Clustering | #24: Monday, 29 November |
Recap: Dimensionality Reduction; Code Demo: transit vs. Euclidean distances K-Means Clustering |
Q24: MDS P47: Voting MDS Project: Draft Abstract & Website P48: Transit Distances |
DS 100: Chapter 28 (clustering)
Wiki K-Means (gif) Python Data Science Handbook: Section 5.11 (K-Means) |
|
#25: Thursday, 2 December |
K-Means: Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch K-Means |
Q25: Coding Quiz P49: Toy Clusters Project: Peer Review #2 |
DS 100: Chapter 28 (clustering)
Python Data Science Handbook: Section 5.9 (K-Means) |
||
#26: Monday, 6 December |
Other Clustering Approaches; Supervised vs. Unsupervised Learning |
Q26: K-Means Clustering P50: 4-Coloring Project: Abstract |
DS 100: Chapter 28 (clustering)
Wiki Cluster Analysis Supervised vs. Unsupervised Learning (IBM) |
||
Replicability | #27: Thursday, 9 December |
Replicability, P-Hacking, A/B testing
Classwork: Coding Challenges (Core Python Recap) |
Q27: Review Project: Website Project: Presentation Slides |
DS Chapter 25 (Replicability) | |
Review | #28: Monday, 13 December |
Review | Q28: End-of-semester Survey |
||
Monday, 20 December, 1:45-3:45pm | Final Exam |