Help::General Questions
.
Help::Individual Questions
.
Theme:  Topics:  Coursework:  Reading:  

Data Science Overview  #1: Thursday, 26 August 
Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference Code Demo: Textbook: Predicting Ages from SSN Data Code Demo: Python Recap: basics & standard packages (pandas, numpy, matplotlib, & seaborn), zips and list comprehensions 
Q1: Academic Integrity P1: Hello, world P2: Senators' Names 
DS 100: Chapter 1 (Data Science Lifecycle)  
#2: Monday, 30 August 
Guest Speaker: Elise Harris, Coordinator for Tech Internships and External Partnerships: Tech Internships Exploratory Data Analysis, Generalizing from Data, Data Sampling, Probability Sampling Classwork: Are Senators older than Representatives? Code Demo: Python string methods 
Q2: Python Recap P3: Senators' Ages P4: ELA Proficiency 
DS 100: Chapter 2 (Data Scope), DS 100: Chapter 3 (Data Design), DS 100: Section 13.1 (Python String Methods) 

Rectangular Data  #3: Thursday, 2 September 
Data Representation: standard primitive types, rectangular data, data tables in Python (Pandas)
Code Demo: Regular Expressions Guest Speaker: Provost Valeda Dent: The Rural Village Libraries Research Network project Classwork: Measuring impact of libraries in NYC communities 
Q3: Data Sampling P5: URL Collection P6: Regex on Restaurant Inspection Data 
DS 100: Chapter 7 (Data Tables in Python) DS 100: Sections 13.23 (Regular Expressions) 

6 September  Labor Day: College Closed  
#4: Thursday, 9 September 
Relational Databases and SQL Code Demo: SQL in Python: setting up a database, basic SQL 
Q4: Python Strings & Data Types P7: Neighborhood Tabulation Areas P8: Restaurant SQL Queries 
DS 100: Chapter 6 (Relational Databases & SQL)  
#5: Monday, 13 September 
Aggregating Data in SQL and Pandas: Code Demo: Revisiting Python Functions: Applying Functions to Tables 
Q5: Coding Quiz P9: Aggregating Restaurant Data (SQL) P10: Extracting Districts 
DS 100: Chapter 6 (Relational Databases & SQL), DS 100: Chapter 7 (Data Tables in Python), python.org: Section 4.7 (More on Defining Functions) 

16 September  No Class  
#6: Monday, 20 September 
Joining Data in SQL and Pandas
Classwork: Combining NYC schools data Code Demo: Lambda Expressions Project Overview 
Q6: Regular Expressions P11: Joining Restaurant & NTA Data Project PreProposal Window Opens P12: MTA Ridership 
DS 100: Chapter 6 (Relational Databases & SQL), DS 100: Chapter 7 (Data Tables in Python), python.org: Section 4.7 (More on Defining Functions)  
Data Visualization  #7: Thursday, 23 September 
Plotting Numerical & Categorical Data, TimeSeries Data Code Demo: Customizing Plots in matplotlib & seaborn Classwork: Plotting MTA Ridership Data Code Demo: Revisiting Python Functions: Defaults, Keywords, Unpacking Argument Lists 
Q7: SQL P13: Column Summaries P14: Library Cleaning 
DS 100, Chapter 8 (Data Representation)
DS 100: Chapter 9 (Data Quality), DS 100: Sections 11.111.3 (Data Visualization) Matplotlib Tools (Hands On ML) 

#8: Monday, 27 September 
Visualizing GIS Data, GeoJSON, Choropleth Maps, Voronoi Diagrams
Code Demo: Interactive Library Maps, School District Choropleth Maps Classwork: GeoJSON Editor 
Q8: Data Frames (Python)
P15: Plotting Challenge P16: Choropleth Attendance Cleaning 
DS 100: Sections 11.111.3 (Data Visualization) Folium documentation, GeoJSON Editor 

#9: Thursday, 30 September 
Visualization Principles Code Demo: Voronoi Diagrams, Classwork: Altair: declarative visualization techniques 
Q9: Python Functions P17: Grouping ELA/Math by Districts P18: Log Scale 
DS 100: Chapter 11 (Data Visualization), Altair overview Altair maps (gallery of case studies) 

Models & Loss Functions  #10: Monday, 4 October 
Probability Distributions Loss Functions: Mean Squared and Mean Absolute Error Code Demo: Textbook: Restaurant Tips 
Q10: Coding Quiz
P19: Smoothing with Gaussians Project PreProposal P20: Loss Functions for Tips 
DS 100, Sections 4.23 (Loss Functions), Seeing Theory (Brown U), Sampling from a Normally Distributed Population (UBC)  
#11: Thursday, 7 October 
More on Loss Functions: Huber Loss, Properties of Different Loss Functions Correlation, Linear Regression, Residuals 
Q11: Data Visualization
P21: Taxi Cleaning P22: Dice Simulator 
DS 100: Chapter 4 (Modeling Intro), DS 100: Chapter 15 (Linear Models), DS 8: Chapter 15 (Prediction), Guessing Correlation Coefficients (UBC), Residuals (UBC) 

11 October  No Class  
#12: Thursday, 14 October 
Least Squares Probability Overview: Distributions Probability Mass Functions, Central Limit Theorem Classwork: Predicting taxi tips & costs (NYC OpenData Yellow Taxi Data) 
Q12: Loss Functions P23: Correlation Coefficients P24: Enrollments 
DS 8: Chapter 9 (Randomness), DS 100: Chapter 16 (Probability & Generalization) Central Limit Theorem (UBC), DS 100: Chapter 17 (Gradient Descent) 

#13: Monday, 18 October 
Recap: Confidence Intervals; Gradient Descent, Convexity, Fitting Models Classwork: Gradient Descent 
Q13: Probability & Risk P25: PMF of Senators' Ages Project: Title & Proposal P26: Weekday Entries 
DS 100: Chapter 17 (Gradient Descent), Confidence Intervals (UBC)  
Multiple Linear Modeling  #14: Thursday, 21 October 
Multiple Linear Regression; More on Gradient Descent  Q14: Linear Models & Gradient Descent P27: Fitting OLS P28: CS Courses 
DS 100: Chapter 19 (Multiple Linear Regression) Gradient Descent Visualization (Lili Jiang) 

#15: Monday, 25 October 
Feature Engineering Overview Code Demo: Walmart Sales 
Q15: Coding Quiz
P29: Predictions with MLM's Project: Peer Review #1 P30: Computing Ranges 
DS 100: Chapter 20 (Feature Engineering)  
#16: Thursday, 28 October 
Feature Engineering: Variable Transformations;
BiasVariance Tradeoff Classwork: Friday Attendance Code Demo: Ice Cream Ratings 
Q16: Review P31: Sampling Distributions P32: Attendance 
DS 100: Chapter 20 (Feature Engineering)
DS 100: Chapter 21 (BiasVariance Tradeoff) 

Classification  #17: Monday, 1 November 
Feature Engineering: Categorical Encoding; Overview of Regularization; Regression on Probabililities; The Logistic Model & Loss Function;
Classwork: 311 Pothole Dataset 
Q17: Feature Engineering P33: Confidence Intervals P34: Polynomial Features 
DS 100: Chapter 22 (Regularization)
DS 100: Chapter 24 (Classification) 

#18: Thursday, 4 November 
Using Logistic Models: Approximating the Empirical Probability Distribution; Fitting & Evaluating a Logistic Model;
Code Demo: Free Throws 
Q18: Logistic Model P35: Parking Tickets P36: Multiple Locations 
DS 100: Chapter 24 (Classification)  
#19: Monday, 8 November 
Evaluating Logistic Models; Confusion Matrices Classwork: Binary Digit Classifier Support Vector Machines (SVM) 
Q19: Logistic Regression P37: Score Predictor Project Checkin #1 (Data Collection) P38: Ticket Prep 
DS 100: Chapter 24 (Classification) Python DS Handbook Chapter 5 (SVMs) Karparthy's SVM Demo (Stanford) 

#20: Thursday, 11 November 
More on SVM's; OneVersusRest (OVR) Classification; Survey of Classifier Techniques Classwork: Iris Classification 
Q20: Coding Quiz P39: Binary Digit Classifier P40: Enrollment by Courses 
Data Camp Tutorial (SVM's) SVM's (sklearn) SKLearn (Recognizing HandWritten Digits) 

Dimensionality Reduction  #21: Monday, 15 November 
Linear Algebra Recap; Principal Components Analysis Code Demo: Explained Visually: Eigenvectors & Eigenvalues 
Q21: Classification 41: Classifier Misses Project Checkin #2 (Analysis) P42: Ticket Predictor 
Explained Visually (Eigenvectors and Eigenvalues) Explained Visually (Principal Components Analysis) Linear Algebra Review (MIT) DS 100: Section 26.1 (PCA Dimensions) 

#22: Thursday, 18 November 
PCA as Dimensionality Reduction; Optimal Number of Components Multiple Dimensional Scaling 
Q22: Linear Algebra P43: Moving P44: Model Comparison 
DS 100: Chapter 26 (PCA) Python Data Science Handbook: Section 5.9 (PCA) 

#23: Monday, 22 November 
NonLinear Dimensionality Reduction: tSNE, UMAP
Project Update: Presentation Details: Abstract, Website, and Presentation 
Q23: PCA P45: Component Retention Project Checkin #3 (Visualization) P46: Digits Components 
Manifold Learning (sklearn)  
2526 November  Thanksgiving Break: College Closed  
Clustering  #24: Monday, 29 November 
Recap: Dimensionality Reduction; Code Demo: transit vs. Euclidean distances KMeans Clustering 
Q24: MDS P47: Voting MDS Project: Draft Abstract & Website P48: Transit Distances 
DS 100: Chapter 28 (clustering)
Wiki KMeans (gif) Python Data Science Handbook: Section 5.11 (KMeans) 

#25: Thursday, 2 December 
KMeans: Clustering Complexity, Lloyd's Algorithm (Naive KMeans), MiniBatch KMeans 
Q25: Coding Quiz P49: Toy Clusters Project: Peer Review #2 
DS 100: Chapter 28 (clustering)
Python Data Science Handbook: Section 5.9 (KMeans) 

#26: Monday, 6 December 
Other Clustering Approaches; Supervised vs. Unsupervised Learning 
Q26: KMeans Clustering P50: 4Coloring Project: Abstract 
DS 100: Chapter 28 (clustering)
Wiki Cluster Analysis Supervised vs. Unsupervised Learning (IBM) 

Replicability  #27: Thursday, 9 December 
Replicability, PHacking, A/B testing
Classwork: Coding Challenges (Core Python Recap) 
Q27: Review Project: Website Project: Presentation Slides 
DS Chapter 25 (Replicability)  
Review  #28: Monday, 13 December 
Review  Q28: Endofsemester Survey 

Monday, 20 December, 1:453:45pm  Final Exam 