Help::General Questions
.
Help::Individual Questions
.
Theme:  Topics:  Coursework:  Reading:  

Data Science Overview  #1: Thursday, 26 August 
Syllabus & Class Policies; Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference Code Demo: Textbook: Predicting Ages from SSN Data Code Demo: Python Recap: basics & standard packages (pandas, numpy, matplotlib, & seaborn), zips and list comprehensions 
Q1: Academic Integrity P1: Hello, world P2: Senators' Names 
DS 100: Chapter 1 (Data Science Lifecycle)  
#2: Monday, 30 August 
Guest Speaker: Elise Harris, Coordinator for Tech Internships and External Partnerships: Tech Internships Exploratory Data Analysis, Generalizing from Data, Data Sampling, Probability Sampling Classwork: Are Senators older than Representatives? Code Demo: Python string methods 
Q2: Python Recap P3: Senators' Ages P4: ELA Proficiency 
DS 100: Chapter 2 (Data Scope), DS 100: Chapter 3 (Data Design), DS 100: Section 13.1 (Python String Methods) 

Rectangular Data  #3: Thursday, 2 September 
Data Representation: standard primitive types, rectangular data, data tables in Python (Pandas)
Code Demo: Regular Expressions Guest Speaker: Provost Valeda Dent: The Rural Village Libraries Research Network project Classwork: Measuring impact of libraries in NYC communities 
Q3: Data Sampling P5: URL Collection P6: Regex on Restaurant Inspection Data 
DS 100: Chapter 7 (Data Tables in Python) DS 100: Sections 13.23 (Regular Expressions) 

6 September  Labor Day: College Closed  
#4: Thursday, 9 September 
Relational Databases and SQL Code Demo: SQL in Python: setting up a database, basic SQL 
Q4: Python Strings & Data Types P7: Neighborhood Tabulation Areas P8: Restaurant SQL Queries 
DS 100: Chapter 6 (Relational Databases & SQL)  
#5: Monday, 13 September 
Aggregating Data in SQL and Pandas: Code Demo: Revisiting Python Functions: Applying Functions to Tables 
Q5: Coding Quiz P9: Aggregating Restaurant Data (SQL) P10: Extracting Districts 
DS 100: Chapter 6 (Relational Databases & SQL), DS 100: Chapter 7 (Data Tables in Python), python.org: Section 4.7 (More on Defining Functions) 

16 September  No Class  
#6: Monday, 20 September 
Joining Data in SQL and Pandas
Classwork: Combining NYC schools data Code Demo: Lambda Expressions Project Overview 
Q6: Regular Expressions P11: Joining Restaurant & NTA Data Project PreProposal Window Opens P12: MTA Ridership 
DS 100: Chapter 6 (Relational Databases & SQL), DS 100: Chapter 7 (Data Tables in Python), python.org: Section 4.7 (More on Defining Functions)  
Data Visualization  #7: Thursday, 23 September 
Plotting Numerical & Categorical Data, TimeSeries Data Code Demo: Customizing Plots in matplotlib & seaborn Classwork: Plotting MTA Ridership Data Code Demo: Revisiting Python Functions: Defaults, Keywords, Unpacking Argument Lists 
Q7: SQL P13: Column Summaries P14: Library Cleaning 
DS 100, Chapter 8 (Data Representation)
DS 100: Chapter 9 (Data Quality), DS 100: Sections 11.111.3 (Data Visualization) Matplotlib Tools (Hands On ML) 

#8: Monday, 27 September 
Visualizing GIS Data, GeoJSON, Choropleth Maps, Voronoi Diagrams
Code Demo: Interactive Library Maps, School District Choropleth Maps Classwork: GeoJSON Editor 
Q8: Data Frames (Python)
P15: Plotting Challenge Project PreProposal P16: Choropleth Attendance Cleaning 
DS 100: Sections 11.111.3 (Data Visualization) Folium documentation, GeoJSON Editor 

#9: Thursday, 30 September 
Visualization Principles Code Demo: Voronoi Diagrams, Classwork: Altair: declarative visualization techniques 
Q9: Python Functions P17: Grouping ELA/Math by Districts P18: Altair Challenge 
DS 100: Chapter 11 (Data Visualization), Altair overview Altair maps (gallery of case studies) 

Models & Loss Functions  #10: Monday, 4 October 
More on Modeling and Estimation,
Introduction to Models & Loss Functions: Absolute and Huber Loss
Code Demo: Textbook: Restaurant Tips 
Q10: Coding Quiz
P19: Modeling Restaurant Tips Project: Title & Proposal P20: Taxi Cleaning 
DS 100, Sections 4.23 (Loss Functions),  
#11: Thursday, 7 October 
Linear Regression, Least Squares
Classwork: Predicting taxi tips & costs (NYC OpenData Yellow Taxi Data) 
Q11: Data Visualization
P21: Taxi Tips Project: Peer Review #1 P22: Dice Simulator 
DS 100: Chapter 4 (Modeling Intro), DS 100: Chapter 15 (Linear Models) 

11 October  No Class  
#12: Thursday, 14 October 
Expectation & Variance, Risk,
Gradient Descent Classwork: Simulating Randomness Code Demo: Gradient Descent 
Q12: Probability & Risk P23: PMF of Senators' Ages P24: Fitting LM's to Taxi Data 
DS 8: Chapter 9 (Randomness), DS 100: Chapter 16 (Probability & Generalization) DS 100: Chapter 17 (Gradient Descent) 

#13: Monday, 18 October 
Stochastic Gradient Descent, Convexity, Fitting Models  Q13: Loss Functions P25: Project: Timeline P26: 
DS 100: Chapter 17 (Gradient Descent)  
Multiple Linear Modeling  #14: Thursday, 21 October 
Multiple Linear Regression  Q14: Linear Regression P27: MLM's for Taxi Trips P28: 
DS 100: Chapter 19 (Multiple Linear Regression)  
#15: Monday, 25 October 
Feature Engineering, BiasVariance Tradeoff Code Demo: Predicting Ice Cream Ratings 
Q15: Coding Quiz
P29: P30: 
DS 100: Chapter 20 (Feature Engineering) DS 100: Chapter 21 (BiasVariance Tradeoff) 

#16: Thursday, 28 October 
Regularization  Q16: Gradient Descent P31: P32: 
DS 100: Chapter 22 (Regularization)  
Classification  #17: Monday, 1 November 
Regression on Probabililities; The Logistic Model & Loss Function;
Classwork: Using Logistic Regression 
Q17: P33: Project: Data Collection P34: 
DS 100: Chapter 24 (Classification)  
#18: Thursday, 4 November 
Using Logistic Models: Approximating the Empirical Probability Distribution; Fitting & Evaluating a Logistic Model;  Q18: P35: P36: 
DS 100: Chapter 24 (Classification)  
#19: Monday, 8 November 
Logistic Model: Multiclass Classification; Support Vector Machines 
Q19: P37: Project: Analysis P38: 
DS 100: Chapter 24 (Classification)  
#20: Thursday, 11 November 
Survey of Classifier Techniques  Q20: Coding Quiz P39: P40: 

Dimensionality Reduction  #21: Monday, 15 November 
Principal Components Analysis Classwork: PCA Explained Visually 
Q21: P41: Project: Visualization P42: 
Explained Visually (Principal Components Analysis) Python Data Science Handbook: Section 5.9 (PCA) DS 100: Section 26.1 (PCA Dimensions) 

#22: Thursday, 18 November 
PCA as Dimensionality Reduction Multiple Dimensional Scaling 
Q22: P43: Component Retention P44: Digits Components 

#23: Monday, 22 November 
NonLinear Dimensionality Reduction: tSNE, UMAP
Code Demo: more dimensionality reduction (sklearn) 
Q23: P45: Mystery Point P46: 
Manifold Learning (sklearn)  
2526 November  Thanksgiving Break: College Closed  
Clustering  #24: Monday, 29 November 
Q24: P47: Project: Draft Abstract & Website P48: 

#25: Thursday, 2 December 
Q25: Coding Quiz P49: Project: Peer Review #2 

#26: Monday, 6 December 
Q26: P50: Project: Abstract 

Replicability  #27: Thursday, 9 December 
Replicability, PHacking, A/B testing
Classwork: A/B Testing 
Q27: Project: Website Project: Video 
DS Chapter 25 (Replicability)  
Review  #28: Monday, 13 December 
Review  Q28: Endofsemester Survey 

Monday, 20 December, 1:453:45pm  Final Exam 