Date:  Topics:  Handouts:  Reading:  Quiz Topics:  HW/Project: 
#1
2 February 
First Day Details, Topics Overview, Python 2 vs. 3, Python Refresher: basics; Quick look at matplotlib's line and bar charts;  Syllabus,
DS venn diagram, Gallery: NY density, nearest airport, citibike, precincts, buses vs. subways, transit + census, life spans, ebola, disease, jobs; Printing (from __future__), Plotting recipes, matplotlib, Textbook's repo 
Academic Integrity Policy, Chapters 13 
#1: Academic Integrity  
#2
4 February 
More on matplotlib:
histograms and scatterplots;
Data as vectors: scaling, dot products;
Means & Variance;
Python Refresher: list comprehensions & zip 
list comprehension examples, matplotlib, Textbook's repo, summaries sometimes hides the big picture, Anscombe's Quartet  Chapters 2,4,5  
4 February  Last day to drop without "WD" grade  
9 February  No class: Classes follow a Friday schedule  
#3
11 February 
Statistics: Basics;
Python Refresher: lists, tuples, & dictionaries 
weather.py, lymeScaled.py, lists vs. tuples, basic stats, dictionary examples  Chapters 2,5  #2: Python Basics  HW #1: Simple graphs with pyplot 
#4
16 February 
More on Stats: Correlation & Causation, Simpson's Paradox; Getting Data: CSV Files 
book's statistics.py (depends on linear_algebra.py), Simpson's paradox wiki, wage growth paradox, simple csv example & data  Chapters 2,6,9  #3: Vectors, Means, and Variances  HW #2: Scaling Vector Data 
#5
18 February 
Probability: Distributions & Central Limit Theorem;
Python Refresher: collections 
dsWiki.txt (for group work), normal distribution calculator, rolling dice, Central Limit Theorem Visualized, Matt Nedrich on CLT  Chapters 2,6  
18 February  Last day to drop with "WD" grade  
#6
23 February 
Bayes Theorem;
Naive Bayes: Spam Filter Example;
Python Refresher: regular expressions 
regex cheat sheet, book's naive Bayes spam filter, spam dataset  Chapters 2,6,13  #4: Python Lists, Dictionaries, & csv  HW #3: Binning Data & Measuring Dispersion 
#7
25 February 
More on Bayes Theorem; Hypothesis & Inference;
Applications; Python Refresher: more on matplotlib & sets 
book's naive Bayes spam filter, spam dataset, twoPlots.py, subplots  Chapters 2,7,8  
#8
1 March 
Hypothesis & Inference: Confidence Intervals; Python Refresher: more on matplotlib  Khan Academy on hypothesis testing, normal distribution calculator  Chapters 2,7  #5: Correlation & Bayes Theorem  HW #4: Correlations & Distributions 
#9
3 March 
More on Confidence Intervals, A/B Testing;
Python Refresher: numpy 
Khan Academy on confidence intervals, numpy, plotting revisited  Chapters 7,25  
#10
8 March 
Manipulating image files with numpy; Gradient Descent: Estimating, Choosing Right Step Size 
scipy lecture notes on arrays, arrays & images, Matt Nedrich's intro to gradient descent & example, Quinn Liu's gradient descent image, 3d surface example code, mplot3d tutorial, matplotlib colormaps  Chapters 8,9,25  #6: Regular Expressions  HW #5: Bayes Theorem, Simpson's Paradox, & Regular Expressions 
#11
10 March 
More on gradient descent;
Example: Simple Linear Regression; Geographical maps in matplotlib: basemap 
Matt Nedrich's intro to gradient descent &
example, Andrew Ng's linear regression notes; basemap, basemap introduction 
Chapters 2,8,9  
#12
15 March 
Linear Algebra Refresher: Eigenvalues & Eigenvectors; Using standard data formats: ERSI's shapefiles, JSON, KML; More on basemap: using shapefiles; 
Eigenvectors & eigenvalues, visually,
linear transformations example;
ERSI's shapefiles, shapefile wikipage, json, KML, summary & comparison, gdal conversion tools, NYC shapefiles, shapefiles in basemap tutorial, shapefiles in basemap 
Chapters 9,10  #7: Hypothesis & Inference  HW #6: A/B Testing 
#13
17 March 
Using github; Working with Data: Exploring and Visualizing; More on Getting Data: scraping webpages, builtin methods, beautifulSoup; Python Refresher: command line, args & kwargs 
github for beginners,
github Hello World, github student pack,
github cheat sheet; Anscombe's Quartet beautifulSoup, soup documentation, where's beautifulSoup?, Frances Zlotnick's tutorial, DOM tutorial, book's code 
Chapters 2,10,25  
#14
22 March 
Working with Multidimensional Data: Rescaling, Principal Components Analysis; Not from scratch: scipy, scikitlearn & Visualization Python Refresher: iterators & generators 
PCA, explained visually, Lindsay Smith's computing PCA, Sebastian Raschka's PCA overview and implementating in Python; scipy, sklearn's PCA, pca on iris dataset, NY Fed's unemployment rates and by major 
Chapters 2,10,25  #8: Gradient Descent & numpy  HW #7: Gradient Descent & Images 
#15
24 March 
Machine Learning: Modeling, Overfitting, Feature Extraction & Selection;
Python Refresher: lambdas & functions as arguments 
generators in Python, lambdaSortingEx.py  Chapters 2,11  
#16
29 March 
Other plotting packages: D3 (javascript) and bokeh (python); Distances for Multidimensional Data; kNearest Neighbors: Language Example, Curse of Dimensionality, Python Refresher: exceptions 
Data Driven Documents (D3),
bokeh (D3 styled graphics in Python),
bokeh quickstart,
bokehPlottingEx.py, bokehChartEx.py 
Chapters 2,11,12  #9: Eigenvectors & eigenvalues  HW #8: Mapping Data & Markov Chains Project: Proposal 
#17
31 March 
Nearest Neighbors & Voronoi Diagrams; Clustering: kmeans 
nearest airport, precincts' Voronoi diagram,
Voronoi diagrams from triagulations, scipy Voronoi module
kmeans (wiki), kmeans image example, scikitlearn clustering, 
Chapters 12,19  
#18
5 April 
More on clustering: hierarchical clustering, Multidimensional Scaling (MDS)  k means example, knearestneighbor versus kmeans, scikitlearn clustering,
NYC Schools, MS data (for in class) scikit's MDS, Noel O'Boyle's map example, Zachary Nichols' NYC scaled to commute time and part 2 
Chapters 10,19  #10: Using github & beautifulSoup  HW #9: Shading Maps & PCA 
#19
7 April 
Linear Regression, revisited;
Multiple Regression 
regression recap  Chapters 1415  
11 April  Last day to drop with "W" grade  
#20
12 April 
More on Regression:
The Bootstrap,
Logistic Regression; Support Vector Machines 
logistic regression wiki, Marcel Caracliolo's university entrance example,
dummies on iris data set,
sklearn logistic regression,
311 Requests (filter for Descriptor = "Pothole"), bootstrapping wiki, Auckland animation resampling from sample vs. samples 
Chapter 16  #11: PCA  HW #10: Nearest Neighbors
Project: Timeline 
#21
14 April 
More on SVMs; Natural Language Processing (NLP) 
SVM intro,
sklearn ML introduction,
sklearn svm,
face recognition,
sklearn ML intro,
sklearn ML advanced

Chapters 16,20  
#22
19 April 
More on NLP; Decision Trees 
wordle,
Google's ngram viewer,
Norvig's ngrams wiki decision trees, sklearn decision trees 
Chapters 17,20  #12: Nearest Neighbors & Clustering  HW #11: Voronoi Diagrams & Clustering
Project: Data Collection 
#23
21 April 
Refresher: Trees & Graphs;
Network Analysis 
networkx tutorial, Cambridge tutorial, graph review  Chapter 21  
2230 April  Spring Recess: No Classes  
#24
3 May 
Recommender Systems Neural Networks 
book's network analysis script, networkx builtin graphs, Knuth miles data, deep learning tutorial (Stanford), neural net wiki  Chapters 18, 22  #13: Regression & NLP  HW #12: MDS & Regression Project: Analysis Project: Visualization & Draft Slide 
#25
5 May 
MapReduce & PageRank  PageRank as applied lin. alg. (SIAM Review 2006)  Chapter 23  
#26
10 May 
Crash Course in SQL  Khan Academy on SQL, sqlitebrowser, sqlite, SQL lab  Chapter 24  Complete Project  
#27
12 May 
Not from scratch: iPython (jupyter), pandas, and seaborn 
Thomas Wiecki's modern guide to data science,
OpenTechSchool iPython tutorial, pandas cookbook, cheat sheet, seaborn, elevator data 
Chapter 25  
#28
17 May 
Project Presentations  Project Sneak Preview Slide  
1920 May  Reading Days (no class)  
Tuesday 24 May 11am1pm 
Optional Review (meets in Gillet 137)  
Thursday 26 May 11am1pm 
Final Examination (required) 