Date:          | Topics: | Handouts: | Reading: | Quiz Topics: | HW/Project: |
#1
2 February |
First Day Details, Topics Overview, Python 2 vs. 3, Python Refresher: basics; Quick look at matplotlib's line and bar charts; | Syllabus,
DS venn diagram, Gallery: NY density, nearest airport, citibike, precincts, buses vs. subways, transit + census, life spans, ebola, disease, jobs; Printing (from __future__), Plotting recipes, matplotlib, Textbook's repo |
Academic Integrity Policy, Chapters 1-3 |
#1: Academic Integrity | |
#2
4 February |
More on matplotlib:
histograms and scatterplots;
Data as vectors: scaling, dot products;
Means & Variance;
Python Refresher: list comprehensions & zip |
list comprehension examples, matplotlib, Textbook's repo, summaries sometimes hides the big picture, Anscombe's Quartet | Chapters 2,4,5 | ||
4 February | Last day to drop without "WD" grade | ||||
9 February | No class: Classes follow a Friday schedule | ||||
#3
11 February |
Statistics: Basics;
Python Refresher: lists, tuples, & dictionaries |
weather.py, lymeScaled.py, lists vs. tuples, basic stats, dictionary examples | Chapters 2,5 | #2: Python Basics | HW #1: Simple graphs with pyplot |
#4
16 February |
More on Stats: Correlation & Causation, Simpson's Paradox; Getting Data: CSV Files |
book's statistics.py (depends on linear_algebra.py), Simpson's paradox wiki, wage growth paradox, simple csv example & data | Chapters 2,6,9 | #3: Vectors, Means, and Variances | HW #2: Scaling Vector Data |
#5
18 February |
Probability: Distributions & Central Limit Theorem;
Python Refresher: collections |
dsWiki.txt (for group work), normal distribution calculator, rolling dice, Central Limit Theorem Visualized, Matt Nedrich on CLT | Chapters 2,6 | ||
18 February | Last day to drop with "WD" grade | ||||
#6
23 February |
Bayes Theorem;
Naive Bayes: Spam Filter Example;
Python Refresher: regular expressions |
regex cheat sheet, book's naive Bayes spam filter, spam dataset | Chapters 2,6,13 | #4: Python Lists, Dictionaries, & csv | HW #3: Binning Data & Measuring Dispersion |
#7
25 February |
More on Bayes Theorem; Hypothesis & Inference;
Applications; Python Refresher: more on matplotlib & sets |
book's naive Bayes spam filter, spam dataset, twoPlots.py, subplots | Chapters 2,7,8 | ||
#8
1 March |
Hypothesis & Inference: Confidence Intervals; Python Refresher: more on matplotlib | Khan Academy on hypothesis testing, normal distribution calculator | Chapters 2,7 | #5: Correlation & Bayes Theorem | HW #4: Correlations & Distributions |
#9
3 March |
More on Confidence Intervals, A/B Testing;
Python Refresher: numpy |
Khan Academy on confidence intervals, numpy, plotting revisited | Chapters 7,25 | ||
#10
8 March |
Manipulating image files with numpy; Gradient Descent: Estimating, Choosing Right Step Size |
scipy lecture notes on arrays, arrays & images, Matt Nedrich's intro to gradient descent & example, Quinn Liu's gradient descent image, 3d surface example code, mplot3d tutorial, matplotlib colormaps | Chapters 8,9,25 | #6: Regular Expressions | HW #5: Bayes Theorem, Simpson's Paradox, & Regular Expressions |
#11
10 March |
More on gradient descent;
Example: Simple Linear Regression; Geographical maps in matplotlib: basemap |
Matt Nedrich's intro to gradient descent &
example, Andrew Ng's linear regression notes; basemap, basemap introduction |
Chapters 2,8,9 | ||
#12
15 March |
Linear Algebra Refresher: Eigenvalues & Eigenvectors; Using standard data formats: ERSI's shapefiles, JSON, KML; More on basemap: using shapefiles; |
Eigenvectors & eigenvalues, visually,
linear transformations example;
ERSI's shapefiles, shapefile wikipage, json, KML, summary & comparison, gdal conversion tools, NYC shapefiles, shapefiles in basemap tutorial, shapefiles in basemap |
Chapters 9,10 | #7: Hypothesis & Inference | HW #6: A/B Testing |
#13
17 March |
Using github; Working with Data: Exploring and Visualizing; More on Getting Data: scraping webpages, built-in methods, beautifulSoup; Python Refresher: command line, args & kwargs |
github for beginners,
github Hello World, github student pack,
github cheat sheet; Anscombe's Quartet beautifulSoup, soup documentation, where's beautifulSoup?, Frances Zlotnick's tutorial, DOM tutorial, book's code |
Chapters 2,10,25 | ||
#14
22 March |
Working with Multidimensional Data: Rescaling, Principal Components Analysis; Not from scratch: scipy, scikit-learn & Visualization Python Refresher: iterators & generators |
PCA, explained visually, Lindsay Smith's computing PCA, Sebastian Raschka's PCA overview and implementating in Python; scipy, sklearn's PCA, pca on iris dataset, NY Fed's unemployment rates and by major |
Chapters 2,10,25 | #8: Gradient Descent & numpy | HW #7: Gradient Descent & Images |
#15
24 March |
Machine Learning: Modeling, Overfitting, Feature Extraction & Selection;
Python Refresher: lambdas & functions as arguments |
generators in Python, lambdaSortingEx.py | Chapters 2,11 | ||
#16
29 March |
Other plotting packages: D3 (javascript) and bokeh (python); Distances for Multidimensional Data; k-Nearest Neighbors: Language Example, Curse of Dimensionality, Python Refresher: exceptions |
Data Driven Documents (D3),
bokeh (D3 styled graphics in Python),
bokeh quickstart,
bokehPlottingEx.py, bokehChartEx.py |
Chapters 2,11,12 | #9: Eigenvectors & eigenvalues | HW #8: Mapping Data & Markov Chains Project: Proposal |
#17
31 March |
Nearest Neighbors & Voronoi Diagrams; Clustering: k-means |
nearest airport, precincts' Voronoi diagram,
Voronoi diagrams from triagulations, scipy Voronoi module
k-means (wiki), k-means image example, scikit-learn clustering, |
Chapters 12,19 | #18
5 April |
More on clustering: hierarchical clustering, Multidimensional Scaling (MDS) | k means example, k-nearest-neighbor versus k-means, scikit-learn clustering,
NYC Schools, MS data (for in class) scikit's MDS, Noel O'Boyle's map example, Zachary Nichols' NYC scaled to commute time and part 2 |
Chapters 10,19 | #10: Using github & beautifulSoup | HW #9: Shading Maps & PCA |
#19
7 April |
Linear Regression, revisited;
Multiple Regression |
regression recap | Chapters 14-15 | ||
11 April | Last day to drop with "W" grade | ||||
#20
12 April |
More on Regression:
The Bootstrap,
Logistic Regression; Support Vector Machines |
logistic regression wiki, Marcel Caracliolo's university entrance example,
dummies on iris data set,
sklearn logistic regression,
311 Requests (filter for Descriptor = "Pothole"), bootstrapping wiki, Auckland animation re-sampling from sample vs. samples |
Chapter 16 | #11: PCA | HW #10: Nearest Neighbors
Project: Timeline |
#21
14 April |
More on SVMs; Natural Language Processing (NLP) |
SVM intro,
sklearn ML introduction,
sklearn svm,
face recognition,
sklearn ML intro,
sklearn ML advanced
|
Chapters 16,20 | ||
#22
19 April |
More on NLP; Decision Trees |
wordle,
Google's ngram viewer,
Norvig's ngrams wiki decision trees, sklearn decision trees |
Chapters 17,20 | #12: Nearest Neighbors & Clustering | HW #11: Voronoi Diagrams & Clustering
Project: Data Collection |
#23
21 April |
Refresher: Trees & Graphs;
Network Analysis |
networkx tutorial, Cambridge tutorial, graph review | Chapter 21 | ||
22-30 April | Spring Recess: No Classes | ||||
#24
3 May |
Recommender Systems Neural Networks |
book's network analysis script, networkx built-in graphs, Knuth miles data, deep learning tutorial (Stanford), neural net wiki | Chapters 18, 22 | #13: Regression & NLP | HW #12: MDS & Regression Project: Analysis Project: Visualization & Draft Slide |
#25
5 May |
MapReduce & PageRank | PageRank as applied lin. alg. (SIAM Review 2006) | Chapter 23 | ||
#26
10 May |
Crash Course in SQL | Khan Academy on SQL, sqlitebrowser, sqlite, SQL lab | Chapter 24 | Complete Project | |
#27
12 May |
Not from scratch: iPython (jupyter), pandas, and seaborn |
Thomas Wiecki's modern guide to data science,
OpenTechSchool iPython tutorial, pandas cookbook, cheat sheet, seaborn, elevator data |
Chapter 25 | ||
#28
17 May |
Project Presentations | Project Sneak Preview Slide | |||
19-20 May | Reading Days (no class) | ||||
Tuesday 24 May 11am-1pm |
Optional Review (meets in Gillet 137) | ||||
Thursday 26 May 11am-1pm |
Final Examination (required) |