CSci 39542 Syllabus    Resources    Coursework



CSci 39542: Introduction to Data Science
Department of Computer Science
Hunter College, City University of New York
Spring 2023

TL;DR: data-focused programming course with optional project.

For questions about the course, write to: datasci AT hunter cuny edu.

Announcements:


Calendar:

Class: Topics: Coursework: Reading:
Wednesday,
25 January
Syllabus & Class Policies;

Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference,

Data Scope, Big Data, Accuracy

Python Recap: I/O, dictionaries, keyword parameters, & linting
Classwork 1
Quiz 1
Program 1
DS 100: Chapter 1 (Data Science Lifecycle),
DS 100: Chapter 2 (Data Scope),

DS 100: Section 13.1 (String Methods),
Think CS: Chapter 11 (Files),
Think CS: Chapter 12 (Dictionaries),
python.org: Section 4.7 (Functions),
pylint documentation
Wednesday,
1 February
Statistics Recap: Expectation, Variance, Correlation, & Sampling

Modeling and Estimation: Linear Models, Predicting Tip Amounts, Loss Functions

Data Representation, DataFrames (Pandas), Lambda Expressions & Applying Functions
Classwork 2
Quiz 2
Program 2
Seeing Theory (Brown U),
Guessing Correlation Coefficients (GeoGebra),
Computing Correlations (Real Python),
Residuals (UBC),

DS 100: Chapter 3 (Data Design),
DS 100: Chapter 4 (Modeling with Summary Statistics),
DS 100: Chapter 15 (Linear Models),
DS 100: Chapter 6 (DataFrames),

Constructing DataFrames (pydata.org)
python.org: Section 4.7 (Functions)
Wednesday,
8 February
Data Representation: Structure & Granularity, Joining & Transforming Data in Pandas, Handling Missing Values (Imputation) & Modifying Structure

Linear Regression

Python Recap: List comprehensions & zips

Project Overview
Classwork 3
Quiz 3
Program 3
DS 100: Chapter 6 (DataFrames),
DS 100: Section 8.5 (Table Shape & Granularity),
DS 100: Chapter 9 (Data Wrangling),
DS 100: Chapter 15 (Linear Models),

Think CS: Section 10.23 (List Comprehensions),
Zip Tutorial (RealPython)
Wednesday,
15 February
Fitting Linear Models, Loss Functions: Mean Squared and Mean Absolute Error, Serializing and Evaluating Models

Visualizing Qualitative & Quantitative Data, Time-Series Data,
Customizing Plots in plotly, matplotlib & seaborn
Classwork 4
Quiz 4
Program 4
DS 100, Sections 4.2-3 (Loss Functions),
DS 100: Chapter 10 (Exploratory Data Analysis),
DS 100: Sections 11.1-11.3 (Data Visualization),
DS 100: Chapter 15 (Linear Models),

Python Object Serialization Docs (Pickling),
Hands On ML (Matplotlib Tools)
Wednesday,
22 February
Visualizing GIS Data: GeoJSON Format, Choropleth Maps, Voronoi Diagrams, Visualization Principles

Multiple Linear Regression, More on Fitting Models: Convexity, Validating, & Gradient Descent;
Feature Engineering: Categorical Encoding
Classwork 5
Quiz 5
Program 5

Opt-in for Optional Project
DS 8: Chapter 15 (Prediction),
DS 100: Chapter 11 (Data Visualization),
DS 100: Chapter 15.3 (Multiple Linear Regression),
DS 100: Chapter 15.6 (Feature Engineering),
DS 100: Chapter 20 (Gradient Descent),

Gradient Descent Visualization (Lili Jiang),
Folium documentation,
GeoJSON Editor
Wednesday,
1 March
Probability and Generalization: Distributions, Probability Mass Functions, Confidence Intervals, Smoothing;

Feature Engineering: Variable Transformations;
Training Models: Cross Validation, Ridge Regularization (L2) & Lasso Regularization (L1); Bias-Variance Tradeoff
Classwork 6
Quiz 6
Program 6

Project Proposal Window Opens
Chapter 16 Model Selection,
DS 100: Chapter 17 (Probability & Generalization),
DS 100: Appendix (Cross Validation),
Confidence Intervals (UBC),

Sampling from a Normally Distributed Population (UBC),
Wednesday,
8 March
Regression on Probabilities; The Logistic Model & Loss Function; Using Logistic Models: Fitting & Evaluating a Logistic Model

Hypothesis Testing, Central Limit Theorem
Classwork 7
Quiz 7
Program 7

Proposal for Optional Project
DS 100: Chapter 17 (Theory for Inference & Prediction),
DS 100: Chapter 19 (Classification),

Central Limit Theorem (UBC),
Recognizing Hand-Written Digits (sklearn)
Confusion Matrices (sklearn)
Wednesday,
15 March
Linear Algebra Recap: Vectors, Matrices, Eigenvectors & Eigenvalues

One-Versus-Rest (OVR) Classification; Other Approaches: Naive Bayes, Support Vector Machines (SVM's), Decision Trees & Random Forests
Classwork 8
Quiz 8
Program 8
DS 100: Chapter 19 (Classification),

Explained Visually (Eigenvectors and Eigenvalues),
Explained Visually (Principal Components Analysis),
Linear Algebra Review (MIT),
Python DS Handbook Chapter 5 (SVMs),
Karparthy's SVM Demo (Stanford),
Data Camp Tutorial (SVM's)
SVM's (sklearn),
Wednesday,
22 March
Vector Space Recap; Intrinsic Dimensionality (Scree Plots); Principal Components Analysis (PCA)

Non-Euclidean Distances; Multidimensional Scaling (MDS)
Classwork 9
Quiz 9
Program 9

Project: Interim Check-In Opens
DS 100: Chapter 19 (Classification),
DS 100: Chapter 22 (PCA)
Python DS Handbook: Section 5.09 (PCA),
Wednesday,
29 March
Other Dimensionality Reduction: Multiple Dimensional Scaling; Non-Linear Dimensionality Reduction: t-SNE, UMAP

K-Means Clustering: Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch;
More on Clustering: Gaussian Mixture Models; Hierarchical Clustering
Classwork 10
Quiz 10
Program 10

Project: Interim Check-In Closes
Python DS Handbook Section 5.10 (Manifold Learning),
Manifold Learning (sklearn),
K-Means gif (wiki),
DS 100: Chapter 24 (clustering),
Python DS Handbook: Section 5.11 (K-Means),
5-13 April Spring Break: No Classes
Wednesday,
19 April
Supervised vs. Unsupervised Learning;

Regular Expressions
Classwork 11
Quiz 11
Program 11
Supervised vs. Unsupervised Learning (IBM),
Python DS Handbook: Section 5.12 (Gaussian Mixture Models) Cluster Analysis (wiki),
Wednesday,
26 April
Hypothesis Testing & A/B Testing

Relational Databases and SQL, Part 1
Classwork 12
Quiz 12
Program 12
DS Chapter 21 (Replicability)
DS 100: Sections 13.2-3 (Regular Expressions)
DS 100: Chapter 7 (Relational Databases & SQL)
Wednesday,
3 May
Relational Databases and SQL, Part 2

Code Demo: SQL in Python
Classwork 13
Quiz 13
Program 13

Project: Final Version
Project: Presentation Video
DS 100: Chapter 7 (Relational Databases & SQL)
Wednesday,
10 May
Final Exam: Coding

Project Showcase

Semester Review
Classwork 14
Quiz 14: End-of-Semester Survey
Wednesday,
17 May,
11:30am-1:30pm
Final Exam: Written
(This file was last modified on 12 April 2023.)