CSci 39542 Syllabus    Resources    Coursework    FAQ



CSci 39542: Introduction to Data Science
Department of Computer Science
Hunter College, City University of New York
Fall 2023

TL;DR: data-focused programming course with optional project.

For questions about the course, write to: datasci AT hunter cuny edu.

Announcements:


Calendar:

Tentative schedule, subject to change: c
Week: Topics: Coursework: Reading:
Week 0: Friday,
25 August
Syllabus & Frequently Asked Questions Classwork 0
Week 1: Wednesday,
30 August
Syllabus & Class Policies;

Data Science Lifecycle: Question Formulation, Data Acquisition and Cleaning, Exploratory Data Analysis, Prediction and Inference,

Data Scope, Big Data, Accuracy

Python Recap: dictionaries, I/O, keyword parameters, & linting
Classwork 1 DS 100: Chapter 1 (Data Science Lifecycle),
DS 100: Chapter 2 (Data Scope),
DS 100: Chapter 4 (Modeling with Summary Statistics),

Think CS: Chapter 12 (Dictionaries),
DS 100: Section 13.1 (String Methods),
Think CS: Chapter 11 (Files),
python.org: Section 4.7 (Functions),
pylint documentation
Week 2: Wednesday,
6 September
Statistics Recap: Expectation, Variance, Correlation, Residuals & Sampling

Linear Regression, Loss Functions: Mean Squared and Mean Absolute Error, Data Representation, DataFrames (Pandas)

Python Recap: Lambda Expressions & Applying Functions
Classwork 2
Seeing Theory (Brown U),
Guessing Correlation Coefficients (GeoGebra),
Computing Correlations (Real Python),
Residuals (UBC),

DS 100: Chapter 3 (Simulation & Data Design),
DS 100: Chapter 15 (Linear Models),

DS 100: Chapter 6 (DataFrames),
Constructing DataFrames (pydata.org)
DS 100: Section 8.5 (Table Shape & Granularity),
python.org: Section 4.7 (Functions)
Friday,
8 September
Program 1
Week 3: Wednesday,
13 September
Multiple Linear Regression, Handling Missing Values (Imputation), Feature Engineering: Categorical Encoding

Joining & Transforming Data in Pandas

Python Recap: List comprehensions & zips

Project Overview
Classwork 3 DS 100: Chapter 6 (DataFrames),
DS 100: Chapter 9 (Data Wrangling),
DS 100: Chapter 15 (Linear Models),

Think CS: Section 10.23 (List Comprehensions),
Zip Tutorial (RealPython)
Week 4: Wednesday,
20 September
Fitting Models with sklearn, More on Loss Functions

Visualizing Qualitative & Quantitative Data, Time-Series Data,
Customizing Plots in plotly, matplotlib & seaborn

Serializing & Evaluating Models (pickling)
Python recap: dates & times
Project Overview
Classwork 4
DS 100, Sections 4.2-3 (Loss Functions),
DS 100: Chapter 10 (Exploratory Data Analysis),
DS 100: Sections 11.1-11.3 (Data Visualization),
DS 100: Chapter 15 (Linear Models),

Python Object Serialization Docs (Pickling),
Hands On ML (Matplotlib Tools)
Friday,
22 September
Program 2
Week 5: Wednesday,
27 September
Visualizing GIS Data: GeoJSON Format, Choropleth Maps, Voronoi Diagrams, Visualization Principles

Polynomial Models, Training Models: Cross Validation, Ridge Regularization (L2) & Lasso Regularization (L1); Bias-Variance Tradeoff
More on Fitting Models: Convexity, Validating, & Gradient Descent

Testing Frameworks
Classwork 5 DS 8: Chapter 15 (Prediction),
DS 100: Chapter 11 (Data Visualization),
DS 100: Chapter 15.4 (Multiple Linear Regression),
DS 100: Chapter 15.7 (Feature Engineering),
DS 100: Chapter 16.3 (Cross Validation),
DS 100: Chapter 20 (Gradient Descent),

Folium documentation,
GeoJSON Editor Gradient Descent Visualization (Lili Jiang),
ThinkCS: Unit Testing, Pytest
Week 6: Wednesday,
4 October
Probability and Generalization: Distributions, Probability Mass Functions, Confidence Intervals, Smoothing;
Hypothesis Testing, Central Limit Theorem

Review
Classwork 6


Opt-in for Optional Project
DS 100: Chapter 17 (Theory for Inference & Prediction),
DS 100: Chapter 17 (Probability & Generalization),

Sampling from a Normally Distributed Population (UBC),
Central Limit Theorem (UBC),
Confidence Intervals (UBC)
Friday,
6 October
Program 3
Week 7: Wednesday,
11 October
Midterm Exam
Classwork 7
Thursday,
12 October
Project Proposal Window Opens
Week 8: Wednesday,
18 October

Regression on Probabilities; The Logistic Model & Loss Function; Using Logistic Models: Fitting & Evaluating a Logistic Model

Linear Algebra Recap: Vectors, Matrices, Eigenvectors & Eigenvalues

Classification: Support Vector Machines (SVM's)

Classwork 8 DS 100: Chapter 19 (Classification),

Recognizing Hand-Written Digits (sklearn)
Confusion Matrices (sklearn)

Explained Visually (Eigenvectors and Eigenvalues),
Linear Algebra Review (MIT),

Python DS Handbook Chapter 5 (SVMs),
Karparthy's SVM Demo (Stanford),
Data Camp Tutorial (SVM's)
SVM's (sklearn),
Friday,
22 October
Proposal for Optional Project
Week 9: Wednesday,
25 October
Multi-class Classification; Other Classifiers: Naive Bayes, Decision Trees & Random Forests

Intrinsic Dimensionality (Scree Plots); Principal Components Analysis (PCA)
Classwork 9 DS 100: Chapter 19 (Classification),
DS 100: Chapter 22 (PCA)

Decisions Trees; Bias & Variance (R2D3) Python DS Handbook: Section 5.09 (PCA),
Explained Visually (Principal Components Analysis),
Week 10: Wednesday,
1 November
Multidimensional Scaling (MDS); Non-Euclidean Distances

Other Dimensionality Reduction: Multiple Dimensional Scaling; Non-Linear Dimensionality Reduction: t-SNE, UMAP
Classwork 10 Python DS Handbook Section 5.10 (Manifold Learning),
Manifold Learning (sklearn),
Friday,
3 November
Program 4
Week 11: Wednesday,
8 November
K-Means Clustering: Clustering Complexity, Lloyd's Algorithm (Naive K-Means), MiniBatch;
More on Clustering: Gaussian Mixture Models, Hierarchical Clustering, Spectral Clustering
Classwork 11 Spectral Clustering (Kaggle),
Clustering (Carpentry),
Spectral Clustering (Great Learning),
K-Means gif (wiki),
DS 100: Chapter 24 (clustering),
Python DS Handbook: Section 5.11 (K-Means),
Python DS Handbook: Section 5.12 (Gaussian Mixture Models),
Cluster Analysis (wiki)
Friday,
10 November
Project Draft / Interim Check-In
Week 12: Wednesday,
15 November
Supervised vs. Unsupervised Learning;

Regular Expressions

Relational Databases, Structured Query Language (SQL) Basics
Classwork 12 Machine Learning Summary (PDSH)
Supervised vs. Unsupervised Learning (IBM)
DS 100: Sections 13.2-3 (Regular Expressions)
DS 100: Sections 13.2-3 (Regular Expressions)
Friday,
17 November
Program 5
22-24 November Thanksgiving Break: No Classes
Week 13: Wednesday,
29 November
SQL: Aggregating, Joining, & Transforming Data

More on Regular Expressions
Classwork 13 DS 100: Chapter 7 (Relational Databases & SQL)
Friday,
1 December
Project: Final Code, Slides Submission
Project: Pre-recorded Video Recording Submission (if not doing in-class live demo)
Week 14: Wednesday,
6 December
Project Showcase

Semester Review
Classwork 14 DS 100: Chapter 7 (Relational Databases & SQL)
Friday,
8 December
Program 6
Wednesday, 12 December Reading Day-- no class
Wednesday, 20 December
11:30am-1:30pm
Final Exam
(This file was last modified on 15 November 2023.)