Syllabus

CMP 464-C401/MAT 456-01:
Topics Course: Data Science

Spring 2016
Tuesdays & Thursdays: 11am-12:40pm
Prof. Katherine St. John

General Information

Description: 4 hours, 4 credits: Topics Course: Data Science. Analyzing data sets to extract new insights. Topics: acquisition/scraping, data mining, storage, and visualization using Python and R. The emphasis of the course is on strengthing Python programming and analytic reasoning skills via analysis of real-world data.
Prerequisites: Linear Algebra or Data Structures, proficiency in Python.

Grading Policy

Expectations: Students are expected to learn both the material covered in class and the material in the textbook and other assigned reading. Completing homework is an essential part of the learning experience. Students should review topics from prior courses as needed using old notes and books.

Honor Code: You are encouraged to work together on the overall design of the programs and homework. However, for specific programs and homework assignments, all work must be your own. You are responsible for knowing and following Lehman's Academic Integrity Policy, (available from the Undergraduate Bulletin, Graduate Bulletin, or the Office of Academic Standards and Evaluations). All incidents of cheating will be reported to the Vice President of Student Affairs.

Homework: Programming exercises are posted on the class website, usually two weeks before the due date. They reinforce concepts covered in lecture. Note that as the semester progresses, the programs will require work on design and programming outside of class to complete. To receive full credit for a program, the program must perform correctly, must include comments, and be written in good style. Accompanying written analysis is expected to be written in standard written English (i.e. use complete sentences and proper grammar). Unless otherwise noted, homework is due weekly by 10:30am, Thursdays, and are submitted via Blackboard. No late homework is accepted.

Quizzes: Instead of mid-term examinations, weekly quizzes will be used to assess mastery of the material

Project: A final project is required for this course. The grade for the project is a combination of grades earned on the milestones (e.g. deadlines during the semester to keep the projects on track) and the overall submitted program.

Final Exam: As with all undergraduate courses at Lehman College, the final exam is required. The registrar has assigned the final examination time of:

Thursday, 26 May, 11am-1pm.

Grades: The grading for the course will be based on:

You must take and pass the final to pass the course.

Materials, Resources and Accommodating Disabilities

Textbook: The textbook is Data Science From Scratch by Joel Grus. (ISBN 978-1491901427). Approximate price: $30 (available on-line from Amazon & O'Reilly).

Technology: This course uses the Python programming language available from python.org. We will be using the numpy, matplotlib, and scipy libraries. These do not come with the default installations of python, but comes with anaconda or visit the download pages of these libraries.

If you would prefer not to upgrade your Python installation (or would like to work through a browswer), there are several web-based services. One that already has the libraries were are using is PythonAnywhere.

Computer Access: Part of this course will use university computer laboratories. These machines are for work related to this course only and a code of conduct applies to computer use in the department and on-campus. Misusing university computers could result in losing your computer access for the rest of the term, making it exceedingly difficult to complete this course.

Accommodating Disabilities: Lehman College is committed to providing access to all programs and curricula to all students. Students with disabilities who may need classroom accommodations are encouraged to register with the Office of Student Disability Services. For more info, please contact the Office of Student Disability Services, Shuster Hall, Room 238, phone number, 718-960-8441.

Course Objectives

At the end of the course, students should be able to:
  1. Acquire data sets from multiple sources and write programs that can extract (scrape) the data into a usable form.
  2. Use data mining to extract new insights about the data.
  3. Understand basic storage techniques and constraints.
  4. Analyze data using standard techniques from statistics and linear algebra.
  5. Visualize data using popular Python modules.