Data science uses techniques from computing, mathematics, and statistics to extract new insights from large data sets. It is a very broad field but at its core is the use of automated techniques to analyze and make inferences from inputted data.
This course is aimed at motivated students who want to apply the algorithms and mathematics they have learned to real data sets. To do this, you should have a good understanding of representing 2 dimensional lists (that is, multi-dimensional arrays and graph representations from a data structures course or matrices (from linear algebra)).
This course will focus on data acquisition (how do you take data from multiple sources and put it in usable forms), data storage, data mining and basic machine learning, and visualization.
For the undergraduate sections, the prerequisites are linear algebra or data structures. Further, a proficiency in the Python programming language is assumed. The undergraduate sections will cover all the statistics needed.
Those enrolled in the graduate level course are expected also to have completed undergraduate calculus-based statistics. The graduate level assignments will assume fluency in undergraduate statistics, linear algebra, and multivariate calculus.
No. This is a fast paced, programming-intensive course. You need to have solid knowledge of the prerequisites (linear algebra or data structures, and Python proficiency) to pass the course. While it is often possible to circumvent the permissions system, please don't. Doing so will not benefit you but will likely result in you doing poorly in all your courses while you struggle to keep up with one for which you are not prepared.
Python and R are the key languanges for data analytics, due to the ease in which they can manipulate files, matrices, perform statistical methods, and the added modules available for visualization. Also, they are the key for the some of the highest paid tech fields: data analytics and machine learning.
Codecademy has a python tutorial that is a great starting point. You can also work through the exercises for introductory programming: a series of 50 short programming challenges that cover the basics for this course.
The mathematics topics course will use computers much as the calculus labs did: using the computer to compute ideas developed in class (in this case, analyzing data focused on NYC). If you did well in the calculus labs, you will do fine in the topics course. If you would like a quick refresher on Python, here's a fun one and more serious one at codeacademy.
Enrollment is via the CUNYFirst system. See the College Registrar for more details.
Yes. Enrollment is via epermit.cuny.edu using your portal credentials. While colleges use a common system to request e-Permits, each department and each college has its own rules and preferences about allowing students to take a course at a different campus. If you want to e-Permit a course, start the process early.
CMP 464 and MAT 456 are numbers assigned to the topics courses in the department. The topics change each semester, and there are often multiple sections offered at once. The section numbers for Data Science are:
The lectures are the same, but the assignments are different for the computer science and mathematics versions of the course. Both sections have analytic reasoning (including programming and proofs) about large data sets. The computer science section requires more programming; the mathematics section requires more proofs.
The lectures are the same, but the assignments are different for the undergraduate and graduate versions. The graduate version of the course has significantly more work, assumes knowledge of undergraduate statistics, and expects a much higher level of sophistication in the programming and projects submitted.
It takes about 24 hours from time of registration for you to be added to a course roster on Blackboard. If it has been more than a day and the course does not appear, check your course list in your CUNYFirst account to make sure that your registration has been processes and you are officially enrolled. If you are enrolled, contact Blackboard Support in the Information Technology Division.
If you are enrolled, contact Blackboard Support in the Information Technology Division. You can also visit the Help Desk in the Computer Center (first floor, Carman Hall) in person. They can reset passwords and help with simple Blackboard issues.
Yes. Attending class in an integral part of the learning process.
Most students spend 6-12 hours a week outside of class hours. Federal guidelines state that 12 credit hours is considered full time enrollment. Given a nominal 40 hour work week, this translates to an expected (40-12)/12 = 2 to 3 hours a week per credit hour of outside class time. As a 4 credit course, your expected time is about 8 to 12 hours of out-of-class work per week.
The textbook is Data Science From Scratch by Joel Grus. The book is required for the course.
Python is freely available from python.org.
For this course, we use Python 2 (any stable version). Note that there are large changes between Python 2 and Python 3. We are using Python 2.7 since many libraries do not yet run under Python 3.
We will be using the numpy, matplotlib, and scipy libraries. These do not come with the default installations of python, but comes with anaconda or visit the download pages of these libraries.
If you would prefer not to upgrade your Python installation (or would like to work through a browswer), there are several web-based services. One that already has the libraries were are using is PythonAnywhere.
Yes. If you do not have a computer at home, there are computers available on-campus with Python.
You can also use on-line resources such as datajoy.com.
Python 3 is available on computers in:
Active learning increases student performance. Instead of passively listening or watching someone else write or type programs, it is much more effective to have active discussions, work together in pairs or small groups, and other activities that emphasize higher-order learning. It also provides an excellent avenue to practice explaining technical ideas to others-- a skill you will need for future STEM courses and future jobs in technical fields.
Yes. All undergraduate courses at Lehman College are required to have final examinations offered during finals week.
The final examination is cumulative and passing it shows that you have mastered all of the learning objectives of the course.
An essential component to programming and technical work is presenting and communicating ideas concisely to others (without the use of a search engine or the Python shell). The communicating of technical information is so important to many companies that they include a paper or oral quiz (no computer allowed) on key concepts during the interview. Companies hire you for your analytic reasoning and programming skills, not your ability to google answers (since they want employees who can also solve novel problems whose solutions aren't already available via a search engine). Some go beyond just key concepts and ask for you to sketch solutions for novel questions and situations during the interview (the most famous is the Google engineering interview).
The final examination times for all courses are announced by the Registrar's Office. This class has been assigned:
No. We are happy that you are doing well in your other courses, but we are required to treat all students equally and base your grade on the work submitted for this course.