CSci 39542 Syllabus    Resources    Coursework



Program 2: Parking Tickets
CSci 39542: Introduction to Data Science
Department of Computer Science
Hunter College, City University of New York
Spring 2022


Classwork    Quizzes    Homework    Project   

Program Description

Program 2: Parking Tickets.Due noon, Thursday, 17 February.
Learning Objective: to refresh students' knowledge of Pandas' functionality to manipulate and create columns from formatted data.
Available Libraries: Pandas and core Python 3.6+.
Data Sources:
Parking Tickets, NYC OpenData and Parking ticket violation codes (summary of codes & fines).
Sample Datasets:


Recent news articles focused on the significantly higher percentage of parking tickets that are unpaid for cars with out-of-state plates:

The data is aggregated across the whole city. Does the same occur when the datasets are focused on individual neighborhoods? To answer that question, as well as what are the most common reasons for tickets, we will use the parking ticket data from OpenData NYC. In Lecture 3, we started data cleaning efforts on the parking ticket data. We will continue the data cleaning efforts for this program, as well as introduce auxiliary files that link the codes stored with a short explanation of the violation. The assignment is broken into the following functions to allow for unit testing:

For example, assuming your functions are in the p2.py:

df = p2.make_df('Parking_Violations_Issued_Precinct_19_2021.csv')
print(df)
will print:
        Summons Number Plate ID  ...     Street Name Vehicle Color
0           1474094223  KDT3875  ...            E 75         BLACK
1           1474094600  GTW5034  ...  EAST 70 STREET            BK
2           1474116280  HXM6089  ...         E 72 ST            BK
3           1474116310  HRW4832  ...         E 72 ST           GRY
4           1474143209  JPR6583  ...  EAST 94 STREET         BLACK
...                ...      ...  ...             ...           ...
451504      8954357854  JRF3892  ...         5th Ave            GR
451505      8955665040   199VP4  ...       E 74th St         BLACK
451506      8955665064   196WL7  ...       E 78th St         BLACK
451507      8970451729  CNK4113  ...        York Ave            GY
451508      8998400418   XJWV98  ...        York Ave         WHITE

[451509 rows x 11 columns]
Note that all the rows are included (451,509) but that only the 11 specified columns are retained in the DataFrame.

Looking at the registration types (Plate Type):

print(f"Registration: {df['Plate Type'].unique()}")
print(f"\n10 Most Common:  {df['Plate Type'].value_counts()[:10]}")
prints many different types of registrations and abbreviations:
Registration: ['PAS' 'SRF' 'OMS' 'COM' '999' 'SPO' 'OMT' 'MOT' 'RGL' 'PHS' 'MED' 'TRC'
'APP' 'SRN' 'OML' 'ITP' 'CMB' 'ORG' 'AMB' 'DLR' 'IRP' 'TOW' 'MCL' 'CBS'
'LMB' 'USC' 'CME' 'RGC' 'VAS' 'ORC' 'HIS' 'STG' 'AGR' 'TRA' 'CHC' 'SOS'
'BOB' 'OMR' 'TRL' 'AGC' 'CSP' 'PSD' 'SPC' 'MCD' 'NLM' 'CMH' 'LMA' 'JCA'
'SCL' 'HAM' 'AYG' 'NYA' 'OMV']

10 Most Common:  PAS    262875
COM    168827
SRF      2834
APP      2800
OMT      2603
OMS      2464
MED      1433
999      1352
CMB      1208
LMB      1135
Name: Plate Type, dtype: int64
The two registration types that are the most common:
count = len(df)
pasCount = len(df[df['Plate Type'] == 'PAS'])
comCount = len(df[df['Plate Type'] == 'COM'])
print(f'{count} different vehicles, {100*(pasCount+comCount)/count} percent are passenger or commercial plates.')

And for the Precinct District 19 dataset that contains almost a half million tickets:

451509 different vehicles, 95.61315499801776 percent are passenger or commercial plates.
Our function will filter for just passenger and commercial plates:
dff = p2.filter_reg(df)
print(f'The length of the filtered data frame is {len(dff)}.')
will print:
The length of the filtered data frame is 431702.
By specifying different registration types with the keyword argument, we can filter for other registration (DMV's Registration Types) such as motocycles:
df2 = p2.filter_reg(df,keep=['MOT','HSM','LMA','LMB'])
print(f'The length of the filtered data frame is {len(df2)}.')
will print:
The length of the filtered data frame is 2095.
Working the the motocycle DataFrame, we can add a column for if the vehicle is registered in New York:
df2['NYPlates'] = df2['Registration State'].apply(p2.add_indicator)
print(df2.head())
will print:
      Summons Number Plate ID  ... Vehicle Color NYPlates
3888      8778381423   MD677M  ...         SILVE        1
5967      1475041184   92BF34  ...           BLK        1
6177      1477342850   40TZ78  ...            RD        1
6985      8514394770   16UD95  ...         BLACK        1
7221      8624098440   77BD79  ...         BLACK        1
We can also look up the tickets that were given, by Plate ID and use the dictionary of the violation code to find out what the tickets were for:
print(f'Motorcycles with most tickets:\n {df2["Plate ID"].value_counts()[:5]}')
code_lookup = p2.make_dict('ticket_codes.csv')
ticket_codes = p2.find_tickets(df2,'19UB23')
descrip = [code_lookup[str(t)] for t in ticket_codes]
print(f'The motocycle with plate 19UB23 got the following tickets: {descrip}')
will print:
Motorcycles with most tickets:
19UB23    14
80BD05    10
38SV33     9
66TZ74     8
70TW50     8
Name: Plate ID, dtype: int64
The motocycle with plate 19UB23 got the following tickets: ['NO PARKING-STREET CLEANING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'INSP. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'INSP. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING', 'REG. STICKER-EXPIRED/MISSING']

Note: you should submit a file with only the standard comments at the top, this function, and any helper functions you have written. The grading scripts will then import the file for testing. If your file includes code outside of functions, either comment the code out before submitting or use a main function that is conditionally executed (see Think CS: Section 6.8 for details).

Hints: