CSci 39542 Syllabus    Resources    Coursework



Program 1: Turnstiles Counts
CSci 39542: Introduction to Data Science
Department of Computer Science
Hunter College, City University of New York
Spring 2023


Classwork    Quizzes    Homework    Project   

Program Description

Program 1: Turnstile Counts.Due 10am, Wednesday, 1 February.
Learning Objective: to build competency with dictionaries and string functions of core Python.
Available Libraries: Core Python 3.6+ only.
Data Sources: MTA's
Turnstile Data.
Sample Datasets: Week ending 10/29/22 (turnstile_221029.txt), Week ending 6/11/22 (turnstile_220611.txt).

The NYC MTA provides counts of the number of entries and exits through every turnstile in every subway station, as well as daily counts for the entire system.

Write a program that will compute the number of entries at subway stations using the MTA turnstile dataset. To allow for unit testing, your program should have the following functions:

For example, open the data file turnstile_220611.txt, storing the data in data:

file_name = 'turnstile_220611.txt'
#Store the file contents in data:
with open(file_name,encoding='UTF-8') as file_d:
  lines = file_d.readlines()
#Discard first line with headers:
data = lines[1:]
Next, lets use the functions above to set up the three dictionaries:
min_dict = make_dictionary(data, kind = "min")
max_dict = make_dictionary(data, kind = "max")
station_dict = make_dictionary(data, kind = "station")

#Print out the station names, alphabetically, without duplicates:
print(f'All stations: {sorted(list(set(station_dict.values())))}')
gives the output:
All stations: ['1 AV', '103 ST', '103 ST-CORONA', '104 ST', '110 ST', '111 ST', '116 ST', '116 ST-COLUMBIA', '121 ST', '125 ST', '135 ST', '137 ST CITY COL', '138/GRAND CONC', '14 ST', '14 ST-UNION SQ', '145 ST', '149/GRAND CONC', '14TH STREET', '15 ST-PROSPECT', '155 ST', '157 ST', '161/YANKEE STAD', '163 ST-AMSTERDM', '167 ST', '168 ST', '169 ST', '170 ST', '174 ST', '174-175 STS', '175 ST', '176 ST', '18 AV', '18 ST', '181 ST', '182-183 STS', '183 ST', '190 ST', '191 ST', '2 AV', '20 AV', '207 ST', '21 ST', '21 ST-QNSBRIDGE', '215 ST', '219 ST', '225 ST', '23 ST', '231 ST', '233 ST', '238 ST', '25 AV', '25 ST', '28 ST', '3 AV', '3 AV 138 ST', '3 AV-149 ST', '30 AV', '33 ST', '33 ST-RAWSON ST', '34 ST-HERALD SQ', '34 ST-HUDSON YD', '34 ST-PENN STA', '36 AV', '36 ST', '39 AV', '4 AV-9 ST', '40 ST LOWERY ST', '42 ST-BRYANT PK', '42 ST-PORT AUTH', '45 ST', '46 ST', '46 ST BLISS ST', '47-50 STS ROCK', '49 ST', '4AV-9 ST', '5 AV/53 ST', '5 AV/59 ST', '5 AVE', '50 ST', '51 ST', '52 ST', '53 ST', '55 ST', '57 ST', '57 ST-7 AV', '59 ST', '59 ST COLUMBUS', '6 AV', '61 ST WOODSIDE', '63 DR-REGO PARK', '65 ST', '66 ST-LINCOLN', '67 AV', '68ST-HUNTER CO', '69 ST', '7 AV', '71 ST', '72 ST', '72 ST-2 AVE', '74 ST-BROADWAY', '75 AV', '75 ST-ELDERTS', '77 ST', '79 ST', '8 AV', '8 ST-NYU', '80 ST', '81 ST-MUSEUM', '82 ST-JACKSON H', '85 ST-FOREST PK', '86 ST', '86 ST-2 AVE', '88 ST', '9 AV', '90 ST-ELMHURST', '96 ST', '96 ST-2 AVE', '9TH STREET', 'ALABAMA AV', 'ALLERTON AV', 'AQUEDUCT N.COND', 'AQUEDUCT RACETR', 'ASTOR PL', 'ASTORIA BLVD', 'ASTORIA DITMARS', 'ATL AV-BARCLAY', 'ATLANTIC AV', 'AVENUE H', 'AVENUE I', 'AVENUE J', 'AVENUE M', 'AVENUE N', 'AVENUE P', 'AVENUE U', 'AVENUE X', "B'WAY-LAFAYETTE", 'BAY 50 ST', 'BAY PKWY', 'BAY RIDGE AV', 'BAY RIDGE-95 ST', 'BAYCHESTER AV', 'BEACH 105 ST', 'BEACH 25 ST', 'BEACH 36 ST', 'BEACH 44 ST', 'BEACH 60 ST', 'BEACH 67 ST', 'BEACH 90 ST', 'BEACH 98 ST', 'BEDFORD AV', 'BEDFORD PK BLVD', 'BEDFORD-NOSTRAN', 'BERGEN ST', 'BEVERLEY ROAD', 'BEVERLY RD', 'BLEECKER ST', 'BOROUGH HALL', 'BOTANIC GARDEN', 'BOWERY', 'BOWLING GREEN', 'BRIARWOOD', 'BRIGHTON BEACH', 'BROAD CHANNEL', 'BROAD ST', 'BROADWAY', 'BROADWAY JCT', 'BRONX PARK EAST', 'BROOK AV', 'BROOKLYN BRIDGE', 'BUHRE AV', 'BURKE AV', 'BURNSIDE AV', 'BUSHWICK AV', 'CANAL ST', 'CANARSIE-ROCKAW', 'CARROLL ST', 'CASTLE HILL AV', 'CATHEDRAL PKWY', 'CENTRAL AV', 'CENTRAL PK N110', 'CHAMBERS ST', 'CHAUNCEY ST', 'CHRISTOPHER ST', 'CHURCH AV', 'CITY / BUS', 'CITY HALL', 'CLARK ST', 'CLASSON AV', 'CLEVELAND ST', 'CLINTON-WASH AV', 'CONEY IS-STILLW', 'CORTELYOU RD', 'CORTLANDT ST', 'COURT SQ', 'COURT SQ-23 ST', 'CRESCENT ST', 'CROWN HTS-UTICA', 'CYPRESS AV', 'CYPRESS HILLS', 'DEKALB AV', 'DELANCEY/ESSEX', 'DITMAS AV', 'DYCKMAN ST', "E 143/ST MARY'S", 'E 149 ST', 'E 180 ST', 'EAST 105 ST', 'EAST BROADWAY', 'EASTCHSTER/DYRE', 'EASTN PKWY-MUSM', 'ELDER AV', 'ELMHURST AV', 'EUCLID AV', 'EXCHANGE PLACE', 'FAR ROCKAWAY', 'FLATBUSH AV-B.C', 'FLUSHING AV', 'FLUSHING-MAIN', 'FORDHAM RD', 'FOREST AVE', 'FOREST HILLS 71', 'FRANKLIN AV', 'FRANKLIN ST', 'FREEMAN ST', 'FRESH POND RD', 'FT HAMILTON PKY', 'FULTON ST', 'GATES AV', 'GRAHAM AV', 'GRAND ARMY PLAZ', 'GRAND ST', 'GRAND-NEWTOWN', 'GRANT AV', 'GRD CNTRL-42 ST', 'GREENPOINT AV', 'GROVE STREET', 'GUN HILL RD', 'HALSEY ST', 'HARLEM 148 ST', 'HARRISON', 'HEWES ST', 'HIGH ST', 'HOUSTON ST', 'HOWARD BCH JFK', 'HOYT ST', 'HOYT-SCHER', 'HUNTERS PT AV', 'HUNTS POINT AV', 'INTERVALE AV', 'INWOOD-207 ST', 'JACKSON AV', 'JAMAICA 179 ST', 'JAMAICA CENTER', 'JAMAICA VAN WK', 'JAY ST-METROTEC', 'JEFFERSON ST', 'JFK JAMAICA CT1', 'JKSN HT-ROOSVLT', 'JOURNAL SQUARE', 'JUNCTION BLVD', 'JUNIUS ST', 'KEW GARDENS', 'KINGS HWY', 'KINGSBRIDGE RD', 'KINGSTON AV', 'KINGSTON-THROOP', 'KNICKERBOCKER', 'KOSCIUSZKO ST', 'LACKAWANNA', 'LAFAYETTE AV', 'LEXINGTON AV/53', 'LEXINGTON AV/63', 'LIBERTY AV', 'LIVONIA AV', 'LONGWOOD AV', 'LORIMER ST', 'MARBLE HILL-225', 'MARCY AV', 'METROPOLITAN AV', 'METS-WILLETS PT', 'MIDDLETOWN RD', 'MONTROSE AV', 'MORGAN AV', 'MORISN AV/SNDVW', 'MORRIS PARK', 'MOSHOLU PKWY', 'MT EDEN AV', 'MYRTLE AV', 'MYRTLE-WILLOUGH', 'MYRTLE-WYCKOFF', 'NASSAU AV', 'NECK RD', 'NEPTUNE AV', 'NEREID AV', 'NEVINS ST', 'NEW LOTS', 'NEW LOTS AV', 'NEW UTRECHT AV', 'NEWARK BM BW', 'NEWARK C', 'NEWARK HM HE', 'NEWARK HW BMEBE', 'NEWKIRK AV', 'NEWKIRK PLAZA', 'NORTHERN BLVD', 'NORWOOD 205 ST', 'NORWOOD AV', 'NOSTRAND AV', 'OCEAN PKWY', 'ORCHARD BEACH', 'OZONE PK LEFFRT', 'PARK PLACE', 'PARKCHESTER', 'PARKSIDE AV', 'PARSONS BLVD', 'PATH NEW WTC', 'PATH WTC 2', 'PAVONIA/NEWPORT', 'PELHAM BAY PARK', 'PELHAM PKWY', 'PENNSYLVANIA AV', 'PRESIDENT ST', 'PRINCE ST', 'PROSPECT AV', 'PROSPECT PARK', 'QUEENS PLAZA', 'QUEENSBORO PLZ', 'RALPH AV', 'RECTOR ST', 'RIT-MANHATTAN', 'RIT-ROOSEVELT', 'ROCKAWAY AV', 'ROCKAWAY BLVD', 'ROCKAWAY PARK B', 'ROOSEVELT ISLND', 'SARATOGA AV', 'SENECA AVE', 'SHEEPSHEAD BAY', 'SHEPHERD AV', 'SIMPSON ST', 'SMITH-9 ST', 'SOUTH FERRY', 'SPRING ST', 'ST LAWRENCE AV', 'ST. GEORGE', 'STEINWAY ST', 'STERLING ST', 'SUTPHIN BLVD', 'SUTPHIN-ARCHER', 'SUTTER AV', 'SUTTER AV-RUTLD', 'THIRTY ST', 'THIRTY THIRD ST', 'TIMES SQ-42 ST', 'TOMPKINSVILLE', 'TREMONT AV', 'TWENTY THIRD ST', 'UNION ST', 'UTICA AV', 'V.CORTLANDT PK', 'VAN SICLEN AV', 'VAN SICLEN AVE', 'VERNON-JACKSON', 'W 4 ST-WASH SQ', 'W 8 ST-AQUARIUM', 'WAKEFIELD/241', 'WALL ST', 'WEST FARMS SQ', 'WESTCHESTER SQ', 'WHITEHALL S-FRY', 'WHITLOCK AV', 'WILSON AV', 'WINTHROP ST', 'WOODHAVEN BLVD', 'WOODLAWN', 'WORLD TRADE CTR', 'WTC-CORTLANDT', 'YORK ST', 'ZEREGA AV']
We can print all the turnstiles from the data:
print(f'All turnstiles: {get_turnstiles(station_dict)}')
Or, a subset, for example, only those for Hunter & Roosevelt Island stations:
print(get_turnstiles(station_dict, stations = ['68ST-HUNTER CO','ROOSEVELT ISLND']))
which would print:
['R259,00-00-00', 'R259,00-00-01', 'R259,00-00-02', 'R259,00-00-03', 'R259,00-05-00', 'R259,00-05-01', 'R177,00-00-00', 'R177,00-00-01', 'R177,00-00-02', 'R177,00-00-03', 'R177,00-00-04', 'R177,00-00-05', 'R177,00-00-06', 'R177,00-03-00', 'R177,00-03-01', 'R177,00-03-02', 'R177,00-03-03', 'R177,00-03-04', 'R177,00-03-05', 'R177,00-03-06']
Checking the ridership for a station:
hunter_turns = get_turnstiles(station_dict, stations = ['68ST-HUNTER CO'])
ridership = compute_ridership(min_dict,max_dict,turnstiles=hunter_turns)
print(f'Ridership for Hunter College: {ridership}.')
gives the output:
Ridership for turnstile, R051,02-00-00:  3096.
Ridership for Hunter College: 49669.  

Notes:


"""
    Name: YOUR NAME HERE (as it appears in Gradescope)
    Email: YOUR EMAIL HERE (as it appears in Gradescope)
    Resources:  ANY RESOURCES YOU USED
"""

def make_dictionary(data, kind = "min"):
    """
    Creating a dictionary with a key of the remote unit ID + turnstile unit number.
    Depending on kind, the resulting dictionary will store the minimum entry
    number seen (as an integer), the maximum entry number seen (as an integer),
    or the station name (as a string).
    Returns the resulting dictionary.

    Keyword arguments:
    kind -- kind of dictionary to be created:  min, max, station
    """

    #Placeholder-- replace with your code
    new_dict = {}
    return new_dict

def get_turnstiles(station_dict, stations = None):
    """
    If stations is None, returns the names of all the turnstiles stored as keys
    in the inputted dictionary.
    If non-null, returns the keys which have value from station in the inputed dictionary.
    Returns a list.

    Keyword arguments:
    stations -- None or list of station names.   
    """

    #Placeholder-- replace with your code
    lst = []
    return lst

def compute_ridership(min_dict,max_dict,turnstiles = None):
    """
    Takes as input two dictionaries and a list, possibly empty, of turnstiles.
    If no value is passed for turnstile, the default value of None is used
    (that is, the total ridership for every station in the dictionaries).
    Returns the ridership (the difference between the minimum and maximum values)
    across all turnstiles specified.

    Keyword arguments:
    turnstiles -- None or list of turnstile names    
    """

    #Placeholder-- replace with your code
    total = 0
    return total

def main():
    """
    Opens a data file and computes ridership, using functions above.
    """
    file_name = 'turnstile_220611.txt'
    #Store the file contents in data:
    with open(file_name,encoding='UTF-8') as file_d:
        lines = file_d.readlines()
    #Discard first line with headers:
    data = lines[1:]

    #Set up the three dictionaries:
    min_dict = make_dictionary(data, kind = "min")
    max_dict = make_dictionary(data, kind = "max")
    station_dict = make_dictionary(data, kind = "station")

    #Print out the station names, alphabetically, without duplicates:
    print(f'All stations: {sorted(list(set(station_dict.values())))}')

    #All the turnstiles from the data:
    print(f'All turnstiles: {get_turnstiles(station_dict)}')
    #Only those for Hunter & Roosevelt Island stations:
    print(get_turnstiles(station_dict, stations = ['68ST-HUNTER CO','ROOSEVELT ISLND']))

    #Checking the ridership for a single turnstile
    ridership = compute_ridership(min_dict,max_dict,turnstiles=["R051,02-00-00"])
    print(f'Ridership for turnstile, R051,02-00-00:  {ridership}.')

    #Checking the ridership for a station
    hunter_turns = get_turnstiles(station_dict, stations = ['68ST-HUNTER CO'])
    ridership = compute_ridership(min_dict,max_dict,turnstiles=hunter_turns)
    print(f'Ridership for Hunter College: {ridership}.')

if __name__ == "__main__":
    main()