Program 3: Restaurant Rankings.   Due noon, Thursday, 24 February.
       
        The NYC Department of Health & Mental Health regularly inspects restaurants and releases the results:
         
        
         These results are also available in CSV files at
        OpenData NYC.  This programming assignment focuses on predicting letter grades for restaurants, yet to be graded, as well computing summary statistics by neighborhood.
        The assignment is broken into the following functions to allow for unit testing:
         
        For example, assuming your functions are in the  Using the  We can use the numeric grade to compute the averages for neighborhoods for both provided and predicted scores:
     To make it easier to find scores for neighborhoods we combine with the NTA table:
           
      Hints:
        
Learning Objective: students can successfully filter formatted data using standard Pandas operations for selecting and joining data.
      
Available Libraries: Pandas and core Python 3.6+.
      
Data Sources: Neigborhood Tabulation Areas, Restaurant Inspection Data @ OpenData NYC, NYC Department of Health
      Restaurant Grading.
      
Sample Datasets:  Neighborhood Tabulation Areas: nynta.csv.
      Restaurant Inspections:
      restaurants1Aug21.csv,
      restaurants30July.csv.
      
          
make_insp_df(file_name):
            This function takes one input:
              
                  
The function should open the filefile_name: the name of a CSV file containing Restaurant Inspection Data from OpenData NYC.
              file_name as DataFrame, keeping only the columns:
              
              If the 'CAMIS', 'DBA', 'BORO', 'BUILDING', 'STREET', 'ZIPCODE', 'SCORE', 'GRADE', 'NTA'SCORE is null for a row, that row should be dropped.  The resulting DataFrame is returned.
          predict_grade(num_violations):
            This function takes one input:
              
                  
The function should then return the letter grade that corresponds to the number of violation pointsnum_violations: the number of violations points.
              num_violations:
              
                
(from NYC Department of Health
              Restaurant Grading).grade2num(grade):
            This function takes one input:
              
                  
and returns the grade on a 4.0 scale forgrade: a letter grade or null value.
              grade = 'A', 'B', or 'C' (i.e. 4.0, 3.0, or 2.0, respectively).  If grade is None or some other value,
              return None.
          make_nta_df(file_name):
            This function takes one input:
              
                  
The function should open the filefile_name: the name of a CSV file containing neighborhood tabulation areas (nynta.csv).
              file_name as DataFrame, returns a DataFrame
              containing only the columns, NTACode and NTAName.
          compute_ave_grade(df,col):
              This function takes two inputs:
              
                  
This function returns a DataFrame with two columns, thedf: a DataFrame containing Parking Ticket Data from OpenData NYC.
                  col: the name of a numeric-valued col in the DataFrame.
                  NTACode and the average of col for each NTA.
              neighborhood_grades(ave_df,nta_df):
                  This function takes two inputs:
                  
                      
This function returns a DataFrame with the neighborhood names (i.e.ave_df: a DataFrame with containing the column 'NTA'
                      nta_df: a DataFrame with two columns, 'NTACode' and 'NTAName'.
                      NTAName) and the columns from ave_df.  The columns NTA and NTACode should be dropped before returning the DataFrame.
      p3.py:
        
        will print:
            df = p3.make_insp_df('restaurants1Aug21.csv')
print(df)
          Note that all the rows are included (243) but that only the 9 specified columns are retained in the DataFrame.  Several rows have null entries for         CAMIS                         DBA           BORO  ... SCORE GRADE   NTA
0    41178124                     CAFE 57      Manhattan  ...   4.0     A  MN15
1    50111450              CASTLE CHICKEN          Bronx  ...  41.0     N  BX29
2    40699339     NICK GARDEN COFFEE SHOP          Bronx  ...  31.0   NaN  BX05
3    41181395                     DUNKIN'       Brooklyn  ...  10.0     A  BK25
4    50052976           ZON BAKERY & CAFE      Manhattan  ...  72.0   NaN  MN36
..        ...                         ...            ...  ...   ...   ...   ...
240  50052976           ZON BAKERY & CAFE      Manhattan  ...  72.0   NaN  MN36
241  41525768               THE WEST CAFE       Brooklyn  ...  10.0     A  BK73
242  50111132  BUONASERA RESTAURANT PIZZA       Brooklyn  ...  16.0     N  BK30
243  40399672         BAGELS & CREAM CAFE         Queens  ...  12.0     A  QN06
244  50104259           ROYAL COFFEE SHOP  Staten Island  ...  69.0     N  SI22
[243 rows x 9 columns]GRADE (e.g. row 2, 4, and 240) while others have letter grades (such as 'N') that are not on the list of possible grades.
          SCORE to compute the likely grade for each inspection, as both a letter and its equivalent on a 4.0 grading scale, yields:
              
              prints many the predicted grade and equivalent numeric grade on the 4.0 scale:
              df['NUM'] = df['GRADE'].apply(p3.grade2num)
df['PREDICTED'] = df['SCORE'].apply(p3.predict_grade)
df['PRE NUM'] = df['PREDICTED'].apply(p3.grade2num)
print(df[ ['DBA','SCORE','GRADE','NUM','PREDICTED','PRE NUM'] ])                           DBA  SCORE GRADE  NUM PREDICTED  PRE NUM
0                       CAFE 57    4.0     A  4.0         A      4.0
1                CASTLE CHICKEN   41.0     N  NaN         C      2.0
2       NICK GARDEN COFFEE SHOP   31.0   NaN  NaN         C      2.0
3                       DUNKIN'   10.0     A  4.0         A      4.0
4             ZON BAKERY & CAFE   72.0   NaN  NaN         C      2.0
..                          ...    ...   ...  ...       ...      ...
240           ZON BAKERY & CAFE   72.0   NaN  NaN         C      2.0
241               THE WEST CAFE   10.0     A  4.0         A      4.0
242  BUONASERA RESTAURANT PIZZA   16.0     N  NaN         B      3.0
243         BAGELS & CREAM CAFE   12.0     A  4.0         A      4.0
244           ROYAL COFFEE SHOP   69.0     N  NaN         C      2.0
[243 rows x 6 columns]
          The first couple of rows are:
          actual_scores = p3.compute_ave_grade(df,'NUM')
predicted_scores = p3.compute_ave_grade(df,'PRE NUM')
scores = actual_scores.join(predicted_scores, on='NTA')
print(scores.head())      NUM   PRE NUM
NTA
BK09  4.0  4.000000
BK17  4.0  4.000000
BK25  4.0  4.000000
BK26  NaN  2.000000
BK28  4.0  3.250000
          The first couple of rows are:
          nta_df = p3.make_nta_df('nynta.csv')
scores_with_nbhd_names = p3.neighborhood_grades(scores,nta_df)
print(scores_with_nbhd_names.head())
    Our predicted scores are the same but almost always decrease when we include the predicted grades from the scores reported.
        NUM   PRE NUM                                         NTAName
0   4.0  4.000000                    Brooklyn Heights-Cobble Hill
1   4.0  4.000000  Sheepshead Bay-Gerritsen Beach-Manhattan Beach
2   4.0  4.000000                                       Homecrest
3   NaN  2.000000                                       Gravesend
4   4.0  3.250000                                Bensonhurst West
          
sys:1: DtypeWarning: Columns (39) have mixed types.Specify dtype option on import or set low_memory=False.
            
                when reading in the parking ticket data. Pandas tries to infer the data type (dtype) of the columns from the values.  Since some columns are a mixture of numeric and character types this can be difficult.  If the file is read in with pd.read_csv(file_name, low_memory=False), the entire column is read in and used to determine type.
            numeric_only = True.