Today's lab will focus on formatting files.

A standard use of scripting languages is to take data in one format and convert into a format accepted by another program. For example, say your boss gives you a list of new customers to be inserted into the company's database. He has collected the list in a spreadsheet, and exported it as a CSV file (a text file, where all columns are separated by commas-- "comma separated values" file):
De Blasio, Bill
Bloomberg, Mike
Giuliani, Rudy
Dinkins, David

The company's database uses SQL commands and the format for new entries is:

insert into customer (first, last) values ('FirstName','LastName')

If there's only a few names, we can easily convert the file to the desired format, by hand:

insert into customer (first, last) values ('Bill','De Blasio')
insert into customer (first, last) values ('Mike','Bloomberg')
insert into customer (first, last) values ('Rudy','Guiliani')
insert into customer (first, last) values ('David','Dinkins')

If there's many names, or if you have to add entries every night, it is much easier to write a program. Here's the basic outline of the program:

# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    #Open files to be used in the program    
    #For each line,
        #Separate out the first and last names
        #Format the insert string and write it to the file		
    #Close files
	
main()

In Lab 6, we opened and closed files, so, let's add those in here:

# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    inName = input("Enter input filename: ")
    outName = input("Enter output filename: ")

    #Open files to be used in the program
    infile = open(inName,"r")
    outfile = open(outName,"w")
    
    #For each line,
        #Separate out the first and last names
        #Format the insert string and write it to the file
		
    #Close files
    infile.close()
    outfile.close()
	
main()
To make sure there's no typing errors, try running the above program with the mayors.txt as the input file. Nothing will print yet, since we have not included any print() statements.

Next, let's go through the input file, line by line. To do this, we will use the readlines() function, since it allows us to access each line (we've included a print statement in our for loop to test that the program is working):

# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    inName = input("Enter input filename: ")
    outName = input("Enter output filename: ")

    #Open files to be used in the program
    infile = open(inName,"r")
    outfile = open(outName,"w")
    
    #For each line,
    for line in infile.readlines():
        #Separate out the first and last names
        #Format the insert string and write it to the file
        print(line)
		
    #Close files
    infile.close()
    outfile.close()
	
main()
When you run the program, the output is doublespaced! Let's remove the extra newline like we did in the last program (i.e. line[:-1]) and then split it in parts using the comma as our delimiter:
# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    inName = input("Enter input filename: ")
    outName = input("Enter output filename: ")

    #Open files to be used in the program
    infile = open(inName,"r")
    outfile = open(outName,"w")
    
    #For each line,
    for line in infile.readlines():
        #Separate out the first and last names
        #Format the insert string and write it to the file
        last, first = line[:-1].split(',')
        print(first, last)
		
    #Close files
    infile.close()
    outfile.close()
	
main()
Notice that we have the variables ordered: last, first since that's the order that our boss had them in the initial file. Next, we need to format the output using the SQL format he gave us:
insert into customer (first, last) values ('FirstName','LastName')
We want to take the string and insert the values of first and last into the positions marked by {0} and {1}:
"insert into customer (first, last) values ('{0}','{1}')"
which we can do using the format statement:
"insert into customer (first, last) values ('{0}','{1}')".format(first,last)
Let's start by printing that to the screen:
# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    inName = input("Enter input filename: ")
    outName = input("Enter output filename: ")

    #Open files to be used in the program
    infile = open(inName,"r")
    outfile = open(outName,"w")
    
    #For each line,
    for line in infile.readlines():
        #Separate out the first and last names
        #Format the insert string and write it to the file
        last, first = line[:-1].split(',')
        s = "insert into customer (first, last) values ('{0}','{1}')".format(first,last)
        print(s)
		
    #Close files
    infile.close()
    outfile.close()
	
main()
Try running this program. Note that we have some extra spaces before the first names. Those will cause problems with the database if we leave them, so, we'll use one of the useful string methods (p 140 in the textbook) to remove the leading whitespace on the first names: first.lstrip():
# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    inName = input("Enter input filename: ")
    outName = input("Enter output filename: ")

    #Open files to be used in the program
    infile = open(inName,"r")
    outfile = open(outName,"w")
    
    #For each line,
    for line in infile.readlines():
        #Separate out the first and last names
        #Format the insert string and write it to the file
        last, first = line[:-1].split(',')
        s = "insert into customer (first, last) values ('{0}','{1}')".format(first.lstrip(),last)
        print(s)
		
    #Close files
    infile.close()
    outfile.close()
	
main()
Lastly, we need to print this information to the file (we were only printing to the screen to make it easier for us to debug the program). We'll print a message to screen to let the user know where we are in the processing:
# Program to convert CSV file of customer data into SQL insert statements

def main():
    #Ask user for input and output file names
    inName = input("Enter input filename: ")
    outName = input("Enter output filename: ")

    #Open files to be used in the program
    infile = open(inName,"r")
    outfile = open(outName,"w")
    
    #For each line,
    for line in infile.readlines():
        #Separate out the first and last names
        #Format the insert string and write it to the file
        last, first = line[:-1].split(',')
        s = "insert into customer (first, last) values ('{0}','{1}')".format(first.lstrip(),last)
        print(s, file = outfile)
        print("\tWrote {0} to {1}.".format(last,outName))
		
    #Close files
    infile.close()
    outfile.close()
	
main()
This basic structure works for processing most data files. For example, say your boss started collecting addresses from customers. How could you change it to add that information to your database?

If you finish early, you may work on the programming problems.