Session 5, #4: Extracting from USGS CSV Files


Often, there's more than a single number or color on the lines of a file. The USGS file contains all the information about each earthquake observation in a single line:

The first line of the file says what each part represents:

time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource

The first part of the line (up to the first comma) is the date and time the earthquake was observed. Next, separated by more commas, is the latitude and longitude and then additional information about the earthquake observation.


Challenge: What is the latitude and longitude in the line:
'2017-01-17T11:48:48.530Z,5.4319,94.6079,54.55,5.6,mb,,67,2.338,1,us,us10007tps,2017-01-17T12:14:46.732Z,"80km W of Banda Aceh, Indonesia",earthquake,7.4,5.8,0.045,173,reviewed,us,us'

Did you find it? In every line, it's the numbers after the first and second commas:

'2017-01-17T11:48:48.530Z,5.4319,94.6079,54.55,5.6,mb,,67,2.338,1,us,us10007tps,2017-01-17T12:14:46.732Z,"80km W of Banda Aceh, Indonesia",earthquake,7.4,5.8,0.045,173,reviewed,us,us'

Let's have Python to extract those numbers for us. Here's the steps we need to follow:

  1. For each line, lineOfData
  2. Split the lineOfData into pieces
  3. Convert the piece after the first comma into a number and store in latitude
  4. Convert the piece after the second comma into a number and store in longitude

We have done each part of this before. Let's test this part, and then we'll move on to plotting the locations we have extracted from the file:



Challenges: Try the other files that are pre-loaded into the trinket. How are they different?



Challenge: Examine the format of the input data to figure out which column stores the magnitude of the observed earthquake. Print the magnitude for each occurrence.