EMF Webinar 2 Homework

Viewing Raw Data

In this section, you’ll try out some of the options for viewing raw data in the EMF. In the Dataset Manager, find the dataset named ARINV_2007_AREA_Jan2012.txt. This ORL nonpoint inventory is part of the 2007 v3 EI project and contains annual emissions for stationary area sources.

Select the dataset in the Dataset Manager and click the View button to open the Dataset Properties View window. Switch to the Data tab and click the View button near the top of the tab. The Data Viewer window will open.

Look for the navigation widget in the top right corner of the Data Viewer window. You should see a line like:

Current: 1 - 100 Filtered: 144721 of 144721

The data window is currently showing records 1 through 100 of the dataset. There are 144,721 total records in the dataset. When initially viewing the data, no filter is applied so all the records are displayed.

Use the Row Filter to display only those rows where the annual emissions value is greater than zero. You will need to set up your row filter using the column ann_emis. Once you have entered your row filter, click the Apply button to apply your filter.

Questions

What row filter did you use? [ann_emis > 0]
How many records matched your row filter? [120,530]

By default, the dataset is sorted by FIPS code, then by SCC code. Suppose you want to group the records by pollutant first. In the Sort Order textbox, change the sort order from FIPS,SCC to poll,fips,scc (the column names are not case sensitive). Click the Apply button to apply the new sort ordering.

Question

How many different pollutants are listed in the first page of the sorted data? [just one: CO]

Next, let’s use the row filter to restrict the inventory to a single county like Fairfield County, CT; FIPS code 09001.

Question

What happens if you use the row filter fips = '9001'? [no records are displayed because the row filter needs to match the FIPS code exactly; use the row filter fips = '09001' instead]

Sort the data for county 09001 by annual emissions value. We want the records with the largest emissions to show up first so add the keyword desc after the column name to sort the records in descending order.

Questions

What sort order value did you use? [ann_emis desc]
Which SCC and pollutant has the highest annual emissions for Fairfield County? [2104008310, CO]

Use the navigation widget to page through the 4 pages of filtered data. The single left and right arrows move backwards and forwards one page at a time. The double arrows will take you to the first and last pages. Each time you change pages, the updated data has to be fetched from the server so it can take a little while for it to show up. Keep an eye on the Current: line to see if the new data has loaded.

You can use the slider to jump to any record in the data. Click and drag the slider handle. You’ll see the record number in the textbox change as you move the handle. When you let go of the handle, the page of data containing the selected record will be loaded (if you’re not already on the appropriate page). You can also type a record number directly into the textbox and then press the Enter key to load the appropriate page.

One final example to try. Set the row filter to fips = '09001' and avd_emis*365 > ann_emis. You may want to change the number of decimal places displayed from the default of 4 to 6 to show more digits in the columns. Type the number 6 into the Decimal Places textbox and click the Format button to apply the changes.

Question

After applying the row filter, you should have 143 filtered rows. What did the row filter do? [The row filter is showing sources where the annual emissions are less than the annualized average-day emissions.]

Using QA Step Templates

For this exercise, you’ll be adding QA steps to the dataset named PTINV_2007_VADGUnits_march2010.orl. This dataset has the dataset type “ORL Point Inventory” and is part of the 2007 v3 EI project.

Add two QA steps using the templates for ORL point inventories: “Summarize by SCC and Pollutant” and “Summarize by SCC and Pollutant with Descriptions”. Tip: In the Add from Template dialog, you can select multiple templates to add multiple QA steps at once. Refer to Section 4.2 Add QA Step From Template.

IMPORTANT! Please edit each QA step you create and add your initials to the QA step’s name. The EMF requires every QA step associated with a dataset to have a unique name. When adding a QA step from a template, the EMF will automatically use the name of the template (i.e. Summarize by SCC and Pollutant) as the new QA step’s name. If a QA step with that exact name already exists, the EMF won’t be able to create a new QA step.

Question

What QA program is used for the QA steps that you added? [SQL]
What is the overall QA Status of each QA step? [Not Started]

After editing each QA step’s name, run the QA steps that you added. You can run each QA step individually from the Edit QA Step window or you can select multiple QA steps in the QA tab of the Dataset Properties Editor window and click the Run button at the bottom of the window. Check the Status window for the status of each run. Refer to Section 4.4 Running QA Steps.

Question

In the Status window, how many messages with the Message Type “RunQAStep” were generated? [4: one started and one completed message for each QA step]

After each QA step has finished running, open the Edit QA Step window for each step. Confirm that the QA step’s Run Status is now “Success” and that the Run Date is recent.

Question

What is the overall QA Status of each QA step now? [In Progress]

View the results of each QA step by clicking the View Results button. The QA step results viewer window allows you to sort and filter the data records just like the Dataset Manager window.

Question

Which SCC in the inventory has the highest NOx emissions? What is the description for that SCC? [20100102; Internal Combustion Engines;Electric Generation;Distillate Oil (Diesel);Reciprocating]

Export the results of the QA steps you created to a local file. Be sure to check the box next to Download result file to local machine?. Refer to Section 4.5 Exporting QA Step Results.

Open the exported files in Excel on your local machine. After you have reviewed the exported results, set the QA Status for each QA step that you created to Complete (refer to Section 4.2 Add QA Step From Template).

Exercise

Use a row filter when exporting your QA step’s results to export only data for Electric Generation Internal Combustion Engines (i.e. SCCs beginning with 201) for which annual NOx emissions are greater than 70 tons/year.

Questions

What row filter did you use? [scc like '201%' and poll = 'NOX' and ann_emis > 70]
How many records are in the exported results? [1]

Comparing Datasets

Loaded onto the MARAMA EMF server are twelve ORL nonroad inventories containing monthly average-day emissions for 2007. These inventories are assigned to the 2007 v3 EI project.

Using Section 4.8 Compare Datasets QA Program as a guide, create and run a custom QA step to compare the average-day emissions for quarter 1 (January, February, and March) to the corresponding quarter 2 emissions (April, May, and June). Your QA step should compare the average-day emissions by SCC and pollutant.

When you create your QA step, add it to the January inventory. You’ll need to give your QA step a unique name like “Q1 vs. Q2 - initials”. When setting the arguments for the Compare Datasets QA program, set the Base Field Suffix to “q1” and the Compare Field Suffix to “q2” to make it easier to interpret the results.

Questions

Which pollutant and SCC saw the largest increase in raw emissions from quarter 1 to quarter 2? How much did the emissions increase by? [CO, 2265004071; 2,740.55 tons]
Which pollutant and SCC saw its emissions increase by 400% from quarter 1 to quarter 2? [NOx, 2282020005]

Exercise

Export the results of your comparison QA step. Use a row filter to only include records where the emissions changed (increased or decreased) by more than 100%.

Question

What row filter did you use? [avd_emis_abspctdiff > 100]

Exercise

For this exercise, you’ll create a QA step to compare annual emissions from 2007 nonpoint inventory data with the corresponding data for 2011. Start by locating the 2007 ORL nonpoint inventory dataset named:

ARINV_2007_AREA_Jan2012

For this dataset, create a custom QA step that uses the Compare Datasets QA program. Make sure to give your QA step a unique name. When setting up the Compare Datasets arguments, include the following 8 2011 FF10 nonpoint inventories as the comparison datasets:

2011NEIv1_nonpoint_20130911_19sep2013_v1.csv
afdust_2011NEIv1_nonpoint_20130911_noepanonprecip_20nov2013_v0.csv
ag_NH3_2011NEIv1_nonpoint_20130911_18sep2013_v0.csv
agburn_2011NEIv1_nonpoint_20130911_19sep2013_v0.csv
EPA_2011_afdust_no_precip_Paved_Unpaved_noNEIstate_20nov2013_v0.csv
oilgas_2011NEIv1_nonpoint_20130911_11sep2013_v0.csv
pfc_2011NEIv1_nonpoint_20130911_19sep2013_v1.csv
rwc_2011NEIv1_nonpoint_20130911_18sep2013_v0.csv

For your comparison report, you can choose what detail level you’d like by setting the Group By Expressions. At a minimum, you’ll aggregate the annual emissions by county and pollutant code. For a more detailed report, you could include partial SCCs as a grouping expression using substr(scc,1,3) to group by the first 3 digits of the SCC code.

The two inventory dataset types don’t use the same names for all of the data fields. This means that you’ll need to use Matching Expressions when setting up the Compare Datasets arguments. You can find the names of the data fields by viewing the raw data for the dataset. For this exercise, the fields of interest are listed below.

Column	ORL Nonpoint	FF10 Nonpoint
County	`FIPS`	`REGION_CD`
SCC code	`SCC`	`SCC`
Pollutant code	`POLL`	`POLL`
Annual emissions	`ANN_EMIS`	`ANN_VALUE`

The 2011 inventories contain data for more regions and pollutants than the 2007 inventory. You’ll use a Where Filter to limit the records that will be compared. The following Where Filter lists all the states and pollutants of interest from the 2007 inventory. We’re skipping PM emissions because of a change in how PM emissions were reported in the 2011 inventories. For your report, start by comparing emissions just in your state.

substr(fips,1,2) IN ('09', '10', '11', '23', '24', '25', '33', '34', '36', '42', '44', '50', '51') and poll IN ('CO', 'NH3', 'NOX', 'SO2', 'VOC')

There are several different ways you could set up the arguments needed by the Compare Datasets QA program. Suggestions for each argument are given below.

Group By Expressions:

fips
poll

Aggregate Expressions:

ann_emis

Matching Expressions:

fips=region_cd
ann_emis=ann_value

Where Filter:

substr(fips,1,2) = '09' and poll IN ('CO', 'NH3', 'NOX', 'SO2', 'VOC')

Base Field Suffix:

2007

Compare Field Suffix:

2011