Emissions Modeling Framework (EMF) User's Guide

NOTE: This version of the user's guide is no longer being updated. The latest version is available at https://www.cmascenter.org/cost/. Click the DOCUMENTATION link and select EMF User's Guide.

For MARAMA-specific installation instructions, see Installing the EMF Client.

Table of Contents

Chapter 1. Overview of the EMF
Chapter 2. Desktop Client
Chapter 3. Datasets
Chapter 4. Dataset Quality Assurance
Chapter 5. Troubleshooting
Chapter 6. Server Administration

Chapter 1. Overview of the EMF ↩

1.1 Introduction ↩

The Emissions Modeling Framework (EMF) is a software system designed to solve many long-standing difficulties of emissions modeling identified at EPA. The overall process of emissions modeling involves gathering measured or estimated emissions data into emissions inventories; applying growth and controls information to create future year and controlled emissions inventories; and converting emissions inventories into hourly, gridded, chemically speciated emissions estimates suitable for input into air quality models such as the Community Multiscale Air Quality (CMAQ) model.

This User’s Guide focuses on the data management and analysis capabilities of the EMF. The EMF also contains a Control Strategy Tool (CoST) for developing future year and controlled emissions inventories and is capable of driving SMOKE to develop CMAQ inputs.

Many types of data are involved in the emissions modeling process including:

Quality assurance (QA) is an important component of emissions modeling. Emissions inventories and other modeling data must be analyzed and reviewed for any discrepancies or outlying data points. Data files need to be organized and tracked so changes can be monitored and updates made when new data is available. Running emissions modeling software such as the Sparse Matrix Operator Kernel Emissions (SMOKE) Modeling System requires many configuration options and input files that need to be maintained so that modeling output can be reproduced in the future. At all stages, coordinating tasks and sharing data between different groups of people can be difficult and specialized knowledge may be required to use various tools.

In your emissions modeling work, you may have found yourself asking questions like:

The EMF helps with these issues by using a client-server system where emissions modeling information is centrally stored and can be accessed by multiple users. The EMF integrates quality control processes into its data management to help with development of high quality emissions results. The EMF also organizes emissions modeling data and tracks emissions modeling efforts to aid in reproducibility of emissions modeling results. Additionally, the EMF strives to allow non-experts to use emissions modeling capabilities such as future year projections, spatial allocation, chemical speciation, and temporal allocation.

1.2 EMF Components ↩

A typical installation of the EMF system is illustrated in Figure 1-1. In this case, a group of users shares a single EMF server with multiple local machines running the client application. The EMF server consists of a database, file storage, and the server application which handles requests from the clients and communicates with the database. The client application runs on each user’s computer and provides a graphical interface for interacting with the emissions modeling data stored on the server (see Chapter 2.). Each user has his or her own username and password for accessing the EMF server. Some users will have administrative privileges which allow them to access additional system data such as managing users or dataset types.

Figure 1-1: Typical EMF client-server setup
Figure 1-1: Typical EMF client-server setup

For a simpler setup, all of the EMF components can be run on a single machine: database, server application, and client application. With this “all-in-one” setup, the emissions data would generally not be shared between multiple users.

1.3 Basic Workflow ↩

Figure 1-2 illustrates the basic workflow of data in the EMF system.

Figure 1-2: Data workflow in EMF system
Figure 1-2: Data workflow in EMF system

Emissions modeling data files are imported into the EMF system where they are represented as datasets (see Chapter 3.). The EMF supports many different types of data files including emissions inventories, allocation factors, cross-reference files, and reference data. Each dataset matches a dataset type which defines the format of the data to be loaded from the file (Section 3.2). In addition to the raw data values, the EMF stores various metadata about each dataset including the time period covered, geographic region, the history of the data, and data usage in model runs or QA analysis.

Once your data is stored as a dataset, you can review and edit the dataset’s properties (Section 3.5) or the data itself (Section 3.6) using the EMF client. You can also run QA steps on a dataset or set of datasets to extract summary information, compare datasets, or convert the data to a different format (see Chapter 4.).

You can export your dataset to a file and download it to your local computer (Section 3.8). You can also export reports that you create with QA steps for further analysis in a spreadsheet program or to create charts (Section 4.5).

Chapter 2. Desktop Client ↩

2.1 Requirements ↩

The EMF client is a graphical desktop application written in Java. While it is primarily developed and used in Windows, it will run under Mac OS X and Linux (although due to font differences the window layout may not be optimal). The EMF client can be run on Windows XP, Windows 7, or Windows 8.

2.1.1 Checking Your Java Installation ↩

The EMF requires Java 6 or greater. The following instructions will help you check if you have Java installed on your Windows machine and what version is installed. If you need more details, please visit How to find Java version in Windows [java.com].

The latest version(s) of Java on your system will be listed as Java 7 with an associated Update number (eg. Java 7 Update 21). Older versions may be listed as Java(TM), Java Runtime Environment, Java SE, J2SE or Java 2.

Windows 8

  1. Right-click on the screen at bottom-left corner and choose the Control Panel from the pop-up menu.
  2. When the Control Panel appears, select Programs
  3. Click Programs and Features
  4. The installed Java version(s) are listed.

Windows 7 and Vista

  1. Click Start
  2. Select Control Panel
  3. Select Programs
  4. Click Programs and Features
  5. The installed Java version(s) are listed.

Windows XP

  1. Click Start
  2. Select Control Panel
  3. Click the Add/Remove Programs control panel icon
  4. The Add/Remove control panel displays a list of software on your system, including any Java versions that are on your computer.

Figure 2-1 shows the Programs and Features Control Panel on Windows 7 with Java installed. The installed version of Java is Version 7 Update 45; this version does not need to be updated to run the EMF client.

Figure 2-1: Programs and Features Control Panel
Figure 2-1: Programs and Features Control Panel

2.1.2 Installing Java ↩

If you need to install Java, please follow the instructions for downloading and installing Java for a Windows computer [java.com]. Note that you will need administrator privileges to install Java on Windows. During the installation, make a note of the directory where Java is installed on your computer. You will need this information to configure the EMF client.

2.1.3 Updating Java ↩

If Java is installed on your computer but is not version 6 or greater, you will need to update your Java installation. Start by opening the Java Control Panel from the Windows Control Panel. Figure 2-2 shows the Java Control Panel.

Figure 2-2: Java Control Panel
Figure 2-2: Java Control Panel

Clicking the About button will display the Java version dialog seen in Figure 2-3. In Figure 2-3, the installed version of Java is Version 7 Update 45. This version of Java does not need to be updated to run the EMF client.

Figure 2-3: Java Version Dialog
Figure 2-3: Java Version Dialog

To update Java, click the tab labeled Update in the Java Control Panel (see Figure 2-4). Click the button labeled Update Now in the bottom right corner of the Java Control Panel to update your installation of Java.

Figure 2-4: Java Control Panel: Update Tab
Figure 2-4: Java Control Panel: Update Tab

2.2 Installing the EMF Client ↩

The following instructions are specific to the MARAMA EMF installation.

To get started, please contact MARAMA to request the EMF client package. Click on the folder named EMF_State. You should see a page similar to Figure 2-5.

Figure 2-5: EMF Client Folder on Dropbox
Figure 2-5: EMF Client Folder on Dropbox

In the top right corner of the window, click the Download button and choose “Download as .zip”. Once the file EMF_State.zip has finished downloading, open the zip file and drag the folder EMF_State to your C: drive. You want to end up with the directory C:\EMF_State as shown in Figure 2-6. You may need administrator privileges to drag the folder to the C: drive.

Figure 2-6: EMF Client Folder on Windows
Figure 2-6: EMF Client Folder on Windows

Next, determine the location where Java is installed on your computer. Depending on the version of your operating system and which version of Java you have, the location might be:

If the location of your Java executable is anything other than C:\Program Files\Java\jre6\bin\java, you will need to edit the EMFClient.bat file in the C:\EMF_State directory. Right click on the EMFClient.bat file and select Edit to open the file in Notepad. You should see a file like Figure 2-7.

Figure 2-7: Editing EMFClient.bat File
Figure 2-7: Editing EMFClient.bat File

Find the line

set JAVA_EXE=C:\Program Files\Java\jre6\bin\java

and update the location to match your Java installation. Save the file and close Notepad.

To launch the EMF client, double-click the file named EMFClient.bat. You may see a security warning similar to Figure 2-8. Uncheck the box labeled “Always ask before opening this file” to avoid the warning in the future.

Figure 2-8: EMF Client Security Warning
Figure 2-8: EMF Client Security Warning

2.3 Register as a New User and Log In ↩

When you start the EMF client application, you will initially see a login window like Figure 2-9.

Figure 2-9: Login to the Emissions Modeling Framework Window
Figure 2-9: Login to the Emissions Modeling Framework Window

If you are an existing EMF user, enter your EMF username and password in the login window and click the Log In button. If you forget your password, an EMF Administrator can reset it for you. Note: The Reset Password button is used to update your password when it expires; it can’t be used if you’ve lost your password. See Section 2.5 for more information on password expiration.

If you have never used the EMF before, click the Register New User button to bring up the Register New User window as shown in Figure 2-10.

Figure 2-10: Register New User Window
Figure 2-10: Register New User Window

In the Register New User window, enter the following information:

Click OK to create your account. If there are any problems with the information you entered, an error message will be displayed at the top of the window as shown in Figure 2-11.

Figure 2-11: Error Registering New User
Figure 2-11: Error Registering New User

Once you have corrected any errors, your account will be created and the EMF main window will be displayed (Figure 2-12).

Figure 2-12: EMF Main Window
Figure 2-12: EMF Main Window

2.4 Update Your Profile ↩

If you need to update any of your profile information or change your password, click the Manage menu and select My Profile to bring up the Edit User window shown in Figure 2-13.

Figure 2-13: Edit User Profile
Figure 2-13: Edit User Profile

To change your password, enter your new password in the Password field and be sure to enter the same password in the Confirm Password field. Your password must be at least 8 characters long and must contain at least one digit.

Once you have entered any updated information, click the Save button to save your changes and close the Edit User window. You can close the window without saving changes by clicking the Close button. If you have unsaved changes, you will be asked to confirm that you want to discard your changes (Figure 2-14).

Figure 2-14: Discard Changes Confirmation
Figure 2-14: Discard Changes Confirmation

2.5 Password Expiration ↩

Passwords in the EMF expire every 90 days. If you try to log in and your password has expired, you will see the message “Password has expired. Reset Password.” as shown in Figure 2-15.

Figure 2-15: Password Expired
Figure 2-15: Password Expired

Click the Reset Password button to set a new password as shown in Figure 2-16. After entering your new password and confirming it, click the Save button to save your new password and you will be logged in to the EMF. Make sure to use your new password next time you log in.

Figure 2-16: Reset Expired Password
Figure 2-16: Reset Expired Password

2.6 Interface Concepts ↩

As you become familiar with the EMF client application, you’ll encounter various concepts that are reused through the interface. In this section, we’ll briefly introduce these concepts. You’ll see specific examples in the following chapters of this guide.

2.6.1 Viewing vs. Editing ↩

First, we’ll discuss the difference between viewing an item and editing an item. Viewing something in the EMF means that you are just looking at it and can’t change its information. Conversely, editing an item means that you have the ability to change something. Oftentimes, the interface for viewing vs. editing will look similar but when you’re just viewing an item, various fields won’t be editable. For example, Figure 2-17 shows the Dataset Properties View window while Figure 2-18 shows the Dataset Properties Editor window for the same dataset.

Figure 2-17: Viewing a dataset
Figure 2-17: Viewing a dataset
Figure 2-18: Editing a dataset
Figure 2-18: Editing a dataset

In the edit window, you can make various changes to the dataset like editing the dataset name, selecting the temporal resolution, or changing the geographic region. Clicking the Save button will save your changes. In the viewing window, those same fields are not editable and there is no Save button. Notice in the lower left hand corner of Figure 2-17 the button labeled Edit Properties. Clicking this button will bring up the editing window shown in Figure 2-18.

Similarly, Figure 2-19 shows the QA tab of the Dataset Properties View as compared to Figure 2-20 showing the same QA tab but in the Dataset Properties Editor.

Figure 2-19: Viewing QA tab
Figure 2-19: Viewing QA tab
Figure 2-20: Editing QA tab
Figure 2-20: Editing QA tab

In the View window, the only option is to view each QA step whereas the Editor allows you to interact with the QA steps by adding, editing, copying, deleting, or running the steps. If you are having trouble finding an option you’re looking for, check to see if you’re viewing an item vs. editing it.

2.6.2 Access Restrictions ↩

Only one user can edit a given item at a time. Thus, if you are editing a dataset, you have a “lock” on it and no one else will be able to edit it at the same time. Other users will be able to view the dataset as you’re editing it. If you try to edit a locked dataset, the EMF will display a message like Figure 2-21. For some items in the EMF, you may only be able to edit the item if you created it or if your account has administrative privileges.

Figure 2-21: Dataset Locked Message
Figure 2-21: Dataset Locked Message

2.6.3 Unsaved Changes ↩

Generally you will need to click the Save button to save changes that you make. If you have unsaved changes and click the Close button, you will be asked if you want to discard your changes as shown in Figure 2-14. This helps to prevent losing your work if you accidentally close a window.

2.6.4 Refresh ↩

The EMF client application loads data from the EMF server. As you and other users work, your information is saved to the server. In order to see the latest information from other users, the client application needs to refresh its information by contacting the server. The latest data will be loaded from the server when you open a new window. If you are working in an already open window, you may need to click on the Refresh button to load the newest data. Figure 2-22 highlights the Refresh button in the Dataset Manager window. Clicking Refresh will contact the server and load the latest list of datasets.

Figure 2-22: Refresh button in the Dataset Manager window
Figure 2-22: Refresh button in the Dataset Manager window

Various windows in the EMF client application have Refresh buttons, usually in either the top right corner as in Figure 2-22 or in the row of buttons on the bottom right like in Figure 2-20.

You will also need to use the Refresh button if you have made changes and return to a previously opened window. For example, suppose you select a dataset in the Dataset Manager and edit the dataset’s name as described in Section 3.5. When you save your changes, the previously opened Dataset Manager window won’t automatically display the updated name. If you close and re-open the Dataset Manager, the dataset’s name will be refreshed; otherwise, you can click the Refresh button to update the display.

2.6.5 Status Window ↩

Many actions in the EMF are run on the server. For example, when you run a QA step, the client application on your computer sends a message to the server to start running the step. Depending on the type of QA step, this processing can take a while and so the client will allow you to do other work while it periodically checks with the server to find out the status of your request. These status checks are displayed in the Status Window shown in Figure 2-23.

Figure 2-23: Status Window
Figure 2-23: Status Window

The status window will show you messages about tasks when they are started and completed. Also, error messages will be displayed if a task could not be completed. You can click the Refresh button in the Status Window to refresh the status. The Trash icon clears the Status Window.

2.6.6 The Sort-Filter-Select Table ↩

Most lists of data within the EMF are displayed using the Sort-Filter-Select Table, a generic table that allows sorting, filtering, and selection (as the name suggests). Figure 2-24 shows the sort-filter-select table used in the Dataset Manager. (To follow along with the figures, select the main Manage menu and then select Datasets. In the window that appears, find the Show Datasets of Type pull-down menu near the top of the window and select All.)

Figure 2-24: Sort-Filter-Select Table
Figure 2-24: Sort-Filter-Select Table

Row numbers are shown in the first column, while the first row displays column headers. The column labeled Select allows you to select individual rows by checking the box in the column. Selections are used for different activities depending on where the table is displayed. For example, in the Dataset Manager window you can select various datasets and then click the View button to view the dataset properties of each selected dataset. In other contexts, you may have options to change the status of all the selected items or copy the selected items. There are toolbar buttons to allow you to quickly select all items in a table (Section 2.6.12) and to clear all selections (Section 2.6.13).

The horizontal scroll bar at the bottom indicates that there are more columns in the table than fit in the window. Scroll to the right in order to see all the columns as in Figure 2-25.

Figure 2-25: Sort-Filter-Select Table with Scrolled Columns
Figure 2-25: Sort-Filter-Select Table with Scrolled Columns

Notice the info line displayed at the bottom of the table. In Figure 2-25 the line reads 35 rows : 12 columns: 0 Selected [Filter: None, Sort: None]. This line gives information about the total number of rows and columns in the table, the number of selected items, and any filtering or sorting applied.

Columns can be resized by clicking on the border between two column headers and dragging it right or left. Your mouse cursor will change to a horizontal double-headed arrow when resizing columns.

You can rearrange the order of the columns in the table by clicking a column header and dragging the column to a new position. Figure 2-26 shows the sort-filter-select table with columns rearranged and resized.

Figure 2-26: Sort-Filter-Select Table with Rearranged and Resized Columns
Figure 2-26: Sort-Filter-Select Table with Rearranged and Resized Columns

To sort the table using data from a given column, click on the column header such as Last Modified Date. Figure 2-27 shows the table sorted by Last Modified Date in descending order (latest dates first). The table info line now includes Sort: Last Modified Date(-).

Figure 2-27: Sort-Filter-Select Table with Column Sort
Figure 2-27: Sort-Filter-Select Table with Column Sort

If you click the Last Modified Date header again, the table will re-sort by Last Modified Date in ascending order (earliest dates first). The table info line also changes to Sort: Last Modified Date(+) as seen in Figure 2-28.

Figure 2-28: Sort-Filter-Select Table with Reversed Column Sort
Figure 2-28: Sort-Filter-Select Table with Reversed Column Sort

The toolbar at the top of the table (as shown in Figure 2-29) has buttons for the following actions (from left to right):

Figure 2-29: Toolbar for Sort-Filter-Select Table
Figure 2-29: Toolbar for Sort-Filter-Select Table
  1. Sort options
  2. Filter rows
  3. Show or hide columns
  4. Format data in columns
  5. Reset table’s sorting, filtering, and column layout
  6. Select all rows
  7. Clear all selections

If you hover your mouse over any of the buttons, a tooltip will pop up to remind you of each button’s function.

2.6.7 Sort Options ↩

The Sort toolbar button brings up the Sort Columns dialog as shown in Figure 2-30. This dialog allows you to sort the table by multiple columns and also allows case sensitive sorting. (Quick sorting by clicking a column header uses case insensitive sorting.)

Figure 2-30: Sort Columns Dialog
Figure 2-30: Sort Columns Dialog

In the Sort Columns Dialog, select the first column you would use to sort the data from the Sort By pull-down menu. You can also specify if the sort order should be ascending or descending and if the sort comparison should be case sensitive.

To add additional columns to sort by, click the Add button and then select the column in the new Then Sort By pull-down menu. When you have finished setting up your sort selections, click the OK button to close the dialog and re-sort the table. The info line beneath the table will show all the columns used for sorting like Sort: Creator(+), Last Modified Date(-).

To remove your custom sorting, click the Clear button in the Sort Columns dialog and then click the OK button. You can also use the Reset toolbar button to reset all custom settings as described in Section 2.6.11.

2.6.8 Filter Rows ↩

The Filter Rows toolbar button brings up the Filter Rows dialog as shown in Figure 2-31. This dialog allows you to create filters to “whittle down” the rows of data shown in the table. You can filter the table’s rows based on any column with several different value matching options.

Figure 2-31: Filter Rows Dialog
Figure 2-31: Filter Rows Dialog

To add a filter criterion, click the Add Criteria button and a new row will appear in the dialog window. Clicking the cell directly under the Column Name header displays a pull-down menu to pick which column you would like use to filter the rows. The Operation column allows you to select how the filter should be applied; for example, you can filter for data that starts with the given value or does not contain the value. Finally, click the cell under the Value header and type in the value to use. Note that the filter values are case-sensitive. A filter value of “nonroad” would not match the dataset type “ORL Nonroad Inventory”.

If you want to specify additional criteria, click Add Criteria again and follow the same process. To remove a filter criterion, click on the row you want to remove and then click the Delete Criteria button.

If the radio button labeled Match using: is set to ALL criteria, then only rows that match all the specified criteria will be shown in the filtered table. If Match using: is set to ANY criteria, then rows will be shown if they meet any of the criteria listed.

Once you are done specifying your filter options, click the OK button to close the dialog and return to the filtered table. The info line beneath the table will include your filter criteria like Filter: Creator contains rhc, Temporal Resolution starts with Ann.

To remove your custom filtering, you can delete the filter criteria from the Filter Rows dialog or uncheck the Apply Filter? checkbox to turn off the filtering without deleting your filter rules. You can also use the Reset toolbar button to reset all custom settings as described in Section 2.6.11. Note that clicking the Reset button will delete your filter rules.

2.6.9 Show or Hide Columns ↩

The Show/Hide Columns toolbar button brings up the Show/Hide Columns dialog as shown in Figure 2-32. This dialog allows you to customize which columns are displayed in the table.

Figure 2-32: Show/Hide Columns Dialog
Figure 2-32: Show/Hide Columns Dialog

To hide a column, uncheck the box next to the column name under the Show? column. Click the OK button to return to the table. The columns you unchecked will no longer be seen in the table. The info line beneath the table will also be updated with the current number of displayed columns.

To make a hidden column appear again, open the Show/Hide Columns dialog and check the Show? box next to the hidden column’s name. Click OK to close the Show/Hide Columns dialog.

To select multiple columns to show or hide, click on the first column name of interest. Then hold down the Shift key and click a second column name to select it and the intervening columns. Once rows are selected, clicking the Show or Hide buttons in the middle of the dialog will check or uncheck all the Show? boxes for the selected rows. To select multiple rows that aren’t next to each other, you can hold down the Control key while clicking each row. The Invert button will invert the selected rows. After checking/unchecking the Show? checkboxes, click OK to return to the table with the columns shown/hidden as desired.

The Show/Hide Columns dialog also supports filtering to find columns to show or hide. This is an infrequently used option most useful for locating columns to show or hide when there are many columns in the table. Figure 2-33 shows an example where a filter has been set up to match column names that contain the value “Date”. Clicking the Select button above the filtering options selects matching rows which can then be hidden by clicking the Hide button.

Figure 2-33: Show/Hide Columns with Column Name Filter
Figure 2-33: Show/Hide Columns with Column Name Filter

2.6.10 Format Data in Columns ↩

The Format Columns toolbar button displays the Format Columns dialog show in Figure 2-34. This dialog allows you to customize the formatting of columns. In practice, this dialog is not used very often but it can be helpful to format numeric data by changing the number of decimal places or the number of significant digits shown.

Figure 2-34: Format Columns Dialog
Figure 2-34: Format Columns Dialog

To change the format of a column, first check the checkbox next to the column name in the Format? column. If you only select columns that contain numeric data, the Numeric Format Options section of the dialog will appear; otherwise, it will not be visible. The Format Columns dialog supports filtering by column name similar to the Show/Hide Columns dialog (Section 2.6.9).

From the Format Columns dialog, you can change the font, the style of the font (e.g. bold, italic), the horizontal alignment for the column (e.g. left, center, right), the text color, and the column width. For numeric columns, you can specify the number of significant digits and decimal places.

2.6.11 Reset Table ↩

The Reset toolbar button will remove all customizations from the table: sorting, filtering, hidden columns, and formatting. It will also reset the column order and set column widths back to the default.

2.6.12 Select All Rows ↩

The Select All toolbar button selects all the rows in the table. After clicking the Select All button, you will see that the checkboxes in the Select column are now all checked. You can select or deselect an individual item by clicking its checkbox in the Select column.

2.6.13 Clear All Selections ↩

The Clear All Selections toolbar button unselects all the rows in the table.

Chapter 3. Datasets ↩

3.1 Introduction ↩

Emissions inventories, reference data, and other types of data files are imported into the EMF and stored as datasets. A dataset encompasses both the data itself as well as various dataset properties such as the time period covered by the dataset and geographic extent of the dataset. Changes to a dataset are tracked as dataset revisions. Multiple versions of the data for a dataset can be stored in the EMF.

3.2 Dataset Types ↩

Each dataset has a dataset type. The dataset type describes the format of the dataset’s data. For example, the dataset type for an ORL Point Inventory (PTINV) defines the various data fields of the inventory file such as FIPS code, SCC code, pollutant name, and annual emissions value. A different dataset type like Spatial Surrogates (A/MGPRO) defines the fields in the corresponding file: surrogate code, FIPS code, grid cell, and surrogate fraction.

The EMF also supports flexible dataset types without fixed format - Comma Separated Value and Line-based. These types allow for new kinds of data to be loaded into the EMF without requiring updates to the EMF software.

When importing data into the EMF, you can choose between internal dataset types where the data itself is stored in the EMF database and external dataset types where the data remains in a file on disk and the EMF only tracks the metadata. For internal datasets, the EMF provides data editing, revision and version tracking, and data analysis using SQL queries. External datasets can be used to track files that don’t need these features or data that can’t be loaded into the EMF like binary NetCDF files.

You can view the dataset types defined in the EMF by selecting Dataset Types from the main Manage menu. EMF administrators can add, edit, and remove dataset types; non-administrative users can view the dataset types. Figure 3-1 shows the Dataset Type Manager.

Figure 3-1: Dataset Type Manager
Figure 3-1: Dataset Type Manager

To view the details of a particular dataset type, check the box next to the type you want to view (for example, “Flat File 2010 Nonpoint”) and then click the View button in the bottom left-hand corner.

Figure 3-2 shows the View Dataset Type window for the Flat File 2010 Nonpoint dataset type. Each dataset type has a name and a description along with metadata about who created the dataset type and when, and also the last modified date for the dataset type.

Figure 3-2: View Dataset Type: Flat File 2010 Nonpoint
Figure 3-2: View Dataset Type: Flat File 2010 Nonpoint

The dataset type defines the format of the data file as seen in the File Format section of Figure 3-2. For the Flat File 2010 Nonpoint dataset type, the columns from the raw data file are mapped into columns in the database when the data is imported. Each data column must match the type (string, integer, floating point) and can be mandatory or optional.

Keyword-value pairs can be used to give the EMF more information about a dataset type. Table 3-1 lists some of the keywords available. Section 3.5.3 provides more information about using and adding keywords.

Table 3-1: Dataset Type Keywords
Keyword Description Example
EXPORT_COLUMN_LABEL Indicates if columns labels should be included when exporting the data to a file FALSE
EXPORT_HEADER_COMMENTS Indicates if header comments should be included when exporting the data to a file FALSE
EXPORT_INLINE_COMMENTS Indicates if inline comments should be included when exporting the data to a file FALSE
EXPORT_PREFIX Filename prefix to include when exporting the data to a file ptinv_
EXPORT_SUFFIX Filename suffix to use when exporting the data to a file .csv
INDICES Tells the system to create indices in the database on the given columns region_cd|country_cd|scc
REQUIRED_HEADER Indicates a line that must occur in the header of a data file #FORMAT=FF10_ACTIVITY

Each dataset type can have QA step templates assigned. These are QA steps that apply to any dataset of the given type. More information about using QA step templates in given in Chapter 4..

3.2.1 Common Dataset Types ↩

Dataset types can be added, edited, or deleted by EMF administrators. In this section, we list dataset types that are commonly used. Your EMF installation may not include all of these types or may have additional types defined.

3.2.1.1 Common Inventory Dataset Types

Table 3-2: Inventory Dataset Types
Dataset Type Name Description Link to File Format
Day-Specific Point Inventory (PTDAY) Point day-specific emissions inventory in EMS–95 format SMOKE documentation
Flat File 2010 Activity Onroad mobile activity data (VMT, VPOP, speed) in Flat File 2010 (FF10) format SMOKE documentation
Flat File 2010 Activity Nonpoint Nonpoint activity data in FF10 format Same format as Flat File 2010 Activity
Flat File 2010 Activity Point Point activity data in FF10 format Not available
Flat File 2010 Nonpoint Nonpoint or nonroad emissions inventory in FF10 format SMOKE documentation
Flat File 2010 Nonpoint Daily Nonpoint or nonroad day-specific emissions inventory in FF10 format SMOKE documentation
Flat File 2010 Point Point emissions inventory in FF10 format SMOKE documentation
Flat File 2010 Point Daily Point day-specific emissions inventory in FF10 format SMOKE documentation
IDA Activity Onroad mobile activity data in IDA format SMOKE documentation
IDA Mobile Onroad mobile emissions inventory in IDA format SMOKE documentation
IDA Nonpoint/Nonroad Nonpoint or nonroad emissions inventory in IDA format SMOKE documentation
IDA Point Point emissions inventory in IDA format SMOKE documentation
Individual Hour-Specific Point Inventory Point hour-specific emissions inventory in EMS–95 format SMOKE documentation
ORL Day-Specific Fires Data Inventory (PTDAY) Day-specific fires inventory SMOKE documentation
ORL Fire Inventory (PTINV) Wildfire and prescribed fire inventory SMOKE documentation
ORL Nonpoint Inventory (ARINV) Nonpoint emissions inventory in ORL format SMOKE documentation
ORL Nonroad Inventory (ARINV) Nonroad emissions inventory in ORL format SMOKE documentation
ORL Onroad Inventory (MBINV) Onroad mobile emissions inventory in ORL format SMOKE documentation
ORL Point Inventory (PTINV) Point emissions inventory in ORL format SMOKE documentation

3.2.1.2 Common Reference Data Dataset Types

Table 3-3: Reference Data Dataset Types
Dataset Type Name Description Link to File Format
Country, state, and county names and data (COSTCY) List of region names and codes with default time zones and daylight-saving time flags SMOKE documentation
Grid Descriptions (Line-based) List of projections and grids I/O API documentation
Holiday Identifications (Line-based) Holidays date list SMOKE documentation
Inventory Table Data (INVTABLE) Pollutant reference data SMOKE documentation
MACT description (MACTDESC) List of MACT codes and descriptions SMOKE documentation
NAICS description file (NAICSDESC) List of NAICS codes and descriptions SMOKE documentation
ORIS Description (ORISDESC) List of ORIS codes and descriptions SMOKE documentation
Point-Source Stack Replacements (PSTK) Replacement stack parameters SMOKE documentation
SCC Descriptions (Line-based) List of SCC codes and descriptions SMOKE documentation
SIC Descriptions (Line-based) List of SIC codes and descriptions SMOKE documentation
Surrogate Descriptions (SRGDESC) List of surrogate codes and descriptions SMOKE documentation

3.2.1.3 Common Emissions Modeling Cross-Reference and Factors Dataset Types

Table 3-4: Emissions Modeling Dataset Types
Dataset Type Name Description Link to File Format
Area-to-point Conversions (Line-based) Point locations to assign to stationary area and nonroad mobile sources SMOKE documentation
Chemical Speciation Combo Profiles (GSPRO_COMBO) Multiple speciation profile combination data SMOKE documentation
Chemical Speciation Cross-Reference (GSREF) Cross-reference data to match inventory sources to speciation profiles SMOKE documentation
Chemical Speciation Profiles (GSPRO) Factors to allocate inventory pollutant emissions to model species SMOKE documentation
Gridding Cross Reference (A/MGREF) Cross-reference data to match inventory sources to spatial surrogates SMOKE documentation
Pollutant to Pollutant Conversion (GSCNV) Conversion factors when inventory pollutant doesn’t match speciation profile pollutant SMOKE documentation
Spatial Surrogates (A/MGPRO) Factors to allocate emissions to grid cells SMOKE documentation
Spatial Surrogates (External Multifile) External dataset type to point to multiple surrogates files on disk Individual files have same format as Spatial Surrogates (A/MGPRO)
Temporal Cross Reference (A/M/PTREF) Cross-reference data to match inventory sources to temporal profiles SMOKE documentation
Temporal Profile (A/M/PTPRO) Factors to allocate inventory emissions to hourly estimates SMOKE documentation

3.2.1.4 Common Growth and Controls Dataset Types

Table 3-5: Growth and Controls Dataset Types
Dataset Type Name Description Link to File Format
Allowable Packet Allowable emissions cap or replacement values SMOKE documentation
Allowable Packet Extended Allowable emissions cap or replacement values; supports monthly values Download CSV
Control Packet Control efficiency, rule effectiveness, and rule penetration rate values SMOKE documentation
Control Packet Extended Control percent reduction values; supports monthly values Download CSV
Control Strategy Detailed Result Output from CoST Not available
Control Strategy Least Cost Control Measure Worksheet Output from CoST Not available
Control Strategy Least Cost Curve Summary Output from CoST Not available
Projection Packet Factors to grow emissions values into the past or future SMOKE documentation
Projection Packet Extended Projection factors; supports monthly values Download CSV
Strategy County Summary Output from CoST Not available
Strategy Impact Summary Output from CoST Not available
Strategy Measure Summary Output from CoST Not available
Strategy Messages (CSV) Output from CoST Not available

3.3 The Dataset Manager ↩

The main interface for finding and interacting with datasets is the Dataset Manager. To open the Dataset Manager, select the Manage menu at the top of the EMF main window, and then select the Datasets menu item. It may take a little while for the window to appear. As shown in Figure 3-3, the Dataset Manager initially does not show any datasets. This is to avoid loading a potentially large list of datasets from the server.

Figure 3-3: Empty Dataset Manager Window
Figure 3-3: Empty Dataset Manager Window

From the Dataset Manager you can:

To quickly find datasets of interest, you can use the Show Datasets of Type pull-down menu at the top of the Dataset Manager window. Select “ORL Point Inventory (PTINV)” and the datasets matching that Dataset Type are loaded into the Dataset Manager as shown in Figure 3-4.

Figure 3-4: Dataset Manager Window with Datasets
Figure 3-4: Dataset Manager Window with Datasets

The matching datasets are shown in a table that lists some of their properties, including the dataset’s name, last modified date, dataset type, status indicating how the dataset was created, and the username of the dataset’s creator. Table 3-6 describes each column in the Dataset Manager window. In the Dataset Manager window, use the horizontal scroll bar to scroll the table to the right to see all the columns.

Table 3-6: Dataset Manager Columns
Column Description
Name A unique name or label for the dataset. You choose this name when importing data and it can be edited by users with appropriate privileges.
Last Modified Date The most recent date and time when the data (not the metadata) of the dataset was modified. When the dataset is initially imported, the Last Modified Date is set to the file’s timestamp.
Type The Dataset Type of this dataset. The Dataset Type incorporates information about the structure of the data and information regarding how the data can be sorted and summarized.
Status Shows whether the dataset was imported from disk or created in some other way such as an output from a control strategy.
Creator The username of the person who originally created the dataset.
Intended Use Specifies whether the dataset is intended to be public (accessible to any user), private (accessible only to the creator), or to be used by a specific group of users.
Project The name of a study or set of work for which this dataset was created. The project field can help you organize related files.
Region The name of a geographic region to which the dataset applies.
Start Date The start date and time for the data contained in the dataset.
End Date The end date and time for the data contained in the dataset.
Temporal Resolution The temporal resolution of the data contained in the dataset (e.g. annual, daily, or hourly).

Using the Dataset Manager, you can select datasets of interest by checking the checkboxes in the Select column and then perform various actions related to those datasets. Table 3-7 lists the buttons along the bottom of the Dataset Manager window and describes the actions for each button.

Table 3-7: Dataset Manager Actions
Command Description
View Displays a read-only Dataset Properties View for each of the selected datasets. You can view a dataset even when someone else is editing that dataset’s properties or data.
Edit Properties Opens a writeable Dataset Properties Editor for each of the selected datasets. Only one user can edit a dataset at any given time.
Edit Data Opens a Dataset Versions Editor for each of the selected datasets.
Remove Marks each of the selected datasets for deletion. Datasets are not actually deleted until you click purge.
Import Opens the Import Datasets window where you can import data files into the EMF as new datasets.
Export Opens the Export window to write the data for one version of the selected dataset to a file.
Purge Permanently removes any datasets that are marked for deletion from the EMF.
Close Closes the Dataset Manager window.

3.4 Finding Datasets ↩

There are several ways to find datasets using the Dataset Manager. First, you can show all datasets with a particular dataset type by choosing the dataset type from the Show Datasets of Type menu. If there are more than a couple hundred datasets matching the type you select, the system will warn you and suggest you enter something in the Name Contains field to limit the list.

3.4.1 Dataset Name Matching ↩

The Name Contains field allows you to enter a search term to match dataset names. For example, if you type 2020 in the textbox and then hit Enter, the Dataset Manager will show all the datasets with “2020” in their names. You can also use wildcards in your keyword. Using the keyword pt*2020 will show all datasets whose name contains “pt” followed at some point by “2020” as shown in Figure 3-5. The Name Contains search is not case sensitive.

Figure 3-5: Using the Name Contains Keyword
Figure 3-5: Using the Name Contains Keyword

3.4.2 Advanced Dataset Search ↩

If you want to search for datasets using attributes other than the dataset’s name or using multiple criteria, click the Advanced button. The Advanced Dataset Search dialog as shown in Figure 3-6 will be displayed.

Figure 3-6: Using the Advanced Search on the Dataset Manager
Figure 3-6: Using the Advanced Search on the Dataset Manager

You can use the Advanced Dataset Search to search for datasets based on the contents of the dataset’s description, the dataset’s creator, project, and more. Table 3-8 lists the options for the advanced search.

Table 3-8: Advanced Dataset Search Options
Search option Description
Name contains Performs a case-insensitive search of the dataset name; supports wildcards
Description contains Performs a case-insensitive search of the dataset description; supports wildcards
Creator Matches datasets created by the specified user
Dataset type Matches datasets of the specified type
Keyword Matches datasets that have the specified keyword
Keyword value Matches datasets where the specified keyword has the specified value; must exactly match the dataset’s keyword value (case-insensitive)
QA name contains Performs a case-insensitive search of the names of the QA steps associated with datasets
Search QA arguments Searches the arguments to QA steps associated with datasets
Project Matches datasets assigned to the specified project
Used by Case Inputs Finds datasets by case (not described in this User’s Guide)
Data Value Filter Matches datasets using SQL like “FIPS='37001' and SCC like '102005%'”; must be used with the dataset type criterion

After setting your search criteria, click OK to perform the search and update the Dataset Manager window. The Advanced Dataset Search dialog will remain visible until you click Close. This allows you to refine your search or perform additional searches if needed. If you specify multiple search criteria, a dataset must satisfy all of the specified criteria to be shown in the Dataset Manager.

3.4.3 Dataset Filtering ↩

Another option for finding datasets is to use the filtering options of the Dataset Manager. (See Section 2.6.8 for a complete description of the Filter Rows dialog.) Filtering helps narrow down the list of datasets already shown in the Dataset Manager. Click the Filter Rows button in the toolbar to bring up the Filter Rows dialog. In the dialog, you can create a filter to show only datasets whose dataset type contains the word “Inventory” (see Figure 3-7).

Figure 3-7: Create Filter by Dataset Type
Figure 3-7: Create Filter by Dataset Type

Once you’ve entered the filter criteria, click OK to return to the Dataset Manager. The list of datasets has now been reduced to only those matching the filter as shown in Figure 3-8.

Figure 3-8: Datasets Filtered by Dataset Type
Figure 3-8: Datasets Filtered by Dataset Type

Using filtering allows you to search for datasets using any column shown in the Dataset Manager. Remember that filtering will only apply to the datasets already shown in the table - it doesn’t search the database for additional datasets like the Advanced Dataset Search feature.

3.5 Viewing and Editing Dataset Properties ↩

To view or edit the properties of a dataset, select the dataset in the Dataset Manager and then click either the View or Edit Properties button at the bottom of the window. The Dataset Properties View or Editor window will be displayed with the Summary tab selected as shown in Figure 3-9. If multiple datasets are selected, separate Dataset Properties windows will be displayed for each selected dataset.

Figure 3-9: Dataset Properties Editor - Summary Tab
Figure 3-9: Dataset Properties Editor - Summary Tab

The interface for viewing dataset properties is very similar to the editing interface except that the values are all read-only. In this section, we will show the editing versions of the interface so that all available options are shown. In general, if you don’t need to edit a dataset, it’s better to just view the properties since viewing the dataset doesn’t lock it for editing by another user.

The Dataset Properties window divides its data into several tabs. Table 3-9 gives a brief description of each tab.

Table 3-9: Dataset Properties Tabs
Tab Description
Summary Shows high-level properties of the dataset
Data Provides access to the actual data stored for the dataset
Keywords Shows additional types of metadata not found on the Summary tab
Notes Shows comments that users have made about the dataset and questions they may have
Revisions Shows the revisions that have been made to the dataset
History Shows how the dataset has been used in the past
Sources Shows where the data came from and where it is stored in the database, if applicable
QA Shows QA steps that have been run using the dataset

There are several buttons at the bottom of the editor window that appear on all tabs:

3.5.1 Summary ↩

The Summary tab of the Dataset Properties Editor (Figure 3-9) displays high level summary information about the Dataset. Many of these properties are shown in the list of datasets displayed by the Dataset Manager and as a result are described in Table 3-6. The additional properties available in the Summary tab are described in Table 3-10.

Table 3-10: Summary Tab Dataset Properties (not included in Dataset Manager)
Column Description
Description Descriptive information about the dataset. The contents of this field are initially populated from the full-line comments found in the header and other sections of the file used to create the dataset when it is imported. Users are free to add on to the contents of this field which is written to the top of the resulting file when the data is exported from the EMF.
Sector The emissions sector to which this data applies.
Country The country to which the data applies.
Last Accessed Date The date/time the data was last exported.
Creation Date The date/time the dataset was created.
Default Version Indicates which version of the dataset is considered to be the default. The default version of a dataset is important in that it indicates to other users and to some quality assurance queries the appropriate version of the dataset to be used.

Values of text fields (boxes with white background) are changed by typing into the fields. Other properties are set by selecting items from pull-down menus.

Some notes about updating the various editable fields follow:

3.5.2 Data ↩

The Data tab of the Dataset Properties Editor (Figure 3-10) provides access to the actual data stored for the dataset. If the dataset has multiple versions, they will be listed in the Versions table.

Figure 3-10: Dataset Properties Editor - Data Tab
Figure 3-10: Dataset Properties Editor - Data Tab

To view the data associated with a particular version, select the version and click the View button. For more information about viewing the raw data, see Section 3.6. The Copy button allows you to copy any version of the data marked as final to a new dataset.

3.5.3 Keywords ↩

The Keywords tab of the Dataset Properties Editor (Figure 3-11) shows additional types of metadata about the dataset stored as keyword-value pairs.

Figure 3-11: Dataset Properties Editor - Keywords Tab
Figure 3-11: Dataset Properties Editor - Keywords Tab

The Keywords Specific to Dataset Type section show keywords associated with the dataset’s type. These keywords are described in Section 3.2.

Additional dataset-specific keywords can be added by clicking the Add button. A new entry will be added to the Keyword Specific to Dataset section of the window. Type the keyword and its value in the Keyword and Value cells.

3.5.4 Notes ↩

The Notes tab of the Dataset Properties Editor (Figure 3-12) shows comments that users have made about the dataset and questions they may have. Each note is associated with a particular version of a dataset.

Figure 3-12: Dataset Properties Editor - Notes Tab
Figure 3-12: Dataset Properties Editor - Notes Tab

To create a new note about a dataset, click the Add button and the Create New Note dialog will open (Figure 3-13). Notes can reference other notes so that questions can be answered. Click the Set button to display other notes for this dataset and select any referenced notes.

Figure 3-13: Create New Note
Figure 3-13: Create New Note

The Add Existing button in the Notes tab opens a dialog to add existing notes to the dataset. This feature is useful if you need to add the same note to a set of files. Add a new note for the first dataset and then for subsequent datasets, use the “Note name contains:” field to search for the newly added note. In the list of matched notes, select the note to add and click the OK button.

Figure 3-14: Add Existing Notes to Dataset
Figure 3-14: Add Existing Notes to Dataset

3.5.5 Revisions ↩

The Revisions tab of the Dataset Properties Editor (Figure 3-15) shows revisions that have been made to the data contained in the dataset. See Section 3.7 for more information about editing the raw data.

Figure 3-15: Dataset Properties Editor - Revisions Tab
Figure 3-15: Dataset Properties Editor - Revisions Tab

3.5.6 History ↩

The History tab of the Dataset Properties Editor (Figure 3-16) shows the export history of the dataset. When the dataset is exported, a history record is automatically created containing the name of the user who exported the data, the version that was exported, the location on the server where the file was exported, and statistics about how many lines were exported and the export time.

Figure 3-16: Dataset Properties Editor - History Tab
Figure 3-16: Dataset Properties Editor - History Tab

3.5.7 Sources ↩

The Sources tab of the Dataset Properties Editor (Figure 3-17) shows where the data associated with the dataset came from and where it is stored in the database, if applicable. For datasets where the data is stored in the EMF database, the Table column shows the name of the table in the EMF database and Source lists the original file the data was imported from.

Figure 3-17: Dataset Properties Editor - Sources Tab
Figure 3-17: Dataset Properties Editor - Sources Tab

Figure 3-18 shows the Sources tab for a dataset that references external files. In this case, there is no Table column since the data is not stored in the EMF database. The Source column lists the current location of the external file. If the location of the external file changes, you can click the Update button to browse for the file in its new location.

Figure 3-18: Sources for External Dataset
Figure 3-18: Sources for External Dataset

3.5.8 QA ↩

The QA tab of the Dataset Properties Editor (Figure 3-19) shows the QA steps that have been run using the dataset. See Chapter 4. for more information about setting up and running QA steps.

Figure 3-19: Dataset Properties Editor - QA Tab
Figure 3-19: Dataset Properties Editor - QA Tab

3.6 Viewing Raw Data ↩

The EMF allows you to view and edit the raw data stored for each dataset. To work with the data, select a dataset from the Dataset Manager and click the Edit Data button to open the Dataset Versions Editor (Figure 3-20). This window shows the same list of versions as the Dataset Properties Data tab (Section 3.5.2).

Figure 3-20: Dataset Versions Editor
Figure 3-20: Dataset Versions Editor

To view the data, select a version and click the View Data button. The raw data is displayed in the Data Viewer as shown in Figure 3-21.

Figure 3-21: Data Viewer
Figure 3-21: Data Viewer

Since the data stored in the EMF may have millions of rows, the client application only transfers a small amount of data (300 rows) from the server to your local machine at a time. The area in the top right corner of the Data Viewer displays information about the currently loaded rows along with controls for paging through the data. The single left and right arrows move through the data one chunk at a time while the double arrows jump to the beginning and end of the data. If you hover your mouse over an arrow, a tooltip will pop up to remind you of its function. The slider allows you to quickly jump to different parts of the data.

You can control how the data are sorted by entering a comma-separated list of columns in the Sort Order field and then clicking the Apply button. A descending sort can be specified by following the column name with desc.

The Row Filter field allows you to enter criteria and filter the rows that are displayed. The syntax is similar to a SQL WHERE clause. Table 3-11 shows some example filters and the syntax for each.

Table 3-11: Examples of Row Filter Syntax
Filter Purpose Row Filter Syntax
Filter on a particular set of SCCs scc like '101%' or scc like '102%'
Filter on a particular set of pollutants poll in ('PM10', 'PM2_5')
Filter sources only in NC (State FIPS = 37), SC (45), and VA (51);
note that FIPS column format is State + County FIPS code (e.g., 37001)
substring(FIPS,1,2) in ('37', '45', '51')
Filter sources only in CA (06) and include only NOx and VOC pollutants fips like '06%' and (poll = 'NOX' or poll = 'VOC')

Figure 3-22 shows the data sorted by the column “ratio” in descending order and filtered to only show rows where the FIPS code is “13013”.

Figure 3-22: Data Viewer with Custom Sort and Row Filter
Figure 3-22: Data Viewer with Custom Sort and Row Filter

The Row Filter syntax used in the Data Viewer can also be used when exporting datasets to create filtered export files (Section 3.8.1. If you would like to create a new dataset based on a filtered existing dataset, you can export your filtered dataset and then import the resulting file as a new dataset. Section 3.8 describes exporting datasets and Section 3.9 explains how to import datasets.

3.7 Editing Raw Data ↩

The EMF does not allow data to be edited after a version has been marked as final. If a dataset doesn’t have a non-final version, first you will need to create a new version. Open the Dataset Versions Editor as shown in Figure 3-20. Click the New Version button to bring up the Create a New Version dialog window like Figure 3-23.

Figure 3-23: Create New Dataset Version
Figure 3-23: Create New Dataset Version

Enter a name for the new version and select the base version. The base version is the starting point for the new version and can only be a version that is marked as final. Click OK to create the new version. The Dataset Versions Editor will show your newly created version (Figure 3-24).

Figure 3-24: Dataset Versions Editor with Non-Final Version
Figure 3-24: Dataset Versions Editor with Non-Final Version

You can now select the non-final version and click the Edit Data button to display the Data Editor as shown in Figure 3-25.

Figure 3-25: Data Editor
Figure 3-25: Data Editor

The Data Editor uses the same paging mechanisms, sort, and filter options as the Data Viewer described in Section 3.6. You can double-click a data cell to edit the value. The toolbar shown in Figure 3-26 provides options for adding and deleting rows.

Figure 3-26: Data Editor Toolbar
Figure 3-26: Data Editor Toolbar

The functions of each toolbar button are described below, listed left to right:

  1. Insert Above: Inserts a new row above the currently selected row.
  2. Insert Below: Inserts a new row below the currently selected row.
  3. Delete: Deletes the selected rows. When you click this button, you will be prompted to confirm the deletion.
  4. Copy Selected Rows: Copies the selected rows.
  5. Insert Copied Rows Below: Pastes the copied rows below the currently selected row.
  6. Select All: Selects all rows.
  7. Clear All: Clears all selections.
  8. Find and Replace Column Values: Opens the Find and Replace Column Values dialog shown in Figure 3-27.
Figure 3-27: Find and Replace Column Values Dialog
Figure 3-27: Find and Replace Column Values Dialog

In the Data Editor window, you can undo your changes by clicking the Discard button. Otherwise, click the Save button to save your changes. If you have made changes, you will need to enter Revision Information before the EMF will allow you to close the window. Revisions for a dataset are shown in the Dataset Properties Revisions tab (see Section 3.5.5).

3.8 Exporting Datasets ↩

When you export a dataset, the EMF will generate a file containing the data in the format defined by the dataset’s type. To export a dataset, you can either select the dataset in the Dataset Manager window and click the Export button or you can click the Export button in the Dataset Properties window. Either way will open the Export dialog as shown in Figure 3-28. If you have multiple datasets selected in the Dataset Manager when you click the Export button, the Export dialog will list each dataset in the Datasets field.

Figure 3-28: Export Dialog
Figure 3-28: Export Dialog

Typically, you will check the Download files to local machine? checkbox. With this option, the EMF will export the dataset to a file on the EMF server and then automatically download it to your local machine. When downloading files to your local machine, the folder input field is not active. The downloaded files will be placed in a temporary directory on your local computer. The EMF property local.temp.dir controls the location of the temporary directory. EMF properties can be edited in the EMFPrefs.txt file. Note that the Overwrite files if they exit? checkbox isn’t functional at this point.

You can enter a prefix to be added to the names of the exported files in the File Name Prefix field. Exported files will be named based on the dataset name and may have prefixes or suffixes attached based on keywords associated with the dataset or dataset type.

If you are exporting a single dataset and that dataset has multiple versions, the Version pull-down menu will allow you to select which version you would like to export. If you are exporting multiple datasets, the default version of each dataset will be exported.

The Row Filter, Filter Dataset, and Filter Dataset Join Condition fields allow for filtering the dataset during export to reduce the total number of rows exported. See Section 3.8.1 for more information about these settings.

Before clicking the Export button, enter a Purpose for your export. This will be logged as part of the history for the dataset. If you do not enter any text in the Purpose field, the fact that you exported the dataset will still be logged as part of the dataset’s history. At this time, history records are only created when the Download files to local machine? checkbox is not checked.

After clicking the Export button, check the Status window to see if any problems arise during the export. If the export succeeds, you will see a status message like

Completed export of nonroad_caps_2005v2_jul_orl_nc.txt to <server directory>/nonroad_caps_2005v2_jul_orl_nc.txt in 2.137 seconds. The file will start downloading momentarily, see the Download Manager for the download status.

You can bring up the Downloads window as shown in Figure 3-29 by opening the Window menu at the top of the EMF main window and selecting Downloads.

Figure 3-29: Downloads Window
Figure 3-29: Downloads Window

As your file is downloading, the progress bar on the right side of the window will update to show you the progress of the download. Once it reaches 100%, your download is complete. Right click on the filename in the Downloads window and select Open Containing Folder to open the folder where the file was downloaded.

3.8.1 Export Filtering Options ↩

The export filtering options allow you to select and export portions of a dataset based on your matching criteria.

The Row Filter field shown in the Export Dialog in Figure 3-28 uses the same syntax as the Data Viewer window (Section 3.6) and allows you to export only a subset of the data. Example filters are shown in Table 3-11.

Filter Dataset and Filter Dataset Join Condition, also shown in Figure 3-28, allow for advanced filtering of the dataset using an additional dataset. For example, if you are exporting a nonroad inventory, you can choose to only export rows that match a different inventory by FIPS code or SCC. When you click the Add button, the Select Datasets dialog appears as in Figure 3-30.

Figure 3-30: Select Filter Datasets
Figure 3-30: Select Filter Datasets

Select the dataset type for the dataset you want to use as a filter from the pull-down menu. You can use the Dataset name contains field to further narrow down the list of matching datasets. Click on the dataset name to select it and then click OK to return to the Export dialog.

The selected dataset is now shown in the Filter Dataset box. If the filter dataset has multiple versions, click the Set Version button to select which version to use for filtering. You can remove the filter dataset by clicking the Remove button.

Next, you will enter the criteria to use for filtering in the Filter Dataset Join Condition textbox. The syntax is similar to a SQL JOIN condition where the left hand side corresponds to the dataset being exported and the right hand side corresponds to the filter dataset. You will need to know the column names you want to use for each dataset.

Table 3-12: Examples of Filter Dataset Join Conditions
Type of Filter Filter Dataset Join Condition
Export records where the FIPS, SCC, and plant IDs are the same in both datasets;
both datasets have the same column names
fips=fips
scc=scc
plantid=plantid
Export records where the SCC, state codes, and pollutants are the same in both datasets;
the column names differ between the datasets
scc=scc_code
substring(fips,1,2)=state_cd
poll=poll_code

Once your filter conditions are set up, click the Export button to begin the export. Only records that match all of the filter conditions will be exported. Status messages in the Status window will contain additional information about your filter. If no records match your filter condition, the export will fail and you will see a status message like:

Export failure. ERROR: nonroad_caps_2005v2_jul_orl_nc.txt will not be exported because no records satisfied the filter

If the export succeeds, the status message will include a count of the number of records in the database and the number of records exported:

No. of records in database: 150845; Exported: 26011

3.9 Importing Datasets ↩

Importing a dataset is the process where the EMF reads a data file or set of data files from disk, stores the data in the database (for internal dataset types), and creates metadata about the dataset. To import a dataset, start by clicking the Import button in the bottom right corner of the Dataset Manager window (Figure 3-4). The Import Datasets dialog will be displayed as shown in Figure 3-31. You can also bring up the Import Datasets dialog from the main EMF File menu, then select Import.

Figure 3-31: Import Datasets Dialog
Figure 3-31: Import Datasets Dialog

An advantage to opening the Import Datasets dialog from the Dataset Manager as opposed to using the File menu is that if you have a dataset type selected in the Dataset Manager Show Datasets of Type pull-down menu, then that dataset type will automatically be selected for you in the Import Datasets dialog.

In the Import Datasets dialog, first use the Dataset Type pull-down menu to select the dataset type corresponding to the file you want to import. For example, if your data file is a annual point-source emissions inventory in Flat File 2010 (FF10) format, you would select the dataset type “Flat File 2010 Point”. Section 3.2.1 lists commonly used dataset types. Keep in mind that your EMF installation may have different dataset types available.

Most dataset types specify that datasets of that type will use data from a single file. For example, for the Flat File 2010 Point dataset type, you will need to select exactly one file to import per dataset. Other dataset types can require or optionally allow multiple files to import into a single dataset. Some dataset types can use a large number of files like the Day-Specific Point Inventory (External Multifile) dataset type which allows up to 366 files for a single dataset. Thus, the Import Datasets dialog will allow you to select multiple files during the import process and has tools for easily matching multiple files.

Next, select the folder where the data files to import are located on the EMF server. You can either type or paste (using Ctrl-V) the folder name into the field labeled Folder, or you can click the Browse button to open the remote file browser as shown in Figure 3-32. Important! To import data files, the files must be accessible by the machine that the EMF server is running on. If the data files are on your local machine, you will need to transfer them to the EMF server before you can import them.

Figure 3-32: Remote File Browser
Figure 3-32: Remote File Browser

To use the remote file browser, you can navigate from your starting folder to the file by either typing or pasting a directory name into the Folder field or by using the Subfolders list on the left side of the window. In the Subfolders list, double-click on a folder’s name to go into that folder. If you need to go up a level, double-click the .. entry.

Once you reach the folder that contains your data files, select the files to import by clicking the checkbox next to each file’s name in the Files section of the browser. The Files section uses the Sort-Filter-Select Table described in Section 2.6.6 to list the files. If you have a large number of files in the directory, you can use the sorting and filtering options of the Sort-Filter-Select Table to help find the files you need.

You can also use the Pattern field in the remote file browser to only show files matching the entered pattern. By default the pattern is just the wildcard character * to match all files. Entering a pattern like arinv*2002*txt will match filenames that start with “arinv”, have “2002” somewhere in the filename, and then end with “txt”.

Once you’ve selected the files to import, click OK to save your selections and return to the Import Datasets dialog. The files you selected will be listed in the Filenames textbox in the Import Datasets dialog as shown in Figure 3-33. If you selected a single file, the Dataset Names field will contain the filename of the selected file as the default dataset name.

Figure 3-33: Import Dataset from Single File
Figure 3-33: Import Dataset from Single File

Update the Dataset Names field with your desired name for the dataset. If the dataset type has EXPORT_PREFIX or EXPORT_SUFFIX keywords assigned, these values will be automatically stripped from the dataset name. For example, the ORL Nonpoint Inventory (ARINV) dataset type defines EXPORT_PREFIX as “arinv_” and EXPORT_SUFFIX as “_orl.txt”. Suppose you select an ORL nonpoint inventory file named “arinv_nonpt_pf4_cap_nopfc_2017ct_ref_orl.txt” to import. By default the Dataset Names field in the Import Datasets dialog will be populated with “arinv_nonpt_pf4_cap_nopfc_2017ct_ref_orl.txt” (the filename). On import, the EMF will automatically convert the dataset name to “nonpt_pf4_cap_nopfc_2017ct_ref” removing the EXPORT_PREFIX and EXPORT_SUFFIX.

Click the Import button to start the dataset import. If there are any problems with your import settings, you’ll see a red error message displayed at the top of the Import Datasets window. Table 3-13 shows some example error messages and suggested solutions.

Table 3-13: Dataset Import Error Messages
Example Error Message Solution
A Dataset Type should be selected Select a dataset type from the Dataset Type pull-down menu.
A Filename should be specified Select a file to import.
A Dataset Name should be specified Enter a dataset name in the Dataset Names textbox.
The ORL Nonpoint Inventory (ARINV) importer can use at most 1 files You selected too many files to import for the dataset type. Select the correct number of files for the dataset type. If you want to import multiple files of the same dataset type, see Section 3.9.1.
The NIF3.0 Nonpoint Inventory importer requires at least 2 files You didn’t select enough files to import for the dataset type. Select the correct number of files for the dataset type.
Dataset name nonpt_pf4_cap_nopfc_2017ct_ref has been used. Each dataset in the EMF needs a unique dataset name. Update the dataset name to be unique. Remember that the EMF will automatically remove the EXPORT_PREFIX and EXPORT_SUFFIX if defined for the dataset type.

If your import settings are good, you will see the message “Started import. Please monitor the Status window to track your import request.” displayed at the top of the Import Datasets window as shown in Figure 3-34.

Figure 3-34: Import Datasets: Started Import
Figure 3-34: Import Datasets: Started Import

In the Status window, you will see a status message like:

Started import of nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0 [ORL Nonpoint Inventory (ARINV)] from arinv_nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0.txt

Depending on the size of your file, the import can take a while to complete. Once the import is complete, you will see a status message like:

Completed import of nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0 [ORL Nonpoint Inventory (ARINV)] in 57.6 seconds from arinv_nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0.txt

To see your newly imported dataset, open the Dataset Manager window and find your dataset by dataset type or using the Advanced search. You may need to click the Refresh button in the upper right corner of the Dataset Manager window to get the latest dataset information from the EMF server.

3.9.1 Importing Multiple Datasets ↩

You can use the Import Datasets window to import multiple datasets of the same type at once. In the remote file browser (shown in Figure 3-32), select all the files you would like to import and click OK. In the Import Datasets window, check the checkbox Create Multiple Datasets as shown in Figure 3-35. The Dataset Names textbox goes away.

Figure 3-35: Import Multiple Datasets
Figure 3-35: Import Multiple Datasets

For each dataset, the EMF will automatically name the dataset using the corresponding filename. If the keywords EXPORT_PREFIX or EXPORT_SUFFIX are defined for the dataset type, the keyword values will be stripped from the filenames when generating the dataset names. If these keywords are not defined for the dataset type, then the dataset name will be identical to the filename.

Click the Import button to start importing the datasets. The Status window will display Started and Completed status messages for each dataset as it is imported.

3.10 Suggestions for Dataset Organization ↩

Figure 3-36: Edit User Profile - Hide Dataset Types
Figure 3-36: Edit User Profile - Hide Dataset Types

Chapter 4. Dataset Quality Assurance ↩

4.1 Introduction ↩

The EMF allows you to perform various types of analyses on a dataset or set of datasets. For example, you can summarize the data by different aspects such as geographic region like county or state, SCC code, pollutant, or plant ID. You can also compare or sum multiple datasets. Within the EMF, running an analysis like this is called a QA step.

A dataset can have many QA steps associated with it. To view a dataset’s QA steps, first select the dataset in the Dataset Manager and click the Edit Properties button. Switch to the QA tab to see the list of QA steps as in Figure 4-1.

Figure 4-1: QA Steps for a Dataset
Figure 4-1: QA Steps for a Dataset

At the bottom of the window you will see a row of buttons for interacting with the QA steps starting with Add from Template, Add Custom, Edit, etc. If you do not see these buttons, make sure that you are editing the dataset’s properties and not just viewing them.

4.2 Add QA Step From Template ↩

Each dataset type can have predefined QA steps called QA Step Templates. QA step templates can be added to a dataset type and configured by EMF Administrators using the Dataset Type Manager (see Section 3.2). QA step templates are easy to run for a dataset because they’ve already been configured.

To see a list of available QA step templates for your dataset, open your dataset’s QA tab in the Dataset Properties Editor (Figure 4-1). Click the Add from Template button to open the Add QA Steps dialog. Figure 4-2 shows the available QA step templates for an ORL Nonroad Inventory.

Figure 4-2: Add QA Steps From Template
Figure 4-2: Add QA Steps From Template

The ORL Nonroad Inventory has various QA step templates for generating different summaries of the inventory.

Summaries “with Descriptions” include more information than those without. For example, the results of the “Summarize by SCC and Pollutant with Descriptions” QA step will include the descriptions of the SCCs and pollutants. Because these summaries with descriptions need to retrieve data from additional tables, they are a bit slower to generate compared to summaries without descriptions.

Select a summary of interest (for example, Summarize by County and Pollutant) by clicking the QA step name. If your dataset has more than one version, you can choose which version to summarize using the Version pull-down menu at the top of the window. Click OK to add the QA step to the dataset.

The newly added QA step is now shown in the list of QA steps for the dataset (Figure 4-3).

Figure 4-3: QA Steps with New Step Added
Figure 4-3: QA Steps with New Step Added

To see the details of the QA step, select the step and click the Edit button. This brings up the Edit QA Step window like Figure 4-4.

Figure 4-4: Edit New QA Step from Template
Figure 4-4: Edit New QA Step from Template

The QA step name is shown at the top of the window. This name was automatically set by the QA step template. You can edit this name if needed to distinguish this step from other QA steps.

The Version pull-down menu shows which version of the data this QA step will run on.

The pull-down menu to the right of the Version setting indicates what type of program will be used for this QA step. In this case, the program type is “SQL” indicating that the results of this QA step will be generated using a SQL query. Most of the summary QA steps are generated using SQL queries. The EMF allows other types of programs to be run as QA steps including Python scripts and various built-in analyses like converting average-day emissions to an annual inventory.

The Arguments textbox shows the arguments used by the QA step program. In this case, the QA step is a SQL query and the Arguments field shows the query that will be run. The special SQL syntax used for QA steps is discussed in Section 4.10.

Other items of interest in the Edit QA Step window include the description and comment textboxes where you can enter a description of your QA step and any comments you have about running the step.

The QA Status field shows the overall status of the QA step. Right now the step is listed as “Not Started” because it hasn’t been run yet. Once the step has been run, the status will automatically change to “In Progress”. After you’ve reviewed the results, you can mark the step as “Complete” for future reference.

The Edit QA Step window also includes options for exporting the results of a QA step to a file. This is described in Section 4.5.

At this point, the next step is to actually run the QA step as described in Section 4.4.

4.3 Adding Custom QA Steps ↩

In addition to using QA steps from templates, you can define your own custom QA steps. From the QA tab of the Dataset Properties Editor (Figure 4-1), click the Add Custom button to bring up the Add Custom QA Step dialog as shown in Figure 4-5.

Figure 4-5: Add Custom QA Step Dialog
Figure 4-5: Add Custom QA Step Dialog

In this dialog, you can configure your custom QA step by entering its name, the program to use, and the program’s arguments.

Creating a custom QA step from scratch is an advanced feature. Oftentimes, you can start by copying an existing step and tweaking it through the Edit QA Step interface.

Section 4.7 shows how to create a custom QA step that uses the built-in QA program “Average day to Annual Inventory” to calculate annual emissions from average-day emissions. Section 4.8 demonstrates using the Compare Datasets QA program to compare two inventories. Section 4.9 gives an example of creating a custom QA step based on a SQL query from an existing QA step.

4.4 Running QA Steps ↩

To run a QA step, open the QA tab of the Dataset Properties Editor and select the QA step you want to run as shown in Figure 4-6.

Figure 4-6: Select a QA Step to Run
Figure 4-6: Select a QA Step to Run

Click the Run button at the bottom of the window to run the QA step. You can also run a QA step from the Edit QA Step window. The Status window will display messages when the QA step begins running and when it completes:

Started running QA step ‘Summarize by County and Pollutant’ for Version ‘Initial Version’ of Dataset ‘nonroad_caps_2005v2_jul_orl_nc.txt’

Completed running QA step ‘Summarize by County and Pollutant’ for Version ‘Initial Version’ of Dataset ‘nonroad_caps_2005v2_jul_orl_nc.txt’

In the QA tab, click the Refresh button to update the table of QA steps as shown in Figure 4-7.

Figure 4-7: Refreshed QA Steps
Figure 4-7: Refreshed QA Steps

The overall QA step status (the QA Status column) has changed from “Not Started” to “In Progress” and the Run Status is now “Success”. The list of QA steps also shows the time the QA step was run in the When column.

To view the results of the QA step, select the step in the QA tab and click the View Results button. A dialog like Figure 4-8 will pop-up asking how many records of the results you would like to preview.

Figure 4-8: View QA Results: Select Number of Records
Figure 4-8: View QA Results: Select Number of Records

Enter the number of records to view or click the View All button to see all records. The View QA Step Results window will display the results of the QA step as shown in Figure 4-9.

Figure 4-9: View QA Results
Figure 4-9: View QA Results

4.5 Exporting QA Step Results ↩

In addition to viewing the results of a QA step in the EMF client application, you can export the results as a comma-separated values (CSV) file. CSV files can be directly opened by Microsoft Excel or other spreadsheet programs to make charts or for further analysis.

To export the results of a QA step, select the QA step of interest in the QA tab of the Dataset Properties Editor. Then click the Edit button to bring up the Edit QA Step window as shown in Figure 4-10.

Figure 4-10: Export QA Step Results
Figure 4-10: Export QA Step Results

Typically, you will want to check the Download result file to local machine? checkbox so the exported file will automatically be downloaded to your local machine. You can type in a name for the exported file in the Export Name field. Then click the Export button. If you did not enter an Export Name, the application will confirm that you want to use an auto-generated name with the dialog shown in Figure 4-11.

Figure 4-11: Export Name Not Specified
Figure 4-11: Export Name Not Specified

Next, you’ll see the Export QA Step Results customization window (Figure 4-12).

Figure 4-12: Export QA Step Results Customization Window
Figure 4-12: Export QA Step Results Customization Window

The Row Filter textbox allows you to limit which rows of the QA step results to include in the exported file. Table 3-11 provides some examples of the syntax used by the row filter. Available Columns lists the column names from the results that could be used in a row filter. In Figure 4-12, the columns fips, poll, and ann_emis are available. To export only the results for counties in North Carolina (state FIPS code = 37), the row filter would be fips like '37%'.

Click the Finish button to start the export. At the top of the Edit QA Step window, you’ll see the message “Started Export. Please monitor the Status window to track your export request.” like Figure 4-13

Figure 4-13: Export QA Step Results Started
Figure 4-13: Export QA Step Results Started

Once your export is complete, you will see a message in the Status window like

Completed exporting QA step ‘Summarize by SCC and Pollutant’ for Version ‘Initial Version’ of Dataset ‘nonpt_pf4_cap_nopfc_2017ct_nc_sc_va’ to <server directory>avg_day_scc_poll_summary.csv. The file will start downloading momentarily, see the Download Manager for the download status.

You can bring up the Downloads window as shown in Figure 4-14 by opening the Window menu at the top of the EMF main window and selecting Downloads.

Figure 4-14: Downloads Window: QA Step Results
Figure 4-14: Downloads Window: QA Step Results

As your file is downloading, the progress bar on the right side of the window will update to show you the progress of the download. Once it reaches 100%, your download is complete. Right click on the filename in the Downloads window and select Open Containing Folder to open the folder where the file was downloaded.

If you have Microsoft Excel or another spreadsheet program installed, you can double-click the downloaded CSV file to open it.

4.6 Exporting KMZ Files ↩

QA step results that include latitude and longitude information can be mapped with geographic information systems (GIS), mapping tools, and Google Earth. Many summaries that have “with Descriptions” in their names include latitude and longitude values. For plant-level summaries, the latitude and longitude in the output are the average of all the values for the specific combination of FIPS and plant ID. For county- and state-level summaries, the latitude and longitude are the centroid values specified in the “fips” table of the EMF reference schema.

To export a KMZ file that can be loaded into Google Earth, you will first need to view the results of the QA step. You can view a QA step’s results by either selecting the QA step in the QA tab of the Dataset Properties Editor (see Figure 4-1) and then clicking the View Results button, or you can click View Results from the Edit QA Step window. Figure 4-15 shows the View QA Step Results window for a summary by county and pollutant with descriptions. The summary includes latitude and longitude values for each county.

Figure 4-15: View QA Step Results with Latitude and Longitude Values
Figure 4-15: View QA Step Results with Latitude and Longitude Values

From the File menu in the top left corner of the View QA Step Results window, select Google Earth. Make sure to look at the File menu for the View QA Step Results window, not the main EMF application. The Create Google Earth file window will be displayed as shown in Figure 4-16.

Figure 4-16: Create Google Earth File
Figure 4-16: Create Google Earth File

In the Create Google Earth file window, the Label Column pull-down menu allows you to select which column will be used to label the points in the KMZ file. This label will appear when you mouse over a point in Google Earth. For a plant summary, this would typically be “plant_name”; county or state summaries would use “county” or “state_name” respectively.

If your summary has data for multiple pollutants, you will often want to specify a filter so that data for only one pollutant is included in the KMZ file. To do this, specify a Filter Column (e.g. “poll”) and then type in a Filter Value (e.g. “EVP__VOC”).

The Data Column pull-down menu specifies the column to use for the value displayed when you mouse over a point in Google Earth such as annual emissions (“ann_emis”). The mouse over information will have the form: <value from Label Column> : <value from Data Column>.

The Maximum Data Cutoff and Minimum Data Cutoff fields allow you to exclude data points above or below certain thresholds.

If you want to control the size of the points, you can adjust the value of the Icon Scale setting between 0 and 1. The default setting is 0.3; values smaller than 0.3 result in smaller circles and values larger than 0.3 will result in larger circles.

Tooltips are available for all of the settings in the Create Google Earth file window by mousing over each field.

Once you have specified your settings, click the Generate button to create the KMZ file. The location of the generated file is shown in the Output File field. If your computer has Google Earth installed, you can click the Open button to open the file in Google Earth.

If you find that you need to repeatedly create similar KMZ files, you can save your settings to a file by clicking the Save button. The next time you need to generate a Google Earth file, click the Load button next to the Properties File field to load your saved settings.

4.7 Average Day to Annual Inventory QA Program ↩

In addition to analyzing individual datasets, the EMF can run QA steps that use multiple datasets. In this section, we’ll show how to create a custom QA step that calculates an annual inventory from 12 month-specific average-day emissions inventories.

To get started, we’ll need to select a dataset to associate the QA step with. As a best practice, add the QA step to the January-specific dataset in the set of 12 month-specific files. This isn’t required by the EMF but it can make finding multi-file QA steps easier later on. If you have more than 12 month-specific files to use (e.g. 12 non-California inventories and 12 California inventories), add the QA step to the “main” January inventory file (e.g. the non-California dataset).

After determining which dataset to add the QA step to, create a new custom QA step as described in Section 4.3. Figure 4-17 shows the Add Custom QA Step dialog. We’ve entered a name for the step and used the Program pull-down menu to select “Average day to Annual Inventory”.

Figure 4-17: Add Custom QA Step Using Average Day to Annual Inventory QA Program
Figure 4-17: Add Custom QA Step Using Average Day to Annual Inventory QA Program

“Average day to Annual Inventory” is a QA program built into the EMF that takes a set of average-day emissions inventories as input and outputs an annual inventory by calculating monthly total emissions and summing all months. Click the OK button in the Add Custom QA Step dialog to save the new QA step. We’ll enter the QA program arguments in a minute. Back in the QA tab of the Dataset Properties Editor, select the newly created QA step and click Edit to open the Edit QA Step window shown in Figure 4-18.

Figure 4-18: Edit Custom QA Step
Figure 4-18: Edit Custom QA Step

We need to define the arguments that will be sent to the QA program that this QA step will run. The QA program is “Average day to Annual Inventory” so the arguments will be a list of month-specific inventories. Click the Set button to the right of the Arguments box to open the Set Inventories dialog as shown in Figure 4-19.

Figure 4-19: Set Inventories for Average Day to Annual Inventory QA Program
Figure 4-19: Set Inventories for Average Day to Annual Inventory QA Program

The Set Inventories dialog is specific to the “Average day to Annual Inventory” QA program. Other QA programs have different dialogs for setting up their arguments. The January inventory that we added the QA step to is already listed. We need to add the other 11 month-specific inventory files. Click the Add button to open the Select Datasets dialog shown in Figure 4-20.

Figure 4-20: Select Datasets for QA Program
Figure 4-20: Select Datasets for QA Program

In the Select Datasets dialog, the dataset type is automatically set to ORL Nonroad Inventory (ARINV) matching our January inventory. The other ORL nonroad inventory datasets are shown in a list. We can use the Dataset name contains: field to enter a search term to narrow the list. We’re using 2005 inventories so we’ll enter 2005 as our search term to match only those datasets whose name contains “2005”. Then we’ll select all the inventories in the list as shown in Figure 4-21.

Select inventories by clicking on the dataset name. You can select a range of datasets by clicking on the first dataset you want to select in the list. Then hold down the Shift key while clicking on the last dataset you want to select. All of the datasets in between will also be selected. If you hold down the Ctrl key while clicking on datasets, you can select multiple items from the list that aren’t next to each other.

Figure 4-21: Select Filtered Datasets for QA Program
Figure 4-21: Select Filtered Datasets for QA Program

Click the OK button in the Select Datasets dialog to save the selected inventories and return to the Set Inventories dialog. As shown in Figure 4-22, the list of emission inventories now contains all 12 month-specific datasets.

Figure 4-22: Inventories for Average Day to Annual Inventory QA Program
Figure 4-22: Inventories for Average Day to Annual Inventory QA Program

Click the OK button in the Set Inventories dialog to return to the Edit QA Step window shown in Figure 4-23. The Arguments textbox now lists the 12 month-specific inventories and the flag (-inventories) needed for the “Average day to Annual Inventory” QA program.

Figure 4-23: Custom QA Step with Arguments Set
Figure 4-23: Custom QA Step with Arguments Set

Click the Save button at the bottom of the Edit QA Step window to save the QA step. This QA step can now be run as described in Section 4.4.

4.8 Compare Datasets QA Program ↩

The Compare Datasets QA program allows you to aggregate and compare datasets using a variety of grouping options. You can compare datasets with the same dataset type or different types. In this section, we’ll set up a QA step to compare the average day emissions from two ORL nonroad inventories by SCC and pollutant.

First, we’ll select a dataset to associate the QA step with. In this example, we’ll be comparing January and February emissions using the January dataset as the base inventory. The EMF doesn’t dictate which dataset should have the QA step associated with it so we’ll choose the base dataset as a convention. From the Dataset Manager, select the January inventory (shown in Figure 4-24) and click the Edit Properties button.

Figure 4-24: Select Dataset to Add QA Step
Figure 4-24: Select Dataset to Add QA Step

Open the QA tab (shown in Figure 4-25) and click Add Custom to add a new QA step.

Figure 4-25: Dataset Editor QA Tab for Selected Dataset
Figure 4-25: Dataset Editor QA Tab for Selected Dataset

In the Add Custom QA Step dialog shown in Figure 4-26, enter a name for the new QA step like “Compare to February”. Use the Program pull-down menu to select the QA program “Compare Datasets”.

Figure 4-26: Select QA Program for New QA Step
Figure 4-26: Select QA Program for New QA Step

You can enter a description of the QA step as shown in Figure 4-27. Then click OK to save the QA step. We’ll be setting up the arguments to the Compare Datasets QA program in just a minute.

Figure 4-27: Add Description to New QA Step
Figure 4-27: Add Description to New QA Step

Back in the QA tab of the Dataset Properties Editor, select the newly created QA step and click the Edit button (see Figure 4-28).

Figure 4-28: Select New QA Step from QA Tab
Figure 4-28: Select New QA Step from QA Tab

In the Edit QA Step window (shown in Figure 4-29), click the Set button to the right of the Arguments textbox.

Figure 4-29: Edit New QA Step
Figure 4-29: Edit New QA Step

A custom dialog is displayed (Figure 4-30) to help you set up the arguments needed by the Compare Datasets QA program.

Figure 4-30: Set Up Compare Datasets QA Step
Figure 4-30: Set Up Compare Datasets QA Step

To get started, we’ll set the base datasets. Click the Add button underneath the Base Datasets area to bring up the Select Datasets dialog shown in Figure 4-31.

Figure 4-31: Select Base Datasets
Figure 4-31: Select Base Datasets

Select one or more datasets to use as the base datasets in the comparison. For this example, we’ll select the January inventory by clicking on the dataset name. Then click OK to close the dialog and return to the setup dialog. The setup dialog now shows the selected base dataset as in Figure 4-32.

Figure 4-32: Base Dataset Set for Compare Datasets
Figure 4-32: Base Dataset Set for Compare Datasets

Next, we’ll add the dataset we want to compare against by clicking the Add button underneath the Compare Datasets area. The Select Datasets dialog is displayed like in Figure 4-33. We’ll select the February inventory and click the OK button.

Figure 4-33: Select Compare Datasets
Figure 4-33: Select Compare Datasets

Returning to the setup dialog, the comparison dataset is now set as shown in Figure 4-34.

Figure 4-34: Compare Dataset Set for Compare Datasets
Figure 4-34: Compare Dataset Set for Compare Datasets

The list of base and comparison datasets includes which version of the data will be used in the QA step. For example, the base dataset 2007JanORLTotMARAMAv3.txt [0 (Initial Version)] indicates that version 0 (named “Initial Version”) will be used. When you select the base and comparison datasets, the EMF automatically uses each dataset’s Default Version. If any of the datasets have a different version that you would like to use for the QA step, select the dataset name and then click the Set Version button underneath the selected dataset. The Set Version dialog shown in Figure 4-35 lets you pick which version of the dataset you would like to use.

Figure 4-35: Set Dataset Version for Compare Datasets QA Program
Figure 4-35: Set Dataset Version for Compare Datasets QA Program

Next, we need to tell the Compare Datasets QA program how to compare the two datasets. We’re going to sum the average-day emissions in each dataset by SCC and pollutant and then compare the results from January to February. In the ORL Nonroad Inventory dataset type, the SCCs are stored in a field called scc, the pollutant codes are stored in a column named poll, and the average-day emissions are stored in a field called avd_emis. In the Group By Expressions textbox, type scc, press Enter, and then type poll. In the Aggregate Expressions textbox, type avd_emis. Figure 4-36 shows the setup dialog with the arguments entered.

Figure 4-36: Arguments Set for Compare Datasets
Figure 4-36: Arguments Set for Compare Datasets

In this example, we’re comparing two datasets of the same type (ORL Nonroad Inventory). This means that the data field names will be consistent between the base and comparison datasets. When you compare datasets with different types, the field names might not match. The Matching Expressions textbox allows you to define how the fields from the base dataset should be matched to the comparison dataset. For this case, we don’t need to enter anything in the Matching Expressions textbox or any of the remaining fields in the setup dialog. The Compare Datasets arguments are described in more detail in Section 4.8.1.

In the setup dialog, click OK to save the arguments and return to the Edit QA Step window. The Arguments textbox now lists the arguments that we set up in the previous step (see Figure 4-37).

Figure 4-37: QA Step with Arguments Set
Figure 4-37: QA Step with Arguments Set

The QA step is now ready to run. Click the Run button to start running the QA step. A message is displayed at the top of the window as shown in Figure 4-38.

Figure 4-38: Started Running QA Step
Figure 4-38: Started Running QA Step

In the Status window, you’ll see a message about starting to run the QA step followed by a completion message once the QA step has finished running. Figure 4-39 shows the two status messages.

Figure 4-39: QA Step Running in Status Window
Figure 4-39: QA Step Running in Status Window

Once the status message

Completed running QA step ‘Compare to February’ for Version ‘Initial Version’ of Dataset ‘2007JanORLTotMARAMAv3.txt’

is displayed, the QA step has finished running. In the Edit QA Step window, click the Refresh button to display the latest information about the QA step. The fields Run Status and Run Date will be populated with the latest run information as shown in Figure 4-40.

Figure 4-40: QA Step with Run Status
Figure 4-40: QA Step with Run Status

Now, we can view the QA step results or export the results. First, we’ll view the results inside the EMF client. Click the View Results button to open the View QA Step Results window as shown in Figure 4-41.

Figure 4-41: View Compare Datasets QA Step Results
Figure 4-41: View Compare Datasets QA Step Results

Table 4-1 describes each column in the QA step results.

Table 4-1: QA Step Results Columns
Column Name Description
poll Pollutant code
scc SCC code
avd_emis_b Summed average-day emissions from base dataset (January) for this pollutant and SCC
avd_emis_c Summed average-day emissions from comparison dataset (February) for this pollutant and SCC
avd_emis_diff avd_emis_c - avd_emis_b
avd_emis_absdiff Absolute value of avd_emis_diff
avd_emis_pctdiff 100 * (avd_emis_diff / avd_emis_b)
avd_emis_abspctdiff Absolute value of avd_emis_pctdiff
count_b Number of records from base dataset included in this row’s results
count_c Number of records from comparison dataset included in this row’s results

To export the QA step results, return to the Edit QA Step window as shown in Figure 4-42. Select the checkbox labeled Download result file to local machine?. In this example, we have entered an optional Export Name for the output file. If you don’t enter an Export Name, the output file will use an auto-generated name. Click the Export button.

Figure 4-42: Ready to Export QA Step Results
Figure 4-42: Ready to Export QA Step Results

The Export QA Step Results dialog will be displayed as shown in Figure 4-43. For more information about the Row Filter option, see Section 4.5. To export all the result records, click the Finish button.

Figure 4-43: Export QA Step Results Options
Figure 4-43: Export QA Step Results Options

Back in the Edit QA Step window, a message is displayed at the top of the window indicating that the export has started. See Figure 4-44.

Figure 4-44: Export Started for QA Step Results
Figure 4-44: Export Started for QA Step Results

Check the Status window to see the status of the export as shown in Figure 4-45.

Figure 4-45: Export Messages in Status Window
Figure 4-45: Export Messages in Status Window

Once the export is complete, the file will start downloading to your computer. Open the Downloads window to check the download status. Once the progress bar reaches 100%, the download is complete. Right click on the results file and select Open Containing Folder as shown in Figure 4-46.

Figure 4-46: QA Step Results in Downloads Window
Figure 4-46: QA Step Results in Downloads Window

Figure 4-47 shows the downloaded file in Windows Explorer. By default, files are downloaded to a temporary directory on your computer. Some disk cleanup programs can automatically delete files in temporary directories; you should move any downloads you want to keep to a more permanent location on your computer.

Figure 4-47: Downloaded QA Step Results in Windows Explorer
Figure 4-47: Downloaded QA Step Results in Windows Explorer

The downloaded file is a CSV (comma-separated values) file which can be opened in Microsoft Excel or other spreadsheet programs. Double-click the filename to open the file. Figure 4-48 shows the QA step results in Microsoft Excel.

Figure 4-48: Downloaded QA Step Results in Microsoft Excel
Figure 4-48: Downloaded QA Step Results in Microsoft Excel

4.8.1 Details of Compare Datasets Arguments ↩

4.8.1.1 Group By Expressions

The Group By Expressions are a list of columns/expressions that are used to group the dataset records for aggregation. The expressions must contain valid columns from either the base or comparison datasets. If a column exists only in the base or compare dataset, then a Matching Expression must be specified in order for a proper mapping to happen during the comparison analysis. A group by expression can be aliased by adding the AS <alias> clause to the expression; this alias is used as the column name in the QA step results. A group by expression can also contain SQL functions such as substring or string concatenation using ||.

Sample Group By Expressions

scc AS scc_code
substring(fips, 1, 2) as fipsst

or

fipsst||fipscounty as fips
substring(scc, 1, 5) as scc_lv5

4.8.1.2 Aggregate Expressions

The Aggregate Expressions are a list of columns/expressions that will be aggregated (summed) using the specified group by expressions. The expressions must contain valid columns from either the base or comparison datasets. If a column exists only in the base or compare dataset, then a Matching Expression must be specified in order for a proper mapping to happen during the comparison analysis.

Sample Aggregate Expressions

ann_emis
avd_emis

4.8.1.3 Matching Expressions

The Matching Expressions are a list of expressions used to match base dataset columns/expressions to comparison dataset columns/expressions. A matching expression consists of three parts: the base dataset expression, the equals sign, and the comparison dataset expression (i.e. base_expression=comparison_expression).

Sample Matching Expressions

substring(fips, 1, 2)=substring(region_cd, 1, 2)
scc=scc_code
ann_emis=emis_ann
avd_emis=emis_avd
fips=fipsst||fipscounty

4.8.1.4 Join Type

The Join Type specifies which type of SQL join should be used when performing the comparison.

Join Type Description
INNER JOIN Only include rows that exist in both the base and compare datasets based on the group by expressions
LEFT OUTER JOIN Include all rows from the base dataset, only include rows from the compare dataset that meet the group by expressions
RIGHT OUTER JOIN Include all rows from the compare dataset, only include rows from the base dataset that meet the group by expressions
FULL OUTER JOIN Include all rows from both the base and compare datasets

The default join type is FULL OUTER JOIN.

4.8.1.5 Where Filter

The Where Filter is a SQL WHERE clause that is used to filter both the base and comparison datasets. The expressions in the WHERE clause must contain valid columns from either the base or comparison datasets. If a column exists only in the base or compare dataset, then a Matching Expression must be specified in order for a proper mapping to happen during the comparison analysis.

Sample Row Filter

substring(fips, 1, 2) = '37' and SCC_code in ('10100202', '10100203')

or

fips like '37%' and SCC_code like '101002%'

4.8.1.6 Base Field Suffix

The Base Field Suffix is appended to the base aggregate expression name that is returned in the output. For example, an Aggregate Expression ann_emis with a Base Field Suffix 2005 will be returned as ann_emis_2005 in the QA step results.

4.8.1.7 Compare Field Suffix

The Compare Field Suffix is appended to the comparison aggregate expression name that is returned in the output. For example, an Aggregate Expression ann_emis with a Compare Field Suffix 2008 will be returned as ann_emis_2008 in the QA step results.

4.8.2 More Examples ↩

Figure 4-49 shows the setup dialog for the following example of the Compare Datasets QA program. We are setting up a plant level comparison of a set of two inventories (EGU and non-EGU) versus another set of two inventories (EGU and non-EGU). All four inventories are the same dataset type. The annual emissions will be grouped by FIPS code, plant ID, and pollutant. There is no mapping required because the dataset types are identical; the columns fips, plantid, poll, and ann_emis exist in both sets of datasets. This comparison is limited to the state of North Carolina via the Where Filter:

substring(fips, 1, 2)='37'

The QA step results will have columns named ann_emis_base, ann_emis_compare, count_base, and count_compare using the Base Field Suffix and Compare Field Suffix.

Figure 4-49: Compare Datasets Example 1
Figure 4-49: Compare Datasets Example 1

Figure 4-50 shows the setup dialog for a second example of the Compare Datasets QA program. This example takes a set of ORL nonpoint datasets and compares it to a single FF10 nonpoint inventory. We are grouping by state (first two digits of the FIPS code) and pollutant. A mapping expression is needed between the ORL column fips and the FF10 column region_cd:

substring(fips, 1, 2)=substring(region_cd, 1, 2)

Another mapping expression is needed between the columns ann_emis and ann_value:

ann_emis=ann_value

No mapping is needed for pollutant because both dataset types use the same column name poll. This comparison is limited to three states and to sources that have annual emissions greater than 1000 tons. These constraints are specified via the Where Filter:

substring(fips, 1, 2) in ('37','45','51') and ann_emis > 1000

In the QA step results, the base dataset column will be named ann_emis_2002 and the compare dataset column will be named ann_emis_2008.

Figure 4-50: Compare Datasets Example 2
Figure 4-50: Compare Datasets Example 2

4.9 Creating a Custom SQL QA Step ↩

Suppose you have an ORL nonroad inventory that contains average-day emissions instead of annual emissions. The QA step templates that can generate inventory summaries report summed annual emissions. If you want to get a report of the average-day emissions, you can create a custom SQL QA step.

First, let’s look at the structure of a SQL QA step created from a QA step template. Figure 4-51 shows a QA step that generates a summary of the annual emissions by county and pollutant.

Figure 4-51: QA Step Reference
Figure 4-51: QA Step Reference

This QA step uses a custom SQL query shown in the Arguments textbox:

select FIPS, POLL, sum(ann_emis) as ann_emis from $TABLE[1] e group by FIPS, POLL order by FIPS, POLL

For the ORL nonroad inventory dataset type, the annual emission values are stored in a database column named ann_emis while the average-day emissions are in a column named avd_emis. For any dataset you can see the names of the underlying data columns by viewing the raw data as described in Section 3.6.

To create an average-day emissions report, we’ll need to switch ann_emis in the above SQL query to avd_emis. In addition, the annual emissions report sums the emissions across the counties and pollutants. For average-day emissions, it might make more sense to compute the average emissions by county and pollutant. In the SQL query we can change sum(ann_emis) to avg(avd_emis) to call the SQL function which computes averages.

Our final revised SQL query is

select FIPS, POLL, avg(avd_emis) as avd_emis from $TABLE[1] e group by FIPS, POLL order by FIPS, POLL

Once we know what SQL query to run, we’ll create a custom QA step. Section 4.3 describes how to add a custom QA step to a dataset. Figure 4-52 shows the new custom QA step with a name assigned and the Program pull-down menu set to SQL so that the custom QA step will run a SQL query. Our custom SQL query is pasted into the Arguments textbox.

Figure 4-52: Custom SQL QA Step Setup
Figure 4-52: Custom SQL QA Step Setup

Click the OK button to save the QA step. The newly added QA step is now shown in the list of QA steps for the dataset (Figure 4-53).

Figure 4-53: Custom SQL QA Step Ready
Figure 4-53: Custom SQL QA Step Ready

At this point, you can run the QA step as described in Section 4.4 and view and export the QA step results (Section 4.5) just like any other QA step.

What if our custom SQL had a typo? Suppose we accidently entered the average-day emissions column name as avg_emis instead of avd_emis. When the QA step is run, it will fail to complete successfully. The Status window will display a message like

Failed to run QA step Avg. Day by County and Pollutant for Version ‘Initial Version’ of Dataset <dataset name>. Check the query -ERROR: column “avg_emis” does not exist

Other types of SQL errors will be displayed in the Status window as well. If the SQL query uses an invalid function name like average(avd_emis) instead of avg(avd_emis), the Status window message is

Failed to run QA step Avg. Day by County and Pollutant for Version ‘Initial Version’ of Dataset <dataset name>. Check the query -ERROR: function average(double precision) does not exist

4.10 Special SQL Syntax for QA Steps ↩

Each of the QA steps that create summaries use a customized SQL syntax that is very similar to standard SQL, except that it includes some EMF-specific concepts that allow the queries to be defined generally and then applied to specific datasets as needed. For example, the EMF syntax for the “Summarize by SCC and Pollutant” query is:

select SCC, POLL, sum(ann_emis) as ann_emis from $TABLE[1] e group by SCC, POLL order by SCC, POLL

The only difference between this and standard SQL is the use of the $TABLE[1] syntax. When this query is run, the $TABLE[1] portion of the query is replaced with the table name that contains the dataset’s data in the EMF database. Most datasets have their own tables in the EMF schema, so you do not normally need to worry about selecting only the records for the specific dataset of interest. The customized syntax also has extensions to refer to another dataset and to refer to specific versions of other datasets using tokens other than $TABLE. For the purposes of this discussion, it is sufficient to note that these other extensions exist.

Some of the summaries are constructed using more complex queries that join information from other tables, such as the SCC and pollutant descriptions, and to account for any missing descriptions. For example, the syntax for the “Summarize by SCC and Pollutant with Descriptions” query is:

select e.SCC, 
       coalesce(s.scc_description,'AN UNSPECIFIED DESCRIPTION')::character varying(248) as scc_description, 
       e.POLL, 
       coalesce(p.descrptn,'AN UNSPECIFIED DESCRIPTION')::character varying(11) as pollutant_code_desc, 
       coalesce(p.name,'AN UNSPECIFIED SMOKE NAME')::character varying(11) as smoke_name,
       p.factor, 
       p.voctog, 
       p.species, 
       coalesce(sum(ann_emis), 0) as ann_emis, 
       coalesce(sum(avd_emis), 0) as avd_emis 
from $TABLE[1] e 
left outer join reference.invtable p on e.POLL=p.cas 
left outer join reference.scc s on e.SCC=s.scc 
group by e.SCC,e.POLL,p.descrptn,s.scc_description,p.name,p.factor,p.voctog,p.species 
order by e.SCC, p.name

This query is quite a bit more complex, but is still supported by the EMF QA step processing system.

Chapter 5. Troubleshooting ↩

5.1 Client won’t start ↩

Problem:

On startup, an error message is displayed like Figure 5-1:

"The EMF client was not able to contact the server due to this error:

(504)Server doesn’t respond at all."

or

(504)Server denies connection.

Figure 5-1: Error Starting the EMF Client
Figure 5-1: Error Starting the EMF Client

Solution:

The EMF client application was not able to connect to the EMF server. This could be due to a problem on your computer, the EMF server, or somewhere in between.

If you are connecting to a remote EMF server, first check your computer’s network connection by loading a page like google.com in your web browser. You must have a working network connection to use the EMF client.

Next, check the server location in the EMF client start up script C:\EMF_State\EMFClient.bat. Look for the line

set TOMCAT_SERVER=http://<server location>:8080

You can directly connect to the EMF server by loading

http://<server location>:8080/emf/services

in your web browser. You should see a response similar to Figure 5-2.

Figure 5-2: EMF Server Response
Figure 5-2: EMF Server Response

If you can’t connect to the EMF server or don’t get a response, then the EMF server may not be running. Contact the EMF server administrator for further help.

5.2 Can’t load Dataset Manager ↩

Problem:

When I click the Datasets item from the main Manage menu, nothing happens and I can’t click on anything else.

Solution:

Clicking Datasets from the main Manage menu displays the Dataset Manager. In order to display this window, the EMF client needs to request a complete list of dataset types from the EMF server. If you are connecting to an EMF server over the Internet, fetching lists of data can take a while and the EMF client needs to wait for the data to be received. Try waiting to see if the Dataset Manager window appears.

5.3 Can’t load all datasets ↩

Problem:

In the Dataset Manager, I selected Show Datasets of Type “All” and nothing happens and I can’t click on anything else.

Solution:

When displaying datasets of the selected type, the EMF client needs to fetch the details of the datasets from the EMF server. If you are connecting to an EMF server over the Internet or if there are many datasets imported into the EMF, loading this data can take a long time. Try waiting to see if the list of datasets is displayed. Rather than displaying all datasets, you may want to pick a single dataset type or use the Advanced search to limit the list of datasets that need to be loaded from the EMF server.

Chapter 6. Server Administration ↩

6.1 Components ↩

The EMF server consists of a database, file storage, and the server application which handles requests from the clients and communicates with the database.

The database server is PostgreSQL version 9.2 or later. For ShapeFile export, you will need the PostGIS module installed.

The server application is a Java executable that runs in the Apache Tomcat servlet container. You will need Apache Tomcat 6.0 or later.

The server components can run on Windows, Linux, or Mac OS X.

6.2 Network Access ↩

The EMF client application communicates with the server on port 8080. For the client application, the EMFClient.bat launch script specifies the server location and port via the setting

set TOMCAT_SERVER=http://<server address>:8080

In order to import data into the EMF, the files must be locally accessible by the server. Depending on your setup, you may want to mount a network drive on the server or allow SFTP connections for users to upload files.

6.3 EMF Administrator ↩

Inside the EMF client, users with administrative privileges have access to additional management options.

6.3.1 User Management ↩

EMF administrators can reset users passwords. Administrators can also create new users.

6.3.2 Dataset Type Management ↩

Administrators can create and edit dataset types. Administrators can also add QA step templates to dataset types.