4.8 Data management and transformation

Calculating environmental indicators for the watershed required temporal and geospatial data from numerous sources. While some indicators utilized a single data source, others required a combination of multiple sources to provide a complete record. This section describes some of the general data management strategies used in this project. For specific details of the data sources and management strategies used for an indicator, please visit the appropriate section in this report.

The teams from SRWP, UC Davis, FRCRM, ESSA Technologies, and other WHIP stakeholders who had knowledge and expertise of the Feather River Watershed, all participated in identifying and acquiring data for use in the calculation of environmental indicators. For the purposes of management within the team, the data types were divided between point-source monitoring data and GIS based data.

Point-source monitoring data focused on temporal variation across the basin originating from numerous collection sites across the basin. The types of indicators that were point data referenced included fish and birds, benthic macroinvertebrates, temperature, stream flow, periphyton, nutrients, mercury, and school lunch programs. Each had an assigned metric to a specific point in the watershed. The condition data were often averaged across the subwatershed reporting units to calculate an overall subwatershed score. The collection sites were mapped with a GIS to identify the subwatershed of which they were a member, to provide a map visualization for the corresponding indicator reports.

Spatial data analysis was performed across the basin using various GIS based data sources. The indicators which utilized GIS analysis included fire frequency, carbon budget, river flooding, stream road barriers, and fragmentation index. These data were analyzed using the same boundary base layer that identified each of the subbasins.

The acquisition of temporal and spatial data came from the following organizations that had assembled data for the Feather River Watershed.

National Organizations:

National Aeronautics and Space Administration (NASA)

USDA Forest Service

United States Geological Service (USGS): National Water Information System (NWIS)

State and County Agencies:

California Department of Education (CDE)

California Department Fish & Game (DFG)

California Department of Water Resources (DWR)

Nevada Irrigation District (NID)

State Water Resources Control Board Surface Water Ambient Monitoring Program (SWAMP)

Sutter County Resource Conservation District (SCRCD)

NGO and Academic Centers:

Avian Knowledge Network (AKN)

Bay-Delta and Tributaries Project (BDAT)

Friends of Deer Creek

Information Center for the Environment (UC Davis)

South Yuba River Citizens League (SYRCL)

University of California, Davis

Wolf Creek Community Alliance

Yuba Accord River Management Team

Each environmental indicator included one or more data files as well as other corresponding relevant information. These files were shared amongst the indicator team, with careful consideration to version control as these data were analyzed and derivative products were created.

These data were stored in various formats, including text based delimited formats (.csv, .tsv), spreadsheet packages (such as Microsoft Excel and OpenOffice Spreadsheet), personal databases (Microsoft Access), GIS Raster Formats (geoTIFF), GIS Vector formats (such as Shapefiles or Google Earth KML files), and personal geodatabases (Microsoft Access). Temporal metadata were collected in various formats, but most often available as part of a document or report that one could download with the data. When available, source GIS based metadata were stored in a standard FGDC XML format, and utilized by the various GIS packages.

An initial search was performed to identify available data for an indicator and to collect general data attributes, such as the data provider, temporal range, spatial extent, and data representation, including units of measure and data quality attributes. These general attributes were assembled in a shared spreadsheet, which identified all relevant sources of data for the various indicators in the study. These data and metadata were downloaded, organized, and assembled for each indicator. It was often necessary for the data to be manually manipulated to transfer it into a common format. Additional resources which documented the data were also collected, such as Standard Operating Procedures (SOP), lab/organization identification protocols, Quality Assurance Policy and Procedures (QAPP’s), and other documents and reports that reported proper use of these data.

Data transformations were often required, because an indicator would utilize data from multiple sources, and these data were frequently stored in different units of measure and temporal frequencies. The common data elements were extracted and stored to produce a new dataset that combined all sources. Specific description of the data manipulations can be found with each indicator report (Section 3).

The quality of the data was an important consideration when decided if they should be included in the study. Various forms of quality assurance (QA) were performed on these data, especially as additional collection sites or new data sources were added to an indicator. For many data sources, the providers had already performed a rigorous QA on the data, and these data could be used in the state at which they were downloaded. In rare cases, the data was found to be corrupted or have extreme outliers (spatially, temporally, and in terms of a valid data value), and in these circumstance, the data were omitted from the study.

Another QA procedure was performed to eliminate data that was within the study area, but was located in canals and diversions since these data points do not represent the natural hydrology of the Feather River. Certain data collected at highly irregular intervals or with an incompatible protocol were also removed from the study.