Information Management: Data Submission

IM Guide » Data Submission

Note: Returning users can click here to download the data submission form.

Introduction
Coweeta investigators use a wide variety of commercial and custom software applications to process and store data they collect for their GCE-sponsored research programs. Information required to document the data, such as descriptions of study characteristics, methodology, and data attributes, can also be stored and managed in a variety of ways. The primary goal of the data submission process is to standardize data and documentation to the point that CWT data sets can be managed and distributed in a consistent manner, as well as meshed with other data sets to support data syntheses and cross-site comparisons. The Coweeta LTER emphasizes standardization of documentation and provides some structure for standardizing data formats, while still providing researchers with the flexibility to use as wide a variety of data processing and analysis programs as possible.

This section of the IM guide provides the rules and guidelines necessary to ensure that data are available to the widest possible user base, while also ensuring the suitability of most data sets for eventual integration into the LTER Network Office's Network Information System (NIS). Archiving of submitted data is composed of three components, data documentation,individual data sets, and, for a limited number of data sets, specialty databases built to manage data that cover broader temporal and spatial scales. 

  • Data documentation (i.e. metadata) is primarily stored and organized in a relational database management system to support data set querying and cataloging, and to generate documentation in different formats to suit various formats.
  • Data sets from individual research studies are primarily stored as descriptively-named files. While files may be stored in a variety of formats, both structured binary and delimited ASCII text formats are always provided, with database-generated metadata stored along with the data table. These files are organized using a data file management system approach, with files named as follows: 

    [study area code]_[theme code]_[year and month submitted]_[order received by CWT IM].[extension] 

    Example:

    File Name: CWT-TER-1101_001.txt 

    Description: research took place in the Coweeta Basin, Terrestrial Gradient monitoring study, submitted Jan. 2011, text format. 

    Serial letters or letters and numbers are appended to the base file name to accommodate multiple submissions in the same month (e.g. CWT-TER-1101a.ASC). The file names and locations are stored in the metadata database to support catalog generation and maintain linkages between metadata and data files. A complete list of region and thematic codes are listed at the bottom of the Site Standards and Definitions section of the IM Guide.

  • Large data sets from long-term monitoring efforts are primarily stored in a relational database management system, and secondarily as individual documented data sets generated from the database containing subsets of data based on spatial or temporal coverage (e.g. monthly or yearly data sets or summary data sets) and database-generated metadata.

Portions of the overall CWT Information Management System continue to be developed, so specific standards for the content, formatting, documentation, and submission of GCE data are subject to change. Protocols and standards will be reevaluated and adjusted annually to make the entire process of data submission as convenient and efficient as possible for everyone. 

Contributor feedback is absolutely essential to meeting this goal -- if any aspect of the data submission protocols seems inefficient, error-prone, or confusing, please discuss it with the Information Manager.

What to Submit
The primary purpose of the CWT database and data archive is to provide a long-term record of ecological observations to support data discovery and analysis over long temporal scales. Consequently, data sets should predominantly consist of raw data from direct measurements or counts. Derived and calculated parameters should only be included when they are essential to the interpretation of the data, such as when the raw data require proprietary calibration steps or the main properties of interest can only be measured indirectly (e.g. by change of an indicator solution or measurement of a reciprocal property like post-combustion mass). Publication in the conventional scientific literature is usually a more appropriate outlet for detailed calculations and statistical analyses of the raw data. 

When derived or calculated parameters are included in data sets, it is imperative that all information relevant to their calculation be included in the data documentation, including equations, descriptions of processing steps, and references. It is also strongly recommended that initial measurements used in the calculations be included as well, to allow future analysts to reevaluate the data using different criteria or derive secondary information from the data values.

When the basic principles stated above are applied to actual research studies, however, the distinction between 'measured' and 'derived' values is often unclear and open to debate. This is particularly true in the case of electronic instrument-based measurements, which are increasingly prevalent in modern science. The ultimate decision about what constitutes 'data' or 'calculations' and what information to submit rests with the investigator, but contributors are encouraged to consult with the Information Manager prior to submitting data for the first time to discuss appropriate strategies for classifying and documenting each parameter. 

When to Submit
The Coweeta LTER Data Policy states that researchers shall submit data no later than six months after the end of each field season. While the policy also states that data may not always be made public until some point well beyond the initial submission date, the policy does require the data to be documented and archived after each season. Data should always be submitted at the investigator's earliest convenience, to minimize the possibility of information loss as memories fade, data sheets become misplaced, and workers move on to other projects. Time frames for data release (see Data Access) will be honored regardless of when the data are actually submitted. 

How to Submit
Investigators are strongly encouraged to schedule a meeting with the Information Manager prior to preparing their data and documentation for initial submission. Many potential content and formatting issues can easily be avoided, saving everyone time and trouble. Specific guidelines for preparing data for submission are presented in the Data Format Guidelines and Data Documentation sections of the IM Guide.

The first step will be the completion of data documentation via the data submission form. The CWT Data Submission Form is a self-contained excel spreadsheet for submitting and documenting each data table in a single file. Please submit one form for each data table file. It is permissible to associate multiple tables with a single data set designation. The CWT Data Submission Form comes with form completion instructions. However, if you need assistance, or if you are providing a relational database or geodatabase, please contact the Information Manager for assistance.

Once formatted and documented, data files and documentation can be sent to the Information Manager via email (cwtim@uga .edu), with data and metadata packaged together in one or more zip files.

After your submission is received, the Information Manager will contact you to confirm any incomplete or missing details, discuss any requests for post-processing analyses, and establish a time frame for returning processed files for review. An on-line submission system is slated for development the third quarter of 2011.