IM Guide » Data Submission » Data Format Guidelines » Format Guidelines for Tabular Data
Data Table Formatting
Data tables can be submitted as ASCII text files or spreadsheet files in Microsoft Excel or Lotus 1-2-3 format. The files should be given a descriptive name, such as the investigator's name, study description, and date submitted, with the appropriate extension for the file format (e.g. Sheldon_weather_020101a.txt, Sheldon_hydro_020101.xls). Please do not use spaces in file names.
General Guidelines
-
The data table must be laid out as a rectangular "matrix" with an equal number of data values in each row.
-
Any missing observations should be represented by a missing value code, such as NaN for numerical columns and NaN or a user-defined code for text columns.
-
Each column in the data set should only contain one distinct type of information and only one value, so that columns represent variables and rows represent individual observations for each variable.
-
Each column should be separated by a single tab or a single comma. The same delimiter should be used throughout the data table, and combinations of spaces and tabs or commas are not allowed (e.g. use 32.3,32.5 rather than 32.3, 32.5).
-
When column format headers are included (see bel
-
In column headers, individual text values should not contain any internal spaces or commas to prevent data parsing errors, although other characters, such as underscores, are allowed (e.g. a column header may not be named "bankfull width", but it may be named "bankfull_width"). In data columns, text fields that must require spaces in the data (e.g. descriptive observations) should include enclosing quotes around each cell value for that column.
-
The data file should not contain any header or footer information, except for optional column format headers as describ
Spreadsheets
-
When submitting spreadsheets make sure that columns are formatted consistently and numerical precisions are set appropriately, because data will be processed as formatted (i.e. non-displayed digits will be lost).
-
It is a good practice to recalculate the spreadsheet then convert all formula cells to values to prevent update anomalies caused by complex formula logic or linked cells (e.g. copy cells, then paste values and formats only).
-
If date/time information is included in the table, make sure to inform the Information Mana
- As individual columns containing integer date components (e.g. 02-Feb-2001 06:00PM as columns Year, Month, Day, Time with values 2001, 2, 2, 1800)
- As fractional Julian days based on Jan. 1, 0000 as 1 (e.g. 730884.750)
-
As integer year (e.g. 2001) and Julian date/Year date (i.e. days since Jan. 1 of the same year, e.g. 32.75)
- Documentation and column format information can be included in the same workbook as the data table, but generally avoid including extraneous worksheets, plots, graphical elements, or comments interspersed with data when submitting spreadsheets.
Data Column Format Information: GCE Matlab Users Only
This section and the Data Types section below are presented primarily for CWT investigators who use the GCE MatLab Toolbox for processing and Post-Processing of their data. People submitting other data are referred to the instructions in the CWT Data Submission Form for information on providing flagging information for Quality Assurance and Quality Control (QA/QC) of their data.
Specific format information for each column in the data set must be provided by the contributor at the time of submission. This information is needed to complete the Data Structural Descriptors portion of the metadata, and to support online data analysis and dynamic document generation. Note that some format selections require corresponding information in other sections of the data documentation, as indicated. The format information can be provided separately in list format, or elements can be included as header rows at the top of each data column with values delimited like the data values. When including header rows, precede the first value in each row with the attribute name and a colon (e.g. Names:Salinity,Temperature,Depth,...), and do not use more than one delimiter between values (e.g. do not include spaces when delimiting by commas). Note that individual column descriptions must be delimited by tabs, commas, or semicolons in order to preserve spaces between words. All attributes are required, except for Descriptions (Names will be substituted if omitted).
Attribute | Description and Formatting | Examples |
---|---|---|
Names |
See Site standards (no internal spaces, symbols are discouraged, mixed-case text and underscores are supported) |
Salinity,Stem_Count,Marsh_Zone |
Descriptions | Description of each column in the table | Pore-water salinity,Number of Spartina stems in the plot,Marsh zone code |
Units |
Measurement units for data in each column (no internal spaces, symbols are allowed) |
PSU,count,none |
DataTypes |
Physical data type of each column F = floating-point number I = integer number S = alphanumeric string |
F,I,S |
VariableTypes |
Logical data type of each column Data (measured data values) Calculation (calculated values)1 Nominal (categorical values)2 DateTime (date and/or time values)3 Logical (Boolean values, e.g. 1/0, True/False) Description (textual observation) Code (strings matching items in a list)1 |
Data,Data,Code |
NumberTypes |
Numerical data type of each column Continuous (continuous numbers) Discrete (discontinuous numbers) Angular (angular numbers, in degrees) None or N/A (not applicable, e.g. strings) |
Continuous,Discrete,None |
Precisions |
Decimal places to display for each column (i.e. indicating decimal places to use when exporting values as text files, HTML tables) |
2,0,0 |
FlagCriteria | Flag criteria expression for each column4 | x<0='?';x>36='H',x<0='?',none |
-
Equations, code lists, and flag character definitions are required in the data documentation. This information will be stored in the following section of the Metadata:
Metadata Category IV: Data Structural Descriptors
Equations in: Data_Calculations
Code lists and flag character definitions in: Data_ValueCodes
Flag criteria in: Data_FlagCriteria -
Categorical values can be numbers or strings that identify some fixed property of the observation, such as site or plot, but aren't data values in themselves and won't be used for calculations or statistics (other than as grouping columns).
-
Date and time values can be expressed as numerical components (e.g. year, month, day, hour columns) or formatted strings (e.g. 09-Feb-2001) as appropriate for the data set - by default, the datetime variable type designation will prevent these columns from being included in statistical analyses regardless of data type.
-
Flag criteria are text expressions containing one or more equations per column defining flag characters to associate with numerical values that match the specified criteria. Flagging questionable data values is a powerful quality control technique, so criteria should be defined whenever possible. (Note: value flags are maintained separately from data in the MATLAB data structure, so flagged values are not removed from the data set by default. Criteria can be edited and reevaluated any time, and flags can be suppressed or displayed in various formats when the data set is saved as text or HTML).
Flag criteria syntax is as follows (or contact the
Information Ma nager for assistance):
x[conditional][value]='[flag character]' , where:
[conditional] is <, >, <=, >=, ~= (or <>), == (or =)
[value] is a numerical data value
[flag character] is any single text character, symbol, or digitMultiple equations can be specified per column by using a semicolon to separate each one (overlapping criteria are supported, allowing flag characters to be combined when more than one criteria is matched)
For example: x<0='L';x>30='H' generates 'L' flags for negative values, 'H' flags for values over 30
(Note: contact the data manager for assistance assigning criteria for flagging coded values)
Appendix: Data types
Physical data type specifies how the values are represented and stored in the data set, whereas logical data type specifies the information content of the column (i.e. variable type or domain) and numerical data type specifies the numerical characteristics of the column. These three column attributes are used by GCE-LTER Data Structure tools to display and analyze values appropriately (e.g. statistics requiring ratio data, such as mean and standard error, will only be calculated on Floating-point numbers with logical types of Data or Calculation, whereas columns with any combination of data types can be enumerated or used for querying, sorting, or aggregating rows).
Supported combinations of these column attributes are listed in the following table, along with descriptions and common usage examples. Note that some logical data types can be represented by several physical data types, depending on the form of the data and requirements of the investigator.
Data Type Category | Description and Examples | ||
---|---|---|---|
Physical | Logical | Numerical | |
Floating-point number | Data | Continuous |
measurements of continuous variables (e.g. salinity, temperature, elapsed time) |
Angular |
measurements of directional parameters (e.g. wind direction) |
||
Calculation* | Continuous |
calculations on continuous data (e.g. fractions, percentages, rates) |
|
Angular |
calculations on directional parameters (e.g. wind direction) |
||
DateTime | Continuous |
fractional numerical date values (e.g. fractional Julian day) |
|
Integer number | Data | Discrete | count or abundance data |
Calculation* | Discrete |
calculations on count data (e.g. min, max, sum, median) |
|
DateTime | Discrete |
integer date components (e.g. year, month, day, hour) |
|
Nominal** | Discrete |
digits representing fixed categories (e.g. sites, plots, zones) |
|
Logical** | Discrete |
binary or numerical Boolean values (e.g. 1 = present, 0 = absent) |
|
Text string | DateTime |
NA or blank |
formatted date or time strings (e.g. 01-Feb-2001) |
Logical** |
NA or blank |
Textual Boolean values (e.g. True/False, Present/Absent) |
|
Description |
NA or blank |
textual observation, taxonomic names (e.g. color, sex, appearance) |
|
Code** |
NA or blank |
alphanumeric code strings (e.g. species code, description code) |
* equations must be included in the data documentation (Category IV: Data_Calculations)
** value definitions or code lists must be included in the data documentation (Category IV: Data_ValueCodes)