Search:     Advanced search

Transferring data from the GCE Data Toolbox for MATLAB to PASTA

Article ID: 136
Last updated: 05 Feb, 2016

Editing data and metadata within the GCE Data Toolbox for MATLAB:

  1. Once the data is successfully loaded into the GCE Data Toolbox for MATLAB, click "Edit" and "View/ Edit Data", in order to check the data.
  2. When modifying the .mat file, it is a good idea to save the file with a different file name, in order to have the original file available in case you need to reference it later.
  3. Make sure that all of the data rows with no units have "none" in the "Column Units" field.
  4. In order to view, look over, and edit the metadata in the GCE Data Toolbox for MATLAB, click "Metadata" and then "View/ Edit Metadata." The dataset metadata window will now open.
  5. In the dataset metadata window, add in the dataset's accession number in "Dataset" - "Accession." The accession number should be the same as the dataset ID.
  6. In the dataset metadata window, edit the list of investigator(s) in "Dataset" - "Investigator."
  7. If the investigator is affiliated with Coweeta, the "Dataset" - "Investigator" field should be filled out in the following format (shown below). If the investigator is not affiliated with Coweeta, the "Dataset" - "Investigator" field is filled out slightly differently, with the User ID omitted (also shown below).

Format for "Dataset" - "Investigator" metadata field for Coweeta-affiliated researchers:

|Name: Wayne Swank
|userid: wswank

Format for "Dataset" - "Investigator" metadata field for non-Coweeta-affiliated researchers:

|Name: Wayne Swank

Removing line breaks in content:

  1. If there are line breaks in the data's fields (whether in the Excel spreadsheet, the Toolbox, or the Metabase), errors can occur during the PASTA import process.
  2. If there are line breaks in any of the fields, you will have to remove them using Notepad++.
  3. In Notepad++, find '\r\n' and replace it with a space.
  4. In Notepad++, find two spaces and replace them with one space until the gaps are gone.

Formatting dates properly:

  1. In many datasets, there is at least one or more column(s) containing date(s). These dates should be formatted in a standard fashion. One of the good ways to format dates is to have a separate column for "Year", "Month", and "Day." The units for "Year" should be "YYYY." In the "Year" column, years should be formatted like "1989", for example (not "89"). The units for "Month" should be "MM", and all months should be numerical. For instance, November should be "11" in the column (not "Nov"). The units for "Day" should be "DD." Days should also be numerical; for instance, "12" for the 12th day of the month.
  2. If the dates in a dataset are not properly formatted, certain date functions in the GCE Data Toolbox for MATLAB can help transform the date format into an acceptable one.
  3. If the dates are formatted in an analogous fashion to "May-99,", you will first need to click "Edit" - "Split Column Values" - "Dash separator (-)" in the Toolbox.
  4. In this example, the year will still be in the YY format (i.e. "99"). You will want to convert it to the YYYY format (i.e. "1999"). In the Toolbox, click on "Convert" next to "Column Units." If the original units are "YY", the converted units should be "YYYY." For the equation, put in "x + 1900." Use this method only if you do not have years 2000 or beyond in your dataset.
  5. If the original date format in the dataset is in the format of "MM-DD-YYYY" (such as "11-27-1997" for November 27, 1997), click on "Edit" - "Date Functions" - "Date Components from Date Column" - "Automatic" in the Toolbox. Keep only the last, modified date columns. Delete the original column in the "MM-DD-YYYY" format.

In the MATLAB workspace:

  1. Once you have finished making the necessary edits to the .mat file, still in the GCE Data Toolbox for MATLAB, press Control B. (Pressing Control B puts the dataset in the MATLAB workspace). When you press Control B, you should see the following pop-up window: "The data structure was successfully copied to the base MATLAB workspace as 'data' ").
  2. At this point, you are working within the MATLAB workspace. In the MATLAB workspace, type 'data' and press Enter.
  3. Next, run the following command string, which connects to the Metabase.

conn = cwt_db_conn('CWT_Metabase2','matlab_avatar','bayes1an_bb')

  1. Once you have run the command (shown above) for connecting to the Metabase, the message should be blank ("Message: []"). Having a blank message is a sign that the command ran successfully.
  2. After the command for connecting to the Metabase has run successfully, you will run a command to insert the dataset metadata into the Metabase. Below is an example of the command string to be used to insert dataset metadata into the Metabase. Instead of "3100", use the specific accession number of the dataset.

[id,msg] = insert_metabase_dataset(conn,data,'3100','CWTTER','',3100)

In the Metabase:

  1. "Forms" - "Data Set Metadata"
  2. Make sure "Display on Web" is checked in the Metabase.
  3. In order to view, look over, and edit the metadata in the GCE Data Toolbox for MATLAB, click "Metadata" and then "View/ Edit Metadata." The dataset metadata window will now open. In the Metabase, you will add information for necessary fields by referencing Toolbox metadata.
  4. Most of the metadata should have transferred automatically to the Metabase from the Toolbox using the commands, but if any of the metadata does not transfer automatically to the Metabase, you will have to fill it out manually.
  5. Select one (or more, but at least one) of the study types listed.
  6. Under the "General" tab, go to "Data Set Themes." Please choose an appropriate theme by looking at the content of the abstract.
  7. Under the "General" tab, on the right, for "Researcher Name", add the 'Responsible PI.' If all of the PIs affiliated with the project and/ or dataset are retired or deceased, add Dr. John Chamblee as the curator. You can find the names, User IDs, and status of Coweeta personnel (including Coweeta PIs) in the Metabase, under "Forms" - "Personnel."
  8. The "Keywords" tab should have automatically been filled out from the Toolbox's metadata. However, you should look at the keywords listed, and add keywords if appropriate. There should be at least one keyword of keyword type "LTER Core Research Areas." Add as many keywords of the keyword type "LTER Core Research Areas" as appropriate. For Coweeta-affiliated datasets, make sure that 'Coweeta Hydrologic Laboratory' (keyword type "User-defined keywords") and 'Coweeta LTER' (keyword type "User-defined keywords") are keywords.
  9. Under the "Studies" tab, fill out at least the "Study Name", "Begin Date", "End Date", and "Design Characteristics." Under the "Studies" tab, copy the contents of "Study - Description" from the Toolbox Metadata Editor and paste them into the "Design Characteristics" field in the Metabase.
  10. Next, look at the "Methods" tab. Copy and paste "Study - Methods" from the Toolbox Metadata Editor.
  11. Fill out the "Methods" tab. If the entire content of the methods does not fit in one entry, you will have to split it up into different methods (steps). Give each method step a different "Method Name," such as "Nutrient Analysis." Under the "Methods" tab, fill out the "Instrumentation" field(s).
  12. For the "Supplementary" tab, reference the Toolbox Metadata Editor. Under the "Supplementary" tab, there will sometimes be information on software to fill out.
  13. Under the "References" tab, select the "Site References (polygons)."
  14. Save your work in the Metabase.
  15. In order to make sure the dataset metadata appears in the CWT Data Catalog, search for the accession number at http://coweeta.uga.edu/dbpublic/data_catalog.asp

Adding units in the Metabase:

  1. When filling out and checking the units for a dataset (in Excel, MATLAB, and/ or the Metabase), make sure that they are in the Metabase Forms "EML Unit Dictionary" and "EML Unit Map." Units used in datasets should have entries in both of those Forms. If they are not, please add them. (If the units used in a dataset are not in the Metabase Forms, errors will appear later in the PASTA import process).

In the MATLAB workspace (for single-entity datasets):

  1. Re-run the following command string, which connects to the Metabase.

conn = cwt_db_conn('CWT_Metabase2','matlab_avatar','bayes1an_bb')

  1. Next, run the following command string, which creates the final product files for upload into PASTA and inserts the variable and file metadata into the Metabase. Note that the syntax of the command string is applicable to single-entity datasets.

exp_datasetfiles_cwt(data,'M:\','M',[],conn,1)

In the MATLAB workspace (for multi-entity datasets):

  1. For multi-entity datasets, after conducting the previous steps up to filling out the metadata in the Metabase, you would still run the following command string (below), which connects to the Metabase.

conn = cwt_db_conn('CWT_Metabase2','matlab_avatar','bayes1an_bb')

  1. Next, for multi-entity datasets, for each file (each entity) of the multi-entity dataset, run a command in the format shown below. In this example, "4023_leaves_bugs" is based off of the file name of the entity. "Data table describing insects collected on tiles data for data set 4023" is an example of a short description of the dataset entity's content. These two fields should be different for each entity of a multi-entity dataset.

exp_datasetfiles_cwt(data,'M:\','M',[],conn,1,'4023_leaves_bugs','Data table describing insects collected on tiles data for data set 4023.',[0 0])

  1. Repeat the following series of these two commands for each of the entities in the multi-entity dataset.

In the Metabase: (for both single-entity and multi-entity datasets)

  1. Check the "Entity" tab in the Metabase to see if the entities/ files were successfully added to the Metabase.
  2. At the bottom of the "Entity" tab in the Metabase, look at the FileType "Standard GCE EML-described."
  3. Go to the .TXT FileName (for instance, "1064_1_0.TXT"). Scroll to the very right column ("Version" column), and change it from "1.0" to "1.1".
  4. Click "Save." Go forward and back in the Metabase records to save what you just did.

In the Coweeta Data Catalog:

  1. Search for the dataset in the Coweeta Data Catalog (http://coweeta.uga.edu/dbpublic/data_catalog.asp) using the dataset accession number.
  2. Click "Complete EML" on the webpage.
  3. When you click "Complete EML", add “&metacat=yes” at the end of the URL.
  4. Errors may appear at this point, which you will need to fix. If errors occur, right click and select "View page source." At this point, you can identify which part of the dataset the errors are associated with.
  5. You may need to make corrections in the .mat file if there are errors. Common errors are unit errors. Make sure that all of the units are in the Metabase Forms "EML Unit Dictionary" and "EML Unit Map." Units used in datasets should have entries in both of those Forms. If they are not, please add them. Also, make sure the units are properly spelled.

In the Metabase: (These steps are only applicable if you needed to make corrections)

  1. Go through the following steps only if you needed to make corrections to the .mat file after encountering errors after clicking "Complete EML" on the Coweeta Data Catalog webpage (explained in the previous section).
  2. In the Metabase, go to "Queries" on the left-hand side, and choose "procDelete Files Vars." If you cannot see "procDelete Files Vars," contact the Coweeta Information Manager.
  3. Next, run the following command string, which connects to the Metabase.

conn = cwt_db_conn('CWT_Metabase2','matlab_avatar','bayes1an_bb')

  1. Then, run the export dataset files command (shown below).

exp_datasetfiles_cwt(data,'M:\','M',[],conn,1)

  1. Once again, search for the dataset in the Coweeta Data Catalog (http://coweeta.uga.edu/dbpublic/data_catalog.asp) using the dataset accession number. Click "Complete EML" on the webpage.
  2. When you click "Complete EML", add “&metacat=yes” at the end of the URL. Make sure there are no errors in the EML.
  3. Copy the page source.

In the Oxygen XML Editor:

  1. Open the Oxygen XML Editor.
  2. Create a new file in the Oxygen XML Editor.
  3. Clear the newly-created file until it is blank.
  4. Paste the page source you previously copied into the XML file.
  5. Red tabs may appear, indicating what you need to fix in the XML file and the location of errors. (If there are errors you cannot figure out how to fix, please contact the Coweeta Information Manager).
  6. Once the square in the upper-right hand corner is green, the file is ready for the next step.

In Portal-S:

  1. Go to Portal-S using the following link: https://portal-s.lternet.edu/nis/home.jsp
  2. Select "Tools" - "Evaluate/ Upload Data Packages"
  3. Log in with the UserID "CWT" (do not include the quotation marks). Contact the Coweeta Information Manager for the password.
  4. Select "File" - "Browse." Find the XML file you recently saved.
  5. Select "Upload."
  6. You may encounter errors at this point.
  7. Once the file has successfully been uploaded into Portal-S, you can move on the next step.

In LTER Network Data Portal:

  1. Go to the LTER Network Data Portal using the following link: https://portal.lternet.edu/nis/login
  2. Select "Tools" - "Evaluate/ Upload Data Packages"
  3. Log in with the UserID "CWT" (do not include the quotation marks). Contact the Coweeta Information Manager for the password.
  4. Select "File" - "Browse." Find the XML file you recently saved.
  5. Select "Upload."
  6. Once the file has successfully been uploaded, you have completed the PASTA import process.

Article ID: 136
Last updated: 05 Feb, 2016
Revision: 7
Views: 0
Print Export to PDF Subscribe Email to friend Share
Prev   Next
Coweeta LTER Guide to Assigning Key Words     Errors associated with transferring data from the Excel template...