Fiehn, Oliver (fiehn@ucdavis.edu)
Subramaniam, Shankar (shankar@sdsc.edu)
The main purpose of this document is to serve as a data sharing plan for all the grants and projects funded by Common Fund Metabolomics Program and applicable across the consortium. For broader NIH funded grants readers are advised to visit the links at the end of this document.
Data sharing is essential for enhanced utilization of research results for translation into knowledge, products, and procedures to improve human health. Since 1996, NIH has required data sharing in areas such as DNA sequences, mapping information and crystallographic coordinates. Protein and DNA sequences are made available to researchers through public data archives, such as GenBank or the Gene Expression Omnibus. Data sharing allows the scientific community to validate research findings, create new datasets by combining data from multiple sources and explore topics not envisioned by investigators who generated the initial data set. With the creation of the Metabolomics Data Repository managed by Data Repository and Coordination Center (DRCC), the NIH acknowledges the importance of data sharing for metabolomics.
Metabolomics represents the systematic study of low molecular weight molecules found in a biological sample, providing a "snapshot" of the current and actual state of the cell or organism at a specific point in time. Thus, the metabolome represents the functional activity of biological systems. As with other ‘omics’, metabolites are conserved across animals, plants and microbial species, facilitating the extrapolation of research findings in laboratory animals to humans. Common technologies for measuring the metabolome include mass spectrometry (MS) and nuclear magnetic resonance spectroscopy (NMR), which can measure hundreds to thousands of unique chemical entities.
Data sharing in metabolomics will include primary raw data and the biological and analytical meta-data necessary to interpret these data. Through cooperation between investigators, metabolomics laboratories and data coordinating centers, these data sets should provide a rich resource for the research community to enhance preclinical, clinical and translational research.
An open exchange format submission is encouraged, as long as the raw data and exchange format contain the same level of information. File names should use identifiers that can be linked to the final result matrix of an experiment.
The frequently asked questions (FAQs) section covers questions that may or may not be covered in the Data Sharing Document text.
Q: When will the metabolomics data sharing policy put in place?
A: Data sharing under the specific guidelines of this plan will become a mandatory requirement for all Common Fund metabolomics research projects starting October 1st 2013.
Q: I have an NIH study that was active before the NIH data sharing policy was put in place. Do I need to submit data from such studies?
A: No, only data from NIH Common Fund Metabolomics Program funded studies need to be submitted to the Metabolomics Repository located at DRCC. However older data sets are welcome and investigators are highly encouraged to submit the data whenever possible.
Q: Are paid or commercial services obtained from a metabolomics core laboratory covered by the data sharing policy?
A: Paid metabolomics core laboratory services are not covered per se, however if the principal investigator is paying with Common Fund Metabolomics Program funds or the costs of the project is offset in part from the Common Fund Metabolomics Program, it is the PIs responsibility to submit such data to the metabolomics data center.
Q: Is the metabolomics core laboratory director responsible for submitting data to the data center?
A: No, it is the principle investigator’s responsibility to send data to the metabolomics data center. However, investigators may use the NIH Common Fund Regional Comprehensive Metabolomics Research Cores (RCMRCs) on their behalf to submit files.
Q: I obtained metabolomics funding from another federal agency such as NSF, DOE, EPA. Am I required to submit data to the Metabolomics Data Repository at DRCC?
A: No, only NIH Common Fund Metabolomics Program funded studies need to comply, but any metabolomics data relevant to biomedical research that satisfies the data deposition requirements of the Metabolomics Data Repository is highly encouraged.
Q: I am a metabolomics researcher from another country, can I submit metabolomics data?
A: Yes, any metabolomics data of relevance to human health is welcome.
Q: What is the license of the data once it is publicly available?
A: After the embargo date the data will be in the public domain.
Q: Can I retract data which is incorrect or one that contains false annotations?
A: Curators from Metabolomics Data Repository will work with you to resolve such issues.
Q: How can I make sure my data sets are correctly connected to my multiple NIH grants?
A: The Metabolomics Data Repository has a systems function that will allow for selection of active NIH grants based on the NIH Reporter System. This information will automatically link related studies. Additionally the principal investigator can update that information on the grant website using the eraCommons login.
Q: I am a NIH Common Fund Metabolomics Program funded principal investigator (PI) and I collaborate with a clinical/medical principal investigator. The medical PI cannot release any patient data and therefore will not be able to share any data. How can I comply with the data sharing policy?
A: Medical information should not be released to the public or to the metabolomics data repository in a way that can be identified. Data should be free of identifiers that would permit linkages to individual research participants, and exclude variables that could lead to deductive disclosure of the identity of individual subjects. When data sharing is limited, applicants should explain such limitations in their data sharing plans. Data needs to be in compliance with Health Insurance Portability and Accountability Act (HIPPA) rules (http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html).
Q: Is patient data safe in the metabolomics data center?
A: Metabolomics Data Repository will make sure all the data deposited will be securely maintained. However, it is the responsibility of the investigator to ensure that the clinical data be free of identifiers that would permit linkages to individual research participants, and exclude variables that could lead to deductive disclosure of the identity of individual subjects. Data needs to be in compliance with the Health Insurance Portability and Accountability Act (HIPAA) privacy rules before submitting it to the Metabolomics Data Repository (http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html ).
Q: I received Pilot and Feasibility funding through the Common Fund Metabolomics Program Regional Comprehensive Metabolomics Research Cores (RCMRCs). Are there any special rules that apply?
A: Yes, all data generated through Pilot and Feasibility funding will be required to deposit to the metabolomics DRCC. However, it will be kept in and safe and firewalled zone and will be released after the embargo time is over. RCMRCs will work with the investigators in ensuring and clarifying any concerns before the data can be transferred to Metabolomics Data Repository
Q: My data sets are so large it will take weeks to upload all the data.
A: Please talk to the Metabolomics Data Repository curators to send the data in physical form, such as hard disks or solid state disks (SSDs) (see contact information on the web site, http://www.metabolomicsworkbench.org.
Q: I have data from an older prior 2012 NIH funded study. Can I send the data to the metabolomics data center?
A: Yes, any metabolomics data relevant to human health and associated raw data, especially with analytical and biological metadata annotations is highly encouraged.
Q: How do I pay for preparing data for sharing and archiving?
A: NIH recognizes that it takes time and money to prepare data for sharing. You should request funds for data archiving and sharing as part of your grant application for collecting the data. If you have already collected the data, you may want to ask your NIH Project Officer if a competitive or administrative supplement is suitable for this purpose. You may also contact Metabolomics Data Repository for assistance in data transfer.
Q: Can I download data from the repository and publish new research based on such data?
A: Yes, as long as the related source of the data is correctly cited. That includes the accession number of the experiment as well as associated publications where the data was first released. However, there may be restrictions, if you are not willing to share the data with Metabolomics Data Repository. You are highly encouraged to contact DRCC personnel for more information.
Q: What is meta-data and what should I publish besides the raw data files?
A: Meta-data in metabolomics can be of analytical instrument or biological nature. Meta-data for analytical instrumentation can include the instrument type, such as GC-MS, LC-MS, NMR, the type of instrument and vendor, the experiment such as 1H-NMR, COSY, LC-MS/MS and protocols of extraction and data processing. Biological data includes taxonomy data, organ and cell type and the minimal description of the experiment such as drug vs. non-drug treatment or time-course experiments and appropriate sample size information. The repository will require a minimum meta-data set and additional data should be submitted as protocol or standard operating procedure in Word or PDF format. All experiments (other than patient clinical studies) should be completely reproducible and a complete set of protocols and metadata will ensure this.
We recommend the guidelines of the Metabolomics Society as published below, but the data/metadata requirements may extend beyond what is proposed in these documents.
The Metabolomics Society published a set of rules in Metabolomics (ISSN 1573-3890), Volume 3, Number 3, September 2007;
http://www.springerlink.com/content/1573-3882/3/3/
Q: Are there any guidelines or examples how to structure data for submission?
A: See http://www.metabolomicsworkbench.org
Q: How will NIH enforce the data sharing rules?
A: This document mainly provides guidance for NIH Common Fund Metabolomics Program funded grants and projects. Data sharing requirements terms and conditions have already been negotiated with the awardee institutions and investigators at the time of Notice of Grant Award (NGA). If data sharing compliance is not met, it may lead to unnecessary delays in non-competing award process. If the investigators seek waiver for data sharing; a strong and compelling justification will be necessary to explain why such a waiver is necessary for withholding the data from sharing.
Q: Does data sharing pertain only to published data?
A: No. Data-sharing plans should encompass all data from funded research that can be shared without compromising individual subjects' rights and privacy that would help analyzing the metabolomics data, regardless of whether the data have been used in a publication. Furthermore, data sharing prior to the publication of major results, is encouraged in many instances if appropriate. For example, when data are collected to provide a resource for the scientific community (as in the case of many large surveys); as such it may not necessarily lead any publications. Whenever applicable, raw data from the measurements should also be shared.
Q: How will the quality of the data be evaluated?
A: A quality index that represents missing meta-data or missing annotations will be assigned by curators. Automatic curation tools that can detect missing substance identifiers, missing metadata or broken raw data files are in development.