Summary of Study ST002132
This data is available at the NIH Common Fund's National Metabolomics Data Repository (NMDR) website, the Metabolomics Workbench, https://www.metabolomicsworkbench.org, where it has been assigned Project ID PR001350. The data can be accessed directly via it's Project DOI: 10.21228/M86X36 This work is supported by NIH grant, U2C- DK119886.
See: https://www.metabolomicsworkbench.org/about/howtocite.php
This study contains a large results data set and is not available in the mwTab file. It is only available for download via FTP as data file(s) here.
Study ID | ST002132 |
Study Title | Optimization of Imputation Strategies for High-Resolution Gas Chromatography-Mass Spectrometry (HR GC-MS) Metabolomics Data |
Study Summary | Gas chromatography-coupled mass spectrometry (GC-MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputa-tion methods with metabolites analyzed on an HR GC-MS instrument. By introducing missing values into the complete (i.e., data without any missing values) NIST plasma dataset we demon-strate that Random Forest (RF), Glmnet Ridge Regression (GRR), and Bayesian Principal Com-ponent Analysis (BPCA) shared the lowest Root Mean Squared Error (RMSE) in technical repli-cate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset, and bias downstream regression coefficients and p-values. |
Institute | Wake Forest School of Medicine |
Last Name | Ampong |
First Name | Isaac |
Address | Center for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University, Winston-Salem, North Carolina, United States |
iampong@wakehealth.edu | |
Phone | 3367162091 |
Submit Date | 2022-04-01 |
Raw Data Available | Yes |
Raw Data File Type(s) | mzML |
Analysis Type Detail | GC-MS |
Release Date | 2022-04-27 |
Release Version | 1 |
Select appropriate tab below to view additional metadata details:
Project:
Project ID: | PR001350 |
Project DOI: | doi: 10.21228/M86X36 |
Project Title: | Optimization of Imputation Strategies for High-Resolution Gas Chromatography-Mass Spectrometry (HR GC-MS) Metabo-lomics Data |
Project Summary: | Gas chromatography-coupled mass spectrometry (GC-MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputa-tion methods with metabolites analyzed on an HR GC-MS instrument. By introducing missing values into the complete (i.e., data without any missing values) NIST plasma dataset we demon-strate that Random Forest (RF), Glmnet Ridge Regression (GRR), and Bayesian Principal Com-ponent Analysis (BPCA) shared the lowest Root Mean Squared Error (RMSE) in technical repli-cate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset, and bias downstream regression coefficients and p-values. |
Institute: | Wake Forest School of Medicine |
Department: | Department of Internal Medicine |
Laboratory: | Olivier Lab |
Last Name: | Ampong |
First Name: | Isaac |
Address: | Center for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University, Winston-Salem, North Carolina, United States |
Email: | iampong@wakehealth.edu |
Phone: | 3367162091 |
Subject:
Subject ID: | SU002217 |
Subject Type: | Mammal |
Subject Species: | Papio hamadryas |
Taxonomy ID: | 9557 |
Factors:
Subject type: Mammal; Subject species: Papio hamadryas (Factor headings shown in green)
mb_sample_id | local_sample_id | type |
---|---|---|
SA204686 | 23 | baboon liver |
SA204687 | 22 | baboon liver |
SA204688 | 21 | baboon liver |
SA204689 | 24 | baboon liver |
SA204690 | 20 | baboon liver |
SA204691 | 26 | baboon liver |
SA204692 | 19 | baboon liver |
SA204693 | 27 | baboon liver |
SA204694 | 28 | baboon liver |
SA204695 | 29 | baboon liver |
SA204696 | 25 | baboon liver |
SA204697 | 11 | baboon liver |
SA204698 | 1 | baboon liver |
SA204699 | 2 | baboon liver |
SA204700 | 30 | baboon liver |
SA204701 | 3 | baboon liver |
SA204702 | 12 | baboon liver |
SA204703 | 13 | baboon liver |
SA204704 | 17 | baboon liver |
SA204705 | 16 | baboon liver |
SA204706 | 15 | baboon liver |
SA204707 | 14 | baboon liver |
SA204708 | 18 | baboon liver |
SA204709 | 49 | baboon liver |
SA204710 | 47 | baboon liver |
SA204711 | 46 | baboon liver |
SA204712 | 45 | baboon liver |
SA204713 | 44 | baboon liver |
SA204714 | 48 | baboon liver |
SA204715 | 50 | baboon liver |
SA204716 | 10 | baboon liver |
SA204717 | 53 | baboon liver |
SA204718 | 52 | baboon liver |
SA204719 | 51 | baboon liver |
SA204720 | 43 | baboon liver |
SA204721 | 42 | baboon liver |
SA204722 | 35 | baboon liver |
SA204723 | 34 | baboon liver |
SA204724 | 33 | baboon liver |
SA204725 | 32 | baboon liver |
SA204726 | 36 | baboon liver |
SA204727 | 37 | baboon liver |
SA204728 | 41 | baboon liver |
SA204729 | 40 | baboon liver |
SA204730 | 39 | baboon liver |
SA204731 | 38 | baboon liver |
SA204732 | 31 | baboon liver |
SA204733 | 14705 | baboon plasma |
SA204734 | 15400 | baboon plasma |
SA204735 | 15149 | baboon plasma |
SA204736 | 15099 | baboon plasma |
SA204737 | 15027 | baboon plasma |
SA204738 | 15432 | baboon plasma |
SA204739 | 15537 | baboon plasma |
SA204740 | 15727 | baboon plasma |
SA204741 | 15706 | baboon plasma |
SA204742 | 15671 | baboon plasma |
SA204743 | 15636 | baboon plasma |
SA204744 | 14722 | baboon plasma |
SA204745 | 14719 | baboon plasma |
SA204746 | 12818 | baboon plasma |
SA204747 | 12656 | baboon plasma |
SA204748 | 11887 | baboon plasma |
SA204749 | 11641 | baboon plasma |
SA204750 | 13029 | baboon plasma |
SA204751 | 13238 | baboon plasma |
SA204752 | 14438 | baboon plasma |
SA204753 | 13737 | baboon plasma |
SA204754 | 13669 | baboon plasma |
SA204755 | 15898 | baboon plasma |
SA204756 | 16172 | baboon plasma |
SA204757 | 30325 | baboon plasma |
SA204758 | 30226 | baboon plasma |
SA204759 | 27948 | baboon plasma |
SA204760 | 26702 | baboon plasma |
SA204761 | 30623 | baboon plasma |
SA204762 | 30628 | baboon plasma |
SA204763 | DK63 | baboon plasma |
SA204764 | BB36 | baboon plasma |
SA204765 | AE06 | baboon plasma |
SA204766 | 26476 | baboon plasma |
SA204767 | 26392 | baboon plasma |
SA204768 | 17000 | baboon plasma |
SA204769 | 16772 | baboon plasma |
SA204770 | 16518 | baboon plasma |
SA204771 | 16215 | baboon plasma |
SA204772 | 17803 | baboon plasma |
SA204773 | 17883 | baboon plasma |
SA204774 | 20120 | baboon plasma |
SA204775 | 18565 | baboon plasma |
SA204776 | 18463 | baboon plasma |
SA204777 | EF44 | baboon plasma |
SA204536 | T2 | Nistplasma |
SA204537 | T3 | Nistplasma |
SA204538 | T4 | Nistplasma |
SA204539 | T1 | Nistplasma |
SA204540 | S9 | Nistplasma |
SA204541 | S7 | Nistplasma |
SA204542 | S8 | Nistplasma |
SA204543 | T5 | Nistplasma |
Collection:
Collection ID: | CO002210 |
Collection Summary: | The NIST plasma metabolomics dataset consisted of 150 replicate samples which were bought from commercial vendors. The 12 batched datasets were pooled, aligned, and processed using open source software MS-DIAL (v4.6). The second dataset was generated from metabolic profiling of 45 baboon plasma samples collected from 35 females in the age range of 6-23 years and 10 males in the same age range. All 45 plasma samples were analyzed using an untargeted EI-GC-MS approach as described above. The third dataset consists of another EI-GC-MS analysis of metabolites extracted from 47 liver biopsy samples collected from the same adult healthy baboons as the plasma which included 39 females and 8 males in the age range of 6-23 years. |
Sample Type: | Liver |
Treatment:
Treatment ID: | TR002229 |
Treatment Summary: | For the baboon study, normal life course baboons were fed control chow diet |
Sample Preparation:
Sampleprep ID: | SP002223 |
Sampleprep Summary: | 15 μL of plasma or liver samples were subjected to sequential solvent extraction, once each with 1 mL of acetonitrile: isopropanol: water (3:3:2) and 500 μL of acetonitrile: water (1:1) mixtures at 4°C [14]. An internal standard, adonitol (2 μL from 10 mg/ml stock) was added to each aliquot prior to the extraction. The extracts were dried under vacuum at 4°C prior to chemical derivatization (silylation reactions). Blank tubes without samples, were treated similarly as sample tubes and added to account for background noise and other sources of contamination. Samples and blanks were sequentially derivatized with meth-oxyamine hydrochloride (MeOX) and 1% TMCS in N-methyl-N-trimethylsilyl-trifluoroacetamide (MSTFA) or 1% TMCS containing N-(t-butyldimethylsilyl)-N-methyltrifluoroacetamide (MTBSTFA) as described elsewhere [15]. Briefly, the steps involved addition of 20 μL of MeOX (20 mg mL-1) in pyridine incu-bated at 55°C for 60 min followed by trimethylsilylation at 60°C for 60 min after adding 80 μL MTBSTFA. |
Combined analysis:
Analysis ID | AN003487 |
---|---|
Analysis type | MS |
Chromatography type | GC |
Chromatography system | Thermo Trace 1310 |
Column | Thermo Scientific Trace GOLD TG-5SIL-MS |
MS Type | EI |
MS instrument type | QTRAP |
MS instrument name | Thermo Q Exactive Orbitrap |
Ion Mode | POSITIVE |
Units | Normalized Peak abundances |
Chromatography:
Chromatography ID: | CH002574 |
Instrument Name: | Thermo Trace 1310 |
Column Name: | Thermo Scientific Trace GOLD TG-5SIL-MS |
Chromatography Type: | GC |
MS:
MS ID: | MS003248 |
Analysis ID: | AN003487 |
Instrument Name: | Thermo Q Exactive Orbitrap |
Instrument Type: | QTRAP |
MS Type: | EI |
MS Comments: | Data acquisition and instrument control were carried out using Xcalibur 4.3 and Trace-Finder 4.1 softwares MS-DIAL |
Ion Mode: | POSITIVE |