https://wiki.mpi-cbg.de/lipidxscr/api.php?action=feedcontributions&user=Schwudke&feedformat=atomLipidXplorer - User contributions [en]2024-03-28T17:26:25ZUser contributionsMediaWiki 1.28.2https://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_wishlist&diff=878LipidXplorer wishlist2012-12-11T02:39:19Z<p>Schwudke: /* Positive and negative mode sequencing */</p>
<hr />
<div>===LipidXplorer wishlist===<br />
<br />
====Positive and negative mode sequencing====<br />
<br />
Would it be possible to add a function to LipidXplorer which allows to perform import and identification of positive mode acquired spectra and negative mode acquired spectra subsequently? Thus one could run LipidXplorer overnight, if one has numerous spectra.<br />
<br />
Dominiks wish:<br />
a simple import for MS3.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=853MFQL library2012-09-02T11:10:25Z<p>Schwudke: </p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[media:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:PE_negative_FAS.mfql|PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:LPE_negative_FAS.mfql|LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:PG_negative_FAS.mfql|PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:PI_negative_FAS.mfql|PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[media:PS_negative_FAS.mfql|PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[media:PA_negative_FAS.mfql|PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_PE.mfql|neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_LPE.mfql|neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[media:neg_bovine_heart_PEO.mfql|neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_PC.mfql|neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_LPC.mfql|neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[media:neg_bovine_heart_PCO.mfql|neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[media:neg_bovine_heart_PA.mfql|neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[media:neg_bovine_heart_LPA.mfql|neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_PG.mfql|neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_LPG.mfql|neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_PI.mfql|neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[media:neg_bovine_heart_DAG.mfql|]neg_bovine_heart_DAG.mfql]<br />
* Diacylglycerol in negative mode: [[media:neg_bovine_heart_DAG.mfql|neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[media:neg_bovine_heart_TAG.mfql|neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[media:neg_bovine_heart_CL.mfql|neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[media:Maradolipid.mfql|Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[media:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
===MFQL used in '''[4]''' For identification of lipids in caveolae induced in ''E. coli'' ===<br />
<br />
In this zip file [https://wiki.mpi-cbg.de/wiki/lipidx/images/b/b4/Walser_Schwudke_2012_MFQLs.zip] following scripts are included.<br />
<br />
* Lysophosphatic acid (LPA)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Lysophosphatidylglycerol (LPG)<br />
* Cardiolipin (CL)<br />
* Diacylglycerol (DAG)<br />
* Phosphatic acid (PA)<br />
* Phosphatidylglycerol (PG)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* determination of fatty acid composition of PE<br />
* determination of fatty acid composition of PG<br />
<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.<br />
<br />
'''[4]''' [[LX Reference Walser 2012 | Walser PJ, Ariotti N, Howes M, Ferguson C, Webb R, Schwudke D, Leneva N, Cho KJ, Cooper L, Rae J, Floetenmeyer M, Oorschot VM, Skoglund U, Simons K, Hancock JF, Parton RG. <br />
'''Constitutive formation of caveolae in a bacterium.'''<br />
''Cell. 2012 Aug 17;150(4):752-63.'']]</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=852MFQL library2012-09-02T11:07:55Z<p>Schwudke: </p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[media:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:PE_negative_FAS.mfql|PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:LPE_negative_FAS.mfql|LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:PG_negative_FAS.mfql|PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:PI_negative_FAS.mfql|PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[media:PS_negative_FAS.mfql|PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[media:PA_negative_FAS.mfql|PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_PE.mfql|neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_LPE.mfql|neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[media:neg_bovine_heart_PEO.mfql|neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_PC.mfql|neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_LPC.mfql|neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[media:neg_bovine_heart_PCO.mfql|neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[media:neg_bovine_heart_PA.mfql|neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[media:neg_bovine_heart_LPA.mfql|neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_PG.mfql|neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_LPG.mfql|neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_PI.mfql|neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[media:neg_bovine_heart_DAG.mfql|]neg_bovine_heart_DAG.mfql]<br />
* Diacylglycerol in negative mode: [[media:neg_bovine_heart_DAG.mfql|neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[media:neg_bovine_heart_TAG.mfql|neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[media:neg_bovine_heart_CL.mfql|neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[media:Maradolipid.mfql|Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[media:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
===MFQL used in '''[4]''' For identification of lipids in caveolae induced in ''E. coli'' ===<br />
<br />
In this zip file: [[MFQL for Caveolae analysis]https://wiki.mpi-cbg.de/wiki/lipidx/images/b/b4/Walser_Schwudke_2012_MFQLs.zip]<br />
<br />
* Lysophosphatic acid (LPA)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Lysophosphatidylglycerol (LPG)<br />
* Cardiolipin (CL)<br />
* Diacylglycerol (DAG)<br />
* Phosphatic acid (PA)<br />
* Phosphatidylglycerol (PG)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* determination of fatty acid composition of PE<br />
* determination of fatty acid composition of PG<br />
<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.<br />
<br />
'''[4]''' [[LX Reference Walser 2012 | Walser PJ, Ariotti N, Howes M, Ferguson C, Webb R, Schwudke D, Leneva N, Cho KJ, Cooper L, Rae J, Floetenmeyer M, Oorschot VM, Skoglund U, Simons K, Hancock JF, Parton RG. <br />
'''Constitutive formation of caveolae in a bacterium.'''<br />
''Cell. 2012 Aug 17;150(4):752-63.'']]</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=851MFQL library2012-09-02T11:04:46Z<p>Schwudke: </p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[media:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:PE_negative_FAS.mfql|PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:LPE_negative_FAS.mfql|LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:PG_negative_FAS.mfql|PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:PI_negative_FAS.mfql|PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[media:PS_negative_FAS.mfql|PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[media:PA_negative_FAS.mfql|PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_PE.mfql|neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_LPE.mfql|neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[media:neg_bovine_heart_PEO.mfql|neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_PC.mfql|neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_LPC.mfql|neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[media:neg_bovine_heart_PCO.mfql|neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[media:neg_bovine_heart_PA.mfql|neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[media:neg_bovine_heart_LPA.mfql|neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_PG.mfql|neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_LPG.mfql|neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_PI.mfql|neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[media:neg_bovine_heart_DAG.mfql|]neg_bovine_heart_DAG.mfql]<br />
* Diacylglycerol in negative mode: [[media:neg_bovine_heart_DAG.mfql|neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[media:neg_bovine_heart_TAG.mfql|neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[media:neg_bovine_heart_CL.mfql|neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[media:Maradolipid.mfql|Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[media:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
===MFQL used in '''[4]''' For identification of lipids in caveolae induced in ''E. coli'' ===<br />
<br />
[https://wiki.mpi-cbg.de/wiki/lipidx/images/b/b4/Walser_Schwudke_2012_MFQLs.zip]<br />
<br />
* Lysophosphatic acid (LPA)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Lysophosphatidylglycerol (LPG)<br />
* Cardiolipin (CL)<br />
* Diacylglycerol (DAG)<br />
* Phosphatic acid (PA)<br />
* Phosphatidylglycerol (PG)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* determination of fatty acid composition of PE<br />
* determination of fatty acid composition of PG<br />
<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.<br />
<br />
'''[4]''' [[LX Reference Walser 2012 | Walser PJ, Ariotti N, Howes M, Ferguson C, Webb R, Schwudke D, Leneva N, Cho KJ, Cooper L, Rae J, Floetenmeyer M, Oorschot VM, Skoglund U, Simons K, Hancock JF, Parton RG. <br />
'''Constitutive formation of caveolae in a bacterium.'''<br />
''Cell. 2012 Aug 17;150(4):752-63.'']]</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=850MFQL library2012-09-02T11:02:35Z<p>Schwudke: </p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[media:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:PE_negative_FAS.mfql|PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:LPE_negative_FAS.mfql|LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:PG_negative_FAS.mfql|PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:PI_negative_FAS.mfql|PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[media:PS_negative_FAS.mfql|PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[media:PA_negative_FAS.mfql|PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_PE.mfql|neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_LPE.mfql|neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[media:neg_bovine_heart_PEO.mfql|neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_PC.mfql|neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_LPC.mfql|neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[media:neg_bovine_heart_PCO.mfql|neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[media:neg_bovine_heart_PA.mfql|neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[media:neg_bovine_heart_LPA.mfql|neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_PG.mfql|neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_LPG.mfql|neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_PI.mfql|neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[media:neg_bovine_heart_DAG.mfql|]neg_bovine_heart_DAG.mfql]<br />
* Diacylglycerol in negative mode: [[media:neg_bovine_heart_DAG.mfql|neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[media:neg_bovine_heart_TAG.mfql|neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[media:neg_bovine_heart_CL.mfql|neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[media:Maradolipid.mfql|Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[media:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
===MFQL used in '''[4]''' For identification of lipids in caveolae induced in ''E. coli'' ===<br />
<br />
[https://wiki.mpi-cbg.de/wiki/lipidx/index.php/File:Walser_Schwudke_2012_MFQLs.zip]<br />
<br />
* Lysophosphatic acid (LPA)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Lysophosphatidylglycerol (LPG)<br />
* Cardiolipin (CL)<br />
* Diacylglycerol (DAG)<br />
* Phosphatic acid (PA)<br />
* Phosphatidylglycerol (PG)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* determination of fatty acid composition of PE<br />
* determination of fatty acid composition of PG<br />
<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.<br />
<br />
'''[4]''' [[LX Reference Walser 2012 | Walser PJ, Ariotti N, Howes M, Ferguson C, Webb R, Schwudke D, Leneva N, Cho KJ, Cooper L, Rae J, Floetenmeyer M, Oorschot VM, Skoglund U, Simons K, Hancock JF, Parton RG. <br />
'''Constitutive formation of caveolae in a bacterium.'''<br />
''Cell. 2012 Aug 17;150(4):752-63.'']]</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=849MFQL library2012-09-02T10:57:56Z<p>Schwudke: </p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[media:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:PE_negative_FAS.mfql|PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:LPE_negative_FAS.mfql|LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:PG_negative_FAS.mfql|PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:PI_negative_FAS.mfql|PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[media:PS_negative_FAS.mfql|PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[media:PA_negative_FAS.mfql|PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_PE.mfql|neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[media:neg_bovine_heart_LPE.mfql|neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[media:neg_bovine_heart_PEO.mfql|neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_PC.mfql|neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[media:neg_bovine_heart_LPC.mfql|neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[media:neg_bovine_heart_PCO.mfql|neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[media:neg_bovine_heart_PA.mfql|neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[media:neg_bovine_heart_LPA.mfql|neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_PG.mfql|neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[media:neg_bovine_heart_LPG.mfql|neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_PI.mfql|neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[media:neg_bovine_heart_LPI.mfql|neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[media:neg_bovine_heart_DAG.mfql|]neg_bovine_heart_DAG.mfql]<br />
* Diacylglycerol in negative mode: [[media:neg_bovine_heart_DAG.mfql|neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[media:neg_bovine_heart_TAG.mfql|neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[media:neg_bovine_heart_CL.mfql|neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[media:Maradolipid.mfql|Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[media:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
===MFQL used in '''[4]''' For identification of lipids in caveolae induced in ''E. coli'' ===<br />
<br />
* Lysophosphatic acid (LPA)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Lysophosphatidylglycerol (LPG)<br />
* Cardiolipin (CL)<br />
* Diacylglycerol (DAG)<br />
* Phosphatic acid (PA)<br />
* Phosphatidylglycerol (PG)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* determination of fatty acid composition of PE<br />
* determination of fatty acid composition of PG<br />
<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.<br />
<br />
'''[4]''' [[LX Reference Walser 2012 | Walser PJ, Ariotti N, Howes M, Ferguson C, Webb R, Schwudke D, Leneva N, Cho KJ, Cooper L, Rae J, Floetenmeyer M, Oorschot VM, Skoglund U, Simons K, Hancock JF, Parton RG. <br />
'''Constitutive formation of caveolae in a bacterium.'''<br />
''Cell. 2012 Aug 17;150(4):752-63.'']]</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=580MFQL library2011-03-17T19:06:00Z<p>Schwudke: /* MFQL used in [3] for identification of Lipids in human blood plasma */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=579MFQL library2011-03-17T19:03:54Z<p>Schwudke: /* MFQL used in [3] for identification of Lipids in human blood plasma */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
Mass spectrometric analysis was performed on a hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a robotic nanoflow ion source TriVersa (Advion BioSciences Ltd, Ithaca NY) using chips with 4.1 µm nozzle diameter. The ion source was controlled by Chipsoft 6.4. software (Advion BioSciences) and operated at the ionization voltage of 0.95 kV and gas pressure 1.25 psi. Plates with lipid extracts were chilled down to 12°C.<br />
MS survey scans were acquired in positive ion mode using the Orbitrap analyzer operated under the target mass resolution of 100,000 (Full Width at Half Maximum, FWHM), defined at m/z 400 under automatic gain control set to 1.0×106 as the target value.<br />
Spectra acquired within 28 s to 120 s from the start of sample infusion (timing required to stabilize the analyte flow and electrospray, as was judged by total ion current (TIC) trace) were averaged and recalibrated using m/z of synthetic standards SM 35:1 and PC –O 20:0/-O 20:0 as references. Recalibrated spectra were further aligned such that related peaks were matched within the full dataset. Only peaks detected at the signal-to-noise ratio above the factor of 5 and recognized in more than 20% of all spectra were further considered. Identification of lipid species relied on accurately determined masses considering a mass accuracy of better than 4 ppm and a retrieval rate of 90% for all plasma samples.<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=578MFQL library2011-03-17T17:35:32Z<p>Schwudke: /* MFQL used in [1] for lipid identification in E.coli */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments were performed using Pulsed Q Collision Induced Dissociation(PQD).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=577MFQL library2011-03-17T17:31:59Z<p>Schwudke: /* MFQL used in [1] for lipid identification in E.coli */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap). MS/MS experiments in the IT were performed using Pulsed Q Collision Induced Dissociation(PQD)<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=576MFQL library2011-03-17T17:27:56Z<p>Schwudke: /* MFQL used in [1] for lipid identification in E.coli */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.txt]] ('''IMPORTANT:''' for usage in LipidXplorer please rename file to lpdX_benchmark.ini)<br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=575MFQL library2011-03-17T17:26:47Z<p>Schwudke: /* MFQL used in [1] for lipid identification in E.coli */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.txt]] <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=File:LpdX_benchmark.txt&diff=574File:LpdX benchmark.txt2011-03-17T17:25:51Z<p>Schwudke: </p>
<hr />
<div></div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=573MFQL library2011-03-17T17:22:37Z<p>Schwudke: /* MFQL used in [1] for lipid identification in E.coli */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset.[[File:lpdX_benchmark.ini]] <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=572MFQL library2011-03-17T17:00:33Z<p>Schwudke: /* MFQL used in [1] for lipid identification in bovine heart */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
Total lipid extract of bovine heart (Avanti Polar Lipids) was analyzed in six technical replicates on a LTQ-Orbitrap XL mass spectrometer using a target resolution of 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS (IT) in negative ion mode. Six replicates were acquired, each consisting of 31 MS and 310 MS/MS spectra. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=MFQL_library&diff=571MFQL library2011-03-17T16:59:16Z<p>Schwudke: /* MFQL used in [1] for lipid identification in E.coli */</p>
<hr />
<div>===MFQL used in '''[1]''' for lipid identification in E.coli===<br />
<br />
E. coli total lipid extract was purchased from Avanti Polar Lipids (Alabaster, AL, USA) and analyzed on the LTQ Orbitrap XL instrument in negative ion mode. A solution of the total lipid concentration of 2.5 μg/ml in 7.5 mM ammonium acetate in choloroform/methanol/2-propanol (1/2/4, v/v/v) was infused into the mass spectrometer by TriVersa robotic ion source using a chip with the diameter of spraying nozzles of 4.1 μm. To produce the spectra dataset, the extract was analyzed in several independent experiments: experiment I, eight acquisitions under the unit mass resolution (R) settings using ion trap (IT) to acquire both MS and MS/MS spectra; experiment II, six acquisitions with R = 7,500 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment III, four acquisitions with R = 30,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment IV, four acquisitions with R = 100,000 for MS spectra (Orbitrap) and unit resolution for MS/MS spectra (IT); experiment V, seven acquisitions with R = 100,000 for MS spectra (Orbitrap) and R = 15,000 for MS/MS spectra (Orbitrap).<br />
<br />
The below listed MFQL files can be applied for all mass spectrometric settings. The mass spectrometric settings are saved in the *ini file utilized for the import of a dataset. <br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:PE_negative_FAS.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:LPE_negative_FAS.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:PG_negative_FAS.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:PI_negative_FAS.mfql]]<br />
* Phosphatidylserine in negative mode: [[File:PS_negative_FAS.mfql]]<br />
* Phosphatic acid in negative mode: [[File:PA_negative_FAS.mfql]]<br />
<br />
===MFQL used in '''[1]''' for lipid identification in bovine heart===<br />
<br />
* Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_PE.mfql]]<br />
* Lyso-Phosphatidylethanolamine in negative mode: [[File:neg_bovine_heart_LPE.mfql]]<br />
* Phosphatidylethanolamine ether in negative mode: [[File:neg_bovine_heart_PEO.mfql]]<br />
* Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_PC.mfql]]<br />
* Lyso-Phosphatidylcholine in negative mode: [[File:neg_bovine_heart_LPC.mfql]]<br />
* Phosphatidylcholine ether in negative mode: [[File:neg_bovine_heart_PCO.mfql]]<br />
* Phosphatic acid in negative mode: [[File:neg_bovine_heart_PA.mfql]]<br />
* Lyso-Phosphatic acid in negative mode: [[File:neg_bovine_heart_LPA.mfql]]<br />
* Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_PG.mfql]]<br />
* Lyso-Phosphatidylglycerol in negative mode: [[File:neg_bovine_heart_LPG.mfql]]<br />
* Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_PI.mfql]]<br />
* Lyso-Phosphatidylinositol in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Sphingomylien in negative mode: [[File:neg_bovine_heart_LPI.mfql]]<br />
* Ceramide in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Diacylglycerol in negative mode: [[File:neg_bovine_heart_DAG.mfql]]<br />
* Triacylglycerol in negative mode: [[File:neg_bovine_heart_TAG.mfql]]<br />
* Cardiolipin in negative mode: [[File:neg_bovine_heart_CL.mfql]]<br />
<br />
===MFQL used in '''[2]''' for identification of Maradolipids in Dauer Larva in Caenorhabditis elegans===<br />
<br />
* Maradolipids in negative ion mode: [[File:Maradolipid.mfql]]<br />
<br />
===MFQL used in '''[3]''' for identification of Lipids in human blood plasma===<br />
<br />
In this zip file: [[File:Mfql_screens_MS-only_positive.zip|Positive MS only screens]] the following queries can be found which were used in '''[3]''':<br />
* Ceramide<br />
* Cholesterylester<br />
* Diacylglycerols<br />
* Glucosylceramide<br />
* GPL-diether<br />
* Lysophosphatidylcholine (LPC)<br />
* Lysophosphatidylethanolamine (LPE)<br />
* Phosphatidylcholine (PC)<br />
* Phosphatidylcholine ether (PC-O)<br />
* Phosphatidylethanolamine (PE)<br />
* Phosphatidylethanolamine ether (PE-O)<br />
* Phosphatidylinositol (PI)<br />
* Phosphatidylserine (PS)<br />
* Sphingomyline (SM)<br />
* Triacylglycerols (TAG)<br />
<br />
-----<br />
<br />
===References===<br />
'''[1]''' Herzog, R., Schwudke, D., Schuhmann, K., Sampaio, J.L., Bornstein, S.R., Schroeder, M., and Shevchenko, A. 2011. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 12(1): R8.<br />
<br />
'''[2]'''Penkov S, Mende F, Zagoriy V, Erkut C, Martin R, Pässler U, Schuhmann K, Schwudke D, Gruner M, Mäntler J, Reichert-Müller T, Shevchenko A, Knölker HJ, Kurzchalia TV. 2010. Maradolipids: diacyltrehalose glycolipids specific to dauer larva in Caenorhabditis elegans. Angew Chem Int Ed Engl. 49(49):9430-5.<br />
<br />
'''[3]''' Graessler J, Schwudke D, Schwarz PE, Herzog R, Shevchenko A, Bornstein SR: Top-down<br />
lipidomics reveals ether lipid deficiency in blood plasma of hypertensive patients. PLoS One<br />
2009, 4:e6261.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Documentation&diff=570LipidXplorer Documentation2011-03-04T15:24:52Z<p>Schwudke: </p>
<hr />
<div>=== [[LipidXplorer Preface|Introduction to LipidXplorer]] ===<br />
<br />
=== [[LipidXplorer Principles|Principles of lipid identification]] ===<br />
<br />
=== [[LipidXplorer MFQL|The Molecular Fragmentation Query Language (MFQL)]] ===<br />
<br />
=== [[LipidXplorer Installation|Installation]] ===<br />
<br />
=== [[LipidXplorer Tutorial|Tutorial]] ===<br />
<br />
=== [[LipidXplorer Reference|Manual]] ===</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=569LipidXplorer Reference2011-03-04T15:20:35Z<p>Schwudke: /* Machine specific settings */</p>
<hr />
<div>== LipidXplorer Import ==<br />
<br />
=== Supported file formats ===<br />
<br />
==== *.mzXML ====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. [Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) (Lin SM et al., Expert review of proteomics 2 (6): 839, 45, [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) Not all mass spectrometers directly produce mzXML files but there are several tools available that generate mzXML files from native acquired files. An open source project known as Sashimi ([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some common mass spectrometric file formats. Currently there are converters available: <br />
<br />
*for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
*for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
*for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff].<br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidXplorer.''<br />
<br />
==== Import of peak lists of MS/MS in *.dta format and MS in *.csv ====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met: The import files have to be given in a certain directory structure, which is: <br />
<pre> MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre> <br />
The top level directory defines which samples go into the MasterScan database object. This are namely all samples occurring as subdirectories. <br> A sample directory can contain<br> <br />
<br />
&nbsp;1. a .csv file with the MS data<br> <br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br> MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br> &nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data <br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the data is obtained in positive or in negative mode. This is done as follows: <br />
<br />
*if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
*if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories. <br />
<br />
===== Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format: <br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file: <br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre> <br />
===== Import of MS/MS spectra using *.dta =====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z and its charge and the tables content are masses with the according intensity. <br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1: <br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre> <br />
=== Importing mass spectra into LipidXplorer ===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data. <br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan (see [[#The_MastersScan_database]]) should also be in one folder. The folder is the information which is given to LipidXplorer to import the spectra. IMPORTANT: *.mzXML files have to be centroidized to achieve correct import with LipidXplorer. <br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in [[#Import_.2A.dta_.2F_.2A.csv_files]]. Also here, the folder where all the peak lists are contained is the input for the LipidXplorer import. <br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the target MasterScan file automatically. To change this press 'Browse' next to the file. <br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit the settings and store them in the configuration file. <br />
<br />
The import starts with pressing 'Start import'. <br />
<br />
The tab contains various possibilities of specifying mass spectrometric attributes. The configurations are stored in an *.ini file. There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, the user can select an own file. <br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra within an allowed error given in '''tolerance'''. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. LipidXplorer corrects the threshold value by dividing it with the square root of the number of scans in the given mass spectra time segment. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment.<br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This value shifts m/z values of the precursors from the fragment spectra. The direction of the shift is given by a positive or negative prefix. This offset does '''not''' shift the survey scan m/z values. It shifts the precursor masses before the fragment spectra are associated to their survey scan mass. If you import only *.dta files without a *.csv file, for example, it will mess your data. In this case use '''MS1 offset'''. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data was imported. Therefore MFQL queries are written in so-called *.mfql files (with the ending *.mfql) where each file should contain just one query. The GUI panel '''Run''' is the site where *.mfql files are loaded and run on the MasterScan file.<br />
<br />
===The Run panel===<br />
<br />
The big window on the left contains all *.mfql scripts which are used for the lipid identification. This window is managed by the the buttons on its right side: <br />
<br />
*'''Add MFQL File''' will add one file <br />
*'''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded. <br />
*'''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it. <br />
*'''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
*'''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has to be chosen. This is done by clicking on the green '''Browse''' button or by dragging the MasterScan file or the folder in which it is onto the text field. The output file is automatically filled, but can be changed by clicking on the grey '''Browse''' button. <br />
<br />
Under '''Optional settings for this run''' you can change the tolerance settings for the particular run. This option will override the tolerance settings you gave in the Import panel. There, the settings are stored in the MasterScan and used by default, whenever you run this MasterScan. But maybe you want to try another setting, or another and so on, then you can set this here. But the values hold only for this particular run and will not permanently override the settings in the MasterScan.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower site of the panel. There is also the option for generation of a complement MasterScan. This is a spectral database containing all entries from the original chosen MasterScan but the identified entries together with their isotopes. <br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. '''No head''' removes the head of the output file and '''Compress''' removes the names of the queries in the output file. This can be helpful if you want to do some automatic post-processing. The option '''Tab limited''' changes the output format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental database to a comma separated file. The MasterScan has its own data format and cannot be viewed by any software. If you want to have a look into it, you need to dump its content into a readable file format. '''Dump MasterScan''' will do this in parallel with the Run of your queries, i.e. if you check '''Dump MasterScan''' and press '''Run LipidXplorer''' the MasterScan will be dumped into a text file (*.csv file format) which you can read out easily with Excel, for example. But MFQL queries are not necessary, to dump the MasterScan.<br />
<br />
With '''Run LipidXplorer''' the lipid identification is started. The result is saved in the output file. With the '''View''' button this file can be viewed on the spot. With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
=== The Editor panel ===<br />
<br />
With the editor it is easy to write queries for LipidXplorer. Every query is opened in a separate tab. If a file is edited the '''Save''' button changes the color to red to remind the user to save to file before using the query. '''SaveAs''' will store the query under a certain file name and '''Close''' will close the tab. <br />
<br />
=== The MS-Tools panel ===<br />
<br />
The MS Tools tab contains a small collection of useful functions: <br />
<br />
==== Mass vs. Sum Composition ====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way round. <br />
<br />
===== Mass-to-sum-composition =====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under '''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB''' the higher border of the double bond equivalent. In '''chg''' the charge has to be given and in '''acc''' the tolerance value in ppm. Then press '''Mass-to-sum-composition''' and the result will be shown in the text window below.<br />
<br />
Here an example:<br />
<br />
[[Image:MS-tools-example.PNG|center]]<br />
<br />
===== Sum-composition-to-mass =====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in '''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure in the window below. <br />
<br />
==== Isotopes of molecules ====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values are the ones used in LipidXplorer for isotopic correction. Here the user can double check if everything is working properly. <br />
<br />
===== Isotopic distribution of MS masses =====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic ''' distribution'''. The list of isotopes is not 100% correct with the masses. This''' is an estimation used in LipidXplorer. But the abundance values are 100% accurate. <br />
<br />
===== Isotopic distribution of MS/MS masses =====<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|center|600px|LipidXplorer Intrascan Isotopic Correction]] <br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and fragment masses. The isotopes for the fragments are calculated by multiplying the probabilities of fragments ('''F''') having no, one or more than one isotopes with the probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment or the neutral loss. F1N0 is the probability of the fragment having one isotope, where the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|right|500px|LipidXplorer Intrascan Isotopic Correction in MS-Tools]] <br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed. If you put a fragment sum composition in '''Fragment sum composition''' the corresponding values are shown in the window below (after pressing '''Get Isotopic distribution''') The mass can either be a real fragment or a neutral loss. This is denoted with the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=568LipidXplorer Reference2011-03-04T15:20:04Z<p>Schwudke: /* Machine specific settings */</p>
<hr />
<div>== LipidXplorer Import ==<br />
<br />
=== Supported file formats ===<br />
<br />
==== *.mzXML ====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. [Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) (Lin SM et al., Expert review of proteomics 2 (6): 839, 45, [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) Not all mass spectrometers directly produce mzXML files but there are several tools available that generate mzXML files from native acquired files. An open source project known as Sashimi ([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some common mass spectrometric file formats. Currently there are converters available: <br />
<br />
*for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
*for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
*for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff].<br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidXplorer.''<br />
<br />
==== Import of peak lists of MS/MS in *.dta format and MS in *.csv ====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met: The import files have to be given in a certain directory structure, which is: <br />
<pre> MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre> <br />
The top level directory defines which samples go into the MasterScan database object. This are namely all samples occurring as subdirectories. <br> A sample directory can contain<br> <br />
<br />
&nbsp;1. a .csv file with the MS data<br> <br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br> MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br> &nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data <br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the data is obtained in positive or in negative mode. This is done as follows: <br />
<br />
*if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
*if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories. <br />
<br />
===== Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format: <br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file: <br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre> <br />
===== Import of MS/MS spectra using *.dta =====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z and its charge and the tables content are masses with the according intensity. <br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1: <br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre> <br />
=== Importing mass spectra into LipidXplorer ===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data. <br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan (see [[#The_MastersScan_database]]) should also be in one folder. The folder is the information which is given to LipidXplorer to import the spectra. IMPORTANT: *.mzXML files have to be centroidized to achieve correct import with LipidXplorer. <br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in [[#Import_.2A.dta_.2F_.2A.csv_files]]. Also here, the folder where all the peak lists are contained is the input for the LipidXplorer import. <br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the target MasterScan file automatically. To change this press 'Browse' next to the file. <br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit the settings and store them in the configuration file. <br />
<br />
The import starts with pressing 'Start import'. <br />
<br />
The tab contains various possibilities of specifying mass spectrometric attributes. The configurations are stored in an *.ini file. There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, the user can select an own file. <br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra within an allowed error given in '''tolerance'''. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. LipidXplorer corrects the threshold value by dividing it with the square root of the number of scans in the given mass spectra time segment. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. For details see [[Resolution_Gradient]]<br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This value shifts m/z values of the precursors from the fragment spectra. The direction of the shift is given by a positive or negative prefix. This offset does '''not''' shift the survey scan m/z values. It shifts the precursor masses before the fragment spectra are associated to their survey scan mass. If you import only *.dta files without a *.csv file, for example, it will mess your data. In this case use '''MS1 offset'''. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data was imported. Therefore MFQL queries are written in so-called *.mfql files (with the ending *.mfql) where each file should contain just one query. The GUI panel '''Run''' is the site where *.mfql files are loaded and run on the MasterScan file.<br />
<br />
===The Run panel===<br />
<br />
The big window on the left contains all *.mfql scripts which are used for the lipid identification. This window is managed by the the buttons on its right side: <br />
<br />
*'''Add MFQL File''' will add one file <br />
*'''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded. <br />
*'''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it. <br />
*'''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
*'''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has to be chosen. This is done by clicking on the green '''Browse''' button or by dragging the MasterScan file or the folder in which it is onto the text field. The output file is automatically filled, but can be changed by clicking on the grey '''Browse''' button. <br />
<br />
Under '''Optional settings for this run''' you can change the tolerance settings for the particular run. This option will override the tolerance settings you gave in the Import panel. There, the settings are stored in the MasterScan and used by default, whenever you run this MasterScan. But maybe you want to try another setting, or another and so on, then you can set this here. But the values hold only for this particular run and will not permanently override the settings in the MasterScan.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower site of the panel. There is also the option for generation of a complement MasterScan. This is a spectral database containing all entries from the original chosen MasterScan but the identified entries together with their isotopes. <br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. '''No head''' removes the head of the output file and '''Compress''' removes the names of the queries in the output file. This can be helpful if you want to do some automatic post-processing. The option '''Tab limited''' changes the output format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental database to a comma separated file. The MasterScan has its own data format and cannot be viewed by any software. If you want to have a look into it, you need to dump its content into a readable file format. '''Dump MasterScan''' will do this in parallel with the Run of your queries, i.e. if you check '''Dump MasterScan''' and press '''Run LipidXplorer''' the MasterScan will be dumped into a text file (*.csv file format) which you can read out easily with Excel, for example. But MFQL queries are not necessary, to dump the MasterScan.<br />
<br />
With '''Run LipidXplorer''' the lipid identification is started. The result is saved in the output file. With the '''View''' button this file can be viewed on the spot. With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
=== The Editor panel ===<br />
<br />
With the editor it is easy to write queries for LipidXplorer. Every query is opened in a separate tab. If a file is edited the '''Save''' button changes the color to red to remind the user to save to file before using the query. '''SaveAs''' will store the query under a certain file name and '''Close''' will close the tab. <br />
<br />
=== The MS-Tools panel ===<br />
<br />
The MS Tools tab contains a small collection of useful functions: <br />
<br />
==== Mass vs. Sum Composition ====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way round. <br />
<br />
===== Mass-to-sum-composition =====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under '''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB''' the higher border of the double bond equivalent. In '''chg''' the charge has to be given and in '''acc''' the tolerance value in ppm. Then press '''Mass-to-sum-composition''' and the result will be shown in the text window below.<br />
<br />
Here an example:<br />
<br />
[[Image:MS-tools-example.PNG|center]]<br />
<br />
===== Sum-composition-to-mass =====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in '''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure in the window below. <br />
<br />
==== Isotopes of molecules ====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values are the ones used in LipidXplorer for isotopic correction. Here the user can double check if everything is working properly. <br />
<br />
===== Isotopic distribution of MS masses =====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic ''' distribution'''. The list of isotopes is not 100% correct with the masses. This''' is an estimation used in LipidXplorer. But the abundance values are 100% accurate. <br />
<br />
===== Isotopic distribution of MS/MS masses =====<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|center|600px|LipidXplorer Intrascan Isotopic Correction]] <br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and fragment masses. The isotopes for the fragments are calculated by multiplying the probabilities of fragments ('''F''') having no, one or more than one isotopes with the probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment or the neutral loss. F1N0 is the probability of the fragment having one isotope, where the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|right|500px|LipidXplorer Intrascan Isotopic Correction in MS-Tools]] <br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed. If you put a fragment sum composition in '''Fragment sum composition''' the corresponding values are shown in the window below (after pressing '''Get Isotopic distribution''') The mass can either be a real fragment or a neutral loss. This is denoted with the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=567LipidXplorer Reference2011-03-04T15:19:40Z<p>Schwudke: /* Machine specific settings */</p>
<hr />
<div>== LipidXplorer Import ==<br />
<br />
=== Supported file formats ===<br />
<br />
==== *.mzXML ====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. [Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) (Lin SM et al., Expert review of proteomics 2 (6): 839, 45, [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) Not all mass spectrometers directly produce mzXML files but there are several tools available that generate mzXML files from native acquired files. An open source project known as Sashimi ([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some common mass spectrometric file formats. Currently there are converters available: <br />
<br />
*for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
*for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
*for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff].<br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidXplorer.''<br />
<br />
==== Import of peak lists of MS/MS in *.dta format and MS in *.csv ====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met: The import files have to be given in a certain directory structure, which is: <br />
<pre> MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre> <br />
The top level directory defines which samples go into the MasterScan database object. This are namely all samples occurring as subdirectories. <br> A sample directory can contain<br> <br />
<br />
&nbsp;1. a .csv file with the MS data<br> <br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br> MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br> &nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data <br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the data is obtained in positive or in negative mode. This is done as follows: <br />
<br />
*if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
*if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories. <br />
<br />
===== Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format: <br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file: <br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre> <br />
===== Import of MS/MS spectra using *.dta =====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z and its charge and the tables content are masses with the according intensity. <br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1: <br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre> <br />
=== Importing mass spectra into LipidXplorer ===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data. <br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan (see [[#The_MastersScan_database]]) should also be in one folder. The folder is the information which is given to LipidXplorer to import the spectra. IMPORTANT: *.mzXML files have to be centroidized to achieve correct import with LipidXplorer. <br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in [[#Import_.2A.dta_.2F_.2A.csv_files]]. Also here, the folder where all the peak lists are contained is the input for the LipidXplorer import. <br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the target MasterScan file automatically. To change this press 'Browse' next to the file. <br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit the settings and store them in the configuration file. <br />
<br />
The import starts with pressing 'Start import'. <br />
<br />
The tab contains various possibilities of specifying mass spectrometric attributes. The configurations are stored in an *.ini file. There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, the user can select an own file. <br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra within an allowed error given in '''tolerance'''. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. LipidXplorer corrects the threshold value by dividing it with the square root of the number of scans in the given mass spectra time segment. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. For details see [[Resolution_Gradiant]]<br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This value shifts m/z values of the precursors from the fragment spectra. The direction of the shift is given by a positive or negative prefix. This offset does '''not''' shift the survey scan m/z values. It shifts the precursor masses before the fragment spectra are associated to their survey scan mass. If you import only *.dta files without a *.csv file, for example, it will mess your data. In this case use '''MS1 offset'''. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data was imported. Therefore MFQL queries are written in so-called *.mfql files (with the ending *.mfql) where each file should contain just one query. The GUI panel '''Run''' is the site where *.mfql files are loaded and run on the MasterScan file.<br />
<br />
===The Run panel===<br />
<br />
The big window on the left contains all *.mfql scripts which are used for the lipid identification. This window is managed by the the buttons on its right side: <br />
<br />
*'''Add MFQL File''' will add one file <br />
*'''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded. <br />
*'''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it. <br />
*'''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
*'''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has to be chosen. This is done by clicking on the green '''Browse''' button or by dragging the MasterScan file or the folder in which it is onto the text field. The output file is automatically filled, but can be changed by clicking on the grey '''Browse''' button. <br />
<br />
Under '''Optional settings for this run''' you can change the tolerance settings for the particular run. This option will override the tolerance settings you gave in the Import panel. There, the settings are stored in the MasterScan and used by default, whenever you run this MasterScan. But maybe you want to try another setting, or another and so on, then you can set this here. But the values hold only for this particular run and will not permanently override the settings in the MasterScan.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower site of the panel. There is also the option for generation of a complement MasterScan. This is a spectral database containing all entries from the original chosen MasterScan but the identified entries together with their isotopes. <br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. '''No head''' removes the head of the output file and '''Compress''' removes the names of the queries in the output file. This can be helpful if you want to do some automatic post-processing. The option '''Tab limited''' changes the output format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental database to a comma separated file. The MasterScan has its own data format and cannot be viewed by any software. If you want to have a look into it, you need to dump its content into a readable file format. '''Dump MasterScan''' will do this in parallel with the Run of your queries, i.e. if you check '''Dump MasterScan''' and press '''Run LipidXplorer''' the MasterScan will be dumped into a text file (*.csv file format) which you can read out easily with Excel, for example. But MFQL queries are not necessary, to dump the MasterScan.<br />
<br />
With '''Run LipidXplorer''' the lipid identification is started. The result is saved in the output file. With the '''View''' button this file can be viewed on the spot. With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
=== The Editor panel ===<br />
<br />
With the editor it is easy to write queries for LipidXplorer. Every query is opened in a separate tab. If a file is edited the '''Save''' button changes the color to red to remind the user to save to file before using the query. '''SaveAs''' will store the query under a certain file name and '''Close''' will close the tab. <br />
<br />
=== The MS-Tools panel ===<br />
<br />
The MS Tools tab contains a small collection of useful functions: <br />
<br />
==== Mass vs. Sum Composition ====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way round. <br />
<br />
===== Mass-to-sum-composition =====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under '''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB''' the higher border of the double bond equivalent. In '''chg''' the charge has to be given and in '''acc''' the tolerance value in ppm. Then press '''Mass-to-sum-composition''' and the result will be shown in the text window below.<br />
<br />
Here an example:<br />
<br />
[[Image:MS-tools-example.PNG|center]]<br />
<br />
===== Sum-composition-to-mass =====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in '''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure in the window below. <br />
<br />
==== Isotopes of molecules ====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values are the ones used in LipidXplorer for isotopic correction. Here the user can double check if everything is working properly. <br />
<br />
===== Isotopic distribution of MS masses =====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic ''' distribution'''. The list of isotopes is not 100% correct with the masses. This''' is an estimation used in LipidXplorer. But the abundance values are 100% accurate. <br />
<br />
===== Isotopic distribution of MS/MS masses =====<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|center|600px|LipidXplorer Intrascan Isotopic Correction]] <br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and fragment masses. The isotopes for the fragments are calculated by multiplying the probabilities of fragments ('''F''') having no, one or more than one isotopes with the probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment or the neutral loss. F1N0 is the probability of the fragment having one isotope, where the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|right|500px|LipidXplorer Intrascan Isotopic Correction in MS-Tools]] <br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed. If you put a fragment sum composition in '''Fragment sum composition''' the corresponding values are shown in the window below (after pressing '''Get Isotopic distribution''') The mass can either be a real fragment or a neutral loss. This is denoted with the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=566LipidXplorer Reference2011-03-04T15:12:02Z<p>Schwudke: /* *.mzXML */</p>
<hr />
<div>== LipidXplorer Import ==<br />
<br />
=== Supported file formats ===<br />
<br />
==== *.mzXML ====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. [Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) (Lin SM et al., Expert review of proteomics 2 (6): 839, 45, [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) Not all mass spectrometers directly produce mzXML files but there are several tools available that generate mzXML files from native acquired files. An open source project known as Sashimi ([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some common mass spectrometric file formats. Currently there are converters available: <br />
<br />
*for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
*for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
*for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff].<br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidXplorer.''<br />
<br />
==== Import of peak lists of MS/MS in *.dta format and MS in *.csv ====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met: The import files have to be given in a certain directory structure, which is: <br />
<pre> MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre> <br />
The top level directory defines which samples go into the MasterScan database object. This are namely all samples occurring as subdirectories. <br> A sample directory can contain<br> <br />
<br />
&nbsp;1. a .csv file with the MS data<br> <br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br> MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br> &nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data <br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the data is obtained in positive or in negative mode. This is done as follows: <br />
<br />
*if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
*if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories. <br />
<br />
===== Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format: <br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file: <br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre> <br />
===== Import of MS/MS spectra using *.dta =====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z and its charge and the tables content are masses with the according intensity. <br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1: <br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre> <br />
=== Importing mass spectra into LipidXplorer ===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data. <br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan (see [[#The_MastersScan_database]]) should also be in one folder. The folder is the information which is given to LipidXplorer to import the spectra. IMPORTANT: *.mzXML files have to be centroidized to achieve correct import with LipidXplorer. <br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in [[#Import_.2A.dta_.2F_.2A.csv_files]]. Also here, the folder where all the peak lists are contained is the input for the LipidXplorer import. <br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the target MasterScan file automatically. To change this press 'Browse' next to the file. <br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit the settings and store them in the configuration file. <br />
<br />
The import starts with pressing 'Start import'. <br />
<br />
The tab contains various possibilities of specifying mass spectrometric attributes. The configurations are stored in an *.ini file. There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, the user can select an own file. <br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra within an allowed error given in '''tolerance'''. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. LipidXplorer corrects the threshold value by dividing it with the square root of the number of scans in the given mass spectra time segment. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This value shifts m/z values of the precursors from the fragment spectra. The direction of the shift is given by a positive or negative prefix. This offset does '''not''' shift the survey scan m/z values. It shifts the precursor masses before the fragment spectra are associated to their survey scan mass. If you import only *.dta files without a *.csv file, for example, it will mess your data. In this case use '''MS1 offset'''. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.''' <br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data was imported. Therefore MFQL queries are written in so-called *.mfql files (with the ending *.mfql) where each file should contain just one query. The GUI panel '''Run''' is the site where *.mfql files are loaded and run on the MasterScan file.<br />
<br />
===The Run panel===<br />
<br />
The big window on the left contains all *.mfql scripts which are used for the lipid identification. This window is managed by the the buttons on its right side: <br />
<br />
*'''Add MFQL File''' will add one file <br />
*'''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded. <br />
*'''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it. <br />
*'''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
*'''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has to be chosen. This is done by clicking on the green '''Browse''' button or by dragging the MasterScan file or the folder in which it is onto the text field. The output file is automatically filled, but can be changed by clicking on the grey '''Browse''' button. <br />
<br />
Under '''Optional settings for this run''' you can change the tolerance settings for the particular run. This option will override the tolerance settings you gave in the Import panel. There, the settings are stored in the MasterScan and used by default, whenever you run this MasterScan. But maybe you want to try another setting, or another and so on, then you can set this here. But the values hold only for this particular run and will not permanently override the settings in the MasterScan.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower site of the panel. There is also the option for generation of a complement MasterScan. This is a spectral database containing all entries from the original chosen MasterScan but the identified entries together with their isotopes. <br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. '''No head''' removes the head of the output file and '''Compress''' removes the names of the queries in the output file. This can be helpful if you want to do some automatic post-processing. The option '''Tab limited''' changes the output format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental database to a comma separated file. The MasterScan has its own data format and cannot be viewed by any software. If you want to have a look into it, you need to dump its content into a readable file format. '''Dump MasterScan''' will do this in parallel with the Run of your queries, i.e. if you check '''Dump MasterScan''' and press '''Run LipidXplorer''' the MasterScan will be dumped into a text file (*.csv file format) which you can read out easily with Excel, for example. But MFQL queries are not necessary, to dump the MasterScan.<br />
<br />
With '''Run LipidXplorer''' the lipid identification is started. The result is saved in the output file. With the '''View''' button this file can be viewed on the spot. With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
=== The Editor panel ===<br />
<br />
With the editor it is easy to write queries for LipidXplorer. Every query is opened in a separate tab. If a file is edited the '''Save''' button changes the color to red to remind the user to save to file before using the query. '''SaveAs''' will store the query under a certain file name and '''Close''' will close the tab. <br />
<br />
=== The MS-Tools panel ===<br />
<br />
The MS Tools tab contains a small collection of useful functions: <br />
<br />
==== Mass vs. Sum Composition ====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way round. <br />
<br />
===== Mass-to-sum-composition =====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under '''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB''' the higher border of the double bond equivalent. In '''chg''' the charge has to be given and in '''acc''' the tolerance value in ppm. Then press '''Mass-to-sum-composition''' and the result will be shown in the text window below.<br />
<br />
Here an example:<br />
<br />
[[Image:MS-tools-example.PNG|center]]<br />
<br />
===== Sum-composition-to-mass =====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in '''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure in the window below. <br />
<br />
==== Isotopes of molecules ====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values are the ones used in LipidXplorer for isotopic correction. Here the user can double check if everything is working properly. <br />
<br />
===== Isotopic distribution of MS masses =====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic ''' distribution'''. The list of isotopes is not 100% correct with the masses. This''' is an estimation used in LipidXplorer. But the abundance values are 100% accurate. <br />
<br />
===== Isotopic distribution of MS/MS masses =====<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|center|600px|LipidXplorer Intrascan Isotopic Correction]] <br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and fragment masses. The isotopes for the fragments are calculated by multiplying the probabilities of fragments ('''F''') having no, one or more than one isotopes with the probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment or the neutral loss. F1N0 is the probability of the fragment having one isotope, where the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|right|500px|LipidXplorer Intrascan Isotopic Correction in MS-Tools]] <br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed. If you put a fragment sum composition in '''Fragment sum composition''' the corresponding values are shown in the window below (after pressing '''Get Isotopic distribution''') The mass can either be a real fragment or a neutral loss. This is denoted with the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=474LipidXplorer Reference2011-01-21T14:15:11Z<p>Schwudke: /* Importing mass spectra into LipidX */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre><br />
<br />
=====Import of MS/MS spectra using *.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
IMPORTANT: *.mzXML files have to be centroidized to achieve correct import with LipidXplorer.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=473LipidXplorer Reference2011-01-21T14:10:15Z<p>Schwudke: /* *.dta */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre><br />
<br />
=====Import of MS/MS spectra using *.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=472LipidXplorer Reference2011-01-21T14:09:25Z<p>Schwudke: /* Import of MS1 information using *.csv file format */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre><br />
<br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=471LipidXplorer Reference2011-01-21T14:06:36Z<p>Schwudke: /* *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====Import of MS1 information using *.csv file format =====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.3<br />
</pre><br />
<br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=470LipidXplorer Reference2011-01-21T14:05:27Z<p>Schwudke: /* *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.<br />
</pre><br />
<br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=469LipidXplorer Reference2011-01-21T14:03:44Z<p>Schwudke: /* *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3<br />
701.5598,4284.7<br />
702.4135,6333<br />
702.5435,23323.7<br />
703.547,7105.8<br />
703.5752,218373.4<br />
704.5786,81777.7<br />
705.5009,253758<br />
705.528,18535.5<br />
705.5822,8314.5<br />
705.5908,35523.1<br />
706.5044,107847.<br />
</pre><br />
<br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=468LipidXplorer Reference2011-01-21T14:01:50Z<p>Schwudke: /* Import of peak lists of MS/MS in *.dta format and MS in / *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. <br><br />
A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. one .csv file with the MS data and a number of .dta files containing MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/[, /relative intensity/]?<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3,0.85<br />
701.5598,4284.7,0.17<br />
702.4135,6333,0.26<br />
702.5435,23323.7,0.95<br />
703.547,7105.8,0.29<br />
703.5752,218373.4,8.87<br />
704.5786,81777.7,3.32<br />
705.5009,253758,10.3<br />
705.528,18535.5,0.75<br />
705.5822,8314.5,0.34<br />
705.5908,35523.1,1.44<br />
706.5044,107847.3,4.38<br />
</pre> <br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=467LipidXplorer Reference2011-01-21T14:00:08Z<p>Schwudke: /* Import of peak lists of MS/MS in *.dta format and MS in / *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample <br><br />
&nbsp; 3. .csv file with the MS data and dta files with the MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/[, /relative intensity/]?<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3,0.85<br />
701.5598,4284.7,0.17<br />
702.4135,6333,0.26<br />
702.5435,23323.7,0.95<br />
703.547,7105.8,0.29<br />
703.5752,218373.4,8.87<br />
704.5786,81777.7,3.32<br />
705.5009,253758,10.3<br />
705.528,18535.5,0.75<br />
705.5822,8314.5,0.34<br />
705.5908,35523.1,1.44<br />
706.5044,107847.3,4.38<br />
</pre> <br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=466LipidXplorer Reference2011-01-21T13:59:13Z<p>Schwudke: /* Import of peak lists of MS/MS in *.dta format and MS in / *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
(MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 when no *.dta with this precursor m/z was found in a sample)<br><br />
&nbsp; 3. .csv file with the MS data and dta files with the MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/[, /relative intensity/]?<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3,0.85<br />
701.5598,4284.7,0.17<br />
702.4135,6333,0.26<br />
702.5435,23323.7,0.95<br />
703.547,7105.8,0.29<br />
703.5752,218373.4,8.87<br />
704.5786,81777.7,3.32<br />
705.5009,253758,10.3<br />
705.528,18535.5,0.75<br />
705.5822,8314.5,0.34<br />
705.5908,35523.1,1.44<br />
706.5044,107847.3,4.38<br />
</pre> <br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=465LipidXplorer Reference2011-01-21T13:58:38Z<p>Schwudke: /* Import of peak lists of MS/MS in *.dta format and MS in / *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. A sample directory can contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data<br><br />
<br />
&nbsp;2. a .dta files with the MS/MS data<br><br />
(MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 when no *.dta with this precursor m/z was found in a sample)<br />
&nbsp; 3. .csv file with the MS data and dta files with the MS/MS data<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/[, /relative intensity/]?<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3,0.85<br />
701.5598,4284.7,0.17<br />
702.4135,6333,0.26<br />
702.5435,23323.7,0.95<br />
703.547,7105.8,0.29<br />
703.5752,218373.4,8.87<br />
704.5786,81777.7,3.32<br />
705.5009,253758,10.3<br />
705.528,18535.5,0.75<br />
705.5822,8314.5,0.34<br />
705.5908,35523.1,1.44<br />
706.5044,107847.3,4.38<br />
</pre> <br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=463LipidXplorer Reference2011-01-21T13:52:15Z<p>Schwudke: /* Import of peak lists of MS/MS in *.dta format and MS in / *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. A sample directory has to contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data and<br><br />
<br />
&nbsp;2. a number of .dta files with the MS/MS data or<br><br />
<br />
&nbsp;either 1. or 2..<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/[, /relative intensity/]?<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3,0.85<br />
701.5598,4284.7,0.17<br />
702.4135,6333,0.26<br />
702.5435,23323.7,0.95<br />
703.547,7105.8,0.29<br />
703.5752,218373.4,8.87<br />
704.5786,81777.7,3.32<br />
705.5009,253758,10.3<br />
705.528,18535.5,0.75<br />
705.5822,8314.5,0.34<br />
705.5908,35523.1,1.44<br />
706.5044,107847.3,4.38<br />
</pre> <br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_Reference&diff=460LipidXplorer Reference2011-01-21T13:41:50Z<p>Schwudke: /* *.dta / *.csv */</p>
<hr />
<div>==LipidXplorer Import==<br />
<br />
===Supported file formats=== <br />
<br />
====*.mzXML====<br />
<br />
mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. <br />
[Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 [http://dx.doi.org/10.1038/nbt1031 doi]) <br />
(Lin SM et al., Expert review of proteomics 2 (6): 839, 45, <br />
[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17342793 PMID]) <br />
Not all mass spectrometers directly produce mzXML files but there are several tools available <br />
that generate mzXML files from native acquired files. An open source project known as Sashimi <br />
([http://sashimi.sourceforge.net/ SASHMI]) offers a collection of converter programs for some <br />
common mass spectrometric file formats. Currently there are converters available:<br />
* for Thermo Scientific Xcalibur *.raw files: [http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW ReAdW], <br />
* for Waters MassLynx *.raw files[http://tools.proteomecenter.org/wiki/index.php?title=Software:massWolf MassWolf] and <br />
* for Sciex/ABI Analyst *.wiff files [http://tools.proteomecenter.org/wiki/index.php?title=Software:mzWiff mzWiff]. <br />
<br />
''LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidX.''<br />
<br />
====Import of peak lists of MS/MS in *.dta format and MS in / *.csv====<br />
<br />
For the import of *.dta and *.csv files, some pre-conditions have to met:<br />
The import files have to be given in a certain directory structure, which is:<br />
<pre><br />
MasterScan Dir/<br />
|<br />
|<br />
------------------------------------------------------ <br />
| | |<br />
| | |<br />
[neg_]Sample1/ [neg_]Sample2/ ... ... ... [neg_]SampleN/<br />
| | |<br />
/\ | /\ <br />
*.csv, /\ *.csv,<br />
[*.dta1, *.dta2, ...] *.csv, [*.dta1, *.dta2, ...]<br />
[*.dta1, *.dta2, ...] <br />
<br />
</pre><br />
<br />
The top level directory defines which samples go into the MasterScan database object. This are <br />
namely all samples occurring as subdirectories. A sample directory has to contain<br><br />
<br />
&nbsp;1. a .csv file with the MS data and<br><br />
<br />
&nbsp;2. a number of .dta files with the MS/MS data or<br><br />
<br />
&nbsp;either 1. or 2..<br />
<br />
<br> IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the <br />
data is obtained in positive or in negative mode. This is done as follows: <br />
* if a directory has 'neg' at the beginning of its name, the according sample is negative. <br />
* if a directory has 'pos' at the beginning of its name, the according sample is positive.<br />
<br />
The names of the samples occurring in LipidXplorer are the names of the sample directories.<br />
<br />
=====*.csv=====<br />
<br />
A *.csv file is a comma separated file, i.e. every line in the file contains data which is <br />
separated by commas. The *.csv files imported by LipidXplorer should contain the information of <br />
the mass spectrometers survey scan (the MS experiment data) in the following format:<br />
<pre>/precursor mass/, /intensity/[, /relative intensity/]?<br />
</pre> <br />
So the *.csv is a peak list, representing the (precursor-)mass spectrum. For example - a section of a *.csv file:<br />
<pre>701.4101,20952.3,0.85<br />
701.5598,4284.7,0.17<br />
702.4135,6333,0.26<br />
702.5435,23323.7,0.95<br />
703.547,7105.8,0.29<br />
703.5752,218373.4,8.87<br />
704.5786,81777.7,3.32<br />
705.5009,253758,10.3<br />
705.528,18535.5,0.75<br />
705.5822,8314.5,0.34<br />
705.5908,35523.1,1.44<br />
706.5044,107847.3,4.38<br />
</pre> <br />
=====*.dta=====<br />
<br />
Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in <br />
the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z <br />
and its charge and the tables content are masses with the according intensity.<br />
<br />
<pre>/mass/ /intensity/<br />
</pre> <br />
For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:<br />
<pre>585.9765 1<br />
197.32957 33132.1<br />
197.33095 12631.7<br />
568.45007 241767.3<br />
569.29065 14319.8<br />
</pre><br />
<br />
===Importing mass spectra into LipidX===<br />
<br />
LipidXplorer can import spectra acquired in '''profile mode''' and in '''centroid mode'''. Internally it only works with <br />
centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.<br />
<br />
If the spectra are given in mzXML file format, all which should be put in one MasterScan <br />
(see [[#The MastersScan database]]) should also be in one folder. The folder is the information <br />
which is given to LipidXplorer to import the spectra.<br />
<br />
If the spectra are given in *.csv/*.dta file format, follow the instructions given in <br />
[[#Import *.dta / *.csv files]]. Also here, the folder where all the peak lists are contained <br />
is the input for the LipidXplorer import.<br />
<br />
Choose the folder with your mass spectral data by pressing the green 'Browse' button or <br />
drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the <br />
target MasterScan file automatically. To change this press 'Browse' next to the file.<br />
<br />
Select a machine specific configuration from the '''Select configuration''' list, edit<br />
the settings and store them in the configuration file.<br />
<br />
The import starts with pressing 'Start import'.<br />
<br />
The tab contains various possibilities of <br />
specifying mass spectrometric attributes. The configurations are stored in an *.ini file. <br />
There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, <br />
the user can select an own file.<br />
<br />
=== Machine specific settings ===<br />
<br />
''For all settings holds that '0' switches it off.'' <br />
<br />
'''selection window:''' describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window <span class="texhtml">''w''</span> of a peak <span class="texhtml">''p''</span> is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value <span class="texhtml">''w''</span> has to be given in Dalton. <br />
<br />
'''timerange:''' defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds. <br />
<br />
'''calibration masses:''' a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values. <br />
<br />
'''massrange:''' restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification. <br />
<br />
'''resolution:''' the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows. <br />
<br />
'''tolerance:''' The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da). <br />
<br />
'''threshold:''' is the minimum intensity a peak has to have to be in the MasterScan. ''Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)!'' Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. The threshold value is corrected by dividing it with the square root of the number of scans used by the averaging. This is due to the increase of information with more scans. The central limit theorem is used to model this. <br />
<br />
'''min occupation:''' it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples. <br />
<br />
'''resolution gradient:''' is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment. <br />
<br />
'''MS1 offset:''' All MS1 m/z values will be shifted by this value. The value has to be given in Da. <br />
<br />
'''PMO:''' The Precursor Offset Correction (PMO). This is a workaround for the offset shift of precursor masses due to settings on LTQ Orbitrap machines. <br />
<br />
<br> <br />
<br />
Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass <span class="texhtml">''m''</span> measured with a given tolerance <span class="texhtml">''a''</span> fits to a peak <span class="texhtml">''p''</span> if <math>m \in [p-a,p+a]</math>. <br />
<br />
The same holds for resolution <span class="texhtml">''R''</span>: two peaks <span class="texhtml">''p''<sub>1</sub></span> and <span class="texhtml">''p''<sub>2</sub></span> are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math> <br />
<br />
==== store all settings in a configuration ====<br />
<br />
All settings can be stored under a user specified name with '''Save As ...'''. '''Save ...''' saves an already stored setting. '''Delete''' deletes a setting. All configurations are stored in the *.ini file which is stated under '''Select *.ini ''' configurations file'''. With '''Browse'''one can choose another or a new file.'''<br />
<br />
==Run queries on the MasterScan==<br />
<br />
MFQL scripts are used for lipid identification, after the spectra data <br />
was imported. Therefore MFQL queries are written in so-called *.mfql files <br />
(with the ending *.mfql) where each file should contain just one query. <br />
The GUI panel '''Run''' is the site where *.mfql files are loaded and run on <br />
the MasterScan file. <br />
<br />
The big window on the left contains all *.mfql scripts which are used <br />
for the lipid identification. This window is managed by the the buttons <br />
on its right side:<br />
* '''Add MFQL File''' will add one file<br />
* '''Add MFQL Directory''' lets you chose a directory containing *.mfql files which are all uploaded.<br />
* '''Edit MFQL Entry''' opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.<br />
* '''New MFQL Entry''' opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file. <br />
* '''Remove MFQL Entry''' removes all entries which are selected in the left window.<br />
<br />
After choosing your *.mfql files, the MasterScan has <br />
to be chosen. This is done by clicking on the green '''Browse''' button or by dragging <br />
the MasterScan file or the folder in which it is onto the text field. The <br />
output file is automatically filled, but can be changed by clicking on the <br />
grey '''Browse''' button.<br />
<br />
'''Isotopic correction''' for MS and MS/MS can be switched on and off on the lower <br />
site of the panel. There is also the option for generation of a complement <br />
MasterScan. This is a spectral database containing all entries from the <br />
original chosen MasterScan but the identified entries together with their <br />
isotopes.<br />
<br />
The options '''No head''' and '''Compress''' change the format of the output slightly. <br />
'''No head''' removes the head of the output file and '''Compress''' removes the names <br />
of the queries in the output file. This can be helpful if you want to do some<br />
automatic post-processing. The option '''Tab limited''' changes the output <br />
format from comma separated file format to tab separated file format. <br />
<br />
'''Dump MasterScan''' lets you write down the content of the MasterScan experimental <br />
database to a comma separated file. This lets you view its content <br />
(in Excel for example).<br />
<br />
With '''Run LipidX''' the lipid identification is started. The result is saved <br />
in the output file. With the '''View''' button this file can be viewed on the spot.<br />
With '''View dump file''' the *.csv file of the MasterScan can be viewed.<br />
<br />
===The editor panel===<br />
<br />
With the editor it is easy to write queries for LipidX. Every query is opened in a<br />
separate tab. If a file is edited the '''Save''' button changes the color to red <br />
to remind the user to save to file before using the query. '''SaveAs''' will store<br />
the query under a certain file name and '''Close''' will close the tab.<br />
<br />
===The MS-Tools panel===<br />
<br />
The MS Tools tab contains a small collection of useful functions:<br />
<br />
====Mass vs. Sum Composition====<br />
<br />
Calculates either the sum composition out of a given m/z value or the other way<br />
round. <br />
<br />
=====Mass-to-sum-composition=====<br />
<br />
Input a m/z value under '''m/z value''' and an sc-constraint under<br />
'''sc-constraint or sum composition'''. '''lDB''' is the lower border and '''hDB'''<br />
the higher border of the double bond equivalent. In '''chg''' the charge has<br />
to be given and in '''acc''' the tolerance value in ppm. Then press<br />
'''Mass-to-sum-composition''' and the result will be shown in the text<br />
window below.<br />
<br />
=====Sum-composition-to-mass=====<br />
<br />
Input a sum composition in '''sc-constraint or sum composition''' and a charge in<br />
'''chg'''. Then press '''Sum-composition-to-mass''' and the result will occure<br />
in the window below.<br />
<br />
====Isotopes of molecules====<br />
<br />
shows the abundances of the isotopes of a given sum composition. Those values<br />
are the ones used in LipidXplorer for isotopic correction. Here the user can double<br />
check if everything is working properly.<br />
<br />
=====Isotopic distribution of MS masses=====<br />
<br />
Input a sum composition under '''Ion sum composition''' and press '''Get Isotopic <br />
distribution'''. The list of isotopes is not 100% correct with the masses. This<br />
is an estimation used in LipidX. But the abundance values are 100% accurate.<br />
<br />
=====Isotopic distribution of MS/MS masses=====<br />
<br />
<br />
[[Image:LipidX-IsotopicCorrection.png|600px|center|LipidXplorer Intrascan Isotopic Correction]]<br />
<br />
The above scheme depicts the values LipidXplorer uses to correct precursor and<br />
fragment masses. The isotopes for the fragments are calculated by multiplying the<br />
probabilities of fragments ('''F''') having no, one or more than one isotopes with the<br />
probabilities of associated neutral losses ('''N'''). <br />
<br />
For example does F0N0 mean that there is no isotope in the fragment<br />
or the neutral loss. F1N0 is the probability of the fragment having one isotope, where<br />
the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment<br />
containing no isotope because it is contained in the neutral loss. <br />
<br />
[[Image:LipidX-IntrascanMSTools.png|500px|right|LipidXplorer Intrascan Isotopic Correction in MS-Tools]]<br />
<br />
In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed.<br />
If you put a fragment sum composition in '''Fragment sum composition''' the corresponding<br />
values are shown in the window below (after pressing '''Get Isotopic distribution''')<br />
The mass can either be a real fragment or a neutral loss. This is denoted with<br />
the checkbox '''Neutral Loss'''.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=453LipidXplorer MFQL2011-01-21T11:33:20Z<p>Schwudke: /* A more complex example for PE-plasmalogen */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for specific precursors in MS and/or fragment ions and/or neutral losses in MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
== Emulating (Multiple) Precursor Ion Scan / Neutral Loss Scan with MFQL ==<br />
<br />
In the <tt>IDENTFIY</tt> section specify precursor ion scans (PIS) and neutral loss <br />
scans (NLS)can be defined. If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
(Comment: The above feature should not be not mistaken with the LipidXplorer functionality to import PIS and NLS mass spectrometric acquisitions.)<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===Analysis of Phosphatidylcholine lipid species emulating PIS 184===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===Application of Boolean operation "AND" for identification of PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=452LipidXplorer MFQL2011-01-21T11:31:41Z<p>Schwudke: /* Analysis of Phosphatidylcholine lipid species emulating 184 PIS */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for specific precursors in MS and/or fragment ions and/or neutral losses in MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
== Emulating (Multiple) Precursor Ion Scan / Neutral Loss Scan with MFQL ==<br />
<br />
In the <tt>IDENTFIY</tt> section specify precursor ion scans (PIS) and neutral loss <br />
scans (NLS)can be defined. If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
(Comment: The above feature should not be not mistaken with the LipidXplorer functionality to import PIS and NLS mass spectrometric acquisitions.)<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===Analysis of Phosphatidylcholine lipid species emulating PIS 184===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=451LipidXplorer MFQL2011-01-21T11:31:21Z<p>Schwudke: /* In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for specific precursors in MS and/or fragment ions and/or neutral losses in MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
== Emulating (Multiple) Precursor Ion Scan / Neutral Loss Scan with MFQL ==<br />
<br />
In the <tt>IDENTFIY</tt> section specify precursor ion scans (PIS) and neutral loss <br />
scans (NLS)can be defined. If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
(Comment: The above feature should not be not mistaken with the LipidXplorer functionality to import PIS and NLS mass spectrometric acquisitions.)<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===Analysis of Phosphatidylcholine lipid species emulating 184 PIS ===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=450LipidXplorer MFQL2011-01-21T11:29:48Z<p>Schwudke: /* The structure of an MFQL query */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for specific precursors in MS and/or fragment ions and/or neutral losses in MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
== Emulating (Multiple) Precursor Ion Scan / Neutral Loss Scan with MFQL ==<br />
<br />
In the <tt>IDENTFIY</tt> section specify precursor ion scans (PIS) and neutral loss <br />
scans (NLS)can be defined. If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
(Comment: The above feature should not be not mistaken with the LipidXplorer functionality to import PIS and NLS mass spectrometric acquisitions.)<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=449LipidXplorer MFQL2011-01-21T11:27:37Z<p>Schwudke: /* Stating (Multiple) Precursor Ion Scan / Neutral Loss Scan */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
== Emulating (Multiple) Precursor Ion Scan / Neutral Loss Scan with MFQL ==<br />
<br />
In the <tt>IDENTFIY</tt> section specify precursor ion scans (PIS) and neutral loss <br />
scans (NLS)can be defined. If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
(Comment: The above feature should not be not mistaken with the LipidXplorer functionality to import PIS and NLS mass spectrometric acquisitions.)<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=448LipidXplorer MFQL2011-01-21T11:23:11Z<p>Schwudke: /* Part 2: The IDENTIFY section */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
== Stating (Multiple) Precursor Ion Scan / Neutral Loss Scan ==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=447LipidXplorer MFQL2011-01-21T11:21:50Z<p>Schwudke: /* (Multiple) Precursor Ion Scan / Neutral Loss Scan */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=446LipidXplorer MFQL2011-01-21T11:19:44Z<p>Schwudke: /* How does LipidXplorer run multiple queries */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How LipidXplorer runs multiple MFQL queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=445LipidXplorer MFQL2011-01-21T11:19:22Z<p>Schwudke: /* The principle of the lipid identification process */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== How does LipidXplorer run multiple queries ==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section<br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=444LipidXplorer MFQL2011-01-21T11:14:27Z<p>Schwudke: /* Structural complexity of lipid species and sum composition constraints */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
'''Figure:''' Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations.<br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== The principle of the lipid identification process==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section <br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=443LipidXplorer MFQL2011-01-21T11:13:45Z<p>Schwudke: /* MFQL identification of phosphatidylcholines (PC) */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations. <br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
'''Figure:''' The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie.<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== The principle of the lipid identification process==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section <br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=442LipidXplorer MFQL2011-01-21T11:10:50Z<p>Schwudke: /* A short tutorial */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations. <br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with<br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== The principle of the lipid identification process==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section <br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=441LipidXplorer MFQL2011-01-21T11:10:00Z<p>Schwudke: /* A short tutorial */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations. <br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species starts with the identification of probable precursors in the MS spectrum by the accurately determined masses and proceeds by <br />
identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== The principle of the lipid identification process==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section <br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=440LipidXplorer MFQL2011-01-21T11:06:13Z<p>Schwudke: /* A short tutorial */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations. <br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head group having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species proceeds by identifying this fragment ion <br />
in MS/MS spectra together with the accurately determined masses of intact <br />
precursors in the MS spectrum (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== The principle of the lipid identification process==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section <br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_MFQL&diff=439LipidXplorer MFQL2011-01-21T11:05:29Z<p>Schwudke: /* A short tutorial */</p>
<hr />
<div>==Introduction==<br />
<br />
MFQL is the first query language developed for the identification of molecules <br />
in complex shotgun spectra datasets. It formalizes the available or assumed<br />
knowledge of lipid fragmentation pathways into queries that are used for <br />
probing a MasterScan database. <br />
<br />
===Structural complexity of lipid species and sum composition constraints===<br />
<br />
[[Image:Figure5.png|600px|center|Structural complexity of lipid species and sum composition constraints]]<br />
Let us consider PC as a representative example: PC molecules consist of a<br />
posphorylcholine head group attached to the glycerol backbone at the sn-3 <br />
position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, <br />
a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid <br />
moieties differ by the number of carbon atoms and double bonds, but also by <br />
the relative location at the glycerol backbone, so that isomeric structures <br />
having exactly the same fatty acid moieties are possible. Note that isomeric <br />
structures are always isobaric, whereas isobaric molecules are not necessarily <br />
isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") <br />
encompass sum compositions of species with all naturally occurring fatty acids. <br />
However, because of the fatty acid variability, some species of other lipid <br />
classes (such as, PE) might meet the same constraint. Therefore, for most <br />
common glycerophospholipid classes, the characterization of individual <br />
molecular species could not solely rely on their intact masses, irrespective <br />
of how accurately were they measured. MS/MS experiments that produce <br />
structure-specific ions contribute more specific constraints, such as the <br />
number of carbons and double bonds in individual moieties, characteristic <br />
head group fragment, characteristic loss of a fatty acid moiety, among others. <br />
Within a MFQL query, these constraints can be bundled by Boolean operations. <br />
<br />
==A short tutorial==<br />
<br />
Below we present an <br />
example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.<br />
<br />
In MS/MS experiments (see [[#MFQL identification of phosphatidylcholines (PC)]]), <br />
molecular cations of PC species produce specific phosphorylcholine fragments of <br />
their head groups having <br />
the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see [[#MFQL identification of phosphatidylcholines (PC)]]). The <br />
identification of PC species proceeds by identifying this fragment ion <br />
in MS/MS spectra together with the accurately determined masses of intact <br />
precursors in the MS spectrum (see [[#MFQL identification of phosphatidylcholines (PC)]]).<br />
<br />
A query for a Phosphatedylcholine lipid (PC) could be: <br />
* Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and <br />
* look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum. <br />
* if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species <br />
<br />
===MFQL identification of phosphatidylcholines (PC)===<br />
<br />
[[Image:figure6.png|600px|center|MFQL identification of phosphatidylcholines (PC)]]<br />
The chemical structure of PC is shown in the figure above. Upon their collisional <br />
fragmentation, molecular cations of PC produce a specific head group <br />
fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. '''A:''' MS <br />
spectrum acquired by direct infusion of a total lipid extract into a <br />
QSTAR mass spectrometer (inset). All detectable peaks were subjected <br />
to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) <br />
is presented at the lower panel. The precursor ion was isolated within <br />
1 Da mass range and therefore several isobaric lipid precursors were <br />
co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. <br />
These ions were disregarded by this MFQL query and did not affect PC <br />
identification. '''B:''' MFQL query identifying PC species, details are <br />
provided in the text. '''C:''' screenshot of the output spreadsheet file; <br />
column annotation and content is determined by REPORT section of the <br />
above MFQL, see also text for details. <br />
<br />
<br />
First, let us assign a name to the query:<br />
<pre>QUERYNAME = Phosphatidylcholine;</pre><br />
Next, we define the variables used for identifying the species. <br />
Our query should identify the singly charged PC head group <br />
fragment and therefore: <br />
<pre><br />
DEFINE<br />
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;<br />
</pre><br />
The keyword <tt>CHG</tt> states the charge of the ion.<br />
<br />
In a shotgun experiment not all fragmented peaks will originate from PCs. <br />
For higher search specificity we next define precursors (<tt>prPC</tt>), who are expected <br />
to produce <tt>headPC</tt> fragment in MS/MS spectra. We impose the sc-constraint on precursor <br />
masses: besides sum composition requirements, it requests that precursors are singly <br />
charged and their unsaturation (expressed as a double bond equivalent with the keyword <br />
<tt>DBR</tt>) is within a certain (here from 1.5 to 7.5) range: <br />
<pre><br />
DEFINE<br />
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);<br />
</pre><br />
<br />
Next, the IDENTIFY section specifies that <tt>prPC</tt> precursors should be <br />
identified in MS spectra and <tt>headPC</tt> fragments in MS/MS spectra, both <br />
acquired in positive mode. The logical operation AND requests that <tt>headPC</tt> <br />
should only be searched in MS/MS spectra of <tt>prPC</tt><br />
<pre><br />
IDENTIFY<br />
prPC IN MS1+ AND<br />
headPC IN MS2+<br />
</pre><br />
We further limit the search space by applying optional project-specific <br />
compositional constraints formulated in the next SUCHTHAT section. For example, <br />
it is generally assumed that mammals do not produce fatty acids having an odd <br />
number of carbon atoms. Therefore, it is likely that if a recognized lipid <br />
comprises an odd-numbered fatty acid moiety this identification is false. <br />
<pre><br />
SUCHTHAT<br />
isEven(prPC.chemsc[C]);<br />
</pre><br />
In this case the operator <tt>isEven</tt> requests that candidate PC <br />
precursors should contain an even number of carbon atoms. Since the head <br />
group of PC and the glycerol backbone contain 5 and 3 carbon atoms, <br />
respectively, this implies that a lipid could not comprise fatty acid <br />
moieties with odd and even number of carbon atoms at the same time.<br />
By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will <br />
recognize spectra pertinent to PC species. The last section REPORT <br />
defines how these findings will be reported. This includes annotation <br />
of the recognized lipid species, reporting the abundances of characteristic <br />
ions for subsequent quantification and reporting all additional <br />
information pertinent to the analysis, such as masses, mass differences <br />
(errors) etc. LipidXplorer outputs the findings as a *.csv file in which <br />
identified species are in rows, while the columns content is user-defined. <br />
In this example we define 5 columns: <tt>NAME</tt> - to report the species name; <br />
along with four peak attributes such as: <tt>MASS</tt> - species mass; <br />
<tt>CHEMSC</tt> - chemical sum composition; <tt>ERROR</tt> - difference <br />
to the calculated mass; <tt>INTENS</tt> - intensities of the specified <br />
ions reported for each individual acquisition. <br />
<pre><br />
REPORT<br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%dppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
</pre><br />
<br />
<br />
It is also possible to define mathematical terms or use certain <br />
functions, such as text formatting, on these attributes. The text <br />
format implies two strings separated by <tt>%</tt> , where the <br />
first string contains placeholders and the second string their <br />
content. This formatting is used in the NAME string such that <br />
the actual annotation convention remains in the users discretion. <br />
In this example two placeholders <tt>%d</tt> of the lipids class <br />
name <tt>PC [%d:%d]</tt> are filled with the number of carbon <br />
atoms and double bonds in the fatty acid moieties. The number <br />
of carbon atoms is calculated by subtracting the sum composition <br />
of <tt>headPC</tt> from the precursor <tt>prPC</tt> and <br />
subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).<br />
<br />
==General rules in MFQL queries==<br />
<br />
# Everything written after <tt>#</tt> is ignored by the interpreter. This function is used for writing comments in the code.<br />
# Every line has to end with <tt>;</tt><br />
# Every query has to end with an extra <tt>;</tt><br />
<br />
<br />
==The structure of an MFQL query== <br />
A MFQL query consists of 3-4 sections:<br />
<br />
1. '''DEFINE''': defines sum compositions, sc-constraints (see also [[#sc-constraints]]), <br />
masses or groups of masses and associates them to user defined names.<br><br />
<br />
2. '''IDENTIFY''': determines where and how the DEFINE content is applied. <br />
It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra<br><br />
<br />
3. '''SUCHTHAT''': ''is optional''. It defines constraints that are formulated as mathematical <br />
expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), <br />
sum compositions and functions. Several individual constraints can be bundled by <br />
logical operations and applied together.<br><br />
<br />
4. '''REPORT''': establishes the content and format of the output <br><br />
<br />
After '''REPORT''' there is a list of variables (<tt>MASS</tt>, <tt>NAME</tt>, ...) which represent columns <br />
in the output file. Each columns content is defined after the <tt>=</tt>. More on the '''REPORT''' <br />
will be found in the '''REPORT''' chapter.<br />
<br />
==SC-constrains==<br />
<br />
For dealing with sets of chemical sum compositions LipidXplorer uses a <br />
special format which is called sum composition constraint (sc-constraint). <br />
With sc-constraints it is possible to specify a class of lipids. It is like <br />
a collection of chemical sum compositions. It is used for several functions, <br />
especially for screening tasks or multiple scans. Its format is <br />
self-explanatory. Here is an example:<br />
<br />
<pre>'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;</pre><br />
<br />
* <tt>DBR</tt> means 'Double Bond Range' and specifies a range of the number of the possible double bonds. <br />
* <tt>CHG</tt> states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.<br />
<br />
==The 4 sections of a MFQL query==<br />
<br />
===Part 1: Definition of sum composition, sc-constrains and masses===<br />
<br />
The first statement of any query is<br />
<pre>QUERYNAME = <name of the query></pre><br />
to give the query a unique name.<br />
<br />
Next, variables are defined. It's syntax is<br />
<pre>DEFINE &lt;variable name&gt; = (&lt;chemical sum composition&gt; | &lt;sf-constraint&gt; | &lt;mass&gt;) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
After the keyword <tt>DEFINE</tt> comes the name of the variable followed by <br />
equation sign and its content. This can be either a chemical sum composition, <br />
a sc-constrain or a list of sum compositions. Sum compositions and <br />
sc-constraints are written in single quotes. Then there can be a <br />
<tt>WITH</tt> followed by certain options. The options can be:<br />
<br />
# <tt>DBR</tt> is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.<br />
# <tt>CHG</tt> states the charge<br />
<br />
If the fragment should be a neutral loss, this can be stated by setting <br />
the charge to zero with <tt>CHG = 0</tt> or by writing <tt>AS NEUTRALLOSS</tt> <br />
after the sum composition or sc-constrain. <br />
<br />
NOTE: The neutral loss is calculated<br />
always between the precursor mass and the fragment, never between two<br />
fragments.<br />
<br />
====examples====<br />
Define PC-O sc-constrains and PC-O's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
</pre><br />
<br />
Define PE sc-constrains and PE's head group which is connected to the <br />
precursor mass:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;<br />
</pre><br />
<br />
Define sc-constrains and fragments for PE-Plasmalogen:<br />
<pre><br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
</pre> <br />
<br />
An arbitrary number of variables can be defined, but they are only valid for the <br />
current query. I.e. they are not valid in other queries of the same Run.<br />
<br />
===Part 2: The <tt>IDENTIFY</tt> section===<br />
<br />
The before defined variables are queried to the experiment database. The syntax is:<br />
<pre>IDENTIFY<br />
<br />
&lt;identification 1&gt; AND<br />
&lt;identification 2&gt; AND<br />
...<br />
&lt;identification n&gt;<br />
</pre><br />
<br />
The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns <i>True</i> if the boolean expression is true. The expression is true if the particular expressions are true:<br />
<br />
An identification looks like this:<br />
<pre><br />
((&lt;variable name&gt; IN (MS1+/-|MS2+/-) (WITH (&lt;option&gt; = &lt;value&gt;,)+)?<br />
</pre> <br />
<br />
Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN':<br />
The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':<br />
<br />
# 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that: <br />
## 'ppm' - parts per million<br />
## 'da' - Dalton and<br />
## 'res' - resolution<br />
# 'MASSRANGE' is a 2-tuple constraining the mass of interest. <br />
# 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.<br />
<br />
For example:<br />
* A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".<br />
* "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.<br />
<br />
Some examples:<br />
<br />
<pre># Phosphatedylcholine ether species<br />
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholineether WHERE<br />
<br />
# the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
# we are not so strict with the tolerance for the low resolution MS/MS spectra<br />
pcHead in MS2+ WITH TOLERANCE = 250ppm<br />
<br />
################################################################################<br />
<br />
# Phosphatedylethanolamine <br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;<br />
<br />
IDENTIFY Phosphatidylethanolamine WHERE<br />
<br />
# marking <br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
peHead in MS2+ WITH TOLERANCE = 0.5Da<br />
<br />
################################################################################<br />
<br />
# PE Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 5ppm AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
</pre><br />
<br />
===Part 3: The <tt>SUCHTHAT</tt> section===<br />
<br />
After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:<br />
<pre>SUCHTHAT<br />
(((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;)) |<br />
((NOT)? (&lt;equation&gt; | &lt;unequation&gt; | &lt;function&gt;) (AND | OR))+) (WITH (&lt;option&gt; = &lt;value&gt;)+)?<br />
</pre> <br />
The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal.<br />
The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as <tt>PR.intensity</tt>. A list of peak attributes can be found here: [[#List of peak attributes]]<br />
<br />
====Functions====<br />
<br />
Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: [[#List of functions]]<br />
<br />
===Part 4: The <tt>REPORT</tt> section===<br />
<br />
All successful identifications are piped to the <tt>REPORT</tt> section, <br />
where the format of the output is specified. In general the <tt>REPORT</tt> <br />
consists of a list of variables where each represents a column. The content <br />
of the variable is the content of the column. So is the following code <br />
generates a column with the name <tt>MASS</tt> and the m/z values of <tt>PR</tt>'s <br />
identified species as content:<br />
<pre><br />
REPORT<br />
MASS = PR.mass<br />
</pre><br />
<br />
The next example reports the sum of the intensities of two fragments<br />
<pre><br />
REPORT<br />
INTENS = frag1.intensity + frag2.intensity<br />
</pre><br />
<br />
Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:<br />
<pre><br />
REPORT<br />
INTENS = sumIntensity(frag1, frag2)<br />
</pre><br />
<br />
The syntax of <tt>REPORT</tt> is:<br />
<pre>REPORT<br />
((&lt;variable name&gt; = &lt;variable&gt; | &lt;equation&gt;)<br />
</pre><br />
<br />
The content of the variable can be any attribute and/or term as in the <br />
<tt>SUCHTHAT</tt> section. The <tt>REPORT</tt> section has an additional <br />
feature with which it is possible to generate lipid names or other formatted strings. <br />
<br />
The syntax for this function is:<br />
<pre>REPORT<br />
(&lt;variable name&gt; = "&lt;format string&gt;" % "&lt;list of variables for the format string&gt;"),)*<br />
</pre> <br />
<br />
The string format works as follows: there are two strings to give <br />
which are separated with a <tt>%</tt>. The first string contains the output <br />
format, i.e. a string with placeholders. Placeholder can be: <tt>%d</tt> <br />
for decimal values, <tt>%.</tt><i>n</i><tt>f</tt> for floating point values <br />
with <i>n</i> decimals and <tt>%s</tt> for string values. The second <br />
string contains a list with the content of the placeholders according to <br />
their order. For example:<br />
<pre>REPORT<br />
LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"<br />
</pre><br />
The variable <tt>LIPIDNAME</tt> contains the string <tt>"PC [... : ...]"</tt>. <br />
The first decimal value is filled with the sum of the carbon atoms of both <br />
fatty acids <tt>(fa1PC, fa2PC)</tt> and the second decimal value the sum of <br />
the double bonds. The output could be for example <tt>"PC [36:2]"</tt>.<br />
<br />
The format string variant is a Python gimmick, where MFQL uses standard <br />
Python commands. I.e. the format string is a python function <br />
(see [http://docs.python.org/library/stdtypes.html#string-formatting-operations here] for more information).<br />
<br />
===Notes===<br />
<br />
* If a lipid was not found in a particular sample, its intensity is set to zero.<br />
* If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'<br />
<br />
==List of peak attributes==<br />
<br />
====error====<br />
The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types: <br />
# <tt>errppm</tt> -&gt; error in ppm<br />
# <tt>errda</tt> -&gt; error in dalton<br />
# <tt>errres</tt> -&gt; error as resolution value<br />
====mass==== <br />
The m/z value of the peak<br />
====chemsc==== <br />
The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after <tt>.chemsc</tt>. To get the number of <tt>C</tt> atoms from a formula for example: <pre>PR.chemsc[C]</pre><br />
# <tt>frsc</tt> -&gt; the chemical sum composition of the fragment. If the peak is a fragment, it is the same as <tt>chemsc</tt>, if it is a neutral loss, it returns the sum composition of the fragment.<br />
# <tt>nlsc</tt> -&gt; the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as <tt>chemsc</tt>, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.<br />
====intensity====<br />
All the intensities of a mass from all the samples it occured. Note that <tt>intensity</tt> is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity &gt; 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (<tt>*</tt> and/or <tt>?</tt>). E.g. is <tt>PR.intensity["*blanck*"]</tt> returning just the samples with the string <tt>blanck</tt> in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.<br />
<pre> avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 </pre> <br />
This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.<br />
====binsize====<br />
The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.<br />
====occ====<br />
Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples<br />
<br />
==List of functions==<br />
<br />
====isEven(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is even. E.g.: <tt>isEven(PR.chemsc[C])</tt>.<br />
<br />
====isOdd(n)==== <br />
<br />
where n is an integer value. The function returns True, if n is odd.<br />
<br />
====avg(v.intensity)==== <br />
<br />
where n is a variable. The function returns the average value of the intensities of n. E.g.: <pre>avg(PR.intensity)</pre><br />
<br />
====isStandard(v, scope)==== <br />
<br />
where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.<br />
<br />
====sumIntensity(f1, f2, ...)====<br />
<br />
The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. <br />
In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.<br />
<br />
If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes. <br />
<br />
If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:<br />
<br />
<math>F1 + F2 -> sumIntensity(F1, F2)</math><br />
<math>-1 + -1 = -2</math><br />
<math> 0 + -1 = -1</math><br />
<math> 1 + -1 = 1</math><br />
<math> 2 + -1 = 2</math><br />
<math> 2 + 0 = 2</math><br />
<br />
That has following consequences when such results have to be interpreted:<br />
<br />
A) intensity = 0 in this specific sample none of the required fragments was present<br />
<br />
B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present<br />
<br />
C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction<br />
<br />
D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)<br />
<br />
===Some examples===<br />
<br />
<pre>SUCHTHAT<br />
# the number of 'C' atoms in 'PR's chemical sum composition should be odd<br />
isOdd(PR.chemsc[C])<br />
<br />
SUCHTHAT<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precursor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
</pre><br />
<br />
== The principle of the lipid identification process==<br />
<br />
The principle of a LipidXplorer Run is the following: All queries run successively on the given <br />
MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan<br />
from smallest to the greatest and checks the conditions given in definition, <tt>IDENTIFY</tt>, <br />
<tt>SUCHTHAT</tt> and <tt>REPORT</tt> sections. I.e. <br />
* it loads a MS mass<br />
* it checks if it fits a given sum compostion or sc-constrain (definition and <tt>IDENTIFY</tt> section).<br />
* it looks into its MS/MS spectrum (if provided) and does the same (definition and <tt>IDENTIFY</tt> section). <br />
* the boolean constraints are checked (<tt>SUCHTHAT</tt> section) and if the result is <br />
positive the MS mass is accepted and send to the <tt>REPORT</tt> section <br />
<br />
==(Multiple) Precursor Ion Scan / Neutral Loss Scan==<br />
<br />
The <tt>IDENTFIY</tt> part emulates precursor ion scans (PIS) and neutral loss <br />
scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. <br />
Switching from PIS to NLS is done in the definition part. When a variable gets <br />
charge zero (<tt>CHG = 0</tt>) or the keyword <tt>AS NEUTRALLOSS</tt> is given then it is <br />
stated as neutral loss. Otherwise it is stated as (fragment) mass.<br />
<br />
==Examples==<br />
<br />
===Screen (without MS/MS experiments) for Phosphatidylcholine species===<br />
<br />
A "screen" is a fast identification based on only MS information. To do <br />
screening properly the masses should be high accurate, because otherwise<br />
the error of identification is too high.<br />
<br />
The name of the query here is <tt>Phosphatidylcholine</tt>. Giving a name <br />
to a query is obligatory and has to be done for every query. We define <br />
the sc-constraint <tt>prPC</tt> (short for "precursor of PC") and state <br />
that it should be found in the positive MS spectra. <br />
<br />
Names for variables are arbitrary. The user should try to give meaningful <br />
names in order to understand his query better.<br />
<br />
The <tt>IDENTIFIY</tt> section urges LipidXplorer to look for the precursor mass<br />
into the MS spectrum.<br />
<br />
In <tt>SUCHTHAT</tt> we use a function to restrict the result to lipids<br />
having an overall even number of carbon atoms. This means that the fatty<br />
acids of the lipid have to have both fatty acids even numbered or<br />
both odd numbered. Such, we can sort out lipids which we know they should<br />
not be in the organism we examine. <br />
<br />
The <tt>REPORT</tt> section uses the following variables:<br />
* 'MASS' returns the m/z value of the MS mass<br />
* 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone. <br />
* 'CHEMSC' returns the chemical sum composition<br />
* 'INTENS' returns the abundance of the identified lipid species for all samples<br />
* 'ERROR' returns the error of the finding in ppm.<br />
<br />
<pre>##########################################################<br />
# Identify PC with checking the precursor mass #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";<br />
CHEMSC = prPC.chemsc;<br />
INTENS = prPC.intensity;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";&nbsp;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
The output of the query is the following:<br />
<br />
[[Image:Screenshot-output.png|center|600px|OuputScreenShot]]<br />
<br />
This is a screen shot of spread sheet software holding the resulting <br />
data from the query. At the top are the variable names followed by the <br />
name of the query, then comes the content. Note, that for 'INTENS' <br />
the file name from which the sample data was taken is also written. <br />
Every entry in the result fulfills the constraints given in the query. <br />
If an expected value is not found then the query or the import settings <br />
should be refined. <br />
<br />
===In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode===<br />
<br />
Additionally to the former query we have a variable 'headPC' <br />
which contains the sum composition of the specific head group <br />
for PC which is found in the fragment spectra after MS/MS of a <br />
PC species. This variable is added as constraint in <tt>IDENTIFY</tt>. <br />
Thus a lipid is only identified if it fits to the constraints <br />
of <tt>prPC</tt> <tt>AND</tt> has a <tt>headPC</tt> fragment <br />
in its MS/MS spectrum. Again, we test the even numbers of <br />
carbons in <tt>SUCHTHAT</tt>, which ensure we do not find borderline <br />
masses, which actually cannot be in the sample. In the output <br />
we have additionally the abundance of the head group fragment <br />
with <tt>FRAGINTENS</tt>.<br />
<br />
<pre>##########################################################<br />
# Identify PCs with checking the precursor mass #<br />
# AND check for PIS 184 in MS2 #<br />
##########################################################<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
<br />
IDENTIFY<br />
<br />
# marking<br />
prPC IN MS1+ AND<br />
headPC in MS2+<br />
<br />
SUCHTHAT<br />
<br />
isEven(prPC.chemsc[C])<br />
<br />
REPORT <br />
MASS = prPC.mass;<br />
NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";<br />
CHEMSC = prPC.chemsc;<br />
ERROR = "%2.2fppm" % "(prPC.errppm)";<br />
INTENS = prPC.intensity;<br />
FRAGINTENS = headPC.intensity;;<br />
<br />
################ end script ##################<br />
</pre><br />
<br />
===A more complex example for PE-plasmalogen===<br />
<br />
An example for a whole script:<br />
<pre>###########################################################<br />
##### find PE-plasmalogens with MS2 in positive mode ######<br />
###########################################################<br />
<br />
# define sf-constrains and fragments for PE-Plasmalogen<br />
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;<br />
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;<br />
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;<br />
<br />
IDENTIFY PEplasmalogen WHERE<br />
<br />
# marking<br />
PR IN MS1+ AND<br />
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND<br />
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm<br />
<br />
SUCHTHAT<br />
<br />
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to<br />
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and<br />
# the intensity of 'FRAG2' should be bigger than 3/10th of the<br />
# the intensity of 'FRAG1' <br />
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND<br />
FRAG1.intensity * 3 &lt; FRAG2.intensity * 10<br />
<br />
REPORT<br />
<br />
# first column is the precursor mass<br />
MASS = PR.mass,<br />
<br />
# second is the lipids name generated with Python's string formatting function<br />
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",<br />
<br />
# third is the precursor's chemical sum composition<br />
CHEMSC = PR.chemsc,<br />
<br />
# forth the intensity<br />
INTENS = PR.intensity,<br />
<br />
# fifth the sum of the error of both fragments in ppm<br />
ERROR = FRAG1.errppm + FRAG2.errppm;;<br />
</pre><br />
<br />
==More Examples==<br />
<br />
More examples can be found in the MFQL collection provided in<br />
the LipidXplorer wiki.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_News&diff=150LipidXplorer News2010-08-03T15:26:49Z<p>Schwudke: /* 2010-07-25 Bugfix for acquisition in different ionization modes / polarities */</p>
<hr />
<div>== 2008-07-16 NOT statement for IDENTIFY ==<br />
<br />
The 'NOT' statement is now usable in the IDENTIFY section. This allows doing negative queries. For example:<br />
DEFINE PR = 'C[20..100] H[30..200] N[1] O[8] P[1]' WITH DBR = (0,16), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
IDENTIFY phosphatedylcholine WHERE<br />
PR IN MS1+ AND<br />
NOT pcHead IN MS2+ WITH TOLERANCE = 50ppm<br />
<br />
to check if 'PR' ignored some PC lipids.<br />
<br />
== 2008-07-16 column() function for SUCHTHAT ==<br />
<br />
A new function for the SUCHTHAT section: 'column(<regular Expression>, <variable>)'. The given regular expression returns only the matching sample names of the variable given. This is for a selective query.<br />
<br />
== 2008-07-20 +/- added to error attribute == <br />
<br />
errors (errppm and errda) are now with the direction of the shift, i.e. with the algebraic sign. The error is calculated as: <theoretical mass> - <measured mass>.<br />
<br />
== 2008-08-07 new function 'isStandard()' and new function for sample grouping ==<br />
<br />
- new function isStandard(<variable>, "<sample>", "<scope>"). Place it in SUCHTHAT section. <br />
A scan is started to find <variable> in <sample>. Once found it is declared as the standard and the<br />
intensities of all other peaks in <scope> are recalculated as ratio. <br />
Example: <br />
isStandard(varStandard, "hilde01", "MS1+")<br />
<br />
- new function for addressing groups of samples with the help of place holders:<br />
<br />
Patterns are Unix shell style:<br />
<br />
* matches everything<br />
? matches any single character<br />
[seq] matches any character in seq<br />
[!seq] matches any char not in seq<br />
<br />
Example: <br />
FA1.intensity["*hilde0[1-9]*"] > FA1.intensity["*hilde1[0-9]*"]<br />
<br />
also possible: <br />
column(FA1, "*hilde0[1-9]*")<br />
<br />
== 2008-08-13 syntax change for 'isStandard()' ==<br />
<br />
semi-new function: isStandard(<variable>, "<scope>"). With this syntax the standard is<br />
calculated for every single sample.<br />
<br />
== 2008-08-14 more peak information ==<br />
<br />
every peak (either MS or MS/MS) has now additionally the following information:<br />
# peak mean<br />
# peak median<br />
# peak variance<br />
# peak standard deviation<br />
all this info will be output in the dump of the MasterScan<br />
<br />
== 2008-08-22 bugfix for 'isStandard()' ==<br />
<br />
isStandard() function works now for MS1+/- (! no MS2+/- !). The function should be placed in the SUCHTHAT<br />
section and has as attributes 1) a variable containing the marked standard and 2) the scope of<br />
the standard (MS1+, MS1-, MS2+ or MS2-). Next an example of an MFQL script identifying<br />
PC with calculating the standard:<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE PR = 'C[36..50] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE DietherPC = 'C44 H93 O6 N1 P1' WITH CHG = 1;<br />
DEFINE DietherPE = 'C45 H95 O6 N1 P1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholine WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 4 ppm OR<br />
DietherPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PR.chemsc[C]) AND<br />
isStandard(DietherPC, "MS1+")<br />
<br />
REPORT <br />
MASS = "%4.4f" % "(PR.mass)";<br />
NAME = "PC [%d:%d]" % "((PR.chemsc)[C] - 8, (PR.chemsc)[db] - 1.5)";<br />
PRECURINTENS = PR.intensity;;<br />
<br />
== 2008-09-02 implementation/bugfix for the de-isotopic algorithm ==<br />
<br />
de-isotoping was implemented for MS mode. The algorithm uses the sum compositions<br />
which where calculated with the used MFQL scripts. So, only molecular species of interest<br />
are considered for the de-isotoping. The algorithm is the following:<br />
# sort MS spectrum increasingly.<br />
# for every mass m which has a chemical sum composition assigned:<br />
## look, if there is are masses i1, i2, i3 or i4 which could be isotopes, i.e. is there a mass i1 = m + 1.0033, is there a mass i2 = m + 2 * (1.0033), and so on ...<br />
## calculate the isotopic distribution of m for only 13C. Isotopes of other elements are so little for lipids, that we leave them beside. The distribution is binomial with a probability that a 13C occurs of 0.01082 <insert citation here>.<br />
## subtract calculated isotopic percentage from i1-4<br />
<br />
de-isotoping of MS/MS was revisited. The algorithm is the following:<br />
# generate artificial PIS spectra P1, ..., Pn for fragments f1, ..., fn by collecting precursor masses which have f1 (f2, ..., fn) in their MS/MS spectrum.<br />
# for every PIS spectrum Pi:<br />
## for every mass m in Pi:<br />
## look, if there is are masses i1, i2, i3 or i4 which could be isotopes<br />
## calculate the isotopic distribution of the neutral loss of m according to <insert citation here><br />
## subtract the calculated isotopic percentage from i1-4 <br />
<br />
== 2008-10-01 GUI enhancement with a debugging window ==<br />
<br />
implemented a debugging window in the GUI.<br />
<br />
GUI looks a bit more compact now<br />
<br />
update of the merging algorithm (for *.mzXML import). Average masses are now calculated by intensity weighted average. This yields more accurate spectra.<br />
<br />
== 2008-10-09 no 'cleaning' procedure for *.mzXML files any more ==<br />
<br />
switched off the cleaning procedure for *.mzXML imported files<br />
<br />
if more than one sum composition is found for a precursor mass, it will be reported ordered by identification error.<br />
<br />
Bug with *.mzXML file, where only MS/MS spectra are given is fixed.<br />
<br />
== 2008-10-23 new feature: generate a complement MasterScan ==<br />
<br />
new Function: complementMasterScan. Switch it on with a checkbox on the Run-Panel. It will produce the "complement MasterScan" of the current query, i.e. a MasterScan with all peaks which where not identified in the current run. It will be saved as <original MasterScan name>-complement.sc.<br />
<br />
the purpose is do blind queries for unknown or not expected sum compositions.<br />
<br />
== 2008-11-18 new features for the DEFINE section ==<br />
<br />
new Function: DEFINE allows now to define a list of same variables with different names in one line. The user just writes a list:<br />
<br />
old:<br />
DEFINE FA1 = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
DEFINE FA2 = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
DEFINE FA3 = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
new:<br />
DEFINE (FA1, FA2, FA3) = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
<br />
new Function: DEFINE allow the definition of a list of fragments. <br />
For example:<br />
DEFINE FA = ('C14 H27 O2 N1',<br />
'C14 H29 O2 N1',<br />
'C14 H31 O2 N1',<br />
'C15 H21 O2 N1',<br />
'C15 H31 O2 N1');<br />
<br />
== 2009-07-29 enhanced GUI concerning syntax errors on MFQL queries ==<br />
<br />
A new technique for more control over the debug output of LipidX was implemented. This offers a lot of possibilities to enhance the error handling for the user and for me, the developer. For example: if a SYNTAX ERROR occurs it will be shown more clearly and it will point the user more accurate to it. If LipidX crashes, the debug output can be easily selected and copied to the mail you send to the developer, if you want to report the crash. This function will be improved in the future to make it more even convenient for the user and the developer and also new functions will be added.<br />
<br />
== 2010-06-25 Bugfix for isotopic correction<br />
Bug: the isotopic corrected peak were not dismissed correctly (according to the threshold<br />
and min occupation settings).<br />
<br />
== 2010-07-25 Bugfix for import of acquisitions in different ionization modes / polarities ==<br />
Spectra from *.mzXML files from positive and negative ionization mode can be simultaneous be imported. Note: one *.mzXML file can only<br />
have one polarity. But if other *.mzXML files have a different polarity it now possible<br />
to import them into the same MasterScan.<br />
<br />
== 2010-07-30 Bugfix: MS/MS scan counts ==<br />
The number of averaged MS/MS scans is now stored properly. This value is used for a correct<br />
calculation of the MS/MS threshold.<br />
<br />
== 2010-08-03 tolerance and min occupation settings in the Run pane ==<br />
MS and MS/MS tolerance values can now be set in the Run pane. This allows to set those<br />
two attributes for all queries at once. All tolerance settings given in the Import<br />
and in the MFQL files are overridden! The same goes for min occupation.</div>Schwudkehttps://wiki.mpi-cbg.de/lipidxscr/index.php?title=LipidXplorer_News&diff=149LipidXplorer News2010-08-03T15:26:18Z<p>Schwudke: /* 2010-07-25 Bugfix for acquisition in different ionization modes / polarity */</p>
<hr />
<div>== 2008-07-16 NOT statement for IDENTIFY ==<br />
<br />
The 'NOT' statement is now usable in the IDENTIFY section. This allows doing negative queries. For example:<br />
DEFINE PR = 'C[20..100] H[30..200] N[1] O[8] P[1]' WITH DBR = (0,16), CHG = 1;<br />
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;<br />
IDENTIFY phosphatedylcholine WHERE<br />
PR IN MS1+ AND<br />
NOT pcHead IN MS2+ WITH TOLERANCE = 50ppm<br />
<br />
to check if 'PR' ignored some PC lipids.<br />
<br />
== 2008-07-16 column() function for SUCHTHAT ==<br />
<br />
A new function for the SUCHTHAT section: 'column(<regular Expression>, <variable>)'. The given regular expression returns only the matching sample names of the variable given. This is for a selective query.<br />
<br />
== 2008-07-20 +/- added to error attribute == <br />
<br />
errors (errppm and errda) are now with the direction of the shift, i.e. with the algebraic sign. The error is calculated as: <theoretical mass> - <measured mass>.<br />
<br />
== 2008-08-07 new function 'isStandard()' and new function for sample grouping ==<br />
<br />
- new function isStandard(<variable>, "<sample>", "<scope>"). Place it in SUCHTHAT section. <br />
A scan is started to find <variable> in <sample>. Once found it is declared as the standard and the<br />
intensities of all other peaks in <scope> are recalculated as ratio. <br />
Example: <br />
isStandard(varStandard, "hilde01", "MS1+")<br />
<br />
- new function for addressing groups of samples with the help of place holders:<br />
<br />
Patterns are Unix shell style:<br />
<br />
* matches everything<br />
? matches any single character<br />
[seq] matches any character in seq<br />
[!seq] matches any char not in seq<br />
<br />
Example: <br />
FA1.intensity["*hilde0[1-9]*"] > FA1.intensity["*hilde1[0-9]*"]<br />
<br />
also possible: <br />
column(FA1, "*hilde0[1-9]*")<br />
<br />
== 2008-08-13 syntax change for 'isStandard()' ==<br />
<br />
semi-new function: isStandard(<variable>, "<scope>"). With this syntax the standard is<br />
calculated for every single sample.<br />
<br />
== 2008-08-14 more peak information ==<br />
<br />
every peak (either MS or MS/MS) has now additionally the following information:<br />
# peak mean<br />
# peak median<br />
# peak variance<br />
# peak standard deviation<br />
all this info will be output in the dump of the MasterScan<br />
<br />
== 2008-08-22 bugfix for 'isStandard()' ==<br />
<br />
isStandard() function works now for MS1+/- (! no MS2+/- !). The function should be placed in the SUCHTHAT<br />
section and has as attributes 1) a variable containing the marked standard and 2) the scope of<br />
the standard (MS1+, MS1-, MS2+ or MS2-). Next an example of an MFQL script identifying<br />
PC with calculating the standard:<br />
<br />
QUERYNAME = Phosphatidylcholine;<br />
DEFINE PR = 'C[36..50] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;<br />
DEFINE DietherPC = 'C44 H93 O6 N1 P1' WITH CHG = 1;<br />
DEFINE DietherPE = 'C45 H95 O6 N1 P1' WITH CHG = 1;<br />
<br />
IDENTIFY Phosphatidylcholine WHERE<br />
<br />
# marking<br />
PR IN MS1+ WITH TOLERANCE = 4 ppm OR<br />
DietherPC IN MS1+<br />
<br />
SUCHTHAT<br />
isEven(PR.chemsc[C]) AND<br />
isStandard(DietherPC, "MS1+")<br />
<br />
REPORT <br />
MASS = "%4.4f" % "(PR.mass)";<br />
NAME = "PC [%d:%d]" % "((PR.chemsc)[C] - 8, (PR.chemsc)[db] - 1.5)";<br />
PRECURINTENS = PR.intensity;;<br />
<br />
== 2008-09-02 implementation/bugfix for the de-isotopic algorithm ==<br />
<br />
de-isotoping was implemented for MS mode. The algorithm uses the sum compositions<br />
which where calculated with the used MFQL scripts. So, only molecular species of interest<br />
are considered for the de-isotoping. The algorithm is the following:<br />
# sort MS spectrum increasingly.<br />
# for every mass m which has a chemical sum composition assigned:<br />
## look, if there is are masses i1, i2, i3 or i4 which could be isotopes, i.e. is there a mass i1 = m + 1.0033, is there a mass i2 = m + 2 * (1.0033), and so on ...<br />
## calculate the isotopic distribution of m for only 13C. Isotopes of other elements are so little for lipids, that we leave them beside. The distribution is binomial with a probability that a 13C occurs of 0.01082 <insert citation here>.<br />
## subtract calculated isotopic percentage from i1-4<br />
<br />
de-isotoping of MS/MS was revisited. The algorithm is the following:<br />
# generate artificial PIS spectra P1, ..., Pn for fragments f1, ..., fn by collecting precursor masses which have f1 (f2, ..., fn) in their MS/MS spectrum.<br />
# for every PIS spectrum Pi:<br />
## for every mass m in Pi:<br />
## look, if there is are masses i1, i2, i3 or i4 which could be isotopes<br />
## calculate the isotopic distribution of the neutral loss of m according to <insert citation here><br />
## subtract the calculated isotopic percentage from i1-4 <br />
<br />
== 2008-10-01 GUI enhancement with a debugging window ==<br />
<br />
implemented a debugging window in the GUI.<br />
<br />
GUI looks a bit more compact now<br />
<br />
update of the merging algorithm (for *.mzXML import). Average masses are now calculated by intensity weighted average. This yields more accurate spectra.<br />
<br />
== 2008-10-09 no 'cleaning' procedure for *.mzXML files any more ==<br />
<br />
switched off the cleaning procedure for *.mzXML imported files<br />
<br />
if more than one sum composition is found for a precursor mass, it will be reported ordered by identification error.<br />
<br />
Bug with *.mzXML file, where only MS/MS spectra are given is fixed.<br />
<br />
== 2008-10-23 new feature: generate a complement MasterScan ==<br />
<br />
new Function: complementMasterScan. Switch it on with a checkbox on the Run-Panel. It will produce the "complement MasterScan" of the current query, i.e. a MasterScan with all peaks which where not identified in the current run. It will be saved as <original MasterScan name>-complement.sc.<br />
<br />
the purpose is do blind queries for unknown or not expected sum compositions.<br />
<br />
== 2008-11-18 new features for the DEFINE section ==<br />
<br />
new Function: DEFINE allows now to define a list of same variables with different names in one line. The user just writes a list:<br />
<br />
old:<br />
DEFINE FA1 = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
DEFINE FA2 = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
DEFINE FA3 = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
new:<br />
DEFINE (FA1, FA2, FA3) = 'C[14..22] H[20..60] O[2] N[1]' WITH DBR = (0.0,6.0), CHG = 0;<br />
<br />
new Function: DEFINE allow the definition of a list of fragments. <br />
For example:<br />
DEFINE FA = ('C14 H27 O2 N1',<br />
'C14 H29 O2 N1',<br />
'C14 H31 O2 N1',<br />
'C15 H21 O2 N1',<br />
'C15 H31 O2 N1');<br />
<br />
== 2009-07-29 enhanced GUI concerning syntax errors on MFQL queries ==<br />
<br />
A new technique for more control over the debug output of LipidX was implemented. This offers a lot of possibilities to enhance the error handling for the user and for me, the developer. For example: if a SYNTAX ERROR occurs it will be shown more clearly and it will point the user more accurate to it. If LipidX crashes, the debug output can be easily selected and copied to the mail you send to the developer, if you want to report the crash. This function will be improved in the future to make it more even convenient for the user and the developer and also new functions will be added.<br />
<br />
== 2010-06-25 Bugfix for isotopic correction<br />
Bug: the isotopic corrected peak were not dismissed correctly (according to the threshold<br />
and min occupation settings).<br />
<br />
== 2010-07-25 Bugfix for acquisition in different ionization modes / polarities ==<br />
Spectra from *.mzXML files from positive and negative ionization mode can be simultaneous be imported. Note: one *.mzXML file can only<br />
have one polarity. But if other *.mzXML files have a different polarity it now possible<br />
to import them into the same MasterScan.<br />
<br />
== 2010-07-30 Bugfix: MS/MS scan counts ==<br />
The number of averaged MS/MS scans is now stored properly. This value is used for a correct<br />
calculation of the MS/MS threshold.<br />
<br />
== 2010-08-03 tolerance and min occupation settings in the Run pane ==<br />
MS and MS/MS tolerance values can now be set in the Run pane. This allows to set those<br />
two attributes for all queries at once. All tolerance settings given in the Import<br />
and in the MFQL files are overridden! The same goes for min occupation.</div>Schwudke