Difference between revisions of "LipidXplorer MFQL"

From LipidXplorer
Jump to: navigation, search
(The principle of the lipid identification process)
(How does LipidXplorer run multiple queries)
Line 473: Line 473:
 
</pre>
 
</pre>
  
== How does LipidXplorer run multiple queries ==
+
== How LipidXplorer runs multiple MFQL queries ==
  
 
The principle of a LipidXplorer Run is the following: All queries run successively on the given  
 
The principle of a LipidXplorer Run is the following: All queries run successively on the given  

Revision as of 12:19, 21 January 2011

Introduction

MFQL is the first query language developed for the identification of molecules in complex shotgun spectra datasets. It formalizes the available or assumed knowledge of lipid fragmentation pathways into queries that are used for probing a MasterScan database.

Structural complexity of lipid species and sum composition constraints

Structural complexity of lipid species and sum composition constraints

Figure: Let us consider PC as a representative example: PC molecules consist of a posphorylcholine head group attached to the glycerol backbone at the sn-3 position, while fatty acid moieties occupy sn-1 and sn-2 positions (alternatively, a fatty alcohol moiety could be attached at the sn-1 position). Fatty acid moieties differ by the number of carbon atoms and double bonds, but also by the relative location at the glycerol backbone, so that isomeric structures having exactly the same fatty acid moieties are possible. Note that isomeric structures are always isobaric, whereas isobaric molecules are not necessarily isomeric. Most generic constraints ("All lipids of PC class" or "All PC esters") encompass sum compositions of species with all naturally occurring fatty acids. However, because of the fatty acid variability, some species of other lipid classes (such as, PE) might meet the same constraint. Therefore, for most common glycerophospholipid classes, the characterization of individual molecular species could not solely rely on their intact masses, irrespective of how accurately were they measured. MS/MS experiments that produce structure-specific ions contribute more specific constraints, such as the number of carbons and double bonds in individual moieties, characteristic head group fragment, characteristic loss of a fatty acid moiety, among others. Within a MFQL query, these constraints can be bundled by Boolean operations.

A short tutorial

Below we present an example of composing a MFQL query for identifying PC lipids in a typical shotgun dataset.

In MS/MS experiments (see #MFQL identification of phosphatidylcholines (PC)), molecular cations of PC species produce specific phosphorylcholine fragments of their head group having the sum composition of 'C5 H15 O4 N1 P1' and m/z 184.07 (see #MFQL identification of phosphatidylcholines (PC)). The identification of PC species starts with the identification of probable precursors in the MS spectrum using accurately determined masses and proceeds with identifying phosphorylcholine headgroup fragment in the MS/MS spectra (see #MFQL identification of phosphatidylcholines (PC)).

A query for a Phosphatedylcholine lipid (PC) could be:

  • Find all precursor masses, which fit into the following set of sum compositions: "C[30..48] H[30..200] O[8] P[1] N[1]" and
  • look if there is the "C5 H15 O4 P1 N1" fragment (or m/z 184.07) in its MS/MS spectrum.
  • if those two conditions hold, we identified a Phosphatedylcholine and can report the lipid species

MFQL identification of phosphatidylcholines (PC)

MFQL identification of phosphatidylcholines (PC)

Figure: The chemical structure of PC is shown in the figure above. Upon their collisional fragmentation, molecular cations of PC produce a specific head group fragment with m/z 184.07 and sum composition 'C5 H15 O4 P1 N1'. A: MS spectrum acquired by direct infusion of a total lipid extract into a QSTAR mass spectrometer (inset). All detectable peaks were subjected to MS/MS. The spectrum acquired from the precursor m/z 788.5 (designated by the arrow) is presented at the lower panel. The precursor ion was isolated within 1 Da mass range and therefore several isobaric lipid precursors were co-isolated for MS/MS and produced abundant fragment ions unrelated to PC. These ions were disregarded by this MFQL query and did not affect PC identification. B: MFQL query identifying PC species, details are provided in the text. C: screenshot of the output spreadsheet file; column annotation and content is determined by REPORT section of the above MFQL, see also text for details.


For better illustration of the structure of MFQL and the meaning of the different command lines we explain in the following the example script for identification of PC lipid specie. First, let us assign a name to the query:

QUERYNAME = Phosphatidylcholine;

Next, we define the variables used for identifying the species. Our query should identify the singly charged PC head group fragment and therefore:

DEFINE
headPC = 'C5 H15 O4 N1 P1' WITH CHG = +1;

The keyword CHG states the charge of the ion.

In a shotgun experiment not all fragmented peaks will originate from PCs. For higher search specificity we next define precursors (prPC), who are expected to produce headPC fragment in MS/MS spectra. We impose the sc-constraint on precursor masses: besides sum composition requirements, it requests that precursors are singly charged and their unsaturation (expressed as a double bond equivalent with the keyword DBR) is within a certain (here from 1.5 to 7.5) range:

DEFINE
prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH CHG = +1, DBR = (1.5, 7.5);

Next, the IDENTIFY section specifies that prPC precursors should be identified in MS spectra and headPC fragments in MS/MS spectra, both acquired in positive mode. The logical operation AND requests that headPC should only be searched in MS/MS spectra of prPC

IDENTIFY
   prPC IN MS1+ AND
   headPC IN MS2+

We further limit the search space by applying optional project-specific compositional constraints formulated in the next SUCHTHAT section. For example, it is generally assumed that mammals do not produce fatty acids having an odd number of carbon atoms. Therefore, it is likely that if a recognized lipid comprises an odd-numbered fatty acid moiety this identification is false.

SUCHTHAT
    isEven(prPC.chemsc[C]);

In this case the operator isEven requests that candidate PC precursors should contain an even number of carbon atoms. Since the head group of PC and the glycerol backbone contain 5 and 3 carbon atoms, respectively, this implies that a lipid could not comprise fatty acid moieties with odd and even number of carbon atoms at the same time. By executing the DEFINE, IDENTIFY and SUCHTHAT sections LipidXplorer will recognize spectra pertinent to PC species. The last section REPORT defines how these findings will be reported. This includes annotation of the recognized lipid species, reporting the abundances of characteristic ions for subsequent quantification and reporting all additional information pertinent to the analysis, such as masses, mass differences (errors) etc. LipidXplorer outputs the findings as a *.csv file in which identified species are in rows, while the columns content is user-defined. In this example we define 5 columns: NAME - to report the species name; along with four peak attributes such as: MASS - species mass; CHEMSC - chemical sum composition; ERROR - difference to the calculated mass; INTENS - intensities of the specified ions reported for each individual acquisition.

REPORT
   MASS = prPC.mass;
   NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";
   CHEMSC = prPC.chemsc;
   ERROR = "%dppm" % "(prPC.errppm)";
   INTENS = prPC.intensity;
   FRAGINTENS = headPC.intensity;;


It is also possible to define mathematical terms or use certain functions, such as text formatting, on these attributes. The text format implies two strings separated by % , where the first string contains placeholders and the second string their content. This formatting is used in the NAME string such that the actual annotation convention remains in the users discretion. In this example two placeholders %d of the lipids class name PC [%d:%d] are filled with the number of carbon atoms and double bonds in the fatty acid moieties. The number of carbon atoms is calculated by subtracting the sum composition of headPC from the precursor prPC and subtracting 3 for carbons in the glycerol backbone (Figures 5 and 6).

General rules in MFQL queries

  1. Everything written after # is ignored by the interpreter. This function is used for writing comments in the code.
  2. Every line has to end with ;
  3. Every query has to end with an extra ;


The structure of an MFQL query

A MFQL query consists of 3-4 sections:

1. DEFINE: defines sum compositions, sc-constraints (see also #sc-constraints), masses or groups of masses and associates them to user defined names.

2. IDENTIFY: determines where and how the DEFINE content is applied. It usually encompasses searches for precursor and/or fragment ions in MS and MS/MS spectra

3. SUCHTHAT: is optional. It defines constraints that are formulated as mathematical expressions and inequalities, numerical values, peak attributes (see Supporting Information S-4), sum compositions and functions. Several individual constraints can be bundled by logical operations and applied together.

4. REPORT: establishes the content and format of the output

After REPORT there is a list of variables (MASS, NAME, ...) which represent columns in the output file. Each columns content is defined after the =. More on the REPORT will be found in the REPORT chapter.

SC-constrains

For dealing with sets of chemical sum compositions LipidXplorer uses a special format which is called sum composition constraint (sc-constraint). With sc-constraints it is possible to specify a class of lipids. It is like a collection of chemical sum compositions. It is used for several functions, especially for screening tasks or multiple scans. Its format is self-explanatory. Here is an example:

'C[38..54] H[30..130] O[10] N[1] P[1]' WITH DBR=(2.5,9.5), CHG = -1;
  • DBR means 'Double Bond Range' and specifies a range of the number of the possible double bonds.
  • CHG states the charge. If the charge is set to zero then the sc-constraint will be threat as a collection of neutral losses.

The 4 sections of a MFQL query

Part 1: Definition of sum composition, sc-constrains and masses

The first statement of any query is

QUERYNAME = <name of the query>

to give the query a unique name.

Next, variables are defined. It's syntax is

DEFINE <variable name> = (<chemical sum composition> | <sf-constraint> | <mass>) (WITH (<option> = <value>)+)?

After the keyword DEFINE comes the name of the variable followed by equation sign and its content. This can be either a chemical sum composition, a sc-constrain or a list of sum compositions. Sum compositions and sc-constraints are written in single quotes. Then there can be a WITH followed by certain options. The options can be:

  1. DBR is the double bound range of a sf-constrain. It is a 2-tuple with the minimum and the maximum double bounds which is allowed for the sc-constrain.
  2. CHG states the charge

If the fragment should be a neutral loss, this can be stated by setting the charge to zero with CHG = 0 or by writing AS NEUTRALLOSS after the sum composition or sc-constrain.

NOTE: The neutral loss is calculated always between the precursor mass and the fragment, never between two fragments.

examples

Define PC-O sc-constrains and PC-O's head group which is connected to the precursor mass:

DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;

Define PE sc-constrains and PE's head group which is connected to the precursor mass:

DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;
DEFINE peHead = 'C2 H8 O4 N1 P1' AS NEUTRALLOSS;

Define sc-constrains and fragments for PE-Plasmalogen:

DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;

An arbitrary number of variables can be defined, but they are only valid for the current query. I.e. they are not valid in other queries of the same Run.

Part 2: The IDENTIFY section

The before defined variables are queried to the experiment database. The syntax is:

IDENTIFY

<identification 1> AND
<identification 2> AND
...
<identification n>

The headline 'IDENTIFY' is followed by identifications which are connected by 'AND'. The result of an identification can be a singleton or a set, i.e. for some variables more than one mass is identified. This holds especially for sc-constraints. This section is the first filtering step. The section returns True if the boolean expression is true. The expression is true if the particular expressions are true:

An identification looks like this:

((<variable name> IN (MS1+/-|MS2+/-) (WITH (<option> = <value>,)+)?

Here does LipidXplorer check the existence of certain masses/fragment masses. The scope (level of MS) is stated after 'IN': The 'MS1+', 'MS1-', 'MS2+' and 'MS2-' tags point to the MS level where to look for the sum composition ('MS1+' means in positive MS, while 'MS2-' means in negative MS/MS). Options can be specified after optional 'WITH':

  1. 'TOLERANCE' states the tolerance with which a mass should be identified. Several possibilities for that:
    1. 'ppm' - parts per million
    2. 'da' - Dalton and
    3. 'res' - resolution
  2. 'MASSRANGE' is a 2-tuple constraining the mass of interest.
  3. 'MINOCC' is a float number between 0 and 1 which states the minimum occupation threshold for this mass along all samples, i.e. the percentage occupation of this mass.

For example:

  • A tolerance of 10 ppm would be: "TOLERANCE = 10ppm".
  • "MASSRANGE = (700, 1000)" considers masses only from m/z700 to m/z1000.

Some examples:

# Phosphatedylcholine ether species
DEFINE PR = 'C[30..48] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;
DEFINE pcHead = 'C5 H15 O4 P1 N1' WITH CHG = 1;

IDENTIFY Phosphatidylcholineether WHERE

  # the MS mass should fit to 'PR' and it should have a MS/MS fragment mass fitting to 'pcHead'
  PR IN MS1+ WITH TOLERANCE = 5ppm AND
  # we are not so strict with the tolerance for the low resolution MS/MS spectra
  pcHead in MS2+ WITH TOLERANCE = 250ppm

################################################################################

# Phosphatedylethanolamine 
DEFINE PR = 'C[30..46] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;
DEFINE peHead = 'C2 H8 O4 N1 P1' WITH CHG = 0;

IDENTIFY Phosphatidylethanolamine WHERE

  # marking 
  PR IN MS1+ WITH TOLERANCE = 5ppm AND
  peHead in MS2+ WITH TOLERANCE = 0.5Da

################################################################################

# PE Plasmalogen
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;

IDENTIFY PEplasmalogen WHERE

  # marking
  PR IN MS1+ WITH TOLERANCE = 5ppm AND
  FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND
  FRAG2 IN MS2+ WITH TOLERANCE = 500ppm

Part 3: The SUCHTHAT section

After the collection of specific masses, it is possible to add more constraints to the query. For example: the identification of PE Plasmalogen requires the marking of 'FRAG1' and 'FRAG2' which both contain several possibilities since they are sc-constraints (see example above) and a test if those two fragments in sum match the precursor mass, i.e. is "FRAG1 + FRAG2 == PR"? Such a constraint is formulated in the optional 'SUCHTHAT' section as boolean connected equations, unequations and functions. The syntax is:

SUCHTHAT
(((NOT)? (<equation> | <unequation> | <function>)) |
((NOT)? (<equation> | <unequation> | <function>) (AND | OR))+) (WITH (<option> = <value>)+)?

The terms can be build up with the basic mathematical functions +, -, *, /. Parenthesis can also be used. The terms are connected as equations by '==' and as inequalities by '<', '>', '<=', '>=' and '!=' for not equal. The values for the terms can be marked masses (given with their variable name), floating point numbers or chemical sum compositions. Certain attributes of marked masses can be also addressed. This can be done by writing the attribute after the variable name connected with a dot. The intensity of the peak 'PR' for example is addressed as PR.intensity. A list of peak attributes can be found here: #List of peak attributes

Functions

Additional to the attributes, SUCHTHAT supports the use of functions. The list of all functions can be found here: #List of functions

Part 4: The REPORT section

All successful identifications are piped to the REPORT section, where the format of the output is specified. In general the REPORT consists of a list of variables where each represents a column. The content of the variable is the content of the column. So is the following code generates a column with the name MASS and the m/z values of PR's identified species as content:

REPORT
  MASS = PR.mass

The next example reports the sum of the intensities of two fragments

REPORT
  INTENS = frag1.intensity + frag2.intensity

Mostly those fragments can be the same (so for example for 2 fatty acid scans), therefore LipidXplorer has a special function which does not sum intensities of same fragments:

REPORT
  INTENS = sumIntensity(frag1, frag2)

The syntax of REPORT is:

REPORT
((<variable name> = <variable> | <equation>)

The content of the variable can be any attribute and/or term as in the SUCHTHAT section. The REPORT section has an additional feature with which it is possible to generate lipid names or other formatted strings.

The syntax for this function is:

REPORT
(<variable name> = "<format string>" % "<list of variables for the format string>"),)*

The string format works as follows: there are two strings to give which are separated with a %. The first string contains the output format, i.e. a string with placeholders. Placeholder can be: %d for decimal values, %.nf for floating point values with n decimals and %s for string values. The second string contains a list with the content of the placeholders according to their order. For example:

REPORT
  LIPIDNAME = "PC [%d:%d]" % "(fa1PC.chemsc[C] + fa2PC.chemsc[C], fa1PC.chemsc[db] + fa2PC.chemsc[db])"

The variable LIPIDNAME contains the string "PC [... : ...]". The first decimal value is filled with the sum of the carbon atoms of both fatty acids (fa1PC, fa2PC) and the second decimal value the sum of the double bonds. The output could be for example "PC [36:2]".

The format string variant is a Python gimmick, where MFQL uses standard Python commands. I.e. the format string is a python function (see here for more information).

Notes

  • If a lipid was not found in a particular sample, its intensity is set to zero.
  • If the isotopic correction corrects an intensity to zero or less than zero, it is set to '-1'

List of peak attributes

error

The difference between the theoretical mass (according to the sum composition) and the tagged mass from the spectrum. The error can be given in the 3 types:

  1. errppm -> error in ppm
  2. errda -> error in dalton
  3. errres -> error as resolution value

mass

The m/z value of the peak

chemsc

The chemical sum composition. For addressing certain elements of the sum composition, the element is to write in brackets after .chemsc. To get the number of C atoms from a formula for example:
PR.chemsc[C]
  1. frsc -> the chemical sum composition of the fragment. If the peak is a fragment, it is the same as chemsc, if it is a neutral loss, it returns the sum composition of the fragment.
  2. nlsc -> the chemical sum composition of the neutral loss. If the peak is a neutral loss, it is the same as chemsc, if it is a fragment, it returns the sum composition of the neutral loss of the precursor.

intensity

All the intensities of a mass from all the samples it occured. Note that intensity is mostly no single value but a list of intensities. One list entry for every sample the peak was found. If used in an equation or unequation, the whole list is considered. I.e. PR.intensity > 10000 is true if and only if all intensities are greater than 10000. It is possible to address only a part of all samples. This is done by writing the name of the sample group as string with wildcards (* and/or ?). E.g. is PR.intensity["*blanck*"] returning just the samples with the string blanck in their name. This could be all blanck samples. This feature allows to generate sample groups by naming the samples according to their group. So, a lot of different constraints can be stated, which increase the accuracy of the interpretation or even already interpret the result. E.g.

 avg(PR.intensity["*blanck*"]) < avg(PR.intensity["*exp*"]) / 100 

This statement asserts that the one percent of the average intensity of all experimental samples ("*exp*") should be greater than the average intensity found in the blanck sample. This simply throws out every "lipid", which is obviously noise.

binsize

The size of the bin of the peak coming from the averaging algorithm. The value is given in Dalton.

occ

Is the occupation of the peak. Occupation = nb. of occurences in the sample / nb. of samples

List of functions

isEven(n)

where n is an integer value. The function returns True, if n is even. E.g.: isEven(PR.chemsc[C]).

isOdd(n)

where n is an integer value. The function returns True, if n is odd.

avg(v.intensity)

where n is a variable. The function returns the average value of the intensities of n. E.g.:
avg(PR.intensity)

isStandard(v, scope)

where v is a variable and scope is "MS1+", "MS1-", "MS2+" or "MS2-". This function is special since it does not return anything. It enables the automatic calculation of standardizied intensities according to the given standard in v. I.e. Every intensity is calculated as relative to v.

sumIntensity(f1, f2, ...)

The function sumIntensity() is used for summing up intensities of different MS2 entries where multiple peaks are required for identification and quantification. In case of fragments with isotopic corrected place holders (see above)the following rules were implemented.

If all MasterScan entries in the MS2 for a particular molecule are place holders (i.e. all are set to '-1') then those values are just added and will result in <math>n_i\times -1</math> where <math>n_i</math> is the number of the attributes.

If there is just one entry whose intensity is greater zero all <math>-1</math> place holders are threaded as zero and not added to the overall sum. In the presented example we assume that two entries in the MS2 where used for the sumIntensity() function:

<math>F1 + F2 -> sumIntensity(F1, F2)</math>
<math>-1 + -1 = -2</math>
<math> 0 + -1 = -1</math>
<math> 1 + -1 =  1</math>
<math> 2 + -1 =  2</math>
<math> 2 +  0 =  2</math>

That has following consequences when such results have to be interpreted:

A) intensity = 0 in this specific sample none of the required fragments was present

B) intensity < 0 in this sample some of the required fragments were found in the initial MasterScan but set '-1', none fragment above threshold (1) was present

C) intensity = -<math>n_i</math> all fragments were below the threshold (1) after isotopic correction

D) intensity > 0 in this case at least one of the required fragments was after isotopic correction above the threshold (1)

Some examples

SUCHTHAT
# the number of 'C' atoms in 'PR's chemical sum composition should be odd
isOdd(PR.chemsc[C])

SUCHTHAT
# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to
# the precursor mass ('PR') with a tolerance of 0.5 dalton and
# the intensity of 'FRAG2' should be bigger than 3/10th of the
# the intensity of 'FRAG1' 
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND
FRAG1.intensity * 3 < FRAG2.intensity * 10

How LipidXplorer runs multiple MFQL queries

The principle of a LipidXplorer Run is the following: All queries run successively on the given MasterScan. For every query, LipidXplorer iterates through the list of MS masses of the MasterScan from smallest to the greatest and checks the conditions given in definition, IDENTIFY, SUCHTHAT and REPORT sections. I.e.

  • it loads a MS mass
  • it checks if it fits a given sum compostion or sc-constrain (definition and IDENTIFY section).
  • it looks into its MS/MS spectrum (if provided) and does the same (definition and IDENTIFY section).
  • the boolean constraints are checked (SUCHTHAT section) and if the result is

positive the MS mass is accepted and send to the REPORT section

(Multiple) Precursor Ion Scan / Neutral Loss Scan

The IDENTFIY part emulates precursor ion scans (PIS) and neutral loss scans (NLS). If the variable is a sc-constrain it emulates multiple PIS/NLS. Switching from PIS to NLS is done in the definition part. When a variable gets charge zero (CHG = 0) or the keyword AS NEUTRALLOSS is given then it is stated as neutral loss. Otherwise it is stated as (fragment) mass.

Examples

Screen (without MS/MS experiments) for Phosphatidylcholine species

A "screen" is a fast identification based on only MS information. To do screening properly the masses should be high accurate, because otherwise the error of identification is too high.

The name of the query here is Phosphatidylcholine. Giving a name to a query is obligatory and has to be done for every query. We define the sc-constraint prPC (short for "precursor of PC") and state that it should be found in the positive MS spectra.

Names for variables are arbitrary. The user should try to give meaningful names in order to understand his query better.

The IDENTIFIY section urges LipidXplorer to look for the precursor mass into the MS spectrum.

In SUCHTHAT we use a function to restrict the result to lipids having an overall even number of carbon atoms. This means that the fatty acids of the lipid have to have both fatty acids even numbered or both odd numbered. Such, we can sort out lipids which we know they should not be in the organism we examine.

The REPORT section uses the following variables:

  • 'MASS' returns the m/z value of the MS mass
  • 'NAME' returns the lipid species' name, which consists of the number of carbon atoms and double bonds of the fatty acids. Those numbers we get from taking the number of carbons/double bonds from the sum composition (prPC.chemsc[C]/prPC.chemsc[db]) and reduce it by the carbons/double bonds belonging to the PC's head group and glycerol backbone.
  • 'CHEMSC' returns the chemical sum composition
  • 'INTENS' returns the abundance of the identified lipid species for all samples
  • 'ERROR' returns the error of the finding in ppm.
##########################################################
# Identify PC with checking the precursor mass #
##########################################################

QUERYNAME = Phosphatidylcholine;
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (2.5,9), CHG = 1;

IDENTIFY

  # marking
  prPC IN MS1+

SUCHTHAT
  isEven(PC.chemsc[C])

REPORT 
  MASS = prPC.mass;
  NAME = "PC [%d:%d]" % "((prPC.chemsc)[C] - 8, (prPC.chemsc)[db] - 5)";
  CHEMSC = prPC.chemsc;
  INTENS = prPC.intensity;
  ERROR = "%2.2fppm" % "(prPC.errppm)"; ;

################ end script ##################

The output of the query is the following:

OuputScreenShot

This is a screen shot of spread sheet software holding the resulting data from the query. At the top are the variable names followed by the name of the query, then comes the content. Note, that for 'INTENS' the file name from which the sample data was taken is also written. Every entry in the result fulfills the constraints given in the query. If an expected value is not found then the query or the import settings should be refined.

In-depth analysis for Phosphatidylcholine species in MS and MS/MS mode

Additionally to the former query we have a variable 'headPC' which contains the sum composition of the specific head group for PC which is found in the fragment spectra after MS/MS of a PC species. This variable is added as constraint in IDENTIFY. Thus a lipid is only identified if it fits to the constraints of prPC AND has a headPC fragment in its MS/MS spectrum. Again, we test the even numbers of carbons in SUCHTHAT, which ensure we do not find borderline masses, which actually cannot be in the sample. In the output we have additionally the abundance of the head group fragment with FRAGINTENS.

##########################################################
# Identify PCs with checking  the precursor mass         #
# AND check for PIS 184 in MS2                           #
##########################################################

QUERYNAME = Phosphatidylcholine;
DEFINE prPC = 'C[30..48] H[30..200] N[1] O[8] P[1]' WITH DBR = (1.5,7.5), CHG = 1;
DEFINE headPC = 'C5 H15 O4 P1 N1' WITH CHG = 1;

IDENTIFY
        
        # marking
        prPC IN MS1+ AND
        headPC in MS2+

SUCHTHAT
        
        isEven(prPC.chemsc[C])
  
REPORT 
    MASS = prPC.mass;
    NAME = "PC [%d:%d]" % "((prPC.chemsc - headPC.chemsc)[C] - 3, prPC.chemsc[db] - 1.5)";
    CHEMSC = prPC.chemsc;
    ERROR = "%2.2fppm" % "(prPC.errppm)";
    INTENS = prPC.intensity;
    FRAGINTENS = headPC.intensity;;
        
################ end script ##################

A more complex example for PE-plasmalogen

An example for a whole script:

###########################################################
##### find PE-plasmalogens with MS2 in positive mode ######
###########################################################

# define sf-constrains and fragments for PE-Plasmalogen
DEFINE PR = 'C[30..46] H[30..200] N[1] O[7] P[1]' WITH DBR = (1.5,8), CHG = 1;
DEFINE FRAG1 = 'C[14..26] H[20..80] O[3]' WITH DBR = (1.5,9), CHG = 1;
DEFINE FRAG2 = 'C[14..26] H[20..80] N[1] O[4] P[1]' WITH DBR = (1.5,9), CHG = 1;

IDENTIFY PEplasmalogen WHERE

# marking
PR IN MS1+ AND
FRAG1 IN MS2+ WITH TOLERANCE = 500ppm AND
FRAG2 IN MS2+ WITH TOLERANCE = 500ppm

SUCHTHAT

# the sum of both fragments ('FRAG1', 'FRAG2') minus one 'H' should be equal to
# the precurosor mass ('PR') with a tolerance of 0.5 dalton and
# the intensity of 'FRAG2' should be bigger than 3/10th of the
# the intensity of 'FRAG1' 
FRAG1 + FRAG2 - 'H1' == PR WITH TOLERANCE = 0.5Da AND
FRAG1.intensity * 3 < FRAG2.intensity * 10

REPORT

# first column is the precursor mass
MASS = PR.mass,

# second is the lipids name generated with Python's string formatting function
NAME = "PE-O [%d:%dp / %d:%d]" % "(FRAG1.frsc[C], FRAG1.frsc[db] - 2, FRAG2.frsc[C], FRAG2.frsc[db] - 2)",

# third is the precursor's chemical sum composition
CHEMSC = PR.chemsc,

# forth the intensity
INTENS = PR.intensity,

# fifth the sum of the error of both fragments in ppm
ERROR = FRAG1.errppm + FRAG2.errppm;;

More Examples

More examples can be found in the MFQL collection provided in the LipidXplorer wiki.