LipidXplorer Reference

From LipidXplorer
Revision as of 16:12, 4 March 2011 by Schwudke (talk | contribs) (*.mzXML)
Jump to: navigation, search

LipidXplorer Import

Supported file formats

*.mzXML

mzXML is a XML (eXtensible Markup Language) based common file format for mass spectrometric data. [Pedrioli PG et al., Nat. Biotechnol. 22 (11): 1459, 66 doi) (Lin SM et al., Expert review of proteomics 2 (6): 839, 45, PMID) Not all mass spectrometers directly produce mzXML files but there are several tools available that generate mzXML files from native acquired files. An open source project known as Sashimi (SASHMI) offers a collection of converter programs for some common mass spectrometric file formats. Currently there are converters available:

  • for Thermo Scientific Xcalibur *.raw files: ReAdW,
  • for Waters MassLynx *.raw filesMassWolf and
  • for Sciex/ABI Analyst *.wiff files mzWiff.

LipidXplorer provides automatic conversation of data from ThermoFinnigan (Orbitrap) and Applied Biosystems (QStar) provided that the instruments software is installed on the same computer as LipidXplorer.

Import of peak lists of MS/MS in *.dta format and MS in *.csv

As an easy way to make the functionality of LipidXplorer available for a wide range of mass spectrometric platforms is to provide the ability to import pre-processed peak-lists. Many vendors enable the functionality in their software to create *.dta files of MS/MS Spectra. In many instances one might be interested to import also the pre-processed peaklist of the MS1 which we support with the widely used *.csv file format. Both text file formats should be reasonable available as alternative for *mzXml. For the import of *.dta and *.csv files, some pre-conditions have to met: The import files have to be given in a certain directory structure, which is:

                         MasterScan Dir/
                                |
                                |
          ------------------------------------------------------ 
          |                       |                            |
          |                       |                            |
       [neg_]Sample1/      [neg_]Sample2/ ... ... ... [neg_]SampleN/
          |                       |                            |
         /\                       |                           /\ 
       *.csv,                    /\                         *.csv,
[*.dta1, *.dta2, ...]          *.csv,                [*.dta1, *.dta2, ...]
                        [*.dta1, *.dta2, ...] 

The top level directory defines which samples go into the MasterScan database object. This are namely all samples occurring as subdirectories.
A sample directory can contain

 1. a .csv file with the MS data

 2. a .dta files with the MS/MS data
MS precursor intensities are set to a) 1 - when *.dta with this precursor is present b) 0 - when no *.dta with this precursor m/z was found in a sample
  3. one .csv file with the MS data and a number of .dta files containing MS/MS data


IMPORTANT! In the names of the sub-directory folders it should be ciphered if its the data is obtained in positive or in negative mode. This is done as follows:

  • if a directory has 'neg' at the beginning of its name, the according sample is negative.
  • if a directory has 'pos' at the beginning of its name, the according sample is positive.

The names of the samples occurring in LipidXplorer are the names of the sample directories.

Import of MS1 information using *.csv file format

A *.csv file is a comma separated file, i.e. every line in the file contains data which is separated by commas. LipidXplorer will solely recognize *.csv files for importing survey scan information(the MS experiment data) in the following format:

/precursor mass/, /intensity/

The *.csv is utilized for representing the (precursor-)mass spectrum. For example - a section of a *.csv file:

701.4101,20952.3
701.5598,4284.7
702.4135,6333
702.5435,23323.7
703.547,7105.8
703.5752,218373.4
704.5786,81777.7
705.5009,253758
705.528,18535.5
705.5822,8314.5
705.5908,35523.1
706.5044,107847.3
Import of MS/MS spectra using *.dta

Many mass spectrometers software are able to generate a peak lists of MS/MS spectra and save them in the *.dta file format. It contains a peak list table, which has as head the precursor mass in m/z and its charge and the tables content are masses with the according intensity.

/mass/ /intensity/

For example - the content of a *.dta file of the precursor mass 585.9765 with charge +1:

585.9765 1
197.32957 33132.1
197.33095 12631.7
568.45007 241767.3
569.29065 14319.8

Importing mass spectra into LipidXplorer

LipidXplorer can import spectra acquired in profile mode and in centroid mode. Internally it only works with centroid data, which we also call peak lists. This means that data given as profiles is converted to centroided data.

If the spectra are given in mzXML file format, all which should be put in one MasterScan (see #The_MastersScan_database) should also be in one folder. The folder is the information which is given to LipidXplorer to import the spectra. IMPORTANT: *.mzXML files have to be centroidized to achieve correct import with LipidXplorer.

If the spectra are given in *.csv/*.dta file format, follow the instructions given in #Import_.2A.dta_.2F_.2A.csv_files. Also here, the folder where all the peak lists are contained is the input for the LipidXplorer import.

Choose the folder with your mass spectral data by pressing the green 'Browse' button or drag the folder into the text field with your mouse. LipidXplorer will fill the fields for the target MasterScan file automatically. To change this press 'Browse' next to the file.

Select a machine specific configuration from the Select configuration list, edit the settings and store them in the configuration file.

The import starts with pressing 'Start import'.

The tab contains various possibilities of specifying mass spectrometric attributes. The configurations are stored in an *.ini file. There is a standard *.ini file provided, but by pressing 'Browse' next to the *.ini file, the user can select an own file.

Machine specific settings

For all settings holds that '0' switches it off.

selection window: describes the size of the window which is used by the mass spectrometer to select the precursor for fragmentation. The size of a given selection window w of a peak p is <math>[p-\frac{w}{2}, p+\frac{w}{2}]</math>. The value w has to be given in Dalton.

timerange: defines time window for all spectra which should be imported. It is a tuple with (start time, end time) with the time is given in seconds.

calibration masses: a list of standard masses can be given here, which are used for a linear offset correction in MS and MS/MS spectra. The standard masses are searched in the spectra within an allowed error given in tolerance. If found, the mass error is used to calculate and apply a mass shift through the whole spectrum. If more than one mass is given, a linear function connects the shift values.

massrange: restrict the imported masses. This helps to decrease import time, resources the speed of lipid identification.

resolution: the resolution of the mass spectrometer in MS and MS/MS mode. This value is used in the import for the spectra averaging and alignment. Both algorithms consider m/z values as equal if they are closer than the resolution allows.

tolerance: The tolerance value is the error LipidXplorer allows for a lipid to be identified. The unit has to be given in parts per million (ppm) or Dalton (Da).

threshold: is the minimum intensity a peak has to have to be in the MasterScan. Be aware that the intensity values may be different in your mzXML file than in your mass spec software (like Analyst or Xcalibur)! Note that for the threshold value the peak intensity is read from the mzXML file and not from the original .wiff or .raw files. All the other peaks below threshold are dismissed. LipidXplorer corrects the threshold value by dividing it with the square root of the number of scans in the given mass spectra time segment. This is due to the increase of information with more scans. The central limit theorem is used to model this.

min occupation: it states the minimum relative number of acquisitions where a mass has to occur. For example: a min occupation of 0.5 states, that each ion should be present in at least 50% of all samples.

resolution gradient: is the gradient of the machines resolution in MS and MS/MS mode. E.g., a value of -78.5 means that the resolution decreases about 78.5 with every increase of 1 m/z. This simulates a typical behavior of mass spectrometers. The resolution decreases with higher masses. On Orbitrap machines we discovered a decrease of 50,000 from m/z 300 to m/z 1200. This gradient value increases the accuracy of the spectra alignment.

MS1 offset: All MS1 m/z values will be shifted by this value. The value has to be given in Da.

PMO: The Precursor Offset Correction (PMO). This value shifts m/z values of the precursors from the fragment spectra. The direction of the shift is given by a positive or negative prefix. This offset does not shift the survey scan m/z values. It shifts the precursor masses before the fragment spectra are associated to their survey scan mass. If you import only *.dta files without a *.csv file, for example, it will mess your data. In this case use MS1 offset.


Note that the tolerance settings in LipidXplorer are used as follows: a theoretical mass m measured with a given tolerance a fits to a peak p if <math>m \in [p-a,p+a]</math>.

The same holds for resolution R: two peaks p1 and p2 are considered equal if <math>p_1 \in [p_2-r, p_2+r]</math> where <math>r=\frac{p_1}{R}</math>

store all settings in a configuration

All settings can be stored under a user specified name with Save As .... Save ... saves an already stored setting. Delete deletes a setting. All configurations are stored in the *.ini file which is stated under Select *.ini configurations file. With Browseone can choose another or a new file.

Run queries on the MasterScan

MFQL scripts are used for lipid identification, after the spectra data was imported. Therefore MFQL queries are written in so-called *.mfql files (with the ending *.mfql) where each file should contain just one query. The GUI panel Run is the site where *.mfql files are loaded and run on the MasterScan file.

The Run panel

The big window on the left contains all *.mfql scripts which are used for the lipid identification. This window is managed by the the buttons on its right side:

  • Add MFQL File will add one file
  • Add MFQL Directory lets you chose a directory containing *.mfql files which are all uploaded.
  • Edit MFQL Entry opens an editor panel for the *.mfql entries selected in the left window. Select *.mfql scripts by clicking on it.
  • New MFQL Entry opens an editor panel with an empty *.mfql file. A prompt will open and ask you about the name of the file.
  • Remove MFQL Entry removes all entries which are selected in the left window.

After choosing your *.mfql files, the MasterScan has to be chosen. This is done by clicking on the green Browse button or by dragging the MasterScan file or the folder in which it is onto the text field. The output file is automatically filled, but can be changed by clicking on the grey Browse button.

Under Optional settings for this run you can change the tolerance settings for the particular run. This option will override the tolerance settings you gave in the Import panel. There, the settings are stored in the MasterScan and used by default, whenever you run this MasterScan. But maybe you want to try another setting, or another and so on, then you can set this here. But the values hold only for this particular run and will not permanently override the settings in the MasterScan.

Isotopic correction for MS and MS/MS can be switched on and off on the lower site of the panel. There is also the option for generation of a complement MasterScan. This is a spectral database containing all entries from the original chosen MasterScan but the identified entries together with their isotopes.

The options No head and Compress change the format of the output slightly. No head removes the head of the output file and Compress removes the names of the queries in the output file. This can be helpful if you want to do some automatic post-processing. The option Tab limited changes the output format from comma separated file format to tab separated file format.

Dump MasterScan lets you write down the content of the MasterScan experimental database to a comma separated file. The MasterScan has its own data format and cannot be viewed by any software. If you want to have a look into it, you need to dump its content into a readable file format. Dump MasterScan will do this in parallel with the Run of your queries, i.e. if you check Dump MasterScan and press Run LipidXplorer the MasterScan will be dumped into a text file (*.csv file format) which you can read out easily with Excel, for example. But MFQL queries are not necessary, to dump the MasterScan.

With Run LipidXplorer the lipid identification is started. The result is saved in the output file. With the View button this file can be viewed on the spot. With View dump file the *.csv file of the MasterScan can be viewed.

The Editor panel

With the editor it is easy to write queries for LipidXplorer. Every query is opened in a separate tab. If a file is edited the Save button changes the color to red to remind the user to save to file before using the query. SaveAs will store the query under a certain file name and Close will close the tab.

The MS-Tools panel

The MS Tools tab contains a small collection of useful functions:

Mass vs. Sum Composition

Calculates either the sum composition out of a given m/z value or the other way round.

Mass-to-sum-composition

Input a m/z value under m/z value and an sc-constraint under sc-constraint or sum composition. lDB is the lower border and hDB the higher border of the double bond equivalent. In chg the charge has to be given and in acc the tolerance value in ppm. Then press Mass-to-sum-composition and the result will be shown in the text window below.

Here an example:

MS-tools-example.PNG
Sum-composition-to-mass

Input a sum composition in sc-constraint or sum composition and a charge in chg. Then press Sum-composition-to-mass and the result will occure in the window below.

Isotopes of molecules

shows the abundances of the isotopes of a given sum composition. Those values are the ones used in LipidXplorer for isotopic correction. Here the user can double check if everything is working properly.

Isotopic distribution of MS masses

Input a sum composition under Ion sum composition and press Get Isotopic distribution. The list of isotopes is not 100% correct with the masses. This is an estimation used in LipidXplorer. But the abundance values are 100% accurate.

Isotopic distribution of MS/MS masses
LipidXplorer Intrascan Isotopic Correction

The above scheme depicts the values LipidXplorer uses to correct precursor and fragment masses. The isotopes for the fragments are calculated by multiplying the probabilities of fragments (F) having no, one or more than one isotopes with the probabilities of associated neutral losses (N).

For example does F0N0 mean that there is no isotope in the fragment or the neutral loss. F1N0 is the probability of the fragment having one isotope, where the neutral loss has none. The opposite is F0N1 which is the probablility of the fragment containing no isotope because it is contained in the neutral loss.

LipidXplorer Intrascan Isotopic Correction in MS-Tools

In MS-Tools the probablities of the isotopes as calculated by LipidXplorer can be viewed. If you put a fragment sum composition in Fragment sum composition the corresponding values are shown in the window below (after pressing Get Isotopic distribution) The mass can either be a real fragment or a neutral loss. This is denoted with the checkbox Neutral Loss.