Detection of Isotopic Peak Series in Low-Resolution Mass Spectra Using Clustering Algorithm and Chi-Square Test

Authors: Lebedev V.V., Pytskii I.S., Buryak A.K. Published: 29.05.2024
Published in issue: #2(113)/2024  

Category: Chemistry | Chapter: Physical Chemistry  
Keywords: mass spectrometry, signal processing, isotopic peak series, DBSCAN clustering algorithm, chi-square test


This paper presents the algorithm for determining whether a peak detected in mass spectra during signal processing belongs to isotopic peak series. The algorithm’s logic implies preliminary grouping of detected peaks into clusters, checking whether the distribution of peak intensities in each cluster matches the selected pattern, and conducting final grouping which takes the position of peaks along m / z axis into account. The features that enhance the resistance of proposed algorithm to negative phenomena, which can make the detection of isotopic peak series in low-resolution mass spectra by existing methods difficult, are described herein in detail. We present the results of algorithm’s functioning with experimental mass spectra of silver(I) chloride and silver(I) bromide used as input. Tested mass spectra were characterized by various negative phenomena that hinder the detection of isotopic peak series. The proposed algorithm is shown to be capable of grouping peaks with the quality similar to existing linear models while avoiding the usage of empirical rules valid only for certain classes of chemical compounds. Since the algorithm requires selection of pattern to model the distribution of intensities in the possible isotopic peak series, we suggest that practical application of proposed algorithm is viable in cases when multiple similar compounds with known pattern of peak intensity distribution are examined using low-resolution mass spectrometer

This work was supported by a grant from the Russian Science Foundation (grant no. 22-13-00266) for the Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences

Please cite this article as:

Lebedev V.V., Pytskii I.S., Buryak A.K. Detection of isotopic peak series in low-resolution mass spectra using clustering algorithm and chi-square test. Herald of the Bauman Moscow State Technical University, Series Natural Sciences, 2024, no. 2 (113), pp. 149--164. EDN: ODEQRN


[1] Bauer C., Cramer R., Schuchhardt J. Evaluation of peak-picking algorithms for protein mass spectrometry. In: Hamacher M., Eisenacher M., Stephan C. (eds). Data Mining in Proteomics. Methods in Molecular Biology, vol. 696. Humana Press, 2011, pp. 341--352. DOI: https://doi.org/10.1007/978-1-60761-987-1_22

[2] Yang C., He Z., Yu W. Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics, 2009, vol. 10, art, no. 4. DOI: https://doi.org/10.1186/1471-2105-10-4

[3] Jurasek P., Slimak M., Kosik M. Determination of isotope cluster patterns in mass spectra of GC-MS Analyses by a chemometric detector. Microchim. Acta, 1993, vol. 110, pp. 133--142. DOI: https://doi.org/10.1007/BF01245097

[4] Treutler H., Neumann S. Prediction, detection, and validation of isotope clusters in mass spectrometry data. Metabolites, 2016, vol. 6, iss. 4. DOI: https://doi.org/10.3390/metabo6040037

[5] Teo G.C., Polasky D.A., Yu F., et al. Fast deisotoping algorithm and its implementation in the MSFragger search engine. J. Proteome Res., 2019, vol. 20, iss. 1, pp. 498--505. DOI: https://doi.org/10.1021/acs.jproteome.0c00544

[6] Tay A.P., Liang A., Hamey J.J., et al. MS2-Deisotoper: a tool for deisotoping high-resolution MS/MS spectra in normal and heavy isotope-labelled samples. Proteomics, 2019, vol. 19, iss. 17, art. 1800444. DOI: https://doi.org/10.1002/pmic.201800444

[7] Boiko D.A., Kozlov K.S., Burykina J.V., et al. Fully automated unconstrained analysis of high-resolution mass spectrometry data with machine learning. J. Am. Chem. Soc., 2022, vol. 144, iss. 32, pp. 14590--14606. DOI: https://doi.org/10.1021/jacs.2c03631

[8] Brenton A.G., Godfrey A.R. Accurate mass measurement: terminology and treatment of data. J. Am. Soc. Mass Spectr., 2010, vol. 21, iss. 11, pp. 1821--1835. DOI: https://doi.org/10.1016/j.jasms.2010.06.006

[9] Urban J., Afseth N.K., Stys D. Fundamental definitions and confusions in mass spectrometry about mass assignment, centroiding and resolution. TrAC, 2014, vol. 53, pp. 126--136. DOI: https://doi.org/10.1016/j.trac.2013.07.010

[10] Gibb S., Strimmer K. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics, 2012, vol. 28, iss. 17, pp. 2270--2271. DOI: https://doi.org/10.1093/bioinformatics/bts447

[11] Li X., Gentleman R., Shi Q., et al. SELDI-TOF mass spectrometry protein data. In: Gentleman R., Carey V.J., Huber W., Irizarry R.A., Dudoit S. (eds). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Statistics for Biology and Health. New York, Springer New York, 2005. pp. 91--109. DOI: https://doi.org/10.1007/0-387-29362-0_6

[12] Ester M., Kriegel H.-P., Sander J., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proc. AAAI Press, 1996, pp. 226--231.

[13] Wei X., Shi X., Kim S., et al. Data dependent peak model based spectrum deconvolution for analysis of high resolution LC-MS Data. Anal. Chem., 2014, vol. 86, iss. 4, pp. 2156--2165. DOI: https://doi.org/10.1021/ac403803a

[14] Pearson K.X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1900, vol. 50, iss. 302, pp. 157--175. DOI: https://doi.org/10.1080/14786440009463897

[15] Lebedev V.V., Buryak А.K. Usage of Kohonen clustering algorithm for rough peak detection during mass spectrum preprocessing. Mass-spektrometria [Mass-Spectrometry], 2022, vol. 19, no. 3, pp. 137--148 (in Russ.). EDN: NVGYFO. DOI: https://doi.org/10.25703/MS.2022.19.15

[16] Pytskii I.S., Buryak A.K. MALDI/SELDI mass-spectrometric surface investigation of AMg-6 and Ad-0 materials. Prot. Met. Phys. Chem. Surf., 2011, vol. 47, iss. 6, pp. 756--761. DOI: https://doi.org/10.1134/S2070205111060165

[17] Lacki M.K., Startek M., Valkenborg D., et al. IsoSpec: hyperfast fine structure calculator. Anal. Chem., 2017, vol. 89, iss. 6, pp. 3272--3277. DOI: https://doi.org/10.1021/acs.analchem.6b01459

[18] R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available at: https://www.R-project.org (accessed: 23.05.2023)

[19] Olson D.L., Delen D. Performance evaluation for predictive modeling. In: Ad-vanced Data Mining Techniques. Berlin, Heidelberg, Springer, 2008, pp. 137--147. DOI: https://doi.org/10.1007/978-3-540-76917-0_9 2008

[20] Goldfarb D., Lafferty M.J., Herring L.E., et al. Approximating isotope distributions of biomolecule fragments. ACS Omega, 2018, vol. 3, iss. 9, pp. 11383--11391. DOI: https://doi.org/10.1021/acsomega.8b01649