UROP Proceedings 2021-22

School of Engineering Department of Chemical and Biological Engineering 76 Improving Data Analysis Methods for Shotgun Proteomics Supervisor: LAM Henry Hei Ning / CBE Student: CHUNG Chun Kit / BIEN Course: UROP1000, Summer The rapid development in proteomics promotes a more complete study of the roles of proteins in living organisms. With the limitations encountered by database search and spectral library, spectral archive has emerged as an improved tool to handle and utilize the tremendous amount of data produced from mass spectrometry (MS). This report compares two common deep learning based tandem mass (MS/MS) spectrum prediction tools, Predfull and Prosit, by analyzing their underlying network architectures and performing real searching on the datasets obtained from two species. The comparison shows that Predfull outperforms Prosit with an average 1.42% increase in accuracy. Two reasons are proposed to explain this result. Improving Data Analysis Methods for Shotgun Proteomics Supervisor: LAM Henry Hei Ning / CBE Student: LI Ka Yan / CEEV Course: UROP1000, Summer Tandem mass spectrometry (MS2) is a common method used in proteomics. The identification of peptide sequences in biological samples creates opportunities for advancement in the medical industry, such as early diagnosis of diseases and development of new drugs. Protein sequencing can be achieved by peptidespectrum matches (PSMs), by using automated algorithms to analysis the mass spectra. There are currently three major approaches to identify the mass spectra – database searching, spectral libraries and spectral archives. This project will focus on searching by spectral archives, which is a relatively new approach that is still under development. Making use of Facebook AI Similarity Search (FAISS), 3 basic indexing methods are being selected in constructing spectral archives in this project, to investigate the effects on the time and accuracy of identifications by spectral archives built with: 1) different numbers of indices and 2) different indexing methods.