UROP Proceedings 2022-23

School of Science Department of Chemistry 8 Text Mining of Synthesis Methods of Metal Organic Framework Supervisor: SU, Haibin / CHEM Student: SIU, Chun Hey / CHEM-IRE Course: UROP1100, Fall Metal Organic Frameworks (MOFs), which are formed by mixing metal ions with organic linkers (ligands), have piqued the interest of researchers in recent years due to their numerous practical potentials. The Cambridge Structural Database records more than 80 useable metals, thousands of ligands, and about 5 methods of synthesis for MOFs specifically (Moghadam et al., 2017). There are still significant gaps to fill in order to better the understanding of MOF synthesis. This project extracts and evaluates synthesis methods and reaction conditions from thousands of MOFS experiments. Text Mining and Graph Mining are used extensively on chosen studies to collect a wide range of data for MOFs using personalized automation software. Data Analytics of Homogeneous Transition Metal Catalyzed Reactions Supervisor: SU, Haibin / CHEM Student: LI, Changwen / CHEM Course: UROP1000, Summer This UROP project aims to build a citation network for the team’s research paper on homogeneous nickel catalyzed (HoNiCa) reactions. The project involves analyzing keywords and their trends in the papers, determining the most influential papers, and capturing latent knowledge. Python code was developed to visualize keyword trends, generate word clouds, and provide graphical analysis of the citation network. Cytoscape is the primary tool used for building and visualizing the citation network. ChemBioDraw was employed for data extraction, including drawing chemical structures, and converting them into SMILES notation for training the transformer model. By implementing these methods, we observed an overview of our team’s reference papers, and have a deeper understanding of the research focus and popular topics. Data Analytics of Homogeneous Transition Metal Catalyzed Reactions Supervisor: SU, Haibin / CHEM Student: SHEK, Ching Yee / SSCI Course: UROP1100, Summer The study focus on the evolution pattern of alpha strain of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2, 2019-nCoV), a inflections virus which has caused significant disruption to the world, by analyzing its mutation rate and machine learning. The report presents a workflow that involves the determination of a consensus sequence for each month, followed by graph plotting to analyze the mutation rate over time and population of the virus. Finding that the peak of mutation rate over time lags behind that of the population over time, possible reasons for it is discussed but has not been found. Future directions are also discussed.