UROP Proceedings 2022-23

School of Business and Management Department of Accounting 163 Deep Learning in Natural Language Processing Supervisor: HUANG, Allen / ACCT Co-supervisor: YANG, Yi / ISOM Student: YEUNG, Yat Cheung / QFIN Course: UROP1100, Spring There has been an exponential increase in the volume of financial textual data in the past decades, making NLP an essential tool for large-scale financial documents. BERT, developed by Google, is the state-of-art language model that outperforms traditional machine learning in many NLP tasks. In this project, we will use BERT to train and fine-tune a language model for the finance domain. More specifically we will be doing entity-level sentiment analysis on financial texts. We define "entity" to be Company, Product, and Government bodies. Different from the typical sentiment analysis model which will label an entire passage into one sentiment label. Our model tries to capture the sentiment of each entity that appears in the given texts. Deep Learning in Natural Language Processing Supervisor: HUANG, Allen / ACCT Co-supervisor: YANG, Yi / ISOM Student: ZHANG, Ruixuan / MAEC Course: UROP1100, Spring Large language models (LLMs), which are deep-learning based NLP algorithms like Google's BERT, have been developed in recent years by computer science researchers. LLMs take into account word contexts, such as other words in the same text, and word sequences when summarising texts. These researchers demonstrate that LLM can perform noticeably better than more basic NLP models that depend on a bag-of-words structure in tasks including general text sentiment classification, language translation, and question answering. In this urop project “Deep learning in natural language programming”, we will label the sentiments of a dataset of financial texts that mention multiple entities (companies) and use the dataset to train and fine-tune several NLP algorithms including LLM. One example of such LLM is FinBERT, a pre-trained deep learning NLP algorithm specific to finance domain. Applicants’ roles contains three main parts, labelling texts for NLP task, training and fine-tuning deep learning models for the NLP task and making interpretations about the training outcomes. Through these roles, the project suggests applicants to achieve the learning objective of gaining experience in the lifecycle of data analysis in financial textual analysis, including data cleaning and Python programming and natural language processing. In this urop report, I will first give you a whole view of the tasks included. Then, I will elaborate on the tasks step by step, to let you know in detail about the project progress. lastly, I will analyse some not so good outcomes and wrap up the project.