UROP Proceedings 2020-21

School of Business and Management Department of Accounting 186 Department of Accounting Machine Learning and Sentiment Classification in Chinese Financial Text Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YOU Haifeng / ACCT Student: HUNG Chak Lung / QFIN Course: UROP1000, Summer The market news and trend could usually affect the sentiment of both financial institutions and individual investors, thus changing their investment decisions and affecting the stock performance. In recent years, thanks to the technological advancements in computer science research, there is a breakthrough in natural language processing, and we can use different NLP models for analyzing subjective information and predicting the sentiment of readers. In this research, the team1 has collected and labelled over 14,000 and 15,000 sentences from analyst reports and company annual reports from SHH and SHZ. After labelling those sentences, I have fine-tuned a pre-trained NLP model – Chinses Bert model for sentiment analysis and predicting readers emotions. Among different fine-tuned Bert models, it is found that the BERT model finetuned by the 3 types of the labelled sentences has the best statistical performance. Machine learning and Sentiment Classification in Chinese Financial Text Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YOU Haifeng / ACCT Student: WANG Meihan / RMBI Course: UROP1000, Summer This project aims to build a sentiment classification model for Chinese financial text in the context of insufficient Chinese financial NLP models. After the preparation which Includes text corpus collection and manual labeling, we chose 3 models to train and compare: Logistic Regression, Naïve Bayes, and Bert (including a base Bert model and a financial Bert model). We collected two sets of corpora for training: one is from the ‘Operation Analysis’ session of different corporate annual reports and the other one is from the analyst reports. The fine-tuned Bert model significantly outperforms others on both annual reports and analyst reports for the three-classification task. The codes and related documents can be found at: https://github.com/mwangbk/UROP2021 Machine learning and Sentiment Classification in Chinese Financial Text Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YOU Haifeng / ACCT Student: LI Xiangyu / ECOF Course: UROP1100, Summer This study investigates the accuracy of the Bidirectional Encoder Representations from Transformers (BERT) model in classifying the sentiment of Chinese financial text. We first label 20,000 sentences from annual reports and analyst reports as positive, negative, or neutral, and then use them to fine-tune and test the Google Chinese BERT. Out-of-sample results suggest that BERT consistently outperforms benchmark bagof-words methods, with higher classification accuracy and better cross-sample generalizability. This performance is stable across years, yet fluctuates across industries. Alternative pretrained BERT models yield qualitatively similar results. The overall findings suggest using BERT to build sentiment factors and guide equity investment.

RkJQdWJsaXNoZXIy NDk5Njg=