UROP Proceedings 2020-21

School of Engineering Department of Computer Science and Engineering 134 Gaining Insights from Noisy Data: Designing Transparent System for Online Decision Making Supervisor: MA Xiaojuan / CSE Student: XU Boran / COMP Course: UROP1100, Summer Computational notebooks are known to be the ideal environment where data science studies are carried out, but a slide deck is still necessary if one wishes to present his/her study to others. Although many efforts have been sought to guide and streamline the presentation of data science works, automated slide generation remains an unexplored field. We propose nb2slide, a system that convert a Jupyter notebook into a presentation slide deck. With the help of machine-learning models and a searching module, the system essentially fills in an existing template using the contents from the notebook. The slide deck generated by the system is satisfactory. Gaining Insights from Noisy Data: Designing Transparent System for Online Decision Making Supervisor: MA Xiaojuan / CSE Student: ZHOU Siyuan / DSCT Course: UROP1100, Summer Online medical crowdfunding platforms (OMCPs) such as GoFundMe1 , Qingsongchou and Shuidichou, serve as online donation platforms where people can seek financial support. Therefore, attracting users' attention to make donation is crucial for campaign sponsor, in which first impression plays a very important role. In our project, we want to answer below questions: RQ1) Can users form consistent first impressions, measured by consequent motivation and projected behavioral intention given images in OMCPs? RQ2) How are impression and donation intention correlated? How people comment differently for low & high donation intention? RQ3) What are the visual features that contribute the most to the first impressions of images in OMCPs? And RQ4) Can we evaluate medical images with the dimension of impressions? My work includes: 1) Crawling down information including images, date, title, tag, location, text from website GoFundMe in order to form a dataset. 2)Doing data cleaning after collecting information in order to eliminate difference caused by spatial and cultural differences when people are rating these pictures afterwards. 3) Using Microsoft Computer Vision API2 and Amazon Rekognition3 to extract features from images in order to form a dataset and help answer RQ3. 4) Filtering out noisy ratings automatically using python based on some criteria from collected image ratings in order to improve regression result. 5) Doing regression analysis to answer RQ2. Gaining Insights from Noisy Data: Designing Transparent System for Online Decision Making Supervisor: MA Xiaojuan / CSE Student: CHEN Zixin / DSCT Course: UROP2100, Summer AutoML is now gaining increasing attention in academia and industry. It automates the selection, composition and parameterization of machine learning models. In this project, we focus on the data scientists’ opinions towards the feature engineering process in AutoML and conduct a user study to analyze people’s attitudes toward both automated and human-based feature engineering. Based on the analysis result, our further work is to provide an approach to facilitate automated feature engineering in data scientists’ works and promote the human-AI synergy.

RkJQdWJsaXNoZXIy NDk5Njg=