UROP Proceedings 2022-23

School of Engineering Department of Computer Science and Engineering 97 Generative AI Supervisor: CHEN, Qifeng / CSE Student: AU, Kwan Wo / DSCT Course: UROP1000, Summer In this project, we are asked to set up a tutorial website to introduce a tool of generative AI for general public to use. I have made the website (https://hkust-aigc.github.io/videoCreators/tutorial6/) and provided a google colab notebook for setting up the environment and use the tool to process videos. My tutorial is on All-In-One-Deflicker – a blind deflickering model that is used to deflicker all sorts of videos. In this report, I will include the content on the webpage including introduction of the tool, tuning hyperparameters, and significance of the tool, as well as my tutorial and setups on the google colab notebook. There are also demonstrations on the webpage. Generative AI Supervisor: CHEN, Qifeng / CSE Student: DING, Kangyuan / SENG Course: UROP1000, Summer VideoFusion is a new DPM-based video generation method proposed by Alidamo. Compared to some previous video generating methods (Imagen Video, Make-A-Video, etc.), VideoFusion abandoned the common spatial/temporal super-resolution method and used DPM entirely to generate images and video sequence. In this report, the fundamental mechanisms of video generation using VideoFusion are provided, and the discussion includes techniques such as parameter control. Additionally, during the investigation of video quality, a comparison is made between the keywords of several prompt sets, and the effects are illustrated in detail. Despite positive progress being made, the intended distinctive advantage of employing two jointly-learned networks for improved consistency in video content remains insignificant in practical implementation. Therefore, further deliberation is necessitated to determine specific strategies or usages for optimal results. Generative AI Supervisor: CHEN, Qifeng / CSE Student: HO, Shao Ping / SENG Course: UROP1000, Summer Rerender A Video is a novel zero-shot text-guided video-to-video translation framework which adapts image models to videos. This tool requires no re-training and aims to achieve temporal consistency across video frames to which avoids the common issue of flickering.