UROP Proceedings 2022-23

School of Engineering Department of Computer Science and Engineering 96 Neural Rendering Supervisor: CHEN, Qifeng / CSE Student: WANG, Yifan / SENG Course: UROP1100, Summer During the summer semester, I participated in the UROP1100 project---neutral rendering, supervised by Professor Chen Qifeng. We are asked to build a studio to showcase the outcomes of some projects, learn CS231N from Stanford University by ourselves, and write a tutorial about using the tool in each project. I focused on the FateZero project, which uses fused attention mechanisms to explore zero-shot text-based video editing. Through studying image recognition and generation models and attending the WAIC conference, I developed insights that will be discussed in this report. The experience allowed me to gain valuable research skills and deepen my understanding of artificial intelligence concepts. Deep Video Super-resolution Supervisor: CHEN, Qifeng / CSE Student: CHEN, I Chieh / COMP Course: UROP1100, Spring UROP2100, Summer Image generation technology has been one of the most discussed topics around the world recently, and by selecting prompts with objects and adjectives that we are interested in, we can generate our distinct images with diffusion models. Other works have been trying to apply diffusion models to more specific tasks, such as image editing, and all these applications use text prompts as guidance when generating new images. In this report, we would like to explore how text prompts can affect the quality of image generation, and we will seek different ways to help improve diffusion models' performance with prompt engineering. Deep Video Super-resolution Supervisor: CHEN, Qifeng / CSE Student: FEI, Yang / SENG Course: UROP1100, Summer Text2Video-Zero is a recently proposed approach for generating videos from textual descriptions using pretrained diffusion models. This paper reviews the end-to-end video generation pipeline of Text2VideoZero and discusses techniques like parametric control to enhance video quality. While Text2Video-Zero still shows limitations in temporal coherence and narrative alignment of generated videos, it demonstrates a marked improvement over prior text-to-video generation methods. By investigating into model architecture, prompt engineering strategies to modulate aesthetic properties, and advanced controls for parametric guidance, this paper identifies opportunities to further optimize Text2Video-Zero. While progress has been made, ample prospects remain to address current shortcomings through tailored model selection, prompting formulations, and parametric control.