Chenfei WU (吴晨飞)
Google Scholar | Github | LinkedIn | cqwuchenfei@163.com
Dr. Wu Chenfei obtained his doctoral degree from Beijing University of Posts and Telecommunications in 2020 and currently is a senior researcher at Microsoft Research Asia. His research focuses on large-scale pre-training, multimodal understanding, and generation. His main research includes a series of multimodal generation models NUWA (NUWA, NUWA-LIP, NUWA-Infinity, NUWA-3D, NUWA-XL), a series of multimodal understanding models (KD-VLP, Bridge-Tower), and multimodal dialogue systems (Visual ChatGPT, TaskMatrix.AI). He published several papers in conferences such as CVPR, NeurIPS, ACL, ECCV, AAAI, MM, with more than 1000 citations. His Github open source projects have been liked more than 30,000 times.
吴晨飞,北京邮电大学博士,微软亚洲研究院高级研究员。研究方向为大模型预训练、多模态理解和生成。主要研究工作包括多模态生成模型 NUWA(女娲)系列(NUWA, NUWA-Infinity, NUWA-XL, DragNUWA)、多模态理解模型 Bridge Tower(桥塔)系列(KD-VLP, Bridge-Tower)以及多模态对话系统(Visual ChatGPT, TaskMatrix.AI)。在 CVPR, NeurIPS, ACL, ECCV, AAAI, MM 等会发表多篇论文,引用量千余次, Github 开源项目获赞三万余次。
Highlight
- Multimodal Generation: GODIVA (Preprint, 2021), NUWA(女娲) (ECCV, 2022, ), NUWA-Infinity (NeurIPS, 2022), NUWA-LIP (CVPR 2023), NUWA-3D (IJCAI 2023), NUWA-XL (ACL 2023), DragNUWA (Preprint, 2023), LayoutNUWA (ICLR 2024), StrokeNUWA (Preprint, 2024).
- Multimodal Understanding: Bridge-Tower (AAAI, 2023), Manager-Tower (ACL, 2023)
- Multimodal System: Visual ChatGPT (Preprint, 2023, ), TaskMatrix.AI (Intelligent Computing, 2024) VL-InterpreT (CVPR, 2022).
Talks
- NUWA: Neural visual world creation with multimodal pretraining. Microsoft Research Summit 2021, October 2021.
- VLP for Text-to-Image Synthesis. VLP Tutorial @ CVPR 2022, Jun 2022.
- 开放报名|顶尖专家联合打造,首个系统化 AI 大模型前沿技术讲习班. 智源社区, March 2023.
- 星辰大海 予力同行 遨游“AIGC+元宇宙”世界,掌行业风口,占赛道先机. 微软科技, March 2023.
- 中国中文信息学会《前沿技术讲习班》- 大模型系列专题 · 深圳站. 中国中文信息学会, Jun 2023.
- A2M 峰会圆满落幕 AIGC 时代下的 AI 落地实践、数据智能和基础架构演进. msup, Jun 2023.
- MLNLP2023@多模态多语言大模型论坛. MLNLP, Sep 2023.
- 中国中文信息学会《前沿技术讲习班》-大模型系列专题·成都站. 中国中文信息学会, Nov 2023.
Media Report
- 微软再扔 AI 聊天画图炸弹!视觉模型加持 ChatGPT,Visual ChatGPT 横空出世. 新智元, March 2023.
- 视觉版 ChatGPT 来了!吸收 AI 画画全技能,MSRA 全华人团队打造,微软 16 年老将领衔. 量子位, March 2023.
- 一个 AI 驱动百万个 API!微软提出多任务处理模型 TaskMatrix,机器人和物联网终于有救了. 量子位, March 2023.
- 微软亚洲研究院多模态模型 NÜWA:以自然语言创造视觉内容. 微软亚洲研究院, March 2022.
- 千万别让富坚义博看到这个-NUWA-Infinity. 量子位, Jul 2022.
- NUWA 系列再添新成员——超长视频生成模型 NUWA-XL. 微软亚洲研究院, Feb 2023.
- 带你穿越清明上河图!DragNUWA 惊艳亮相:一拖一拽让静图秒变视频. 新智元, Sep 2023.
Publications
Multimodal Generation
-
Godiva: Generating open-domain videos from natural descriptions.
Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan.
Arxiv, 2021 -
Nüwa: Visual synthesis pre-training for neural visual world creation.
Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan.
ECCV, 2022. -
NUWA-LIP: language-guided image inpainting with defect-free VQGAN.
Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, Wangmeng Zuo, Nan Duan.
CVPR 2023. -
NUWA-Infinity: Autoregressive over autoregressive generation for infinite visual synthesis.
Jian Liang, Chenfei Wu, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan.
CVPR 2022. -
NUWA-XL: Diffusion over diffusion for extremely long video generation.
Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan.
ACL 2023. -
DragNUWA: Fine-grained control in video generation by integrating text, image, and trajectory.
Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan.
Arxiv 2023. -
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis.
Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan.
Arxiv 2024. -
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models.
Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan.
ICLR 2024. -
NUWA-3D: Learning 3D photography videos via self-supervised diffusion on single images.
Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan.
IJCAI 2023. -
HORIZON: A High-Resolution Panorama Synthesis Framework.
Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma.
AAAI 2024. -
Trace Controlled Text to Image Generation.
Kun Yan, Lei Ji, Chenfei Wu, Jianmin Bao, Ming Zhou, Nan Duan, Shuai Ma.
ECCV, 2022. -
ORES: Open-vocabulary Responsible Visual Synthesis.
Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan.
AAAI 2024. -
Reco: Region-controlled text-to-image generation.
Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang.
CVPR, 2023. -
DiVAE: Photorealistic images synthesis with denoising diffusion decoder.
Jie Shi, Chenfei Wu, Jian Liang, Xiang Liu, Nan Duan.
Arxiv 2022.
Multimodal Understanding
-
Using Left and Right Brains Together: Towards Vision and Language Planning.
Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang.
Arxiv 2024. -
Kd-vlp: Improving end-to-end vision-and-language pretraining with object knowledge distillation.
Yongfei Liu, Chenfei Wu, Shao-yen Tseng, Vasudev Lal, Xuming He, Nan Duan.
Findings of NAACL, 2022. -
Bridgetower: Building bridges between encoders in vision-language representation learning.
Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
AAAI 2023. -
ManagerTower: Aggregating the insights of uni-modal experts for vision-language representation learning.
Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
ACL 2023. -
Learning temporal video procedure segmentation from an automatically collected large dataset.
Lei Ji, Chenfei Wu, Daisy Zhou, Kun Yan, Edward Cui, Xilin Chen, Nan Duan.
WACV 2022. - Deep reason: A strong baseline for real-world visual reasoning.
Chenfei Wu, Yanzhao Zhou, Gen Li, Nan Duan, Duyu Tang, Xiaojie Wang.
CVPR VQA Workshop, 2019. -
Object-difference attention: A simple relational attention for visual question answering.
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong
ACM Multimedia, 2018 -
Chain of reasoning for visual question answering.
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong.
NeurIPS, 2018 -
Differential networks for visual question answering.
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Ruifan Li.
AAAI, 2019. - Sequential visual reasoning for visual question answering.
Jinlai Liu, Chenfei Wu, Xiaojie Wang, Xuan Dong.
CCIS 2018.
Multimodal Systems/Evaluations
-
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models.
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan.
arXiv, 2023. -
Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis.
Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan.
Intelligent Computing, 2024 -
Vl-interpret: An interactive visualization tool for interpreting vision-language transformers.
Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal.
CVPR 2022. -
Low-code llm: Visual programming over llms.
Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan.
Arxiv 2023. -
Learning to program with natural language.
Yiduo Guo, Yaobo Liang, Chenfei Wu, Wenshan Wu, Dongyan Zhao, Nan Duan.
Arxiv 2023. -
GEM: A general evaluation benchmark for multimodal tasks.
Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti.
Findings of ACL, 2021. -
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation.
Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan.
Arxiv 2023. -
GameEval: Evaluating LLMs on Conversational Games.
Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan.
Arxiv 2023.