Fu-En Yang

I am a Research Scientist at NVIDIA Research, pursuing research on Adaptive Physical Intelligence, focusing on developing efficient, adaptive AI systems for vision-language-action models (VLA), world modeling, embodied reasoning, and physical AI.

I received my Ph.D. from National Taiwan University (NTU) in Jul. 2023, supervised by Prof. Yu-Chiang Frank Wang. Previously, I was a research intern at NVIDIA Research (Feb. 2023-Aug. 2023), focusing on efficient model personalization and vision-language models. Also, I was a Ph.D. program researcher at ASUS AICS from Sep. 2020 to Oct. 2022, specializing in visual transfer learning.

Prior to my Ph.D., I received my Bachelor's degree from Department of Electrical Engineering at National Taiwan University in 2018.

Email / CV / Google Scholar / LinkedIn / Twitter / Github

News

[Nov. 2025] Our papers "SANTA" (mitigating hallucinations in video LLMs), "TA-Prompting" (video temporal understanding), and "VADER" (video anomaly understanding) are accepted at WACV 2026.
[Sep. 2025] Our paper "ThinkAct" is accepted at NeurIPS 2025.
[Jun. 2025] One co-authored paper "LongSplat" is accepted at ICCV 2025.
[Feb. 2025] Our paper "VideoMage" is accepted at CVPR 2025.
[Jul. 2024] Our papers "Receler" and "Select and Distill" are accepted at ECCV 2024.
[Feb. 2024] Join NVIDIA Research as a Research Scientist.
[Nov. 2023] Honored to receive the Honorable Mention at 2023 Taiwanese Association for Artificial Intelligence (TAAI) Ph.D. Thesis Award.
[Sep. 2023] Honored to receive the 2023 Presidential Award for Graduate Students, National Taiwan University.
[Aug. 2023] Honored to receive the 2023 Chinese Image Processing and Pattern Recognition Society (IPPR) Best Doctoral Thesis Award.
[Jul. 2023] I officially obtained my Ph.D. degree from National Taiwan University (NTU).

Selected Publications

My research goal is to advance Adaptive Physical Intelligence Research, developing efficient, adaptive, and customized AI systems that seamlessly integrate perception, reasoning, and action in physical environments. I focus on vision-language-action models that enable intelligent agents to understand and interact with the world through multimodal reasoning, sophisticated world modeling for predictive understanding of dynamic environments, and embodied reasoning that bridges abstract cognition with physical reality. My work involves developing novel latent modeling approaches to capture the underlying structure of complex physical interactions, with applications spanning target robotics and physical AI systems. I am driven by the vision that AI should not merely process information, but should adaptively learn from and intelligently respond to the rich complexity of physical experience, ultimately creating more capable, personalized, and contextually aware artificial agents. Full list of publications here.

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang
Neural Information Processing Systems (NeurIPS), 2025
paper / arXiv / project

We introduce ThinkAct, a reasoning VLA framework capable of thinking before acting. Through reasoning reinforced by our action-aligned visual feedback, ThinkAct enables capabilities of few-shot adaptation, long-horizon planning, and self-correction in embodied tasks.

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin, Cheng Sun, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu
IEEE International Conference on Computer Vision (ICCV), 2025
paper / arXiv / project / code

LongSplat reconstructs scenes from any casual long video without camera calibration and renders high-quality novel views from any point along your path.

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung, Kai-Po Chang, Fu-En Yang, Yu-Chiang Frank Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025
paper / arXiv / project

VideoMage enables text-to-video diffusion models to generate coherent videos with multiple customized subjects and their distinct, controllable motion patterns through disentangled appearance–motion learning and spatial-temporal composition.

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
European Conference on Computer Vision (ECCV), 2024
paper / arXiv / project / code

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
European Conference on Computer Vision (ECCV), 2024
paper / arXiv / project / code

RAPPER: Reinforced Rationale-Prompted Paradigm for Natural Language Explanation in Visual Question Answering
Kai-Po Chang, Chi-Pin Huang, Wei-Yuan Cheng, Fu-En Yang, Chien-Yi Wang, Yung-Hsuan Lai, Yu-Chiang Frank Wang
International Conference on Learning Representations (ICLR), 2024
paper

Language-Guided Transformer for Federated Multi-Label Classification
I-Jieh Liu, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang
AAAI Conference on Artificial Intelligence (AAAI), 2024
paper / arXiv / webpage / code

Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation
Fu-En Yang, Chien-Yi Wang, Yu-Chiang Frank Wang
IEEE International Conference on Computer Vision (ICCV), 2023
paper / arXiv / poster

Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning
Fu-En Yang, Yuan-Hao Lee, Chia-Ching Lin, Yu-Chiang Frank Wang
International Journal of Computer Vision (IJCV), 2023

Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond
Cheng-Yen Hsieh, Chih-Jung Chang, Fu-En Yang, Yu-Chiang Frank Wang
IEEE Winter Conference on Applications of Computer Vision (WACV), 2023
paper / arXiv / code

Adversarial Teacher-Student Representation Learning for Domain Generalization
Fu-En Yang, Yuan-Chia Cheng, Zu-Yun Shiau, Yu-Chiang Frank Wang
Advances in Neural Information Processing Systems (NeurIPS), 2021 (Spotlight Presentation)
paper / OpenReview / video / slides / poster

A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation
Yuan-Hao Lee, Fu-En Yang, Yu-Chiang Frank Wang
IEEE Winter Conference on Applications of Computer Vision (WACV), 2022
paper / arXiv

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
Cheng-Fu Yang*, Wan-Cyuan Fan*, Fu-En Yang, Yu-Chiang Frank Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
paper / code

Few-Shot Classification in Unseen Domains by Episodic Meta-Learning Across Visual Domains
Yuan-Chia Cheng, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang
IEEE International Conference on Image Processing (ICIP), 2021
paper / IEEE Xplore / arXiv

Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment
Po-Hsiang Huang, Fu-En Yang, Yu-Chiang Frank Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
paper / video

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis
Fu-En Yang*, Jing-Cheng Chang*, Yuan-Hao Lee, Yu-Chiang Frank Wang
IEEE International Conference on Pattern Recognition (ICPR), 2020
paper / IEEE Xplore / arXiv / video / slides

Semantics-Guided Representation Learning with Applications to Visual Synthesis
Jia-Wei Yan, Ci-Siang Lin, Fu-En Yang, Yu-Jhe Li, Yu-Chiang Frank Wang
IEEE International Conference on Pattern Recognition (ICPR), 2020
paper / IEEE Xplore / arXiv

A Multi-Domain and Multi-Modal Representation Disentangler for Cross-Domain Image Manipulation and Classification
Fu-En Yang*, Jing-Cheng Chang*, Chung-Chi Tsai, Yu-Chiang Frank Wang
IEEE Transactions on Image Processing (TIP), 2020
paper / IEEE Xplore

Learning Hierarchical Self-Attention for Video Summarization
Yen-Ting Liu, Yu-Jhe Li, Fu-En Yang, Shang-Fu Chen, Yu-Chiang Frank Wang
IEEE International Conference on Image Processing (ICIP), 2019
IEEE Xplore

Adaptation and Re-Identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-Identification
Yu-Jhe Li, Fu-En Yang, Yen-Cheng Liu, Yu-Ying Yeh, Xiaofei Du, Yu-Chiang Frank Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018
paper / arXiv / code

Academic Services

Area Chair: NeurIPS 2025 Workshop GenProCC

Conference Program Committee/Reviewer: CVPR 2026, ICLR 2026, AAAI 2026, WACV 2026, NeurIPS 2025, ICCV 2025, ICML 2025, CVPR 2025, ICLR 2025, ICLR 2025 WS SCOPE, AAAI 2025, ACM MM 2025, NeurIPS 2024, ECCV 2024, ICML 2024, CVPR 2024, AAAI 2024, ACCV 2024, ICIP 2024, NeurIPS 2023, ICCV 2023, CVPR 2023, AAAI 2023, WACV 2023, ICIP 2023, ACCV 2022, CVPR 2022, AAAI 2022, WACV 2022, AAAI 2021, ICIP 2020, AAAI 2020

Journal Reviewer: Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Computer Vision and Image Understanding (CVIU), ACM Computing Surveys (CSUR)

Awards

Honorable Mention at 2023 TAAI Ph.D. Thesis Award, Nov. 2023
NTU Presidential Award for Graduate Students, Sep. 2023
Merit Award at the 16th IPPR Doctoral Thesis Award, Aug. 2023

Teaching Assistant

Deep Learning for Computer Vision, Spring 2019
Computer Vision: from recognition to geometry, Fall 2018

The template is designed and shared by Dr. Jon Barron.