_version_ 1866909600628867072
author Seawead, Team
Yang, Ceyuan
Lin, Zhijie
Zhao, Yang
Lin, Shanchuan
Ma, Zhibei
Guo, Haoyuan
Chen, Hao
Qi, Lu
Wang, Sen
Cheng, Feng
Zuo, Feilong
Zeng, Xuejiao
Yang, Ziyan
Kong, Fangyuan
Wei, Meng
Qing, Zhiwu
Xiao, Fei
Hoang, Tuyen
Zhang, Siyu
Zhu, Peihao
Zhao, Qi
Yan, Jiangqiao
Gui, Liangke
Bi, Sheng
Li, Jiashi
Ren, Yuxi
Wang, Rui
Li, Huixia
Xiao, Xuefeng
Liu, Shu
Ling, Feng
Zhang, Heng
Wei, Houmin
Kuang, Huafeng
Duncan, Jerry
Zhang, Junda
Zheng, Junru
Sun, Li
Zhang, Manlin
Sun, Renfei
Zhuang, Xiaobin
Li, Xiaojie
Xia, Xin
Chi, Xuyan
Peng, Yanghua
Wang, Yuping
Wang, Yuxuan
Zhao, Zhongkai
Chen, Zhuo
Song, Zuquan
Yang, Zhenheng
Feng, Jiashi
Yang, Jianchao
Jiang, Lu
author_facet Seawead, Team
Yang, Ceyuan
Lin, Zhijie
Zhao, Yang
Lin, Shanchuan
Ma, Zhibei
Guo, Haoyuan
Chen, Hao
Qi, Lu
Wang, Sen
Cheng, Feng
Zuo, Feilong
Zeng, Xuejiao
Yang, Ziyan
Kong, Fangyuan
Wei, Meng
Qing, Zhiwu
Xiao, Fei
Hoang, Tuyen
Zhang, Siyu
Zhu, Peihao
Zhao, Qi
Yan, Jiangqiao
Gui, Liangke
Bi, Sheng
Li, Jiashi
Ren, Yuxi
Wang, Rui
Li, Huixia
Xiao, Xuefeng
Liu, Shu
Ling, Feng
Zhang, Heng
Wei, Houmin
Kuang, Huafeng
Duncan, Jerry
Zhang, Junda
Zheng, Junru
Sun, Li
Zhang, Manlin
Sun, Renfei
Zhuang, Xiaobin
Li, Xiaojie
Xia, Xin
Chi, Xuyan
Peng, Yanghua
Wang, Yuping
Wang, Yuxuan
Zhao, Zhongkai
Chen, Zhuo
Song, Zuquan
Yang, Zhenheng
Feng, Jiashi
Yang, Jianchao
Jiang, Lu
contents This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary video generation models of much larger size. Design choices are especially crucial in a resource-constrained setting. This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model. Empirically, we make two observations: (1) Seaweed-7B achieves performance comparable to, or even surpasses, larger models trained on substantially greater GPU resources, and (2) our model, which exhibits strong generalization ability, can be effectively adapted across a wide range of downstream applications either by lightweight fine-tuning or continue training. See the project page at https://seaweed.video/
format Preprint
id arxiv_https___arxiv_org_abs_2504_08685
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Seawead, Team
Yang, Ceyuan
Lin, Zhijie
Zhao, Yang
Lin, Shanchuan
Ma, Zhibei
Guo, Haoyuan
Chen, Hao
Qi, Lu
Wang, Sen
Cheng, Feng
Zuo, Feilong
Zeng, Xuejiao
Yang, Ziyan
Kong, Fangyuan
Wei, Meng
Qing, Zhiwu
Xiao, Fei
Hoang, Tuyen
Zhang, Siyu
Zhu, Peihao
Zhao, Qi
Yan, Jiangqiao
Gui, Liangke
Bi, Sheng
Li, Jiashi
Ren, Yuxi
Wang, Rui
Li, Huixia
Xiao, Xuefeng
Liu, Shu
Ling, Feng
Zhang, Heng
Wei, Houmin
Kuang, Huafeng
Duncan, Jerry
Zhang, Junda
Zheng, Junru
Sun, Li
Zhang, Manlin
Sun, Renfei
Zhuang, Xiaobin
Li, Xiaojie
Xia, Xin
Chi, Xuyan
Peng, Yanghua
Wang, Yuping
Wang, Yuxuan
Zhao, Zhongkai
Chen, Zhuo
Song, Zuquan
Yang, Zhenheng
Feng, Jiashi
Yang, Jianchao
Jiang, Lu
Computer Vision and Pattern Recognition
Artificial Intelligence
This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary video generation models of much larger size. Design choices are especially crucial in a resource-constrained setting. This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model. Empirically, we make two observations: (1) Seaweed-7B achieves performance comparable to, or even surpasses, larger models trained on substantially greater GPU resources, and (2) our model, which exhibits strong generalization ability, can be effectively adapted across a wide range of downstream applications either by lightweight fine-tuning or continue training. See the project page at https://seaweed.video/
title Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2504.08685