Saved in:
| Main Authors: | Zheng, Zangwei, Peng, Xiangyu, Yang, Tianji, Shen, Chenhui, Li, Shenggui, Liu, Hongxin, Zhou, Yukun, Li, Tianyi, You, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.20404 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?
by: Luo, Yang, et al.
Published: (2024)
by: Luo, Yang, et al.
Published: (2024)
Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)
by: Lin, Bin, et al.
Published: (2024)
Sora Generates Videos with Stunning Geometrical Consistency
by: Li, Xuanyi, et al.
Published: (2024)
by: Li, Xuanyi, et al.
Published: (2024)
Omni-Video: Democratizing Unified Video Understanding and Generation
by: Tan, Zhiyu, et al.
Published: (2025)
by: Tan, Zhiyu, et al.
Published: (2025)
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
by: Su, Zihan, et al.
Published: (2025)
by: Su, Zihan, et al.
Published: (2025)
Simple Visual Artifact Detection in Sora-Generated Videos
by: Sugiyama, Misora, et al.
Published: (2025)
by: Sugiyama, Misora, et al.
Published: (2025)
Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
by: Chen, Tong, et al.
Published: (2024)
by: Chen, Tong, et al.
Published: (2024)
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
by: Yang, Deshun, et al.
Published: (2024)
by: Yang, Deshun, et al.
Published: (2024)
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
by: Zhu, Zheng, et al.
Published: (2024)
by: Zhu, Zheng, et al.
Published: (2024)
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
by: Li, Wanyun, et al.
Published: (2024)
by: Li, Wanyun, et al.
Published: (2024)
From Sora What We Can See: A Survey of Text-to-Video Generation
by: Sun, Rui, et al.
Published: (2024)
by: Sun, Rui, et al.
Published: (2024)
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
by: Dai, Josef, et al.
Published: (2024)
by: Dai, Josef, et al.
Published: (2024)
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval Using Language
by: Fang, Xiang, et al.
Published: (2026)
by: Fang, Xiang, et al.
Published: (2026)
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
by: Deng, Jinhong, et al.
Published: (2025)
by: Deng, Jinhong, et al.
Published: (2025)
More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis
by: Peng, Xiangyu, et al.
Published: (2024)
by: Peng, Xiangyu, et al.
Published: (2024)
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
by: Kim, Bosung, et al.
Published: (2025)
by: Kim, Bosung, et al.
Published: (2025)
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
by: Kim, Bosung, et al.
Published: (2025)
by: Kim, Bosung, et al.
Published: (2025)
OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)
by: Zheng, Minghang, et al.
Published: (2026)
OneThinker: All-in-one Reasoning Model for Image and Video
by: Feng, Kaituo, et al.
Published: (2025)
by: Feng, Kaituo, et al.
Published: (2025)
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
by: An, Xiang, et al.
Published: (2025)
by: An, Xiang, et al.
Published: (2025)
Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models
by: Chu, Zhixuan, et al.
Published: (2024)
by: Chu, Zhixuan, et al.
Published: (2024)
RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection
by: Wang, Zhuo, et al.
Published: (2025)
by: Wang, Zhuo, et al.
Published: (2025)
ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions
by: Liu, Zihao, et al.
Published: (2026)
by: Liu, Zihao, et al.
Published: (2026)
FFA Sora, video generation as fundus fluorescein angiography simulator
by: Wu, Xinyuan, et al.
Published: (2024)
by: Wu, Xinyuan, et al.
Published: (2024)
AllTracker: Efficient Dense Point Tracking at High Resolution
by: Harley, Adam W., et al.
Published: (2025)
by: Harley, Adam W., et al.
Published: (2025)
MPJudge: Towards Perceptual Assessment of Music-Induced Paintings
by: Jiang, Shiqi, et al.
Published: (2025)
by: Jiang, Shiqi, et al.
Published: (2025)
Multi-Modal Fusion of In-Situ Video Data and Process Parameters for Online Forecasting of Cookie Drying Readiness
by: Li, Shichen, et al.
Published: (2025)
by: Li, Shichen, et al.
Published: (2025)
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
by: Shu, Yong, et al.
Published: (2024)
by: Shu, Yong, et al.
Published: (2024)
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025)
by: Chen, Lin, et al.
Published: (2025)
Technical Report: Competition Solution For Modelscope-Sora
by: Chen, Shengfu, et al.
Published: (2024)
by: Chen, Shengfu, et al.
Published: (2024)
PRNet: Original Information Is All You Have
by: Zheng, PeiHuang, et al.
Published: (2025)
by: Zheng, PeiHuang, et al.
Published: (2025)
Interspatial Attention for Efficient 4D Human Video Generation
by: Shao, Ruizhi, et al.
Published: (2025)
by: Shao, Ruizhi, et al.
Published: (2025)
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)
by: Zhang, Yunzhu, et al.
Published: (2025)
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
by: Cheng, Zesen, et al.
Published: (2024)
by: Cheng, Zesen, et al.
Published: (2024)
Part-aware Prompted Segment Anything Model for Adaptive Segmentation
by: Zhao, Chenhui, et al.
Published: (2024)
by: Zhao, Chenhui, et al.
Published: (2024)
Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance
by: Feng, Sicong, et al.
Published: (2025)
by: Feng, Sicong, et al.
Published: (2025)
Open-Vocabulary Video Anomaly Detection
by: Wu, Peng, et al.
Published: (2023)
by: Wu, Peng, et al.
Published: (2023)
CeRF: Convolutional Neural Radiance Fields for New View Synthesis with Derivatives of Ray Modeling
by: Yang, Xiaoyan, et al.
Published: (2023)
by: Yang, Xiaoyan, et al.
Published: (2023)
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
by: Luo, Zhuoyan, et al.
Published: (2024)
by: Luo, Zhuoyan, et al.
Published: (2024)
Similar Items
-
How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?
by: Luo, Yang, et al.
Published: (2024) -
Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024) -
Sora Generates Videos with Stunning Geometrical Consistency
by: Li, Xuanyi, et al.
Published: (2024) -
Omni-Video: Democratizing Unified Video Understanding and Generation
by: Tan, Zhiyu, et al.
Published: (2025) -
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
by: Su, Zihan, et al.
Published: (2025)