:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Zangwei, Peng, Xiangyu, Yang, Tianji, Shen, Chenhui, Li, Shenggui, Liu, Hongxin, Zhou, Yukun, Li, Tianyi, You, Yang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.20404
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?
by: Luo, Yang, et al.
Published: (2024)

Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)

Sora Generates Videos with Stunning Geometrical Consistency
by: Li, Xuanyi, et al.
Published: (2024)

Omni-Video: Democratizing Unified Video Understanding and Generation
by: Tan, Zhiyu, et al.
Published: (2025)

Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
by: Su, Zihan, et al.
Published: (2025)

Simple Visual Artifact Detection in Sora-Generated Videos
by: Sugiyama, Misora, et al.
Published: (2025)

Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
by: Chen, Tong, et al.
Published: (2024)

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
by: Yang, Deshun, et al.
Published: (2024)

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
by: Zhu, Zheng, et al.
Published: (2024)

OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
by: Li, Wanyun, et al.
Published: (2024)

From Sora What We Can See: A Survey of Text-to-Video Generation
by: Sun, Rui, et al.
Published: (2024)

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
by: Dai, Josef, et al.
Published: (2024)

Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval Using Language
by: Fang, Xiang, et al.
Published: (2026)

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
by: Deng, Jinhong, et al.
Published: (2025)

More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis
by: Peng, Xiangyu, et al.
Published: (2024)

On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
by: Kim, Bosung, et al.
Published: (2025)

On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
by: Kim, Bosung, et al.
Published: (2025)

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)

OneThinker: All-in-one Reasoning Model for Image and Video
by: Feng, Kaituo, et al.
Published: (2025)

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
by: An, Xiang, et al.
Published: (2025)

Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models
by: Chu, Zhixuan, et al.
Published: (2024)

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection
by: Wang, Zhuo, et al.
Published: (2025)

ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions
by: Liu, Zihao, et al.
Published: (2026)

FFA Sora, video generation as fundus fluorescein angiography simulator
by: Wu, Xinyuan, et al.
Published: (2024)

AllTracker: Efficient Dense Point Tracking at High Resolution
by: Harley, Adam W., et al.
Published: (2025)

MPJudge: Towards Perceptual Assessment of Music-Induced Paintings
by: Jiang, Shiqi, et al.
Published: (2025)

Multi-Modal Fusion of In-Situ Video Data and Process Parameters for Online Forecasting of Cookie Drying Readiness
by: Li, Shichen, et al.
Published: (2025)

Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
by: Shu, Yong, et al.
Published: (2024)

Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025)

Technical Report: Competition Solution For Modelscope-Sora
by: Chen, Shengfu, et al.
Published: (2024)

PRNet: Original Information Is All You Have
by: Zheng, PeiHuang, et al.
Published: (2025)

Interspatial Attention for Efficient 4D Human Video Generation
by: Shao, Ruizhi, et al.
Published: (2025)

FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
by: Cheng, Zesen, et al.
Published: (2024)

Part-aware Prompted Segment Anything Model for Adaptive Segmentation
by: Zhao, Chenhui, et al.
Published: (2024)

Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance
by: Feng, Sicong, et al.
Published: (2025)

Open-Vocabulary Video Anomaly Detection
by: Wu, Peng, et al.
Published: (2023)

CeRF: Convolutional Neural Radiance Fields for New View Synthesis with Derivatives of Ray Modeling
by: Yang, Xiaoyan, et al.
Published: (2023)

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
by: Luo, Zhuoyan, et al.
Published: (2024)