:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Shengfu, Liu, Hailong, Wei, Wenzhao
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.07194
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
by: Wang, Lening, et al.
Published: (2024)

FFA Sora, video generation as fundus fluorescein angiography simulator
by: Wu, Xinyuan, et al.
Published: (2024)

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection
by: Wang, Zhuo, et al.
Published: (2025)

Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)

From Sora What We Can See: A Survey of Text-to-Video Generation
by: Sun, Rui, et al.
Published: (2024)

Ovis-Image Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)

Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
by: Liu, Yixin, et al.
Published: (2024)

Ovis-U1 Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)

Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)

Seed1.5-VL Technical Report
by: Guo, Dong, et al.
Published: (2025)

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
by: Chen, Tong, et al.
Published: (2024)

HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)

Dolphin v1.0 Technical Report
by: Weng, Taohan, et al.
Published: (2025)

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
by: Dai, Josef, et al.
Published: (2024)

GR-3 Technical Report
by: Cheang, Chilam, et al.
Published: (2025)

Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition
by: Yu, Jun, et al.
Published: (2025)

MASSeg : 2nd Technical Report for 4th PVUW MOSE Track
by: Cao, Xuqiang, et al.
Published: (2025)

AEMIM: Adversarial Examples Meet Masked Image Modeling
by: Xiang, Wenzhao, et al.
Published: (2024)

MedGemma Technical Report
by: Sellergren, Andrew, et al.
Published: (2025)

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
by: Yang, Deshun, et al.
Published: (2024)

Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)

MARS: Technical Report for the CASTLE Challenge at EgoVis 2026
by: Zhang, Haoyu, et al.
Published: (2026)

Motif-Video 2B: Technical Report
by: Lim, Junghwan, et al.
Published: (2026)

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns
by: Zhang, Jiarui, et al.
Published: (2025)

ZAYA1-VL-8B Technical Report
by: Shapourian, Hassan, et al.
Published: (2026)

PLaMo 2.1-VL Technical Report
by: Kerola, Tommi, et al.
Published: (2026)

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model
by: Jin, Zhiwei, et al.
Published: (2025)

Gender Bias in Text-to-Video Generation Models: A case study of Sora
by: Nadeem, Mohammad, et al.
Published: (2024)

Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)

PhysBrain 1.0 Technical Report
by: Lian, Shijie, et al.
Published: (2026)

Phi-4-reasoning-vision-15B Technical Report
by: Aneja, Jyoti, et al.
Published: (2026)

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025
by: Chu, Qiaohui, et al.
Published: (2025)

iFlyBot-VLA Technical Report
by: Zhang, Yuan, et al.
Published: (2025)

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
by: Dong, Jiajun, et al.
Published: (2025)

ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
by: Chu, Zedong, et al.
Published: (2026)

Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)

Pegasus-v1 Technical Report
by: Jung, Raehyuk, et al.
Published: (2024)

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
by: Huang, Yuanhui, et al.
Published: (2024)

HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning
by: Qiu, Wenzhao, et al.
Published: (2024)