Saved in:
| Main Authors: | Shi, Huafeng, Liang, Jianzhong, Xie, Rongchang, Wu, Xian, Chen, Cheng, Liu, Chang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10584 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
by: Xie, Rongchang, et al.
Published: (2024)
by: Xie, Rongchang, et al.
Published: (2024)
Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios
by: Shi, Yu, et al.
Published: (2026)
by: Shi, Yu, et al.
Published: (2026)
Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios
by: Guo, Guangqian, et al.
Published: (2026)
by: Guo, Guangqian, et al.
Published: (2026)
DiTPainter: Efficient Video Inpainting with Diffusion Transformers
by: Wu, Xian, et al.
Published: (2025)
by: Wu, Xian, et al.
Published: (2025)
VideoMAC: Video Masked Autoencoders Meet ConvNets
by: Pei, Gensheng, et al.
Published: (2024)
by: Pei, Gensheng, et al.
Published: (2024)
TrajLoom: Dense Future Trajectory Generation from Video
by: Zhang, Zewei, et al.
Published: (2026)
by: Zhang, Zewei, et al.
Published: (2026)
Plenoptic Video Generation
by: Fu, Xiao, et al.
Published: (2026)
by: Fu, Xiao, et al.
Published: (2026)
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
by: Guan, Yiran, et al.
Published: (2026)
by: Guan, Yiran, et al.
Published: (2026)
Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding
by: Wang, Mengzhao, et al.
Published: (2024)
by: Wang, Mengzhao, et al.
Published: (2024)
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
by: Guan, Yiran, et al.
Published: (2026)
by: Guan, Yiran, et al.
Published: (2026)
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)
by: Shi, Yang, et al.
Published: (2025)
DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions
by: Wang, Guangrun, et al.
Published: (2024)
by: Wang, Guangrun, et al.
Published: (2024)
SWinMamba: Serpentine Window State Space Model for Vascular Segmentation
by: Zhao, Rongchang, et al.
Published: (2025)
by: Zhao, Rongchang, et al.
Published: (2025)
IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
by: Dang, Lingwei, et al.
Published: (2025)
by: Dang, Lingwei, et al.
Published: (2025)
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
ALL-PET: A Low-resource and Low-shot PET Foundation Model in Projection Domain
by: Huang, Bin, et al.
Published: (2025)
by: Huang, Bin, et al.
Published: (2025)
NegVSR: Augmenting Negatives for Generalized Noise Modeling in Real-World Video Super-Resolution
by: Song, Yexing, et al.
Published: (2023)
by: Song, Yexing, et al.
Published: (2023)
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)
by: Yan, Peizheng, et al.
Published: (2026)
PresentAgent: Multimodal Agent for Presentation Video Generation
by: Shi, Jingwei, et al.
Published: (2025)
by: Shi, Jingwei, et al.
Published: (2025)
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model
by: Fu, Yongjie, et al.
Published: (2024)
by: Fu, Yongjie, et al.
Published: (2024)
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation
by: Jiao, Guanlong, et al.
Published: (2026)
by: Jiao, Guanlong, et al.
Published: (2026)
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
by: Chen, Harold Haodong, et al.
Published: (2025)
by: Chen, Harold Haodong, et al.
Published: (2025)
Generative Scenario Rollouts for End-to-End Autonomous Driving
by: Yasarla, Rajeev, et al.
Published: (2026)
by: Yasarla, Rajeev, et al.
Published: (2026)
Investigating Memorization in Video Diffusion Models
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)
by: Chen, Yuheng, et al.
Published: (2025)
Exploring Spatiotemporal Feature Propagation for Video-Level Compressive Spectral Reconstruction: Dataset, Model and Benchmark
by: Cai, Lijing, et al.
Published: (2026)
by: Cai, Lijing, et al.
Published: (2026)
[CLS] is Not Enough: Multi-Label Recognition via Patch-Level Inference and Adaptive Aggregation
by: Wang, Akang, et al.
Published: (2026)
by: Wang, Akang, et al.
Published: (2026)
Advancing Video Self-Supervised Learning via Image Foundation Models
by: Wu, Jingwei, et al.
Published: (2025)
by: Wu, Jingwei, et al.
Published: (2025)
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
by: Cheng, Junhao, et al.
Published: (2025)
by: Cheng, Junhao, et al.
Published: (2025)
SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
by: Peng, Liang, et al.
Published: (2023)
by: Peng, Liang, et al.
Published: (2023)
MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation
by: Shi, Haoyuan, et al.
Published: (2026)
by: Shi, Haoyuan, et al.
Published: (2026)
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
by: Liu, Tongkun, et al.
Published: (2025)
by: Liu, Tongkun, et al.
Published: (2025)
VistaGEN: Consistent Driving Video Generation with Fine-Grained Control Using Multiview Visual-Language Reasoning
by: Chen, Li-Heng, et al.
Published: (2026)
by: Chen, Li-Heng, et al.
Published: (2026)
MambaTrans: Multimodal Fusion Image Translation via Large Language Model Priors for Downstream Visual Tasks
by: Xu, Yushen, et al.
Published: (2025)
by: Xu, Yushen, et al.
Published: (2025)
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding
by: Cheng, Tongtong, et al.
Published: (2025)
by: Cheng, Tongtong, et al.
Published: (2025)
PanFlow: Decoupled Motion Control for Panoramic Video Generation
by: Zhang, Cheng, et al.
Published: (2025)
by: Zhang, Cheng, et al.
Published: (2025)
Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility
by: Hao, Yutong, et al.
Published: (2025)
by: Hao, Yutong, et al.
Published: (2025)
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
by: Duan, Zhizhao, et al.
Published: (2024)
by: Duan, Zhizhao, et al.
Published: (2024)
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
by: Liu, Yunze, et al.
Published: (2025)
by: Liu, Yunze, et al.
Published: (2025)
Similar Items
-
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
by: Xie, Rongchang, et al.
Published: (2024) -
Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios
by: Shi, Yu, et al.
Published: (2026) -
Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios
by: Guo, Guangqian, et al.
Published: (2026) -
DiTPainter: Efficient Video Inpainting with Diffusion Transformers
by: Wu, Xian, et al.
Published: (2025) -
VideoMAC: Video Masked Autoencoders Meet ConvNets
by: Pei, Gensheng, et al.
Published: (2024)