:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Dongping, Huang, Xuanao, Hu, Zhihan, Shi, Qingyuan, Li, Dianqi, Zhou, Tianyi
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computation and Language Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2606.00579
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

RoboOmni: Proactive Robot Manipulation in Omni-modal Context
von: Wang, Siyin, et al.
Veröffentlicht: (2025)

Reinforced Visual Perception with Tools
von: Zhou, Zetong, et al.
Veröffentlicht: (2025)

e5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings
von: Chen, Haonan, et al.
Veröffentlicht: (2026)

AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
von: Wu, Xiyang, et al.
Veröffentlicht: (2024)

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models
von: Zhang, Haoyu, et al.
Veröffentlicht: (2026)

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
von: Zhang, Lu, et al.
Veröffentlicht: (2024)

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
von: Yang, Qize, et al.
Veröffentlicht: (2025)

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
von: Zhang, Haonan, et al.
Veröffentlicht: (2025)

OmniCaptioner: One Captioner to Rule Them All
von: Lu, Yiting, et al.
Veröffentlicht: (2025)

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
von: Dai, Yifan, et al.
Veröffentlicht: (2026)

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs
von: Chen, Qian, et al.
Veröffentlicht: (2026)

OmniBench: Towards The Future of Universal Omni-Language Models
von: Li, Yizhi, et al.
Veröffentlicht: (2024)

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
von: Ma, Zixian, et al.
Veröffentlicht: (2024)

ChronusOmni: Improving Time Awareness of Omni Large Language Models
von: Chen, Yijing, et al.
Veröffentlicht: (2025)

OmniGAIA: Towards Native Omni-Modal AI Agents
von: Li, Xiaoxi, et al.
Veröffentlicht: (2026)

Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
von: Li, Xuchen, et al.
Veröffentlicht: (2024)

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos
von: Goel, Arushi, et al.
Veröffentlicht: (2026)

Baichuan-Omni Technical Report
von: Li, Yadong, et al.
Veröffentlicht: (2024)

Is Extending Modality The Right Path Towards Omni-Modality?
von: Zhu, Tinghui, et al.
Veröffentlicht: (2025)

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
von: Jin, Zhuoran, et al.
Veröffentlicht: (2025)

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
von: Cheng, Zhili, et al.
Veröffentlicht: (2025)

Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
von: Chen, Dongping, et al.
Veröffentlicht: (2024)

Seeking and Updating with Live Visual Knowledge
von: Fu, Mingyang, et al.
Veröffentlicht: (2025)

Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction
von: Hu, Juncheng, et al.
Veröffentlicht: (2026)

DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
von: Li, Xuchen, et al.
Veröffentlicht: (2024)

OMCAT: Omni Context Aware Transformer
von: Goel, Arushi, et al.
Veröffentlicht: (2024)

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
von: Jia, Mengzhao, et al.
Veröffentlicht: (2024)

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
von: Ye, Hanrong, et al.
Veröffentlicht: (2025)

GroundingGPT:Language Enhanced Multi-modal Grounding Model
von: Li, Zhaowei, et al.
Veröffentlicht: (2024)

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
von: Ma, Ziyang, et al.
Veröffentlicht: (2025)

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
von: Hu, Yushi, et al.
Veröffentlicht: (2025)

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
von: Wu, Jialin, et al.
Veröffentlicht: (2023)

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
von: Tong, Wenwen, et al.
Veröffentlicht: (2025)

Paper2Web: Let's Make Your Paper Alive!
von: Chen, Yuhang, et al.
Veröffentlicht: (2025)

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization
von: Chaubey, Ashutosh, et al.
Veröffentlicht: (2026)

ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding
von: Tian, Xueyun, et al.
Veröffentlicht: (2026)

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
von: Wu, Chengyue, et al.
Veröffentlicht: (2024)

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
von: Wang, Zhenhailong, et al.
Veröffentlicht: (2025)

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
von: Zhang, Kaichen, et al.
Veröffentlicht: (2024)

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
von: Wang, Yabing, et al.
Veröffentlicht: (2024)