:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Jinming, Jia, Zhaoyang, Li, Jiahao, Li, Bin, Jin, Xin, Zeng, Wenjun, Lu, Yan
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2509.24258
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

An Efficient Streaming Video Understanding Framework with Agentic Control
von: Liu, Jinming, et al.
Veröffentlicht: (2026)

Generative Latent Coding for Ultra-Low Bitrate Image Compression
von: Jia, Zhaoyang, et al.
Veröffentlicht: (2025)

Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression
von: Qi, Linfeng, et al.
Veröffentlicht: (2025)

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
von: Liu, Jinming, et al.
Veröffentlicht: (2024)

Generative Latent Video Compression
von: Guo, Zongyu, et al.
Veröffentlicht: (2025)

Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
von: Liu, Jinming, et al.
Veröffentlicht: (2024)

Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
von: Jin, Xin, et al.
Veröffentlicht: (2026)

FreeRet: MLLMs as Training-Free Retrievers
von: Zhu, Yuhan, et al.
Veröffentlicht: (2025)

One-Step Diffusion-Based Image Compression with Semantic Distillation
von: Xue, Naifu, et al.
Veröffentlicht: (2025)

DLF: Extreme Image Compression with Dual-generative Latent Fusion
von: Xue, Naifu, et al.
Veröffentlicht: (2025)

Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance
von: Xue, Naifu, et al.
Veröffentlicht: (2025)

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
von: Ye, Qilang, et al.
Veröffentlicht: (2025)

Towards Practical Real-Time Neural Video Compression
von: Jia, Zhaoyang, et al.
Veröffentlicht: (2025)

Visual Jigsaw Post-Training Improves MLLMs
von: Wu, Penghao, et al.
Veröffentlicht: (2025)

Benchmarking Large and Small MLLMs
von: Feng, Xuelu, et al.
Veröffentlicht: (2025)

CoD: A Diffusion Foundation Model for Image Compression
von: Jia, Zhaoyang, et al.
Veröffentlicht: (2025)

QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
von: Li, Shuai, et al.
Veröffentlicht: (2025)

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
von: Wang, Jiahao, et al.
Veröffentlicht: (2025)

Dense Connector for MLLMs
von: Yao, Huanjin, et al.
Veröffentlicht: (2024)

Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
von: Dalal, Dwip, et al.
Veröffentlicht: (2025)

VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
von: Qiang, Chenhui, et al.
Veröffentlicht: (2025)

CodePercept: Code-Grounded Visual STEM Perception for MLLMs
von: Guan, Tongkun, et al.
Veröffentlicht: (2026)

When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs
von: Cao, Fanpu, et al.
Veröffentlicht: (2026)

Generative Video Compression with One-Dimensional Latent Representation
von: Zheng, Zihan, et al.
Veröffentlicht: (2026)

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
von: Lu, Lidong, et al.
Veröffentlicht: (2025)

CoD-Lite: Real-Time Diffusion-Based Generative Image Compression
von: Jia, Zhaoyang, et al.
Veröffentlicht: (2026)

Automated Multi-level Preference for MLLMs
von: Zhang, Mengxi, et al.
Veröffentlicht: (2024)

Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs
von: Li, Yuanshuai, et al.
Veröffentlicht: (2025)

Neural Video Compression with Feature Modulation
von: Li, Jiahao, et al.
Veröffentlicht: (2024)

Generation Navigator: A State-Aware Agentic Framework for Image Generation
von: Liu, Jinming, et al.
Veröffentlicht: (2026)

Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs
von: Zhu, Rui, et al.
Veröffentlicht: (2026)

Spatial Preference Rewarding for MLLMs Spatial Understanding
von: Qiu, Han, et al.
Veröffentlicht: (2025)

On the Generalization Capacities of MLLMs for Spatial Intelligence
von: Zhang, Gongjie, et al.
Veröffentlicht: (2026)

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
von: Zhang, Bob, et al.
Veröffentlicht: (2025)

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
von: Zeng, Fanhu, et al.
Veröffentlicht: (2025)

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs
von: Chen, Feng, et al.
Veröffentlicht: (2025)

CrystaL: Spontaneous Emergence of Visual Latents in MLLMs
von: Zhang, Yang, et al.
Veröffentlicht: (2026)

SpatialTree: How Spatial Abilities Branch Out in MLLMs
von: Xiao, Yuxi, et al.
Veröffentlicht: (2025)

Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios
von: Li, Xiaomin, et al.
Veröffentlicht: (2026)

RynnEC: Bringing MLLMs into Embodied World
von: Dang, Ronghao, et al.
Veröffentlicht: (2025)