:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ma, Yan, Chern, Steffi, Shen, Xuyang, Zhong, Yiran, Liu, Pengfei
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Computation and Language Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2504.02587
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
von: Chern, Steffi, et al.
Veröffentlicht: (2024)

Thinking with Generated Images
von: Chern, Ethan, et al.
Veröffentlicht: (2025)

One RL to See Them All: Visual Triple Unified Reinforcement Learning
von: Ma, Yan, et al.
Veröffentlicht: (2025)

Halu-J: Critique-Based Hallucination Judge
von: Wang, Binjie, et al.
Veröffentlicht: (2024)

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
von: Chern, Ethan, et al.
Veröffentlicht: (2024)

Scaling Laws for Linear Complexity Language Models
von: Shen, Xuyang, et al.
Veröffentlicht: (2024)

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
von: Qin, Zhen, et al.
Veröffentlicht: (2024)

BeHonest: Benchmarking Honesty in Large Language Models
von: Chern, Steffi, et al.
Veröffentlicht: (2024)

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
von: Wu, Xuyang, et al.
Veröffentlicht: (2024)

MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars
von: Wang, Shuoyuan, et al.
Veröffentlicht: (2026)

Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models
von: Zhou, Yucheng, et al.
Veröffentlicht: (2024)

Elucidating the Design Space of Decay in Linear Attention
von: Qin, Zhen, et al.
Veröffentlicht: (2025)

Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation
von: Wang, Xintong, et al.
Veröffentlicht: (2025)

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
von: Zhang, Ce, et al.
Veröffentlicht: (2025)

Enhancing Large Vision Language Models with Self-Training on Image Comprehension
von: Deng, Yihe, et al.
Veröffentlicht: (2024)

Combating Adversarial Attacks with Multi-Agent Debate
von: Chern, Steffi, et al.
Veröffentlicht: (2024)

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
von: Qin, Zhen, et al.
Veröffentlicht: (2024)

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
von: Chern, Ethan, et al.
Veröffentlicht: (2025)

Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
von: Laskar, Md Tahmid Rahman, et al.
Veröffentlicht: (2025)

EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
von: Wang, Zekun, et al.
Veröffentlicht: (2025)

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
von: Ding, Yi, et al.
Veröffentlicht: (2025)

S-GRPO: Unified Post-Training for Large Vision-Language Models
von: Yan, Yuming, et al.
Veröffentlicht: (2026)

HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task
von: Tian, Yu, et al.
Veröffentlicht: (2024)

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
von: Zhou, Chenyu, et al.
Veröffentlicht: (2024)

Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations
von: Zhu, Kangyu, et al.
Veröffentlicht: (2025)

Evaluating Vision-Language Models as Evaluators in Path Planning
von: Aghzal, Mohamed, et al.
Veröffentlicht: (2024)

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
von: Fu, Rao, et al.
Veröffentlicht: (2024)

NPHardEval4V: Dynamic Evaluation of Large Vision-Language Models with Effects of Vision
von: Li, Xiang, et al.
Veröffentlicht: (2024)

Coordinated Robustness Evaluation Framework for Vision-Language Models
von: Babu, Ashwin Ramesh, et al.
Veröffentlicht: (2025)

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
von: Qin, Zhen, et al.
Veröffentlicht: (2024)

Evaluating Vision-Language Models for Emotion Recognition
von: Bhattacharyya, Sree, et al.
Veröffentlicht: (2025)

Evaluation of Cultural Competence of Vision-Language Models
von: Yadav, Srishti, et al.
Veröffentlicht: (2025)

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
von: Li, Baiqi, et al.
Veröffentlicht: (2024)

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
von: Wu, Yuhang, et al.
Veröffentlicht: (2024)

Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
von: Zhang, Yanan, et al.
Veröffentlicht: (2024)

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
von: Jing, Liqiang, et al.
Veröffentlicht: (2025)

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
von: Deng, Yihe, et al.
Veröffentlicht: (2025)

From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
von: Bhatia, Mehar, et al.
Veröffentlicht: (2024)

OpenClaw-RL: Train Any Agent Simply by Talking
von: Wang, Yinjie, et al.
Veröffentlicht: (2026)

IRR: Image Review Ranking Framework for Evaluating Vision-Language Models
von: Hayashi, Kazuki, et al.
Veröffentlicht: (2024)