Sparad:
| Huvudupphovsmän: | Luo, Fuwen, Chen, Chi, Wan, Zihao, Kang, Zhaolu, Yan, Qidong, Li, Yingjie, Wang, Xiaolong, Wang, Siyu, Wang, Ziyue, Mi, Xiaoyue, Li, Peng, Ma, Ning, Sun, Maosong, Liu, Yang |
|---|---|
| Materialtyp: | Preprint |
| Publicerad: |
2024
|
| Ämnen: | |
| Länkar: | https://arxiv.org/abs/2402.13607 |
| Taggar: |
Lägg till en tagg
Inga taggar, Lägg till första taggen!
|
- Beståndsuppgifter
- Beskrivning
- Innehållsförteckning
- Kommentarer
- Liknande verk
- Katalogiseringsuppgifter
Liknande verk
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
av: Wang, Ziyue, et al.
Publicerad: (2024)
av: Wang, Ziyue, et al.
Publicerad: (2024)
Thinking with Visual Abstract: Enhancing Multimodal Reasoning via Visual Abstraction
av: Liu, Dairu, et al.
Publicerad: (2025)
av: Liu, Dairu, et al.
Publicerad: (2025)
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
av: Wang, Xiaolong, et al.
Publicerad: (2025)
av: Wang, Xiaolong, et al.
Publicerad: (2025)
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
av: Wang, Ziyue, et al.
Publicerad: (2024)
av: Wang, Ziyue, et al.
Publicerad: (2024)
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
av: Luo, Fuwen, et al.
Publicerad: (2026)
av: Luo, Fuwen, et al.
Publicerad: (2026)
Model Composition for Multimodal Large Language Models
av: Chen, Chi, et al.
Publicerad: (2024)
av: Chen, Chi, et al.
Publicerad: (2024)
EscapeCraft: A 3D Room Escape Environment for Benchmarking Complex Multimodal Reasoning Ability
av: Wang, Ziyue, et al.
Publicerad: (2025)
av: Wang, Ziyue, et al.
Publicerad: (2025)
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
av: Wang, Xiaolong, et al.
Publicerad: (2024)
av: Wang, Xiaolong, et al.
Publicerad: (2024)
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
av: Lin, Junming, et al.
Publicerad: (2024)
av: Lin, Junming, et al.
Publicerad: (2024)
Penny Wise, Pixel Foolish: Bypassing Price Constraints in Multimodal Agents via Visual Adversarial Perturbations
av: Qian, Jiachen, et al.
Publicerad: (2026)
av: Qian, Jiachen, et al.
Publicerad: (2026)
Perspective Transition of Large Language Models for Solving Subjective Tasks
av: Wang, Xiaolong, et al.
Publicerad: (2025)
av: Wang, Xiaolong, et al.
Publicerad: (2025)
When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life
av: Lou, Xinyue, et al.
Publicerad: (2026)
av: Lou, Xinyue, et al.
Publicerad: (2026)
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
av: Wang, Zhikai, et al.
Publicerad: (2025)
av: Wang, Zhikai, et al.
Publicerad: (2025)
Multimodal Generalized Category Discovery
av: Su, Yuchang, et al.
Publicerad: (2024)
av: Su, Yuchang, et al.
Publicerad: (2024)
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
av: Xu, Yuzhuang, et al.
Publicerad: (2023)
av: Xu, Yuzhuang, et al.
Publicerad: (2023)
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal
av: Nie, Yiqi, et al.
Publicerad: (2026)
av: Nie, Yiqi, et al.
Publicerad: (2026)
DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms
av: Bi, Xiaojun, et al.
Publicerad: (2025)
av: Bi, Xiaojun, et al.
Publicerad: (2025)
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
av: Xiao, Yijia, et al.
Publicerad: (2024)
av: Xiao, Yijia, et al.
Publicerad: (2024)
ReasonAct: Progressive Training for Fine-Grained Video Reasoning in Small Models
av: Liu, Jiaxin, et al.
Publicerad: (2025)
av: Liu, Jiaxin, et al.
Publicerad: (2025)
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
av: Wan, Zhongwei, et al.
Publicerad: (2025)
av: Wan, Zhongwei, et al.
Publicerad: (2025)
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
av: Li, Jingyao, et al.
Publicerad: (2025)
av: Li, Jingyao, et al.
Publicerad: (2025)
V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
av: Wang, Qidong, et al.
Publicerad: (2025)
av: Wang, Qidong, et al.
Publicerad: (2025)
Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning
av: Zeng, Fanhu, et al.
Publicerad: (2026)
av: Zeng, Fanhu, et al.
Publicerad: (2026)
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
av: Luo, Fuwen, et al.
Publicerad: (2025)
av: Luo, Fuwen, et al.
Publicerad: (2025)
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
av: Kang, Zhaolu, et al.
Publicerad: (2025)
av: Kang, Zhaolu, et al.
Publicerad: (2025)
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
av: Ye, Junyan, et al.
Publicerad: (2024)
av: Ye, Junyan, et al.
Publicerad: (2024)
Evaluating Time Awareness and Cross-modal Active Perception of Large Models via 4D Escape Room Task
av: Dong, Yurui, et al.
Publicerad: (2026)
av: Dong, Yurui, et al.
Publicerad: (2026)
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
av: Xia, Haotian, et al.
Publicerad: (2024)
av: Xia, Haotian, et al.
Publicerad: (2024)
JurisCTC: Enhancing Legal Judgment Prediction via Cross-Domain Transfer and Contrastive Learning
av: Kang, Zhaolu, et al.
Publicerad: (2025)
av: Kang, Zhaolu, et al.
Publicerad: (2025)
VP-MEL: Visual Prompts Guided Multimodal Entity Linking
av: Mi, Hongze, et al.
Publicerad: (2024)
av: Mi, Hongze, et al.
Publicerad: (2024)
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
av: Li, Yunxin, et al.
Publicerad: (2024)
av: Li, Yunxin, et al.
Publicerad: (2024)
Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
av: Wang, Ziyue, et al.
Publicerad: (2023)
av: Wang, Ziyue, et al.
Publicerad: (2023)
Personal Visual Context Learning in Large Multimodal Models
av: Xue, Zihui, et al.
Publicerad: (2026)
av: Xue, Zihui, et al.
Publicerad: (2026)
UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
av: Ji, Yifan, et al.
Publicerad: (2026)
av: Ji, Yifan, et al.
Publicerad: (2026)
Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages
av: Zhang, Yuanchi, et al.
Publicerad: (2024)
av: Zhang, Yuanchi, et al.
Publicerad: (2024)
Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples
av: Tang, Weidong, et al.
Publicerad: (2026)
av: Tang, Weidong, et al.
Publicerad: (2026)
Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models
av: Zhou, Yucheng, et al.
Publicerad: (2024)
av: Zhou, Yucheng, et al.
Publicerad: (2024)
Visual-Friendly Concept Protection via Selective Adversarial Perturbations
av: Mi, Xiaoyue, et al.
Publicerad: (2024)
av: Mi, Xiaoyue, et al.
Publicerad: (2024)
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
av: Li, You, et al.
Publicerad: (2026)
av: Li, You, et al.
Publicerad: (2026)
EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models
av: Huang, Jiacheng, et al.
Publicerad: (2025)
av: Huang, Jiacheng, et al.
Publicerad: (2025)
Liknande verk
-
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
av: Wang, Ziyue, et al.
Publicerad: (2024) -
Thinking with Visual Abstract: Enhancing Multimodal Reasoning via Visual Abstraction
av: Liu, Dairu, et al.
Publicerad: (2025) -
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
av: Wang, Xiaolong, et al.
Publicerad: (2025) -
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
av: Wang, Ziyue, et al.
Publicerad: (2024) -
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
av: Luo, Fuwen, et al.
Publicerad: (2026)