:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Verma, Shreyash, Kesari, Amit, Trivedi, Vinayak, Purwar, Anupam, Jamidar, Ratnesh
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition Computation and Language
Accesso online:	https://arxiv.org/abs/2509.15241
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

VLMs-in-the-Wild: Bridging the Gap Between Academic Benchmarks and Enterprise Reality
di: Bandraupalli, Srihari, et al.
Pubblicazione: (2025)

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
di: Darabi, Nastaran, et al.
Pubblicazione: (2026)

VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
di: Kumar, Divake, et al.
Pubblicazione: (2026)

MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms
di: Jin, Yiqiao, et al.
Pubblicazione: (2024)

Dynamic semantic VSLAM with known and unknown objects
di: Gu, Sanghyoup, et al.
Pubblicazione: (2024)

M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
di: Wang, Hongyu, et al.
Pubblicazione: (2024)

DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images
di: Dalal, Dwip, et al.
Pubblicazione: (2025)

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
di: Fu, Rao, et al.
Pubblicazione: (2024)

Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space
di: Verma, Gaurav, et al.
Pubblicazione: (2024)

See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation
di: Rizwan, Naquee, et al.
Pubblicazione: (2026)

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
di: Xing, Shangyu, et al.
Pubblicazione: (2024)

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
di: Wan, Zhongwei, et al.
Pubblicazione: (2024)

VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools
di: Qi, Ji, et al.
Pubblicazione: (2023)

Pose-Based Sign Language Appearance Transfer
di: Moryossef, Amit, et al.
Pubblicazione: (2024)

Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment
di: Wang, Kangsheng, et al.
Pubblicazione: (2025)

From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
di: Wang, Xiangfeng, et al.
Pubblicazione: (2025)

Interpretable Multimodal Framework for Human-Centered Street Assessment: Integrating Visual-Language Models for Perceptual Urban Diagnostics
di: Lan, HaoTian
Pubblicazione: (2025)

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention
di: Nguyen, Nhi Ngoc-Yen, et al.
Pubblicazione: (2026)

Ham2Pose: Animating Sign Language Notation into Pose Sequences
di: Shalev-Arkushin, Rotem, et al.
Pubblicazione: (2022)

Veagle: Advancements in Multimodal Representation Learning
di: Chawla, Rajat, et al.
Pubblicazione: (2024)

M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction
di: Gan, Chengguang, et al.
Pubblicazione: (2025)

Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
di: Wu, Te-Lin, et al.
Pubblicazione: (2021)

Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
di: Pantazopoulos, Georgios, et al.
Pubblicazione: (2024)

Advancing Toward Robust and Scalable Fingerprint Orientation Estimation: From Gradients to Deep Learning
di: Trivedi, Amit Kumar, et al.
Pubblicazione: (2020)

VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
di: Lim, Qi Zhi, et al.
Pubblicazione: (2025)

Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation
di: Qin, Zhi, et al.
Pubblicazione: (2025)

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
di: Belouadi, Jonas, et al.
Pubblicazione: (2025)

Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents
di: Zhang, Xueqiao, et al.
Pubblicazione: (2025)

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
di: Jiang, Chaoya, et al.
Pubblicazione: (2024)

DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking
di: Song, Shezheng, et al.
Pubblicazione: (2024)

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
di: Chan, Adrian, et al.
Pubblicazione: (2024)

LaRe: Latent Refocusing for Multimodal Reasoning
di: Ma, Jizheng, et al.
Pubblicazione: (2025)

Dual-branch Prompting for Multimodal Machine Translation
di: Wang, Jie, et al.
Pubblicazione: (2025)

Open-Vocabulary Federated Learning with Multimodal Prototyping
di: Zeng, Huimin, et al.
Pubblicazione: (2024)

Grounding Partially-Defined Events in Multimodal Data
di: Sanders, Kate, et al.
Pubblicazione: (2024)

Cooperative Sentiment Agents for Multimodal Sentiment Analysis
di: Wang, Shanmin, et al.
Pubblicazione: (2024)

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
di: Gao, Timin, et al.
Pubblicazione: (2024)

Maya: An Instruction Finetuned Multilingual Multimodal Model
di: Alam, Nahid, et al.
Pubblicazione: (2024)

Reinforcing Multimodal Reasoning Against Visual Degradation
di: Liu, Rui, et al.
Pubblicazione: (2026)

UEval: A Benchmark for Unified Multimodal Generation
di: Li, Bo, et al.
Pubblicazione: (2026)