:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Meng, Ziyang, Dai, Yu, Gong, Zezheng, Guo, Shaoxiong, Tang, Minglong, Wei, Tongquan
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition 68-04 68-04 I.2.7; I.2.10
Online-Zugang:	https://arxiv.org/abs/2406.14056
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
von: Li, Kaixin, et al.
Veröffentlicht: (2025)

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible
von: Zhao, Lepeng, et al.
Veröffentlicht: (2026)

Make Literature-Based Discovery Great Again through Reproducible Pipelines
von: Cestnik, Bojan, et al.
Veröffentlicht: (2025)

Knowledge-Guided Multi-Agent Framework for Automated Requirements Development: A Vision
von: Huang, Jiangping, et al.
Veröffentlicht: (2025)

TowerVision: Understanding and Improving Multilinguality in Vision-Language Models
von: Viveiros, André G., et al.
Veröffentlicht: (2025)

Cost-Effective Attention Mechanisms for Low Resource Settings: Necessity & Sufficiency of Linear Transformations
von: Hosseini, Peyman, et al.
Veröffentlicht: (2024)

Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
von: Sáez, Arnau Igualde, et al.
Veröffentlicht: (2025)

Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution
von: Cornebise, Julien, et al.
Veröffentlicht: (2022)

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
von: Fu, Tianyu, et al.
Veröffentlicht: (2024)

ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization
von: Fang, Chunrong, et al.
Veröffentlicht: (2024)

Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization
von: Sun, Weisong, et al.
Veröffentlicht: (2025)

Source Code Summarization in the Era of Large Language Models
von: Sun, Weisong, et al.
Veröffentlicht: (2024)

Progressive Cross Attention Network for Flood Segmentation using Multispectral Satellite Imagery
von: Feliren, Vicky, et al.
Veröffentlicht: (2025)

Unpacking Hateful Memes: Presupposed Context and False Claims
von: Cai, Weibin, et al.
Veröffentlicht: (2025)

Context-Dependent Affordance Computation in Vision-Language Models
von: Farzulla, Murad
Veröffentlicht: (2026)

ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents
von: Huo, Dongjie, et al.
Veröffentlicht: (2026)

MORQA: Benchmarking Evaluation Metrics for Medical Open-Ended Question Answering
von: Yim, Wen-wai, et al.
Veröffentlicht: (2025)

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology
von: Kim, Soyeon, et al.
Veröffentlicht: (2026)

A Survey on Vision-Language-Action Models for Embodied AI
von: Ma, Yueen, et al.
Veröffentlicht: (2024)

Hateful Meme Detection through Context-Sensitive Prompting and Fine-Grained Labeling
von: Ouyang, Rongxin, et al.
Veröffentlicht: (2024)

torchsom: The Reference PyTorch Library for Self-Organizing Maps
von: Berthier, Louis, et al.
Veröffentlicht: (2025)

HuMoCon: Concept Discovery for Human Motion Understanding
von: Fang, Qihang, et al.
Veröffentlicht: (2025)

Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
von: Tu, Songjun, et al.
Veröffentlicht: (2025)

TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding
von: Zhang, Junwen, et al.
Veröffentlicht: (2025)

Biomedical Visual Instruction Tuning with Clinician Preference Alignment
von: Cui, Hejie, et al.
Veröffentlicht: (2024)

Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
von: Menon, Anjali R., et al.
Veröffentlicht: (2025)

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation
von: Qi, Dekang, et al.
Veröffentlicht: (2026)

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
von: Haque, Md. Asraful, et al.
Veröffentlicht: (2026)

Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
von: Gkountouras, John, et al.
Veröffentlicht: (2025)

ROI-GS: Interest-based Local Quality 3D Gaussian Splatting
von: Bui, Quoc-Anh, et al.
Veröffentlicht: (2025)

ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition
von: Bui, Quoc-Anh, et al.
Veröffentlicht: (2025)

TensLoRA: Tensor Alternatives for Low-Rank Adaptation
von: Marmoret, Axel, et al.
Veröffentlicht: (2025)

ATAAT: Adaptive Threat-Aware Adversarial Tuning Framework against Backdoor Attacks on Vision-Language-Action Models
von: Chen, Kewei, et al.
Veröffentlicht: (2026)

SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
von: Dumpala, Sri Harsha, et al.
Veröffentlicht: (2024)

The MSR-Video to Text Dataset with Clean Annotations
von: Chen, Haoran, et al.
Veröffentlicht: (2021)

Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning
von: Sharma, Aditya, et al.
Veröffentlicht: (2025)

A Prompt Learning Framework for Source Code Summarization
von: Xu, Tingting, et al.
Veröffentlicht: (2023)

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
von: Parcalabescu, Letitia, et al.
Veröffentlicht: (2021)

Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
von: Yanambakkam, Hemanth Teja, et al.
Veröffentlicht: (2025)

VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors
von: Lyu, Wenbo, et al.
Veröffentlicht: (2025)