:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Tran, Huyen T. T., Nguyen, Van-Quang, Alferro, Farros, Liu, Kang-Jun, Okatani, Takayuki
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2603.16179
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

LAMS-Edit: Latent and Attention Mixing with Schedulers for Improved Content Preservation in Diffusion-Based Image and Style Editing
di: Fu, Wingwa, et al.
Pubblicazione: (2026)

TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos
di: Charoenpitaks, Korawat, et al.
Pubblicazione: (2025)

Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models
di: Sun, Li, et al.
Pubblicazione: (2024)

Rethinking Unsupervised Domain Adaptation for Semantic Segmentation
di: Wang, Zhijie, et al.
Pubblicazione: (2022)

Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
di: Charoenpitaks, Korawat, et al.
Pubblicazione: (2023)

An Improved Method for Personalizing Diffusion Models
di: Zeng, Yan, et al.
Pubblicazione: (2024)

Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos
di: Tiwari, Aditi, et al.
Pubblicazione: (2025)

Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images
di: Lu, Xiangyong, et al.
Pubblicazione: (2024)

MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval
di: Sogi, Naoya, et al.
Pubblicazione: (2025)

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
di: Ouyang, Kun, et al.
Pubblicazione: (2024)

Machine Intelligence that Understands Visual and Linguistic Information and Interacts with Humans and Environments
di: Nguyen, Van Quang
Pubblicazione: (2026)

CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning
di: Nguyen, Van-Quang, et al.
Pubblicazione: (2026)

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
di: Nguyen, Quang-Binh, et al.
Pubblicazione: (2025)

RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution
di: Zou, Han, et al.
Pubblicazione: (2023)

Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability
di: Hosoya, Yusuke, et al.
Pubblicazione: (2024)

Rethinking Annotation for Object Detection: Is Annotating Small-size Instances Worth Its Cost?
di: Hosoya, Yusuke, et al.
Pubblicazione: (2024)

Rethinking Open-Set Object Detection: Issues, a New Formulation, and Taxonomy
di: Hosoya, Yusuke, et al.
Pubblicazione: (2022)

360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation
di: Wang, Hai, et al.
Pubblicazione: (2024)

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
di: Yuan, Jiakang, et al.
Pubblicazione: (2025)

KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
di: Pham, Anh-Cuong, et al.
Pubblicazione: (2024)

Action-Agnostic Point-Level Supervision for Temporal Action Detection
di: Yoshida, Shuhei M., et al.
Pubblicazione: (2024)

ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
di: Pham, Huy Quang, et al.
Pubblicazione: (2024)

Explore the Hallucination on Low-level Perception for MLLMs
di: Sun, Yinan, et al.
Pubblicazione: (2024)

MambaU-Lite: A Lightweight Model based on Mamba and Integrated Channel-Spatial Attention for Skin Lesion Segmentation
di: Nguyen, Thi-Nhu-Quynh, et al.
Pubblicazione: (2024)

LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation
di: Tran, Ngoc-Du, et al.
Pubblicazione: (2024)

From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information
di: Jiao, Qirui, et al.
Pubblicazione: (2024)

Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training
di: Liu, Anglin, et al.
Pubblicazione: (2026)

Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline
di: Zuo, Rui, et al.
Pubblicazione: (2025)

TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking
di: Nguyen-Quang, Thuc, et al.
Pubblicazione: (2024)

FreeRet: MLLMs as Training-Free Retrievers
di: Zhu, Yuhan, et al.
Pubblicazione: (2025)

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
di: Kang, Caixin, et al.
Pubblicazione: (2026)

Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach
di: La Quang, Hai, et al.
Pubblicazione: (2026)

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
di: Tang, Yolo Y., et al.
Pubblicazione: (2025)

Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification
di: Vu, Anh Mai, et al.
Pubblicazione: (2025)

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
di: Van Nguyen, Quan, et al.
Pubblicazione: (2024)

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
di: Zhang, Jiarui, et al.
Pubblicazione: (2025)

PAT: Pixel-wise Adaptive Training for Long-tailed Segmentation
di: Do, Khoi, et al.
Pubblicazione: (2024)

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
di: Xiao, Tong, et al.
Pubblicazione: (2025)

Perception-based Image Denoising via Generative Compression
di: Nguyen, Nam, et al.
Pubblicazione: (2026)

Aleatoric Uncertainty Medical Image Segmentation Estimation via Flow Matching
di: Van Nguyen, Phi, et al.
Pubblicazione: (2025)