:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Saito, Kuniaki, Shinoda, Risa, Tanaka, Shohei, Hirasawa, Tosho, Okura, Fumio, Ushiku, Yoshitaka
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.20515
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2026)

SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
by: Shinoda, Risa, et al.
Published: (2024)

BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment
by: Shinoda, Risa, et al.
Published: (2026)

SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts
by: Inadumi, Shun, et al.
Published: (2025)

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)

AgroBench: Vision-Language Model Benchmark in Agriculture
by: Shinoda, Risa, et al.
Published: (2025)

GaussianPlant: Structure-aligned Gaussian Splatting for 3D Reconstruction of Plants
by: Yang, Yang, et al.
Published: (2025)

COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
by: Maeda, Koki, et al.
Published: (2024)

SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2025)

SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024)

Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space
by: Nakagawa, Ren, et al.
Published: (2025)

PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
by: Shinoda, Risa, et al.
Published: (2024)

Evaluating the Capability of Video Question Generation for Expert Knowledge Elicitation
by: Zhang, Huaying, et al.
Published: (2025)

Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
by: Xing, Junhao, et al.
Published: (2025)

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
by: Choong, Wey Yeh, et al.
Published: (2024)

HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
by: Lymperaiou, Maria, et al.
Published: (2025)

CAMOT: Camera Angle-aware Multi-Object Tracking
by: Limanta, Felix, et al.
Published: (2024)

OpenAnimalTracks: A Dataset for Animal Track Recognition
by: Shinoda, Risa, et al.
Published: (2024)

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions
by: Wang, Xinran, et al.
Published: (2026)

EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors
by: Miyazato, Ryuhei, et al.
Published: (2026)

WarrantScore: Modeling Warrants between Claims and Evidence for Substantiation Evaluation in Peer Reviews
by: Mori, Kiyotada, et al.
Published: (2026)

Gaussian Mesh Renderer for Lightweight Differentiable Rendering
by: Liu, Xinpeng, et al.
Published: (2026)

HalLoc: Token-level Localization of Hallucinations for Vision Language Models
by: Park, Eunkyu, et al.
Published: (2025)

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
by: Ohkawa, Takehiko, et al.
Published: (2023)

Am I More Pointwise or Pairwise? Revealing Position Bias in Rubric-Based LLM-as-a-Judge
by: Xu, Yuzheng, et al.
Published: (2026)

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
by: Gu, Yi, et al.
Published: (2025)

Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs
by: Taniguchi, Takara, et al.
Published: (2026)

Unsupervised 3D Human Pose Estimation via Conditional Multi-view Ancestral Sampling
by: Goto, Ryohei, et al.
Published: (2026)

TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation
by: Liu, Xinpeng, et al.
Published: (2024)

PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation
by: Liu, Xinpeng, et al.
Published: (2026)

MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)

AnimalClue: Recognizing Animals by their Traces
by: Shinoda, Risa, et al.
Published: (2025)

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models
by: Qiu, Han, et al.
Published: (2024)

HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
by: Liu, Xinpeng, et al.
Published: (2025)

Mitigating Image Captioning Hallucinations in Vision-Language Models
by: Zhao, Fei, et al.
Published: (2025)

Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)

DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning
by: Wei, Yuancheng, et al.
Published: (2026)

DP-SfM: Dual-Pixel Structure-from-Motion without Scale Ambiguity
by: Makabe, Lilika, et al.
Published: (2026)

Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
by: Makabe, Lilika, et al.
Published: (2025)

Near-light Photometric Stereo with Symmetric Lights
by: Makabe, Lilika, et al.
Published: (2026)