Saved in:
| Main Authors: | Saito, Kuniaki, Shinoda, Risa, Tanaka, Shohei, Hirasawa, Tosho, Okura, Fumio, Ushiku, Yoshitaka |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.20515 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2026)
by: Saito, Kuniaki, et al.
Published: (2026)
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
by: Shinoda, Risa, et al.
Published: (2024)
by: Shinoda, Risa, et al.
Published: (2024)
BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment
by: Shinoda, Risa, et al.
Published: (2026)
by: Shinoda, Risa, et al.
Published: (2026)
SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts
by: Inadumi, Shun, et al.
Published: (2025)
by: Inadumi, Shun, et al.
Published: (2025)
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)
by: Saito, Kuniaki, et al.
Published: (2025)
AgroBench: Vision-Language Model Benchmark in Agriculture
by: Shinoda, Risa, et al.
Published: (2025)
by: Shinoda, Risa, et al.
Published: (2025)
GaussianPlant: Structure-aligned Gaussian Splatting for 3D Reconstruction of Plants
by: Yang, Yang, et al.
Published: (2025)
by: Yang, Yang, et al.
Published: (2025)
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
by: Maeda, Koki, et al.
Published: (2024)
by: Maeda, Koki, et al.
Published: (2024)
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2025)
by: Tanaka, Shohei, et al.
Published: (2025)
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2024)
by: Tanaka, Shohei, et al.
Published: (2024)
Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space
by: Nakagawa, Ren, et al.
Published: (2025)
by: Nakagawa, Ren, et al.
Published: (2025)
PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
by: Shinoda, Risa, et al.
Published: (2024)
by: Shinoda, Risa, et al.
Published: (2024)
Evaluating the Capability of Video Question Generation for Expert Knowledge Elicitation
by: Zhang, Huaying, et al.
Published: (2025)
by: Zhang, Huaying, et al.
Published: (2025)
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
by: Xing, Junhao, et al.
Published: (2025)
by: Xing, Junhao, et al.
Published: (2025)
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
by: Choong, Wey Yeh, et al.
Published: (2024)
by: Choong, Wey Yeh, et al.
Published: (2024)
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
by: Lymperaiou, Maria, et al.
Published: (2025)
by: Lymperaiou, Maria, et al.
Published: (2025)
CAMOT: Camera Angle-aware Multi-Object Tracking
by: Limanta, Felix, et al.
Published: (2024)
by: Limanta, Felix, et al.
Published: (2024)
OpenAnimalTracks: A Dataset for Animal Track Recognition
by: Shinoda, Risa, et al.
Published: (2024)
by: Shinoda, Risa, et al.
Published: (2024)
DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions
by: Wang, Xinran, et al.
Published: (2026)
by: Wang, Xinran, et al.
Published: (2026)
EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors
by: Miyazato, Ryuhei, et al.
Published: (2026)
by: Miyazato, Ryuhei, et al.
Published: (2026)
WarrantScore: Modeling Warrants between Claims and Evidence for Substantiation Evaluation in Peer Reviews
by: Mori, Kiyotada, et al.
Published: (2026)
by: Mori, Kiyotada, et al.
Published: (2026)
Gaussian Mesh Renderer for Lightweight Differentiable Rendering
by: Liu, Xinpeng, et al.
Published: (2026)
by: Liu, Xinpeng, et al.
Published: (2026)
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
by: Park, Eunkyu, et al.
Published: (2025)
by: Park, Eunkyu, et al.
Published: (2025)
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
by: Ohkawa, Takehiko, et al.
Published: (2023)
by: Ohkawa, Takehiko, et al.
Published: (2023)
Am I More Pointwise or Pairwise? Revealing Position Bias in Rubric-Based LLM-as-a-Judge
by: Xu, Yuzheng, et al.
Published: (2026)
by: Xu, Yuzheng, et al.
Published: (2026)
Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
by: Gu, Yi, et al.
Published: (2025)
by: Gu, Yi, et al.
Published: (2025)
Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs
by: Taniguchi, Takara, et al.
Published: (2026)
by: Taniguchi, Takara, et al.
Published: (2026)
Unsupervised 3D Human Pose Estimation via Conditional Multi-view Ancestral Sampling
by: Goto, Ryohei, et al.
Published: (2026)
by: Goto, Ryohei, et al.
Published: (2026)
TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation
by: Liu, Xinpeng, et al.
Published: (2024)
by: Liu, Xinpeng, et al.
Published: (2024)
PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation
by: Liu, Xinpeng, et al.
Published: (2026)
by: Liu, Xinpeng, et al.
Published: (2026)
MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)
by: Enomoto, Shohei, et al.
Published: (2026)
AnimalClue: Recognizing Animals by their Traces
by: Shinoda, Risa, et al.
Published: (2025)
by: Shinoda, Risa, et al.
Published: (2025)
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models
by: Qiu, Han, et al.
Published: (2024)
by: Qiu, Han, et al.
Published: (2024)
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
by: Liu, Xinpeng, et al.
Published: (2025)
by: Liu, Xinpeng, et al.
Published: (2025)
Mitigating Image Captioning Hallucinations in Vision-Language Models
by: Zhao, Fei, et al.
Published: (2025)
by: Zhao, Fei, et al.
Published: (2025)
Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)
by: Dong, Hongyuan, et al.
Published: (2024)
DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning
by: Wei, Yuancheng, et al.
Published: (2026)
by: Wei, Yuancheng, et al.
Published: (2026)
DP-SfM: Dual-Pixel Structure-from-Motion without Scale Ambiguity
by: Makabe, Lilika, et al.
Published: (2026)
by: Makabe, Lilika, et al.
Published: (2026)
Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
by: Makabe, Lilika, et al.
Published: (2025)
by: Makabe, Lilika, et al.
Published: (2025)
Near-light Photometric Stereo with Symmetric Lights
by: Makabe, Lilika, et al.
Published: (2026)
by: Makabe, Lilika, et al.
Published: (2026)
Similar Items
-
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2026) -
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
by: Shinoda, Risa, et al.
Published: (2024) -
BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment
by: Shinoda, Risa, et al.
Published: (2026) -
SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts
by: Inadumi, Shun, et al.
Published: (2025) -
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)