:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nagashima, Shunya, Sugiura, Komei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.07847
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FLARE-SSM: Deep State Space Models with Influence-Balanced Loss for 72-Hour Solar Flare Prediction
by: Takagi, Yusuke, et al.
Published: (2025)

Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding
by: Suzuki, Shuntaro, et al.
Published: (2025)

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
by: Matsuda, Kazuki, et al.
Published: (2024)

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
by: Matsuda, Kazuki, et al.
Published: (2025)

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
by: Wada, Yuiga, et al.
Published: (2024)

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)

ZINA: Multimodal Fine-grained Hallucination Detection and Editing
by: Wada, Yuiga, et al.
Published: (2025)

Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)

IntrinsicWeather: Controllable Weather Editing in Intrinsic Space
by: Zhu, Yixin, et al.
Published: (2025)

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
by: Yashima, Daichi, et al.
Published: (2026)

When Color-Space Decoupling Meets Diffusion for Adverse-Weather Image Restoration
by: Fang, Wenxuan, et al.
Published: (2025)

MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection
by: Lu, Hui, et al.
Published: (2025)

An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation
by: Tan, Dayu, et al.
Published: (2025)

Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)

SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception
by: Foucard, Louis, et al.
Published: (2024)

About an Automating Annotation Method for Robot Markers
by: Uemura, Wataru, et al.
Published: (2026)

Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation
by: Kogure, Hina, et al.
Published: (2026)

Beyond Pedestrians: Caption-Guided CLIP Framework for High-Difficulty Video-based Person Re-Identification
by: Hamano, Shogo, et al.
Published: (2026)

MLLM-as-a-Judge Exhibits Model Preference Bias
by: Koyama, Shuitsu, et al.
Published: (2026)

Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach
by: Sakajo, Haruki, et al.
Published: (2025)

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning
by: Yao, Zhengjian, et al.
Published: (2026)

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation
by: He, Ruozhen, et al.
Published: (2026)

Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction using Weather Image Time Series
by: Salis, Matteo, et al.
Published: (2024)

AllWeatherNet:Unified Image Enhancement for Autonomous Driving under Adverse Weather and Lowlight-conditions
by: Qian, Chenghao, et al.
Published: (2024)

Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Model
by: Hirano, Shinnosuke, et al.
Published: (2025)

WeatherSeg: Weather-Robust Image Segmentation using Teacher-Student Dual Learning and Classifier-Updating Attention
by: Zhang, Zhang, et al.
Published: (2026)

MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction
by: Tang, Shuo, et al.
Published: (2025)

Electrolyzers-HSI: Close-Range Multi-Scene Hyperspectral Imaging Benchmark Dataset
by: Arbash, Elias, et al.
Published: (2025)

Gradient-Guided Parameter Mask for Multi-Scenario Image Restoration Under Adverse Weather
by: Guo, Jilong, et al.
Published: (2024)

Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
by: Nishimura, Takayuki, et al.
Published: (2024)

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)

LLM-Free Image Captioning Evaluation in Reference-Flexible Settings
by: Hirano, Shinnosuke, et al.
Published: (2025)

WeatherReasonSeg: A Benchmark for Weather-Aware Reasoning Segmentation in Visual Language Models
by: Du, Wanjun, et al.
Published: (2026)

Improving the Spatial Resolution of GONG Solar Images to GST Quality Using Deep Learning
by: Li, Chenyang, et al.
Published: (2025)

A Multi-Level Hierarchical Framework for the Classification of Weather Conditions and Hazard Prediction
by: Neelam, Harish
Published: (2024)

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024)

WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation
by: Qian, Chenghao, et al.
Published: (2024)

Deep Learning Models for Coral Bleaching Classification in Multi-Condition Underwater Image Datasets
by: Macrohon, Julio Jerison E., et al.
Published: (2025)

Enhanced Multi-Class Classification of Gastrointestinal Endoscopic Images with Interpretable Deep Learning Model
by: Kamble, Astitva, et al.
Published: (2025)

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression
by: Tu, Hanyue, et al.
Published: (2025)