:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Amemiya, Kanon, Yashima, Daichi, Katsumata, Kei, Komatsu, Takumi, Korekata, Ryosuke, Otsuki, Seitaro, Sugiura, Komei
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.05446
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)

Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
by: Katsumata, Kei, et al.
Published: (2025)

ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
by: Yashima, Daichi, et al.
Published: (2026)

Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)

Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
by: Yashima, Daichi, et al.
Published: (2026)

MLLM-as-a-Judge Exhibits Model Preference Bias
by: Koyama, Shuitsu, et al.
Published: (2026)

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
by: Matsuda, Kazuki, et al.
Published: (2025)

LLM-Free Image Captioning Evaluation in Reference-Flexible Settings
by: Hirano, Shinnosuke, et al.
Published: (2025)

Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation
by: Kogure, Hina, et al.
Published: (2026)

HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching
by: Yashima, Daichi, et al.
Published: (2026)

Layer-Wise Relevance Propagation with Conservation Property for ResNet
by: Otsuki, Seitaro, et al.
Published: (2024)

AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation
by: Takagi, Yusuke, et al.
Published: (2026)

GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
by: Katsumata, Kei, et al.
Published: (2025)

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
by: Wada, Yuiga, et al.
Published: (2024)

Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
by: Komatsu, Takumi, et al.
Published: (2024)

Pre-Manipulation Alignment Prediction with Parallel Deep State-Space and Transformer Models
by: Kambara, Motonari, et al.
Published: (2025)

Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)

Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images
by: Nagashima, Shunya, et al.
Published: (2025)

ZINA: Multimodal Fine-grained Hallucination Detection and Editing
by: Wada, Yuiga, et al.
Published: (2025)

Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
by: Nishimura, Takayuki, et al.
Published: (2024)

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
by: Matsuda, Kazuki, et al.
Published: (2024)

FLARE-SSM: Deep State Space Models with Influence-Balanced Loss for 72-Hour Solar Flare Prediction
by: Takagi, Yusuke, et al.
Published: (2025)

Co-Scale Cross-Attentional Transformer for Rearrangement Target Detection
by: Matsuo, Haruka, et al.
Published: (2024)

Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Model
by: Hirano, Shinnosuke, et al.
Published: (2025)

Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding
by: Suzuki, Shuntaro, et al.
Published: (2025)

Fixed Very‐Low‐Dose Oral Immunotherapy in Infants and Toddlers With Low‐Threshold Egg, Milk or Wheat Allergy: A Prospective Cohort Study
by: Katsumasa Kitamura, et al.
Published: (2026)

Antigenicity of proteins in cooked egg powder and skim milk powder for children with egg and milk allergies
by: Michihiro Naito, et al.
Published: (2025)

MEGState: Phoneme Decoding from Magnetoencephalography Signals
by: Suzuki, Shuntaro, et al.
Published: (2025)

LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation
by: Kambara, Motonari, et al.
Published: (2026)

Leaving berlin / Joseph Kanon
by: Kanon, Joseph
Published: (2015)

Toward a holistic tophus assessment in gout clinical trials: What lies beyond tophus count and size?
by: Kanon Jatuworapruk
Published: (2024)

Superprotonic Conduction in Donor Co‐Doped Perovskites
by: Kensei Umeda, et al.
Published: (2026)

Superprotonic Conduction in Donor Co‐Doped Perovskites
by: Kensei Umeda, et al.
Published: (2026)

3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow Estimation
by: Nijhawan, Siddharth, et al.
Published: (2024)

A bordo del Nai'a. Buceando en Fidji
Published: (1998)

NaiAD: Initiate Data-Driven Research for LLM Advertising
by: Zhang, Yihang, et al.
Published: (2026)

Zooplankton community in Thi Nai lagoon in the period of 2001-2020
by: Nguyen, Tam Vinh
Published: (2020)

Abundance of pteropods in the Aegean Sea during LIA07, LIA08, LIA09 and LIA10
by: Siokou-Frangou, Ioanna, et al.
Published: (2014)