Saved in:
| Main Authors: | Amemiya, Kanon, Yashima, Daichi, Katsumata, Kei, Komatsu, Takumi, Korekata, Ryosuke, Otsuki, Seitaro, Sugiura, Komei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)
by: Yashima, Daichi, et al.
Published: (2024)
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
by: Katsumata, Kei, et al.
Published: (2025)
by: Katsumata, Kei, et al.
Published: (2025)
ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
by: Yashima, Daichi, et al.
Published: (2026)
by: Yashima, Daichi, et al.
Published: (2026)
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)
by: Korekata, Ryosuke, et al.
Published: (2025)
Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)
by: Goko, Miyu, et al.
Published: (2024)
DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)
by: Korekata, Ryosuke, et al.
Published: (2024)
ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
by: Yashima, Daichi, et al.
Published: (2026)
by: Yashima, Daichi, et al.
Published: (2026)
MLLM-as-a-Judge Exhibits Model Preference Bias
by: Koyama, Shuitsu, et al.
Published: (2026)
by: Koyama, Shuitsu, et al.
Published: (2026)
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
by: Matsuda, Kazuki, et al.
Published: (2025)
by: Matsuda, Kazuki, et al.
Published: (2025)
LLM-Free Image Captioning Evaluation in Reference-Flexible Settings
by: Hirano, Shinnosuke, et al.
Published: (2025)
by: Hirano, Shinnosuke, et al.
Published: (2025)
Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation
by: Kogure, Hina, et al.
Published: (2026)
by: Kogure, Hina, et al.
Published: (2026)
HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching
by: Yashima, Daichi, et al.
Published: (2026)
by: Yashima, Daichi, et al.
Published: (2026)
Layer-Wise Relevance Propagation with Conservation Property for ResNet
by: Otsuki, Seitaro, et al.
Published: (2024)
by: Otsuki, Seitaro, et al.
Published: (2024)
AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation
by: Takagi, Yusuke, et al.
Published: (2026)
by: Takagi, Yusuke, et al.
Published: (2026)
GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
by: Katsumata, Kei, et al.
Published: (2025)
by: Katsumata, Kei, et al.
Published: (2025)
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
by: Wada, Yuiga, et al.
Published: (2024)
by: Wada, Yuiga, et al.
Published: (2024)
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
by: Komatsu, Takumi, et al.
Published: (2024)
by: Komatsu, Takumi, et al.
Published: (2024)
Pre-Manipulation Alignment Prediction with Parallel Deep State-Space and Transformer Models
by: Kambara, Motonari, et al.
Published: (2025)
by: Kambara, Motonari, et al.
Published: (2025)
Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)
by: Kambara, Motonari, et al.
Published: (2024)
Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images
by: Nagashima, Shunya, et al.
Published: (2025)
by: Nagashima, Shunya, et al.
Published: (2025)
ZINA: Multimodal Fine-grained Hallucination Detection and Editing
by: Wada, Yuiga, et al.
Published: (2025)
by: Wada, Yuiga, et al.
Published: (2025)
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
by: Nishimura, Takayuki, et al.
Published: (2024)
by: Nishimura, Takayuki, et al.
Published: (2024)
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
by: Matsuda, Kazuki, et al.
Published: (2024)
by: Matsuda, Kazuki, et al.
Published: (2024)
FLARE-SSM: Deep State Space Models with Influence-Balanced Loss for 72-Hour Solar Flare Prediction
by: Takagi, Yusuke, et al.
Published: (2025)
by: Takagi, Yusuke, et al.
Published: (2025)
Co-Scale Cross-Attentional Transformer for Rearrangement Target Detection
by: Matsuo, Haruka, et al.
Published: (2024)
by: Matsuo, Haruka, et al.
Published: (2024)
Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Model
by: Hirano, Shinnosuke, et al.
Published: (2025)
by: Hirano, Shinnosuke, et al.
Published: (2025)
Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding
by: Suzuki, Shuntaro, et al.
Published: (2025)
by: Suzuki, Shuntaro, et al.
Published: (2025)
Fixed Very‐Low‐Dose Oral Immunotherapy in Infants and Toddlers With Low‐Threshold Egg, Milk or Wheat Allergy: A Prospective Cohort Study
by: Katsumasa Kitamura, et al.
Published: (2026)
by: Katsumasa Kitamura, et al.
Published: (2026)
Antigenicity of proteins in cooked egg powder and skim milk powder for children with egg and milk allergies
by: Michihiro Naito, et al.
Published: (2025)
by: Michihiro Naito, et al.
Published: (2025)
MEGState: Phoneme Decoding from Magnetoencephalography Signals
by: Suzuki, Shuntaro, et al.
Published: (2025)
by: Suzuki, Shuntaro, et al.
Published: (2025)
LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation
by: Kambara, Motonari, et al.
Published: (2026)
by: Kambara, Motonari, et al.
Published: (2026)
Leaving berlin / Joseph Kanon
by: Kanon, Joseph
Published: (2015)
by: Kanon, Joseph
Published: (2015)
Toward a holistic tophus assessment in gout clinical trials: What lies beyond tophus count and size?
by: Kanon Jatuworapruk
Published: (2024)
by: Kanon Jatuworapruk
Published: (2024)
Superprotonic Conduction in Donor Co‐Doped Perovskites
by: Kensei Umeda, et al.
Published: (2026)
by: Kensei Umeda, et al.
Published: (2026)
Superprotonic Conduction in Donor Co‐Doped Perovskites
by: Kensei Umeda, et al.
Published: (2026)
by: Kensei Umeda, et al.
Published: (2026)
3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow Estimation
by: Nijhawan, Siddharth, et al.
Published: (2024)
by: Nijhawan, Siddharth, et al.
Published: (2024)
A bordo del Nai'a. Buceando en Fidji
Published: (1998)
Published: (1998)
NaiAD: Initiate Data-Driven Research for LLM Advertising
by: Zhang, Yihang, et al.
Published: (2026)
by: Zhang, Yihang, et al.
Published: (2026)
Zooplankton community in Thi Nai lagoon in the period of 2001-2020
by: Nguyen, Tam Vinh
Published: (2020)
by: Nguyen, Tam Vinh
Published: (2020)
Abundance of pteropods in the Aegean Sea during LIA07, LIA08, LIA09 and LIA10
by: Siokou-Frangou, Ioanna, et al.
Published: (2014)
by: Siokou-Frangou, Ioanna, et al.
Published: (2014)
Similar Items
-
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024) -
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
by: Katsumata, Kei, et al.
Published: (2025) -
ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
by: Yashima, Daichi, et al.
Published: (2026) -
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025) -
Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)