Saved in:
| Main Authors: | Han, Leekyeung, Min, Hyunji, Hwangbo, Gyeom, Choi, Jonghyun, Seo, Paul Hongsuck |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.12894 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Robust Image Self-Recovery against Tampering using Watermark Generation with Pixel Shuffling
by: Kim, Minyoung, et al.
Published: (2025)
by: Kim, Minyoung, et al.
Published: (2025)
Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment
by: Park, Jonghyun, et al.
Published: (2025)
by: Park, Jonghyun, et al.
Published: (2025)
Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos
by: Kim, Youngseo, et al.
Published: (2025)
by: Kim, Youngseo, et al.
Published: (2025)
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
by: Yu, Seonghoon, et al.
Published: (2024)
by: Yu, Seonghoon, et al.
Published: (2024)
Multi-Granularity Video Object Segmentation
by: Lim, Sangbeom, et al.
Published: (2024)
by: Lim, Sangbeom, et al.
Published: (2024)
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models
by: Lee, Seung-jae, et al.
Published: (2025)
by: Lee, Seung-jae, et al.
Published: (2025)
Learning Correlation Structures for Vision Transformers
by: Kim, Manjin, et al.
Published: (2024)
by: Kim, Manjin, et al.
Published: (2024)
V$^2$Dial: Unification of Video and Visual Dialog via Multimodal Experts
by: Abdessaied, Adnen, et al.
Published: (2025)
by: Abdessaied, Adnen, et al.
Published: (2025)
Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision
by: Kim, Dohyun, et al.
Published: (2025)
by: Kim, Dohyun, et al.
Published: (2025)
GaussNav: Gaussian Splatting for Visual Navigation
by: Lei, Xiaohan, et al.
Published: (2024)
by: Lei, Xiaohan, et al.
Published: (2024)
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
by: Huang, Minbin, et al.
Published: (2024)
by: Huang, Minbin, et al.
Published: (2024)
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
by: Seo, Minhyuk, et al.
Published: (2024)
by: Seo, Minhyuk, et al.
Published: (2024)
OASIS: Online Sample Selection for Continual Visual Instruction Tuning
by: Lee, Minjae, et al.
Published: (2025)
by: Lee, Minjae, et al.
Published: (2025)
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
by: Cho, Seokju, et al.
Published: (2023)
by: Cho, Seokju, et al.
Published: (2023)
Spectral-Adaptive Modulation Networks for Visual Perception
by: Yun, Guhnoo, et al.
Published: (2025)
by: Yun, Guhnoo, et al.
Published: (2025)
CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation
by: Hao, Haihong, et al.
Published: (2025)
by: Hao, Haihong, et al.
Published: (2025)
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025)
by: Lee, Dosung, et al.
Published: (2025)
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
by: Shin, Heeseong, et al.
Published: (2024)
by: Shin, Heeseong, et al.
Published: (2024)
GOAT: A Training Framework for Goal-Oriented Agent with Tools
by: Min, Hyunji, et al.
Published: (2025)
by: Min, Hyunji, et al.
Published: (2025)
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
by: Lee, Young-Jun, et al.
Published: (2022)
by: Lee, Young-Jun, et al.
Published: (2022)
UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories
by: Mei, Yanghong, et al.
Published: (2025)
by: Mei, Yanghong, et al.
Published: (2025)
Multi-Level Knowledge Distillation and Dynamic Self-Supervised Learning for Continual Learning
by: Kim, Taeheon, et al.
Published: (2025)
by: Kim, Taeheon, et al.
Published: (2025)
Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
by: Kim, Chaehyun, et al.
Published: (2025)
by: Kim, Chaehyun, et al.
Published: (2025)
Nav-R1: Reasoning and Navigation in Embodied Scenes
by: Liu, Qingxiang, et al.
Published: (2025)
by: Liu, Qingxiang, et al.
Published: (2025)
MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning
by: Xu, Tianyu, et al.
Published: (2025)
by: Xu, Tianyu, et al.
Published: (2025)
PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps
by: Long, Junlin, et al.
Published: (2026)
by: Long, Junlin, et al.
Published: (2026)
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
by: Qiao, Yanyuan, et al.
Published: (2025)
by: Qiao, Yanyuan, et al.
Published: (2025)
TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions
by: Jeon, Dongjae, et al.
Published: (2025)
by: Jeon, Dongjae, et al.
Published: (2025)
GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers
by: Hwang, Hochul, et al.
Published: (2025)
by: Hwang, Hochul, et al.
Published: (2025)
ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness
by: Atha, Deegan, et al.
Published: (2024)
by: Atha, Deegan, et al.
Published: (2024)
Hyp2Nav: Hyperbolic Planning and Curiosity for Crowd Navigation
by: di Melendugno, Guido Maria D'Amely, et al.
Published: (2024)
by: di Melendugno, Guido Maria D'Amely, et al.
Published: (2024)
CoNav: A Benchmark for Human-Centered Collaborative Navigation
by: Li, Changhao, et al.
Published: (2024)
by: Li, Changhao, et al.
Published: (2024)
FOM-Nav: Frontier-Object Maps for Object Goal Navigation
by: Chabal, Thomas, et al.
Published: (2025)
by: Chabal, Thomas, et al.
Published: (2025)
Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
by: Ahn, Daechul, et al.
Published: (2024)
by: Ahn, Daechul, et al.
Published: (2024)
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
by: Soni, Sagar, et al.
Published: (2024)
by: Soni, Sagar, et al.
Published: (2024)
OctoNav: Towards Generalist Embodied Navigation
by: Gao, Chen, et al.
Published: (2025)
by: Gao, Chen, et al.
Published: (2025)
LangNav: Language as a Perceptual Representation for Navigation
by: Pan, Bowen, et al.
Published: (2023)
by: Pan, Bowen, et al.
Published: (2023)
SR-Nav: Spatial Relationships Matter for Zero-shot Object Goal Navigation
by: Fang, Leyuan, et al.
Published: (2026)
by: Fang, Leyuan, et al.
Published: (2026)
WarNav: An Autonomous Driving Benchmark for Segmentation of Navigable Zones in War Scenes
by: Graviers, Marc-Emmanuel Coupvent des, et al.
Published: (2025)
by: Graviers, Marc-Emmanuel Coupvent des, et al.
Published: (2025)
NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation
by: Xu, Peiran, et al.
Published: (2025)
by: Xu, Peiran, et al.
Published: (2025)
Similar Items
-
Robust Image Self-Recovery against Tampering using Watermark Generation with Pixel Shuffling
by: Kim, Minyoung, et al.
Published: (2025) -
Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment
by: Park, Jonghyun, et al.
Published: (2025) -
Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos
by: Kim, Youngseo, et al.
Published: (2025) -
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
by: Yu, Seonghoon, et al.
Published: (2024) -
Multi-Granularity Video Object Segmentation
by: Lim, Sangbeom, et al.
Published: (2024)