:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ko, Dohwan, Kim, Sihyeon, Suh, Yumin, G, Vijay Kumar B., Yoon, Minseo, Chandraker, Manmohan, Kim, Hyunwoo J.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.19355
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
by: Aich, Abhishek, et al.
Published: (2024)

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
by: Sharan, S P, et al.
Published: (2023)

Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023)

Tuned Contrastive Learning
by: Animesh, Chaitanya, et al.
Published: (2023)

LLaMo: Large Language Model-based Molecular Graph Assistant
by: Park, Jinyoung, et al.
Published: (2024)

Taming Self-Training for Open-Vocabulary Object Detection
by: Zhao, Shiyu, et al.
Published: (2023)

MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models
by: Ko, Dohwan, et al.
Published: (2026)

Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation
by: Yao, Manyi, et al.
Published: (2024)

Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
by: Ko, Dohwan, et al.
Published: (2025)

ST-LINK: Spatially-Aware Large Language Models for Spatio-Temporal Forecasting
by: Jeon, Hyotaek, et al.
Published: (2025)

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
by: Khan, Zaid, et al.
Published: (2024)

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
by: Ke, Fucai, et al.
Published: (2025)

DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
by: Park, Dogyun, et al.
Published: (2024)

DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
by: Choi, Joonmyung, et al.
Published: (2026)

Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
by: Kalluri, Tarun, et al.
Published: (2024)

What to Test Next: Interpretable Coverage Gap Discovery in Driving VLMs
by: Aich, Abhishek, et al.
Published: (2026)

Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling
by: Kwon, Minseo, et al.
Published: (2025)

UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
by: Kalluri, Tarun, et al.
Published: (2024)

Locally Orderless Images for Optimization in Differentiable Rendering
by: Mehta, Ishit, et al.
Published: (2025)

RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time
by: Ghosh, Anurag, et al.
Published: (2026)

Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift
by: Kim, Hyunwoo, et al.
Published: (2026)

Latent Bayesian Optimization via Autoregressive Normalizing Flows
by: Lee, Seunghun, et al.
Published: (2025)

Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
by: Lim, Geuntaek, et al.
Published: (2024)

Latent Preference Modeling for Cross-Session Personalized Tool Calling
by: Yoon, Yejin, et al.
Published: (2026)

Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection
by: Kim, Jeehong, et al.
Published: (2025)

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis
by: Park, Jinho, et al.
Published: (2026)

STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models
by: Nguyen-Nhu, Tinh-Anh, et al.
Published: (2025)

Constant Acceleration Flow
by: Park, Dogyun, et al.
Published: (2024)

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)

NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code
by: Jain, Seemandhar, et al.
Published: (2026)

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
by: Li, Haodong, et al.
Published: (2026)

PhyCo: Learning Controllable Physical Priors for Generative Motion
by: Narayanan, Sriram, et al.
Published: (2026)

DISPATCH: Distilling Selective Patches for Speech Enhancement
by: Kim, Dohwan, et al.
Published: (2025)

LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
by: Chang, Wei-Jer, et al.
Published: (2025)

VideoMamba: Spatio-Temporal Selective State Space Model
by: Park, Jinyoung, et al.
Published: (2024)

SLIP & ETHICS: Graduated Intervention for AI Emotional Companions
by: Kim, Minseo
Published: (2026)

Instantaneous Perception of Moving Objects in 3D
by: Liu, Di, et al.
Published: (2024)

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
by: Kim, Hyunjae, et al.
Published: (2024)

Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
by: Kim, Jongha, et al.
Published: (2026)

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
by: He, Yun, et al.
Published: (2025)