Saved in:
| Main Authors: | Ko, Dohwan, Kim, Sihyeon, Suh, Yumin, G, Vijay Kumar B., Yoon, Minseo, Chandraker, Manmohan, Kim, Hyunwoo J. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.19355 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
by: Aich, Abhishek, et al.
Published: (2024)
by: Aich, Abhishek, et al.
Published: (2024)
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
by: Sharan, S P, et al.
Published: (2023)
by: Sharan, S P, et al.
Published: (2023)
Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023)
by: Zhao, Shiyu, et al.
Published: (2023)
Tuned Contrastive Learning
by: Animesh, Chaitanya, et al.
Published: (2023)
by: Animesh, Chaitanya, et al.
Published: (2023)
LLaMo: Large Language Model-based Molecular Graph Assistant
by: Park, Jinyoung, et al.
Published: (2024)
by: Park, Jinyoung, et al.
Published: (2024)
Taming Self-Training for Open-Vocabulary Object Detection
by: Zhao, Shiyu, et al.
Published: (2023)
by: Zhao, Shiyu, et al.
Published: (2023)
MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models
by: Ko, Dohwan, et al.
Published: (2026)
by: Ko, Dohwan, et al.
Published: (2026)
Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation
by: Yao, Manyi, et al.
Published: (2024)
by: Yao, Manyi, et al.
Published: (2024)
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
by: Ko, Dohwan, et al.
Published: (2025)
by: Ko, Dohwan, et al.
Published: (2025)
ST-LINK: Spatially-Aware Large Language Models for Spatio-Temporal Forecasting
by: Jeon, Hyotaek, et al.
Published: (2025)
by: Jeon, Hyotaek, et al.
Published: (2025)
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
by: Khan, Zaid, et al.
Published: (2024)
by: Khan, Zaid, et al.
Published: (2024)
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
by: Ke, Fucai, et al.
Published: (2025)
by: Ke, Fucai, et al.
Published: (2025)
DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
by: Park, Dogyun, et al.
Published: (2024)
by: Park, Dogyun, et al.
Published: (2024)
DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
by: Choi, Joonmyung, et al.
Published: (2026)
by: Choi, Joonmyung, et al.
Published: (2026)
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
by: Kalluri, Tarun, et al.
Published: (2024)
by: Kalluri, Tarun, et al.
Published: (2024)
What to Test Next: Interpretable Coverage Gap Discovery in Driving VLMs
by: Aich, Abhishek, et al.
Published: (2026)
by: Aich, Abhishek, et al.
Published: (2026)
Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling
by: Kwon, Minseo, et al.
Published: (2025)
by: Kwon, Minseo, et al.
Published: (2025)
UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
by: Kalluri, Tarun, et al.
Published: (2024)
by: Kalluri, Tarun, et al.
Published: (2024)
Locally Orderless Images for Optimization in Differentiable Rendering
by: Mehta, Ishit, et al.
Published: (2025)
by: Mehta, Ishit, et al.
Published: (2025)
RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time
by: Ghosh, Anurag, et al.
Published: (2026)
by: Ghosh, Anurag, et al.
Published: (2026)
Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift
by: Kim, Hyunwoo, et al.
Published: (2026)
by: Kim, Hyunwoo, et al.
Published: (2026)
Latent Bayesian Optimization via Autoregressive Normalizing Flows
by: Lee, Seunghun, et al.
Published: (2025)
by: Lee, Seunghun, et al.
Published: (2025)
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
by: Lim, Geuntaek, et al.
Published: (2024)
by: Lim, Geuntaek, et al.
Published: (2024)
Latent Preference Modeling for Cross-Session Personalized Tool Calling
by: Yoon, Yejin, et al.
Published: (2026)
by: Yoon, Yejin, et al.
Published: (2026)
Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection
by: Kim, Jeehong, et al.
Published: (2025)
by: Kim, Jeehong, et al.
Published: (2025)
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis
by: Park, Jinho, et al.
Published: (2026)
by: Park, Jinho, et al.
Published: (2026)
STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models
by: Nguyen-Nhu, Tinh-Anh, et al.
Published: (2025)
by: Nguyen-Nhu, Tinh-Anh, et al.
Published: (2025)
Constant Acceleration Flow
by: Park, Dogyun, et al.
Published: (2024)
by: Park, Dogyun, et al.
Published: (2024)
iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)
by: Yao, Manyi, et al.
Published: (2025)
NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code
by: Jain, Seemandhar, et al.
Published: (2026)
by: Jain, Seemandhar, et al.
Published: (2026)
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
by: Li, Haodong, et al.
Published: (2026)
by: Li, Haodong, et al.
Published: (2026)
PhyCo: Learning Controllable Physical Priors for Generative Motion
by: Narayanan, Sriram, et al.
Published: (2026)
by: Narayanan, Sriram, et al.
Published: (2026)
DISPATCH: Distilling Selective Patches for Speech Enhancement
by: Kim, Dohwan, et al.
Published: (2025)
by: Kim, Dohwan, et al.
Published: (2025)
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
by: Chang, Wei-Jer, et al.
Published: (2025)
by: Chang, Wei-Jer, et al.
Published: (2025)
VideoMamba: Spatio-Temporal Selective State Space Model
by: Park, Jinyoung, et al.
Published: (2024)
by: Park, Jinyoung, et al.
Published: (2024)
SLIP & ETHICS: Graduated Intervention for AI Emotional Companions
by: Kim, Minseo
Published: (2026)
by: Kim, Minseo
Published: (2026)
Instantaneous Perception of Moving Objects in 3D
by: Liu, Di, et al.
Published: (2024)
by: Liu, Di, et al.
Published: (2024)
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
by: Kim, Hyunjae, et al.
Published: (2024)
by: Kim, Hyunjae, et al.
Published: (2024)
Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
by: Kim, Jongha, et al.
Published: (2026)
by: Kim, Jongha, et al.
Published: (2026)
LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
by: He, Yun, et al.
Published: (2025)
by: He, Yun, et al.
Published: (2025)
Similar Items
-
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
by: Aich, Abhishek, et al.
Published: (2024) -
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
by: Sharan, S P, et al.
Published: (2023) -
Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023) -
Tuned Contrastive Learning
by: Animesh, Chaitanya, et al.
Published: (2023) -
LLaMo: Large Language Model-based Molecular Graph Assistant
by: Park, Jinyoung, et al.
Published: (2024)