Saved in:
| Main Authors: | He, Yuxin, Li, An, Xue, Cheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06619 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CauSight: Learning to Supersense for Visual Causal Discovery
by: Zhang, Yize, et al.
Published: (2025)
by: Zhang, Yize, et al.
Published: (2025)
Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives
by: Park, Ji-jun, et al.
Published: (2024)
by: Park, Ji-jun, et al.
Published: (2024)
Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer
by: Yardi, Yash, et al.
Published: (2025)
by: Yardi, Yash, et al.
Published: (2025)
RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning
by: Zuo, Jiacheng, et al.
Published: (2025)
by: Zuo, Jiacheng, et al.
Published: (2025)
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
by: Bialer, Oded, et al.
Published: (2024)
by: Bialer, Oded, et al.
Published: (2024)
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
by: Hu, Ming, et al.
Published: (2024)
by: Hu, Ming, et al.
Published: (2024)
Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction
by: Bejerano, Emily, et al.
Published: (2026)
by: Bejerano, Emily, et al.
Published: (2026)
CauSkelNet: Causal Representation Learning for Human Behaviour Analysis
by: Gu, Xingrui, et al.
Published: (2024)
by: Gu, Xingrui, et al.
Published: (2024)
Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
by: Liu, Kuanrong, et al.
Published: (2025)
by: Liu, Kuanrong, et al.
Published: (2025)
Bridge the Modality and Capability Gaps in Vision-Language Model Selection
by: Yi, Chao, et al.
Published: (2024)
by: Yi, Chao, et al.
Published: (2024)
PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models
by: Li, Yuliang, et al.
Published: (2026)
by: Li, Yuliang, et al.
Published: (2026)
MaskedCLIP: Bridging the Masked and CLIP Space for Semi-Supervised Medical Vision-Language Pre-training
by: Zhu, Lei, et al.
Published: (2025)
by: Zhu, Lei, et al.
Published: (2025)
Driving with DINO: Vision Foundation Features as a Unified Bridge for Sim-to-Real Generation in Autonomous Driving
by: Chen, Xuyang, et al.
Published: (2026)
by: Chen, Xuyang, et al.
Published: (2026)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection
by: Tan, Kaiyuan, et al.
Published: (2025)
by: Tan, Kaiyuan, et al.
Published: (2025)
An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models
by: Maack, Lennart, et al.
Published: (2026)
by: Maack, Lennart, et al.
Published: (2026)
Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation
by: Wang, Zihan, et al.
Published: (2024)
by: Wang, Zihan, et al.
Published: (2024)
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
by: Wang, Jiapeng, et al.
Published: (2024)
by: Wang, Jiapeng, et al.
Published: (2024)
A Causality-Inspired Model for Intima-Media Thickening Assessment in Ultrasound Videos
by: Gao, Shuo, et al.
Published: (2025)
by: Gao, Shuo, et al.
Published: (2025)
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models
by: Jin, Juseong, et al.
Published: (2024)
by: Jin, Juseong, et al.
Published: (2024)
The Abstraction Gap in Vision-Language Causal Reasoning
by: Hoang, Chinh, et al.
Published: (2026)
by: Hoang, Chinh, et al.
Published: (2026)
High-Fidelity Digital Twins for Bridging the Sim2Real Gap in LiDAR-Based ITS Perception
by: Shahbaz, Muhammad, et al.
Published: (2025)
by: Shahbaz, Muhammad, et al.
Published: (2025)
Embodied Scene Understanding for Vision Language Models via MetaVQA
by: Wang, Weizhen, et al.
Published: (2025)
by: Wang, Weizhen, et al.
Published: (2025)
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
by: Wu, Haitao, et al.
Published: (2025)
by: Wu, Haitao, et al.
Published: (2025)
CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense
by: Zhang, Mingkun, et al.
Published: (2024)
by: Zhang, Mingkun, et al.
Published: (2024)
Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
by: Jiang, Xixi, et al.
Published: (2025)
by: Jiang, Xixi, et al.
Published: (2025)
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
by: Hu, Xiaoxing, et al.
Published: (2025)
by: Hu, Xiaoxing, et al.
Published: (2025)
VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning
by: Li, Jingyao, et al.
Published: (2024)
by: Li, Jingyao, et al.
Published: (2024)
Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment
by: Rawal, Parth, et al.
Published: (2023)
by: Rawal, Parth, et al.
Published: (2023)
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
by: Zhu, Wencheng, et al.
Published: (2025)
by: Zhu, Wencheng, et al.
Published: (2025)
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
by: Cheng, Ying, et al.
Published: (2025)
by: Cheng, Ying, et al.
Published: (2025)
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos
by: Xia, Chong, et al.
Published: (2026)
by: Xia, Chong, et al.
Published: (2026)
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
by: Du, Yiyang, et al.
Published: (2026)
by: Du, Yiyang, et al.
Published: (2026)
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
by: Maaz, Muhammad, et al.
Published: (2023)
by: Maaz, Muhammad, et al.
Published: (2023)
Mitigating Hallucinations in Large Vision-Language Models via Causal Route Gating
by: Cheng, Zhe, et al.
Published: (2026)
by: Cheng, Zhe, et al.
Published: (2026)
Causality Model for Semantic Understanding on Videos
by: Yicong, Li
Published: (2025)
by: Yicong, Li
Published: (2025)
InfoCLIP: Bridging Vision-Language Pretraining and Open-Vocabulary Semantic Segmentation via Information-Theoretic Alignment Transfer
by: Yuan, Muyao, et al.
Published: (2025)
by: Yuan, Muyao, et al.
Published: (2025)
Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion
by: Wang, Xiao, et al.
Published: (2023)
by: Wang, Xiao, et al.
Published: (2023)
Q-CLIP: Unleashing the Power of Vision-Language Models for Video Quality Assessment through Unified Cross-Modal Adaptation
by: Mi, Yachun, et al.
Published: (2025)
by: Mi, Yachun, et al.
Published: (2025)
Similar Items
-
CauSight: Learning to Supersense for Visual Causal Discovery
by: Zhang, Yize, et al.
Published: (2025) -
Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives
by: Park, Ji-jun, et al.
Published: (2024) -
Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer
by: Yardi, Yash, et al.
Published: (2025) -
RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning
by: Zuo, Jiacheng, et al.
Published: (2025) -
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
by: Bialer, Oded, et al.
Published: (2024)