Saved in:
| Main Authors: | Hu, Yangliu, Song, Zikai, Feng, Na, Luo, Yawei, Yu, Junqing, Chen, Yi-Ping Phoebe, Yang, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.07745 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset
by: Zinnen, Mathias, et al.
Published: (2025)
by: Zinnen, Mathias, et al.
Published: (2025)
Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos
by: Zhang, Junbin, et al.
Published: (2022)
by: Zhang, Junbin, et al.
Published: (2022)
VDPP: Video Depth Post-Processing for Speed and Scalability
by: Yoon, Daewon, et al.
Published: (2026)
by: Yoon, Daewon, et al.
Published: (2026)
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
by: Jin, Xiaofeng, et al.
Published: (2025)
by: Jin, Xiaofeng, et al.
Published: (2025)
VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding
by: Yang, Baoyao, et al.
Published: (2025)
by: Yang, Baoyao, et al.
Published: (2025)
A Multi-Camera Vision-Based Approach for Fine-Grained Assembly Quality Control
by: Nazeri, Ali, et al.
Published: (2025)
by: Nazeri, Ali, et al.
Published: (2025)
Unlocking UML Class Diagram Understanding in Vision Language Models
by: Naboichenko, Artem, et al.
Published: (2026)
by: Naboichenko, Artem, et al.
Published: (2026)
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model
by: Ko, Hyun-kyu, et al.
Published: (2026)
by: Ko, Hyun-kyu, et al.
Published: (2026)
Rethinking Visual Intelligence: Insights from Video Pretraining
by: Acuaviva, Pablo, et al.
Published: (2025)
by: Acuaviva, Pablo, et al.
Published: (2025)
Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models
by: Gautam, Sushant, et al.
Published: (2025)
by: Gautam, Sushant, et al.
Published: (2025)
AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models
by: Zarei, Mohammad, et al.
Published: (2025)
by: Zarei, Mohammad, et al.
Published: (2025)
Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
RailSafeNet: Visual Scene Understanding for Tram Safety
by: Valach, Ondřej, et al.
Published: (2025)
by: Valach, Ondřej, et al.
Published: (2025)
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)
by: Su, Yuetong, et al.
Published: (2025)
Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing
by: Komurcu, Kursat, et al.
Published: (2026)
by: Komurcu, Kursat, et al.
Published: (2026)
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
by: Tencent HY Team
Published: (2026)
by: Tencent HY Team
Published: (2026)
Learning through Creation: A Hash-Free Framework for On-the-Fly Category Discovery
by: Zhang, Bohan, et al.
Published: (2026)
by: Zhang, Bohan, et al.
Published: (2026)
Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)
by: Qesaraku, Bjorna, et al.
Published: (2025)
From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation
by: Chen, Jingkun, et al.
Published: (2025)
by: Chen, Jingkun, et al.
Published: (2025)
Event-ECC: Asynchronous Tracking of Events with Continuous Optimization
by: Zafeiri, Maria, et al.
Published: (2024)
by: Zafeiri, Maria, et al.
Published: (2024)
Decoder Generates Manufacturable Structures: A Framework for 3D-Printable Object Synthesis
by: Kumar, Abhishek
Published: (2026)
by: Kumar, Abhishek
Published: (2026)
DSER: Spectral Epipolar Representation for Efficient Light Field Depth Estimation
by: Mohammad, Noor Islam S., et al.
Published: (2025)
by: Mohammad, Noor Islam S., et al.
Published: (2025)
Hierarchical Spatial Algorithms for High-Resolution Image Quantization and Feature Extraction
by: Mohammad, Noor Islam S.
Published: (2025)
by: Mohammad, Noor Islam S.
Published: (2025)
METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark
by: Yang, Xu, et al.
Published: (2025)
by: Yang, Xu, et al.
Published: (2025)
TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection
by: Siddiqui, Yousuf Ahmed, et al.
Published: (2025)
by: Siddiqui, Yousuf Ahmed, et al.
Published: (2025)
ROI-GS: Interest-based Local Quality 3D Gaussian Splatting
by: Bui, Quoc-Anh, et al.
Published: (2025)
by: Bui, Quoc-Anh, et al.
Published: (2025)
ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition
by: Bui, Quoc-Anh, et al.
Published: (2025)
by: Bui, Quoc-Anh, et al.
Published: (2025)
VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations
by: Gautam, Sushant, et al.
Published: (2026)
by: Gautam, Sushant, et al.
Published: (2026)
Image-based Facial Rig Inversion
by: Yang, Tianxiang, et al.
Published: (2025)
by: Yang, Tianxiang, et al.
Published: (2025)
The Impact of Image Resolution on Face Detection: A Comparative Analysis of MTCNN, YOLOv XI and YOLOv XII models
by: Ömercikoğlu, Ahmet Can, et al.
Published: (2025)
by: Ömercikoğlu, Ahmet Can, et al.
Published: (2025)
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
by: Yanambakkam, Hemanth Teja, et al.
Published: (2025)
by: Yanambakkam, Hemanth Teja, et al.
Published: (2025)
Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
by: Marian, Vasile, et al.
Published: (2026)
by: Marian, Vasile, et al.
Published: (2026)
VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors
by: Lyu, Wenbo, et al.
Published: (2025)
by: Lyu, Wenbo, et al.
Published: (2025)
HEDGE: Hallucination Estimation via Dense Geometric Entropy for VQA with Vision-Language Models
by: Gautam, Sushant, et al.
Published: (2025)
by: Gautam, Sushant, et al.
Published: (2025)
LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs
by: Lu, Hongyu, et al.
Published: (2026)
by: Lu, Hongyu, et al.
Published: (2026)
Yolo-Key-6D: Single Stage Monocular 6D Pose Estimation with Keypoint Enhancements
by: Çetiner, Kemal Alperen, et al.
Published: (2026)
by: Çetiner, Kemal Alperen, et al.
Published: (2026)
Detecting 3D Line Segments for 6DoF Pose Estimation with Limited Data
by: Mok, Matej, et al.
Published: (2026)
by: Mok, Matej, et al.
Published: (2026)
GazeD: Context-Aware Diffusion for Accurate 3D Gaze Estimation
by: Catalini, Riccardo, et al.
Published: (2026)
by: Catalini, Riccardo, et al.
Published: (2026)
BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin
by: Fang, Qihang, et al.
Published: (2024)
by: Fang, Qihang, et al.
Published: (2024)
Diffusion Features for Zero-Shot 6DoF Object Pose Estimation
by: Von Gimborn, Bernd, et al.
Published: (2024)
by: Von Gimborn, Bernd, et al.
Published: (2024)
Similar Items
-
Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset
by: Zinnen, Mathias, et al.
Published: (2025) -
Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos
by: Zhang, Junbin, et al.
Published: (2022) -
VDPP: Video Depth Post-Processing for Speed and Scalability
by: Yoon, Daewon, et al.
Published: (2026) -
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
by: Jin, Xiaofeng, et al.
Published: (2025) -
VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding
by: Yang, Baoyao, et al.
Published: (2025)