Saved in:
| Main Author: | Lyu, Shijie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10016 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
by: Gebremedhin, Tekleab G., et al.
Published: (2025)
by: Gebremedhin, Tekleab G., et al.
Published: (2025)
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
by: Foss, Aaron, et al.
Published: (2025)
by: Foss, Aaron, et al.
Published: (2025)
THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
by: Ioan, Calin Teodor
Published: (2025)
by: Ioan, Calin Teodor
Published: (2025)
Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
by: Chopra, Agamdeep S., et al.
Published: (2026)
by: Chopra, Agamdeep S., et al.
Published: (2026)
A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection
by: Le, Van-Truong, et al.
Published: (2026)
by: Le, Van-Truong, et al.
Published: (2026)
Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
by: Berlia, Anushree
Published: (2026)
by: Berlia, Anushree
Published: (2026)
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
by: Ge, Junyao, et al.
Published: (2024)
by: Ge, Junyao, et al.
Published: (2024)
ChartComplete: A Taxonomy-based Inclusive Chart Dataset
by: Mustapha, Ahmad, et al.
Published: (2026)
by: Mustapha, Ahmad, et al.
Published: (2026)
Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing
by: Sim, Sangcheol
Published: (2026)
by: Sim, Sangcheol
Published: (2026)
GenMatter: Perceiving Physical Objects with Generative Matter Models
by: Li, Eric, et al.
Published: (2026)
by: Li, Eric, et al.
Published: (2026)
Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS
by: Guo, Zhihao, et al.
Published: (2025)
by: Guo, Zhihao, et al.
Published: (2025)
From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
by: Gonzalez, Leonardo
Published: (2026)
by: Gonzalez, Leonardo
Published: (2026)
Context in object detection: a systematic literature review
by: Jamali, Mahtab, et al.
Published: (2025)
by: Jamali, Mahtab, et al.
Published: (2025)
LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
by: Vasilcoiu, Ana, et al.
Published: (2025)
by: Vasilcoiu, Ana, et al.
Published: (2025)
Beyond Few-shot Object Detection: A Detailed Survey
by: Chudasama, Vishal, et al.
Published: (2024)
by: Chudasama, Vishal, et al.
Published: (2024)
Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
by: Kocsis, Peter, et al.
Published: (2025)
by: Kocsis, Peter, et al.
Published: (2025)
IntrinsiX: High-Quality PBR Generation using Image Priors
by: Kocsis, Peter, et al.
Published: (2025)
by: Kocsis, Peter, et al.
Published: (2025)
HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
by: Jin, Haopeng, et al.
Published: (2026)
by: Jin, Haopeng, et al.
Published: (2026)
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
by: Kocsis, Peter, et al.
Published: (2023)
by: Kocsis, Peter, et al.
Published: (2023)
A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video
by: Lucas, Andrea Filiberto, et al.
Published: (2026)
by: Lucas, Andrea Filiberto, et al.
Published: (2026)
4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding
by: Barhdadi, Mohamed Rayan, et al.
Published: (2026)
by: Barhdadi, Mohamed Rayan, et al.
Published: (2026)
StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)
by: Oliveira, Daniel, et al.
Published: (2026)
Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022)
by: Oliveira, Daniel A. P., et al.
Published: (2022)
Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
by: Gurbindo, Unai, et al.
Published: (2025)
by: Gurbindo, Unai, et al.
Published: (2025)
Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors
by: Grönquist, Peter, et al.
Published: (2023)
by: Grönquist, Peter, et al.
Published: (2023)
SelvaBox: A high-resolution dataset for tropical tree crown detection
by: Baudchon, Hugo, et al.
Published: (2025)
by: Baudchon, Hugo, et al.
Published: (2025)
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
by: Wang, Zixing, et al.
Published: (2025)
by: Wang, Zixing, et al.
Published: (2025)
Beyond still images: Temporal features and input variance resilience
by: Fadaei, Amir Hosein, et al.
Published: (2023)
by: Fadaei, Amir Hosein, et al.
Published: (2023)
SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition
by: Oh, Sukju, et al.
Published: (2026)
by: Oh, Sukju, et al.
Published: (2026)
Leveraging Color Channel Independence for Improved Unsupervised Object Detection
by: Jäckl, Bastian, et al.
Published: (2024)
by: Jäckl, Bastian, et al.
Published: (2024)
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)
by: Chindemi, Giuseppe, et al.
Published: (2025)
DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
by: Hou, Zhiyi, et al.
Published: (2025)
by: Hou, Zhiyi, et al.
Published: (2025)
IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)
by: González, Abiam Remache, et al.
Published: (2025)
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)
by: Wang, Chaoyi, et al.
Published: (2025)
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)
by: Lee, Kyuho, et al.
Published: (2025)
Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)
by: Berjawi, Jad, et al.
Published: (2025)
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)
by: Guo, Yijie, et al.
Published: (2025)
Similar Items
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025) -
Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
by: Gebremedhin, Tekleab G., et al.
Published: (2025) -
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
by: Foss, Aaron, et al.
Published: (2025) -
THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
by: Ioan, Calin Teodor
Published: (2025)