:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Lyu, Shijie
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence I.4.8; I.2.10
Online Access:	https://arxiv.org/abs/2505.10016
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework
by: Jung, Seoik, et al.
Published: (2025)

Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
by: Gebremedhin, Tekleab G., et al.
Published: (2025)

CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
by: Foss, Aaron, et al.
Published: (2025)

THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
by: Ioan, Calin Teodor
Published: (2025)

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
by: Chopra, Agamdeep S., et al.
Published: (2026)

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection
by: Le, Van-Truong, et al.
Published: (2026)

Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
by: Berlia, Anushree
Published: (2026)

RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
by: Ge, Junyao, et al.
Published: (2024)

ChartComplete: A Taxonomy-based Inclusive Chart Dataset
by: Mustapha, Ahmad, et al.
Published: (2026)

Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing
by: Sim, Sangcheol
Published: (2026)

GenMatter: Perceiving Physical Objects with Generative Matter Models
by: Li, Eric, et al.
Published: (2026)

Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)

UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS
by: Guo, Zhihao, et al.
Published: (2025)

From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
by: Gonzalez, Leonardo
Published: (2026)

Context in object detection: a systematic literature review
by: Jamali, Mahtab, et al.
Published: (2025)

LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
by: Vasilcoiu, Ana, et al.
Published: (2025)

Beyond Few-shot Object Detection: A Detailed Survey
by: Chudasama, Vishal, et al.
Published: (2024)

Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
by: Kocsis, Peter, et al.
Published: (2025)

IntrinsiX: High-Quality PBR Generation using Image Priors
by: Kocsis, Peter, et al.
Published: (2025)

HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
by: Jin, Haopeng, et al.
Published: (2026)

Intrinsic Image Diffusion for Indoor Single-view Material Estimation
by: Kocsis, Peter, et al.
Published: (2023)

A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video
by: Lucas, Andrea Filiberto, et al.
Published: (2026)

4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding
by: Barhdadi, Mohamed Rayan, et al.
Published: (2026)

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)

Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022)

Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix
by: Gurbindo, Unai, et al.
Published: (2025)

Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors
by: Grönquist, Peter, et al.
Published: (2023)

SelvaBox: A high-resolution dataset for tropical tree crown detection
by: Baudchon, Hugo, et al.
Published: (2025)

FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
by: Wang, Zixing, et al.
Published: (2025)

Beyond still images: Temporal features and input variance resilience
by: Fadaei, Amir Hosein, et al.
Published: (2023)

SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition
by: Oh, Sukju, et al.
Published: (2026)

Leveraging Color Channel Independence for Improved Unsupervised Object Detection
by: Jäckl, Bastian, et al.
Published: (2024)

From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
by: Hou, Zhiyi, et al.
Published: (2025)

IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)

NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)

Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)