:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zighem, Mohammed-En-Nadhir, Hadid, Abdenour
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.20188
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Recent Advances in Medical Imaging Segmentation: A Survey
by: Bougourzi, Fares, et al.
Published: (2025)

Decoding Matters: Efficient Mamba-Based Decoder with Distribution-Aware Deep Supervision for Medical Image Segmentation
by: Bougourzi, Fares, et al.
Published: (2026)

C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Car Damage Detection
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)

FIDAVL: Fake Image Detection and Attribution using Vision-Language Model
by: Keita, Mamadou, et al.
Published: (2024)

Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)

VLM-PAR: A Vision Language Model for Pedestrian Attribute Recognition
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)

Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)

PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition
by: Saadi, Ibtissam, et al.
Published: (2025)

SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging
by: Bekhouche, Salah Eddine, et al.
Published: (2025)

SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning
by: Eutamene, Hessen Bougueffa, et al.
Published: (2026)

Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)

VideoSAVi: Self-Aligned Video Language Models without Human Supervision
by: Kulkarni, Yogesh, et al.
Published: (2024)

Conflict-Aware Multimodal Fusion for Ambivalence and Hesitancy Recognition
by: Bekhouche, Salah Eddine, et al.
Published: (2026)

DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images
by: Keita, Mamadou, et al.
Published: (2025)

Face to Cartoon Incremental Super-Resolution using Knowledge Distillation
by: Devkatte, Trinetra, et al.
Published: (2024)

RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification
by: Keita, Mamadou, et al.
Published: (2025)

RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation
by: Djouama, Ahmed Marouane, et al.
Published: (2026)

Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation
by: Keita, Mamadou, et al.
Published: (2026)

Semantic-Aware Ship Detection with Vision-Language Integration
by: Li, Jiahao, et al.
Published: (2025)

SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection
by: Xu, Qing, et al.
Published: (2025)

VP-Hype: A Hybrid Mamba-Transformer Framework with Visual-Textual Prompting for Hyperspectral Image Classification
by: Sellam, Abdellah Zakaria, et al.
Published: (2026)

RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet
by: Orfaig, Eliraz, et al.
Published: (2025)

Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
by: Qi, Jianing, et al.
Published: (2025)

FMG-Det: Foundation Model Guided Robust Object Detection
by: Hannan, Darryl, et al.
Published: (2025)

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
by: Huang, Chenxi, et al.
Published: (2024)

BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model
by: Mohammed, Rawa, et al.
Published: (2025)

VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
by: Li, Wuyang, et al.
Published: (2025)

Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models
by: Seutin, Corentin, et al.
Published: (2026)

CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks
by: Qi, Yu, et al.
Published: (2025)

LMM-Det: Make Large Multimodal Models Excel in Object Detection
by: Li, Jincheng, et al.
Published: (2025)

RemDet: Rethinking Efficient Model Design for UAV Object Detection
by: Li, Chen, et al.
Published: (2024)

DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
by: Okazaki, Soichiro, et al.
Published: (2026)

DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection
by: Gare, Gautam Rajendrakumar, et al.
Published: (2026)

UniDet3D: Multi-dataset Indoor 3D Object Detection
by: Kolodiazhnyi, Maksim, et al.
Published: (2024)

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
by: Li, Yuxuan, et al.
Published: (2024)

Detecting Text Manipulation in Images using Vision Language Models
by: Vidit, Vidit, et al.
Published: (2025)

Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
by: Yang, Zhongyu, et al.
Published: (2025)

GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
by: Song, Zixuan, et al.
Published: (2025)

Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation
by: Bai, Weimin, et al.
Published: (2025)

GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection
by: Zhang, Yi, et al.
Published: (2025)