Saved in:
| Main Authors: | Zighem, Mohammed-En-Nadhir, Hadid, Abdenour |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.20188 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Recent Advances in Medical Imaging Segmentation: A Survey
by: Bougourzi, Fares, et al.
Published: (2025)
by: Bougourzi, Fares, et al.
Published: (2025)
Decoding Matters: Efficient Mamba-Based Decoder with Distribution-Aware Deep Supervision for Medical Image Segmentation
by: Bougourzi, Fares, et al.
Published: (2026)
by: Bougourzi, Fares, et al.
Published: (2026)
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Car Damage Detection
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)
FIDAVL: Fake Image Detection and Attribution using Vision-Language Model
by: Keita, Mamadou, et al.
Published: (2024)
by: Keita, Mamadou, et al.
Published: (2024)
Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)
by: Keita, Mamadou, et al.
Published: (2024)
VLM-PAR: A Vision Language Model for Pedestrian Attribute Recognition
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)
by: Keita, Mamadou, et al.
Published: (2024)
PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition
by: Saadi, Ibtissam, et al.
Published: (2025)
by: Saadi, Ibtissam, et al.
Published: (2025)
SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging
by: Bekhouche, Salah Eddine, et al.
Published: (2025)
by: Bekhouche, Salah Eddine, et al.
Published: (2025)
SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning
by: Eutamene, Hessen Bougueffa, et al.
Published: (2026)
by: Eutamene, Hessen Bougueffa, et al.
Published: (2026)
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)
by: Saadi, Ibtissam, et al.
Published: (2024)
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
by: Kulkarni, Yogesh, et al.
Published: (2024)
by: Kulkarni, Yogesh, et al.
Published: (2024)
Conflict-Aware Multimodal Fusion for Ambivalence and Hesitancy Recognition
by: Bekhouche, Salah Eddine, et al.
Published: (2026)
by: Bekhouche, Salah Eddine, et al.
Published: (2026)
DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images
by: Keita, Mamadou, et al.
Published: (2025)
by: Keita, Mamadou, et al.
Published: (2025)
Face to Cartoon Incremental Super-Resolution using Knowledge Distillation
by: Devkatte, Trinetra, et al.
Published: (2024)
by: Devkatte, Trinetra, et al.
Published: (2024)
RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification
by: Keita, Mamadou, et al.
Published: (2025)
by: Keita, Mamadou, et al.
Published: (2025)
RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation
by: Djouama, Ahmed Marouane, et al.
Published: (2026)
by: Djouama, Ahmed Marouane, et al.
Published: (2026)
Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation
by: Keita, Mamadou, et al.
Published: (2026)
by: Keita, Mamadou, et al.
Published: (2026)
Semantic-Aware Ship Detection with Vision-Language Integration
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection
by: Xu, Qing, et al.
Published: (2025)
by: Xu, Qing, et al.
Published: (2025)
VP-Hype: A Hybrid Mamba-Transformer Framework with Visual-Textual Prompting for Hyperspectral Image Classification
by: Sellam, Abdellah Zakaria, et al.
Published: (2026)
by: Sellam, Abdellah Zakaria, et al.
Published: (2026)
RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet
by: Orfaig, Eliraz, et al.
Published: (2025)
by: Orfaig, Eliraz, et al.
Published: (2025)
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
by: Qi, Jianing, et al.
Published: (2025)
by: Qi, Jianing, et al.
Published: (2025)
FMG-Det: Foundation Model Guided Robust Object Detection
by: Hannan, Darryl, et al.
Published: (2025)
by: Hannan, Darryl, et al.
Published: (2025)
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
by: Huang, Chenxi, et al.
Published: (2024)
by: Huang, Chenxi, et al.
Published: (2024)
BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model
by: Mohammed, Rawa, et al.
Published: (2025)
by: Mohammed, Rawa, et al.
Published: (2025)
VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
by: Li, Wuyang, et al.
Published: (2025)
by: Li, Wuyang, et al.
Published: (2025)
Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models
by: Seutin, Corentin, et al.
Published: (2026)
by: Seutin, Corentin, et al.
Published: (2026)
CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks
by: Qi, Yu, et al.
Published: (2025)
by: Qi, Yu, et al.
Published: (2025)
LMM-Det: Make Large Multimodal Models Excel in Object Detection
by: Li, Jincheng, et al.
Published: (2025)
by: Li, Jincheng, et al.
Published: (2025)
RemDet: Rethinking Efficient Model Design for UAV Object Detection
by: Li, Chen, et al.
Published: (2024)
by: Li, Chen, et al.
Published: (2024)
DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
by: Okazaki, Soichiro, et al.
Published: (2026)
by: Okazaki, Soichiro, et al.
Published: (2026)
DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection
by: Gare, Gautam Rajendrakumar, et al.
Published: (2026)
by: Gare, Gautam Rajendrakumar, et al.
Published: (2026)
UniDet3D: Multi-dataset Indoor 3D Object Detection
by: Kolodiazhnyi, Maksim, et al.
Published: (2024)
by: Kolodiazhnyi, Maksim, et al.
Published: (2024)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
by: Li, Yuxuan, et al.
Published: (2024)
by: Li, Yuxuan, et al.
Published: (2024)
Detecting Text Manipulation in Images using Vision Language Models
by: Vidit, Vidit, et al.
Published: (2025)
by: Vidit, Vidit, et al.
Published: (2025)
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
by: Yang, Zhongyu, et al.
Published: (2025)
by: Yang, Zhongyu, et al.
Published: (2025)
GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
by: Song, Zixuan, et al.
Published: (2025)
by: Song, Zixuan, et al.
Published: (2025)
Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation
by: Bai, Weimin, et al.
Published: (2025)
by: Bai, Weimin, et al.
Published: (2025)
GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection
by: Zhang, Yi, et al.
Published: (2025)
by: Zhang, Yi, et al.
Published: (2025)
Similar Items
-
Recent Advances in Medical Imaging Segmentation: A Survey
by: Bougourzi, Fares, et al.
Published: (2025) -
Decoding Matters: Efficient Mamba-Based Decoder with Distribution-Aware Deep Supervision for Medical Image Segmentation
by: Bougourzi, Fares, et al.
Published: (2026) -
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Car Damage Detection
by: Sellam, Abdellah Zakaria, et al.
Published: (2025) -
FIDAVL: Fake Image Detection and Attribution using Vision-Language Model
by: Keita, Mamadou, et al.
Published: (2024) -
Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)