:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteur principal:	Sim, Sangcheol
Format:	Preprint
Publié:	2026
Sujets:	Computer Vision and Pattern Recognition Artificial Intelligence I.4.8; I.2.10
Accès en ligne:	https://arxiv.org/abs/2604.03301
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
par: Raoufi, Behnam, et autres
Publié: (2025)

RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
par: Ge, Junyao, et autres
Publié: (2024)

LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
par: Vasilcoiu, Ana, et autres
Publié: (2025)

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
par: Chopra, Agamdeep S., et autres
Publié: (2026)

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection
par: Le, Van-Truong, et autres
Publié: (2026)

Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
par: Berlia, Anushree
Publié: (2026)

ChartComplete: A Taxonomy-based Inclusive Chart Dataset
par: Mustapha, Ahmad, et autres
Publié: (2026)

GenMatter: Perceiving Physical Objects with Generative Matter Models
par: Li, Eric, et autres
Publié: (2026)

Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
par: Gebremedhin, Tekleab G., et autres
Publié: (2025)

CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
par: Foss, Aaron, et autres
Publié: (2025)

Application of YOLOv8 in monocular downward multiple Car Target detection
par: Lyu, Shijie
Publié: (2025)

THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
par: Ioan, Calin Teodor
Publié: (2025)

DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
par: Deng, Pei, et autres
Publié: (2025)

From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
par: Gonzalez, Leonardo
Publié: (2026)

Beyond Few-shot Object Detection: A Detailed Survey
par: Chudasama, Vishal, et autres
Publié: (2024)

HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
par: Jin, Haopeng, et autres
Publié: (2026)

Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
par: Kocsis, Peter, et autres
Publié: (2025)

IntrinsiX: High-Quality PBR Generation using Image Priors
par: Kocsis, Peter, et autres
Publié: (2025)

A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video
par: Lucas, Andrea Filiberto, et autres
Publié: (2026)

4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding
par: Barhdadi, Mohamed Rayan, et autres
Publié: (2026)

Intrinsic Image Diffusion for Indoor Single-view Material Estimation
par: Kocsis, Peter, et autres
Publié: (2023)

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
par: Oliveira, Daniel, et autres
Publié: (2026)

Perceptual Flow Network for Visually Grounded Reasoning
par: Li, Yangfu, et autres
Publié: (2026)

Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieval
par: Quy, Nguyen Lam Phu, et autres
Publié: (2025)

Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors
par: Grönquist, Peter, et autres
Publié: (2023)

Beyond still images: Temporal features and input variance resilience
par: Fadaei, Amir Hosein, et autres
Publié: (2023)

FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
par: Wang, Zixing, et autres
Publié: (2025)

UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS
par: Guo, Zhihao, et autres
Publié: (2025)

SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition
par: Oh, Sukju, et autres
Publié: (2026)

Leveraging Color Channel Independence for Improved Unsupervised Object Detection
par: Jäckl, Bastian, et autres
Publié: (2024)

From eye to AI: studying rodent social behavior in the era of machine Learning
par: Chindemi, Giuseppe, et autres
Publié: (2025)

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
par: Hou, Zhiyi, et autres
Publié: (2025)

Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
par: Sumuk, Aarya
Publié: (2026)

PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
par: Satish, Siddarth Nilol Kundur, et autres
Publié: (2026)

Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
par: Shen, Meng, et autres
Publié: (2026)

A Simple Baseline for Streaming Video Understanding
par: Shen, Yujiao, et autres
Publié: (2026)

SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
par: Gholibeigi, Emad, et autres
Publié: (2026)

Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation
par: Zeng, Taowen
Publié: (2026)

Pixel-Level Pavement Distress Assessment Using Instance Segmentation
par: Dewick, Logan, et autres
Publié: (2026)

Context in object detection: a systematic literature review
par: Jamali, Mahtab, et autres
Publié: (2025)