Enregistré dans:
| Auteur principal: | Sim, Sangcheol |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2604.03301 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
par: Raoufi, Behnam, et autres
Publié: (2025)
par: Raoufi, Behnam, et autres
Publié: (2025)
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
par: Ge, Junyao, et autres
Publié: (2024)
par: Ge, Junyao, et autres
Publié: (2024)
LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
par: Vasilcoiu, Ana, et autres
Publié: (2025)
par: Vasilcoiu, Ana, et autres
Publié: (2025)
Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
par: Chopra, Agamdeep S., et autres
Publié: (2026)
par: Chopra, Agamdeep S., et autres
Publié: (2026)
A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection
par: Le, Van-Truong, et autres
Publié: (2026)
par: Le, Van-Truong, et autres
Publié: (2026)
Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
par: Berlia, Anushree
Publié: (2026)
par: Berlia, Anushree
Publié: (2026)
ChartComplete: A Taxonomy-based Inclusive Chart Dataset
par: Mustapha, Ahmad, et autres
Publié: (2026)
par: Mustapha, Ahmad, et autres
Publié: (2026)
GenMatter: Perceiving Physical Objects with Generative Matter Models
par: Li, Eric, et autres
Publié: (2026)
par: Li, Eric, et autres
Publié: (2026)
Automated Plant Disease and Pest Detection System Using Hybrid Lightweight CNN-MobileViT Models for Diagnosis of Indigenous Crops
par: Gebremedhin, Tekleab G., et autres
Publié: (2025)
par: Gebremedhin, Tekleab G., et autres
Publié: (2025)
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models
par: Foss, Aaron, et autres
Publié: (2025)
par: Foss, Aaron, et autres
Publié: (2025)
Application of YOLOv8 in monocular downward multiple Car Target detection
par: Lyu, Shijie
Publié: (2025)
par: Lyu, Shijie
Publié: (2025)
THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
par: Ioan, Calin Teodor
Publié: (2025)
par: Ioan, Calin Teodor
Publié: (2025)
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
par: Deng, Pei, et autres
Publié: (2025)
par: Deng, Pei, et autres
Publié: (2025)
From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
par: Gonzalez, Leonardo
Publié: (2026)
par: Gonzalez, Leonardo
Publié: (2026)
Beyond Few-shot Object Detection: A Detailed Survey
par: Chudasama, Vishal, et autres
Publié: (2024)
par: Chudasama, Vishal, et autres
Publié: (2024)
HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
par: Jin, Haopeng, et autres
Publié: (2026)
par: Jin, Haopeng, et autres
Publié: (2026)
Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
par: Kocsis, Peter, et autres
Publié: (2025)
par: Kocsis, Peter, et autres
Publié: (2025)
IntrinsiX: High-Quality PBR Generation using Image Priors
par: Kocsis, Peter, et autres
Publié: (2025)
par: Kocsis, Peter, et autres
Publié: (2025)
A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video
par: Lucas, Andrea Filiberto, et autres
Publié: (2026)
par: Lucas, Andrea Filiberto, et autres
Publié: (2026)
4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding
par: Barhdadi, Mohamed Rayan, et autres
Publié: (2026)
par: Barhdadi, Mohamed Rayan, et autres
Publié: (2026)
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
par: Kocsis, Peter, et autres
Publié: (2023)
par: Kocsis, Peter, et autres
Publié: (2023)
StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
par: Oliveira, Daniel, et autres
Publié: (2026)
par: Oliveira, Daniel, et autres
Publié: (2026)
Perceptual Flow Network for Visually Grounded Reasoning
par: Li, Yangfu, et autres
Publié: (2026)
par: Li, Yangfu, et autres
Publié: (2026)
Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieval
par: Quy, Nguyen Lam Phu, et autres
Publié: (2025)
par: Quy, Nguyen Lam Phu, et autres
Publié: (2025)
Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors
par: Grönquist, Peter, et autres
Publié: (2023)
par: Grönquist, Peter, et autres
Publié: (2023)
Beyond still images: Temporal features and input variance resilience
par: Fadaei, Amir Hosein, et autres
Publié: (2023)
par: Fadaei, Amir Hosein, et autres
Publié: (2023)
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
par: Wang, Zixing, et autres
Publié: (2025)
par: Wang, Zixing, et autres
Publié: (2025)
UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS
par: Guo, Zhihao, et autres
Publié: (2025)
par: Guo, Zhihao, et autres
Publié: (2025)
SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition
par: Oh, Sukju, et autres
Publié: (2026)
par: Oh, Sukju, et autres
Publié: (2026)
Leveraging Color Channel Independence for Improved Unsupervised Object Detection
par: Jäckl, Bastian, et autres
Publié: (2024)
par: Jäckl, Bastian, et autres
Publié: (2024)
From eye to AI: studying rodent social behavior in the era of machine Learning
par: Chindemi, Giuseppe, et autres
Publié: (2025)
par: Chindemi, Giuseppe, et autres
Publié: (2025)
DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
par: Hou, Zhiyi, et autres
Publié: (2025)
par: Hou, Zhiyi, et autres
Publié: (2025)
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
par: Sumuk, Aarya
Publié: (2026)
par: Sumuk, Aarya
Publié: (2026)
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
par: Satish, Siddarth Nilol Kundur, et autres
Publié: (2026)
par: Satish, Siddarth Nilol Kundur, et autres
Publié: (2026)
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
par: Shen, Meng, et autres
Publié: (2026)
par: Shen, Meng, et autres
Publié: (2026)
A Simple Baseline for Streaming Video Understanding
par: Shen, Yujiao, et autres
Publié: (2026)
par: Shen, Yujiao, et autres
Publié: (2026)
SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
par: Gholibeigi, Emad, et autres
Publié: (2026)
par: Gholibeigi, Emad, et autres
Publié: (2026)
Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation
par: Zeng, Taowen
Publié: (2026)
par: Zeng, Taowen
Publié: (2026)
Pixel-Level Pavement Distress Assessment Using Instance Segmentation
par: Dewick, Logan, et autres
Publié: (2026)
par: Dewick, Logan, et autres
Publié: (2026)
Context in object detection: a systematic literature review
par: Jamali, Mahtab, et autres
Publié: (2025)
par: Jamali, Mahtab, et autres
Publié: (2025)
Documents similaires
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
par: Raoufi, Behnam, et autres
Publié: (2025) -
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
par: Ge, Junyao, et autres
Publié: (2024) -
LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
par: Vasilcoiu, Ana, et autres
Publié: (2025) -
Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
par: Chopra, Agamdeep S., et autres
Publié: (2026) -
A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection
par: Le, Van-Truong, et autres
Publié: (2026)