:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zeng, Ling-An, Zheng, Wei-Shi
Format:	Preprint
Published:	2024
Subjects:	Signal Processing Artificial Intelligence Computer Vision and Pattern Recognition I.2.10
Online Access:	https://arxiv.org/abs/2402.09444
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automated Defect Detection for Mass-Produced Electronic Components Based on YOLO Object Detection Models
by: Mao, Wei-Lung, et al.
Published: (2025)

Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation
by: Wu, Qingyu, et al.
Published: (2026)

Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)

Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation
by: Jin, Jing, et al.
Published: (2025)

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving
by: Wang, Zhaohui, et al.
Published: (2025)

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
by: Liu, Yisu, et al.
Published: (2024)

DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction
by: Du, Chenhe, et al.
Published: (2024)

EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
by: Su, Qile, et al.
Published: (2025)

Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)

SAFformer:Improving Spiking Transformer via Active Predictive Filtering
by: Xie, Zequan, et al.
Published: (2026)

A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging
by: Sifnaios, Savvas, et al.
Published: (2024)

MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)

From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)

SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking
by: Mukhtar, Muhammad Taha, et al.
Published: (2026)

MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance
by: Dong, Yi, et al.
Published: (2025)

ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation
by: Bidulka, Luke, et al.
Published: (2024)

Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals
by: Bao, Daqian, et al.
Published: (2024)

CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models
by: Liu, Zhi
Published: (2026)

MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
by: Yang, Shan
Published: (2024)

Unified Auto-Encoding with Masked Diffusion
by: Hansen-Estruch, Philippe, et al.
Published: (2024)

TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
by: Sanders, Kate, et al.
Published: (2024)

Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation
by: Agarwal, Rachit, et al.
Published: (2026)

SITUATE -- Synthetic Object Counting Dataset for VLM training
by: Peinl, René, et al.
Published: (2026)

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
by: Chen, Zhangquan, et al.
Published: (2025)

Image Segmentation and Classification of E-waste for Training Robots for Waste Segregation
by: Tripathi, Prakriti
Published: (2025)

ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
by: Holm, Felix, et al.
Published: (2025)

Siamese Networks for Cat Re-Identification: Exploring Neural Models for Cat Instance Recognition
by: Trein, Tobias, et al.
Published: (2025)

Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications
by: Li, Vladislav, et al.
Published: (2023)

Appearance-based gaze estimation enhanced with synthetic images using deep neural networks
by: Herashchenko, Dmytro, et al.
Published: (2023)

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models
by: Atighehchian, Parmida, et al.
Published: (2026)

Attentive VQ-VAE
by: Hoyos, Angello, et al.
Published: (2023)

TexTailor: Customized Text-aligned Texturing via Effective Resampling
by: Lee, Suin, et al.
Published: (2025)

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
by: Chen, Zhangquan, et al.
Published: (2026)

CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
by: Ou, Ziyang
Published: (2025)

CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
by: Safdar, Aon, et al.
Published: (2025)

Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025)

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
by: Chen, Yiping, et al.
Published: (2026)

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)

Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)