Saved in:
| Main Authors: | Zeng, Ling-An, Zheng, Wei-Shi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.09444 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automated Defect Detection for Mass-Produced Electronic Components Based on YOLO Object Detection Models
by: Mao, Wei-Lung, et al.
Published: (2025)
by: Mao, Wei-Lung, et al.
Published: (2025)
Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation
by: Wu, Qingyu, et al.
Published: (2026)
by: Wu, Qingyu, et al.
Published: (2026)
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)
by: Luo, Wang, et al.
Published: (2025)
Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation
by: Jin, Jing, et al.
Published: (2025)
by: Jin, Jing, et al.
Published: (2025)
CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving
by: Wang, Zhaohui, et al.
Published: (2025)
by: Wang, Zhaohui, et al.
Published: (2025)
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
by: Liu, Yisu, et al.
Published: (2024)
by: Liu, Yisu, et al.
Published: (2024)
DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction
by: Du, Chenhe, et al.
Published: (2024)
by: Du, Chenhe, et al.
Published: (2024)
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
by: Su, Qile, et al.
Published: (2025)
by: Su, Qile, et al.
Published: (2025)
Sora as a World Model? A Complete Survey on Text-to-Video Generation
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
by: Puspitasari, Fachrina Dewi, et al.
Published: (2024)
SAFformer:Improving Spiking Transformer via Active Predictive Filtering
by: Xie, Zequan, et al.
Published: (2026)
by: Xie, Zequan, et al.
Published: (2026)
A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging
by: Sifnaios, Savvas, et al.
Published: (2024)
by: Sifnaios, Savvas, et al.
Published: (2024)
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)
by: Ji, Yiyan, et al.
Published: (2025)
From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)
by: Hamara, Andrew, et al.
Published: (2024)
SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking
by: Mukhtar, Muhammad Taha, et al.
Published: (2026)
by: Mukhtar, Muhammad Taha, et al.
Published: (2026)
MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance
by: Dong, Yi, et al.
Published: (2025)
by: Dong, Yi, et al.
Published: (2025)
ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation
by: Bidulka, Luke, et al.
Published: (2024)
by: Bidulka, Luke, et al.
Published: (2024)
Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals
by: Bao, Daqian, et al.
Published: (2024)
by: Bao, Daqian, et al.
Published: (2024)
CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models
by: Liu, Zhi
Published: (2026)
by: Liu, Zhi
Published: (2026)
MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
by: Yang, Shan
Published: (2024)
by: Yang, Shan
Published: (2024)
Unified Auto-Encoding with Masked Diffusion
by: Hansen-Estruch, Philippe, et al.
Published: (2024)
by: Hansen-Estruch, Philippe, et al.
Published: (2024)
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
by: Sanders, Kate, et al.
Published: (2024)
by: Sanders, Kate, et al.
Published: (2024)
Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation
by: Agarwal, Rachit, et al.
Published: (2026)
by: Agarwal, Rachit, et al.
Published: (2026)
SITUATE -- Synthetic Object Counting Dataset for VLM training
by: Peinl, René, et al.
Published: (2026)
by: Peinl, René, et al.
Published: (2026)
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
Image Segmentation and Classification of E-waste for Training Robots for Waste Segregation
by: Tripathi, Prakriti
Published: (2025)
by: Tripathi, Prakriti
Published: (2025)
ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
by: Holm, Felix, et al.
Published: (2025)
by: Holm, Felix, et al.
Published: (2025)
Siamese Networks for Cat Re-Identification: Exploring Neural Models for Cat Instance Recognition
by: Trein, Tobias, et al.
Published: (2025)
by: Trein, Tobias, et al.
Published: (2025)
Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications
by: Li, Vladislav, et al.
Published: (2023)
by: Li, Vladislav, et al.
Published: (2023)
Appearance-based gaze estimation enhanced with synthetic images using deep neural networks
by: Herashchenko, Dmytro, et al.
Published: (2023)
by: Herashchenko, Dmytro, et al.
Published: (2023)
From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models
by: Atighehchian, Parmida, et al.
Published: (2026)
by: Atighehchian, Parmida, et al.
Published: (2026)
Attentive VQ-VAE
by: Hoyos, Angello, et al.
Published: (2023)
by: Hoyos, Angello, et al.
Published: (2023)
TexTailor: Customized Text-aligned Texturing via Effective Resampling
by: Lee, Suin, et al.
Published: (2025)
by: Lee, Suin, et al.
Published: (2025)
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
by: Chen, Zhangquan, et al.
Published: (2026)
by: Chen, Zhangquan, et al.
Published: (2026)
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
by: Ou, Ziyang
Published: (2025)
by: Ou, Ziyang
Published: (2025)
CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
by: Safdar, Aon, et al.
Published: (2025)
by: Safdar, Aon, et al.
Published: (2025)
Next-Generation License Plate Detection and Recognition System using YOLOv8
by: Amin, Arslan, et al.
Published: (2025)
by: Amin, Arslan, et al.
Published: (2025)
3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
by: Chen, Yiping, et al.
Published: (2026)
by: Chen, Yiping, et al.
Published: (2026)
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)
by: He, Jianxiang, et al.
Published: (2025)
Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)
by: Ji, Liya, et al.
Published: (2026)
Similar Items
-
Automated Defect Detection for Mass-Produced Electronic Components Based on YOLO Object Detection Models
by: Mao, Wei-Lung, et al.
Published: (2025) -
Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation
by: Wu, Qingyu, et al.
Published: (2026) -
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025) -
Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation
by: Jin, Jing, et al.
Published: (2025) -
CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving
by: Wang, Zhaohui, et al.
Published: (2025)