:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Lixiang, Cui, Qingzhe, Hong, Richang, Xu, Wei, Chen, Enhong, Yuan, Xin, Li, Chenglong, Tang, Yuanyan
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence 68 I.2.10
Online Access:	https://arxiv.org/abs/2312.16477
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking VLMs for Image Forgery Detection and Localization
by: Guo, Shaofeng, et al.
Published: (2026)

PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation
by: Alanazi, Ahmed, et al.
Published: (2025)

Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
by: Lai, Guannan, et al.
Published: (2025)

MaP-AVR: A Meta-Action Planner for Agents Leveraging Vision Language Models and Retrieval-Augmented Generation
by: Guo, Zhenglong, et al.
Published: (2025)

Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence
by: Zeng, Zhitao, et al.
Published: (2026)

VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
by: Meng, Ziyang, et al.
Published: (2024)

Hierarchical Point-Patch Fusion with Adaptive Patch Codebook for 3D Shape Anomaly Detection
by: Kang, Xueyang, et al.
Published: (2026)

Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
by: Moore, Alexander, et al.
Published: (2025)

Hierarchical Spatial Algorithms for High-Resolution Image Quantization and Feature Extraction
by: Mohammad, Noor Islam S.
Published: (2025)

Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
by: Zeng, Zhitao, et al.
Published: (2025)

Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction
by: Artru, Noé, et al.
Published: (2026)

RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion
by: Tyagi, Kavyansh, et al.
Published: (2026)

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
by: Liao, Xinyao, et al.
Published: (2025)

VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)

GraphTEN: Graph Enhanced Texture Encoding Network
by: Peng, Bo, et al.
Published: (2025)

TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting
by: Xu, Zhiyuan, et al.
Published: (2025)

ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents
by: Huo, Dongjie, et al.
Published: (2026)

Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos
by: Zhang, Junbin, et al.
Published: (2022)

GIIM: Graph-based Learning of Inter- and Intra-view Dependencies for Multi-view Medical Image Diagnosis
by: Sam, Tran Bao, et al.
Published: (2026)

Beyond RGB: Leveraging Vision Transformers for Thermal Weapon Segmentation
by: Kambhatla, Akhila, et al.
Published: (2025)

Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
by: Sáez, Arnau Igualde, et al.
Published: (2025)

Image Reconstruction as a Tool for Feature Analysis
by: Allakhverdov, Eduard, et al.
Published: (2025)

Multi-Scale Graph Learning for Anti-Sparse Downscaling
by: Fan, Yingda, et al.
Published: (2025)

Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)

Force-Aware 3D Contact Modeling for Stable Grasp Generation
by: Chen, Zhuo, et al.
Published: (2025)

MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
by: Lê, Eric-Tuan, et al.
Published: (2024)

Sequence Matters: Harnessing Video Models in 3D Super-Resolution
by: Ko, Hyun-kyu, et al.
Published: (2024)

Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation
by: Araya-Martinez, Jose Moises, et al.
Published: (2025)

Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
by: Da, Longchao, et al.
Published: (2025)

Fast 3D point clouds retrieval for Large-scale 3D Place Recognition
by: Zede, Chahine-Nicolas, et al.
Published: (2025)

Tricks and Plug-ins for Gradient Boosting in Image Classification
by: Fang, Biyi, et al.
Published: (2025)

Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
by: Wang, Wentao, et al.
Published: (2025)

FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models
by: Chen, Kewei, et al.
Published: (2025)

ATAAT: Adaptive Threat-Aware Adversarial Tuning Framework against Backdoor Attacks on Vision-Language-Action Models
by: Chen, Kewei, et al.
Published: (2026)

VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors
by: Lyu, Wenbo, et al.
Published: (2025)

A Plug-and-Play Temporal Normalization Module for Robust Remote Photoplethysmography
by: Wang, Kegang, et al.
Published: (2024)

HuMoCon: Concept Discovery for Human Motion Understanding
by: Fang, Qihang, et al.
Published: (2025)

IDOL: Instant Photorealistic 3D Human Creation from a Single Image
by: Zhuang, Yiyu, et al.
Published: (2024)

Video-CoE: Reinforcing Video Event Prediction via Chain of Events
by: Su, Qile, et al.
Published: (2026)