Saved in:
| Main Authors: | Xu, Lixiang, Cui, Qingzhe, Hong, Richang, Xu, Wei, Chen, Enhong, Yuan, Xin, Li, Chenglong, Tang, Yuanyan |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.16477 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking VLMs for Image Forgery Detection and Localization
by: Guo, Shaofeng, et al.
Published: (2026)
by: Guo, Shaofeng, et al.
Published: (2026)
PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation
by: Alanazi, Ahmed, et al.
Published: (2025)
by: Alanazi, Ahmed, et al.
Published: (2025)
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
by: Lai, Guannan, et al.
Published: (2025)
by: Lai, Guannan, et al.
Published: (2025)
MaP-AVR: A Meta-Action Planner for Agents Leveraging Vision Language Models and Retrieval-Augmented Generation
by: Guo, Zhenglong, et al.
Published: (2025)
by: Guo, Zhenglong, et al.
Published: (2025)
Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)
by: Adžemović, Momir
Published: (2025)
Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence
by: Zeng, Zhitao, et al.
Published: (2026)
by: Zeng, Zhitao, et al.
Published: (2026)
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
by: Meng, Ziyang, et al.
Published: (2024)
by: Meng, Ziyang, et al.
Published: (2024)
Hierarchical Point-Patch Fusion with Adaptive Patch Codebook for 3D Shape Anomaly Detection
by: Kang, Xueyang, et al.
Published: (2026)
by: Kang, Xueyang, et al.
Published: (2026)
Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
by: Moore, Alexander, et al.
Published: (2025)
by: Moore, Alexander, et al.
Published: (2025)
Hierarchical Spatial Algorithms for High-Resolution Image Quantization and Feature Extraction
by: Mohammad, Noor Islam S.
Published: (2025)
by: Mohammad, Noor Islam S.
Published: (2025)
Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
by: Zeng, Zhitao, et al.
Published: (2025)
by: Zeng, Zhitao, et al.
Published: (2025)
Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction
by: Artru, Noé, et al.
Published: (2026)
by: Artru, Noé, et al.
Published: (2026)
RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion
by: Tyagi, Kavyansh, et al.
Published: (2026)
by: Tyagi, Kavyansh, et al.
Published: (2026)
VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
by: Liao, Xinyao, et al.
Published: (2025)
by: Liao, Xinyao, et al.
Published: (2025)
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)
by: Su, Yuetong, et al.
Published: (2025)
GraphTEN: Graph Enhanced Texture Encoding Network
by: Peng, Bo, et al.
Published: (2025)
by: Peng, Bo, et al.
Published: (2025)
TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting
by: Xu, Zhiyuan, et al.
Published: (2025)
by: Xu, Zhiyuan, et al.
Published: (2025)
ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents
by: Huo, Dongjie, et al.
Published: (2026)
by: Huo, Dongjie, et al.
Published: (2026)
Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos
by: Zhang, Junbin, et al.
Published: (2022)
by: Zhang, Junbin, et al.
Published: (2022)
GIIM: Graph-based Learning of Inter- and Intra-view Dependencies for Multi-view Medical Image Diagnosis
by: Sam, Tran Bao, et al.
Published: (2026)
by: Sam, Tran Bao, et al.
Published: (2026)
Beyond RGB: Leveraging Vision Transformers for Thermal Weapon Segmentation
by: Kambhatla, Akhila, et al.
Published: (2025)
by: Kambhatla, Akhila, et al.
Published: (2025)
Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
by: Sáez, Arnau Igualde, et al.
Published: (2025)
by: Sáez, Arnau Igualde, et al.
Published: (2025)
Image Reconstruction as a Tool for Feature Analysis
by: Allakhverdov, Eduard, et al.
Published: (2025)
by: Allakhverdov, Eduard, et al.
Published: (2025)
Multi-Scale Graph Learning for Anti-Sparse Downscaling
by: Fan, Yingda, et al.
Published: (2025)
by: Fan, Yingda, et al.
Published: (2025)
Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)
by: Qesaraku, Bjorna, et al.
Published: (2025)
Force-Aware 3D Contact Modeling for Stable Grasp Generation
by: Chen, Zhuo, et al.
Published: (2025)
by: Chen, Zhuo, et al.
Published: (2025)
MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
by: Lê, Eric-Tuan, et al.
Published: (2024)
by: Lê, Eric-Tuan, et al.
Published: (2024)
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
by: Ko, Hyun-kyu, et al.
Published: (2024)
by: Ko, Hyun-kyu, et al.
Published: (2024)
Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation
by: Araya-Martinez, Jose Moises, et al.
Published: (2025)
by: Araya-Martinez, Jose Moises, et al.
Published: (2025)
Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
by: Da, Longchao, et al.
Published: (2025)
by: Da, Longchao, et al.
Published: (2025)
Fast 3D point clouds retrieval for Large-scale 3D Place Recognition
by: Zede, Chahine-Nicolas, et al.
Published: (2025)
by: Zede, Chahine-Nicolas, et al.
Published: (2025)
Tricks and Plug-ins for Gradient Boosting in Image Classification
by: Fang, Biyi, et al.
Published: (2025)
by: Fang, Biyi, et al.
Published: (2025)
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models
by: Chen, Kewei, et al.
Published: (2025)
by: Chen, Kewei, et al.
Published: (2025)
ATAAT: Adaptive Threat-Aware Adversarial Tuning Framework against Backdoor Attacks on Vision-Language-Action Models
by: Chen, Kewei, et al.
Published: (2026)
by: Chen, Kewei, et al.
Published: (2026)
VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors
by: Lyu, Wenbo, et al.
Published: (2025)
by: Lyu, Wenbo, et al.
Published: (2025)
A Plug-and-Play Temporal Normalization Module for Robust Remote Photoplethysmography
by: Wang, Kegang, et al.
Published: (2024)
by: Wang, Kegang, et al.
Published: (2024)
HuMoCon: Concept Discovery for Human Motion Understanding
by: Fang, Qihang, et al.
Published: (2025)
by: Fang, Qihang, et al.
Published: (2025)
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
by: Zhuang, Yiyu, et al.
Published: (2024)
by: Zhuang, Yiyu, et al.
Published: (2024)
Video-CoE: Reinforcing Video Event Prediction via Chain of Events
by: Su, Qile, et al.
Published: (2026)
by: Su, Qile, et al.
Published: (2026)
Similar Items
-
Rethinking VLMs for Image Forgery Detection and Localization
by: Guo, Shaofeng, et al.
Published: (2026) -
PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation
by: Alanazi, Ahmed, et al.
Published: (2025) -
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
by: Lai, Guannan, et al.
Published: (2025) -
MaP-AVR: A Meta-Action Planner for Agents Leveraging Vision Language Models and Retrieval-Augmented Generation
by: Guo, Zhenglong, et al.
Published: (2025) -
Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)