:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yao, Jiawei, Hu, Juhua
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2402.05310
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
von: Yao, Jiawei, et al.
Veröffentlicht: (2024)

Text-Guided Mixup Towards Long-Tailed Image Categorization
von: Franklin, Richard, et al.
Veröffentlicht: (2024)

Online Zero-Shot Classification with CLIP
von: Qian, Qi, et al.
Veröffentlicht: (2024)

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
von: Qian, Qi, et al.
Veröffentlicht: (2024)

SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
von: Qian, Qi, et al.
Veröffentlicht: (2024)

Dual Cluster Contrastive learning for Object Re-Identification
von: Yao, Hantao, et al.
Veröffentlicht: (2021)

Rethinking Model Efficiency: Multi-Agent Inference with Large Models
von: Dong, Sixun, et al.
Veröffentlicht: (2026)

Retrospective motion correction in MRI using disentangled embeddings
von: Wang, Qi, et al.
Veröffentlicht: (2025)

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting
von: Ye, Maoyuan, et al.
Veröffentlicht: (2023)

Multi-view Deep Subspace Clustering Networks
von: Zhu, Pengfei, et al.
Veröffentlicht: (2019)

MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
von: Dong, Sixun, et al.
Veröffentlicht: (2025)

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis
von: Zhang, Xike, et al.
Veröffentlicht: (2026)

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement
von: Zhang, Xiaohan, et al.
Veröffentlicht: (2023)

Hierarchy-Aware Fine-Tuning of Vision-Language Models
von: Li, Jiayu, et al.
Veröffentlicht: (2025)

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models
von: Li, Chao, et al.
Veröffentlicht: (2025)

YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection
von: Hu, Jiawei
Veröffentlicht: (2025)

Semi-disentangled spatiotemporal implicit neural representations of longitudinal neuroimaging data for trajectory classification
von: Aulakh, Agampreet, et al.
Veröffentlicht: (2025)

Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection
von: Zhao, Taijin, et al.
Veröffentlicht: (2025)

RFL-CDNet: Towards Accurate Change Detection via Richer Feature Learning
von: Gan, Yuhang, et al.
Veröffentlicht: (2024)

VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
von: He, Haibin, et al.
Veröffentlicht: (2026)

PolarMAE: Efficient Fetal Ultrasound Pre-training via Semantic Screening and Polar-Guided Masking
von: Lv, Meng, et al.
Veröffentlicht: (2026)

Rethink Sparse Signals for Pose-guided Text-to-image Generation
von: Xuan, Wenjie, et al.
Veröffentlicht: (2025)

Deep Multiview Clustering by Contrasting Cluster Assignments
von: Chen, Jie, et al.
Veröffentlicht: (2023)

Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
von: Zhang, Yushuo, et al.
Veröffentlicht: (2026)

Dual-Level Cross-Modal Contrastive Clustering
von: Zhang, Haixin, et al.
Veröffentlicht: (2024)

Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance
von: Jin, Jiaqi, et al.
Veröffentlicht: (2025)

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
von: He, Haibin, et al.
Veröffentlicht: (2024)

Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training
von: Zhong, Qihuang, et al.
Veröffentlicht: (2026)

SFA: Scan, Focus, and Amplify toward Guidance-aware Answering for Video TextVQA
von: He, Haibin, et al.
Veröffentlicht: (2025)

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation
von: Chen, Hang, et al.
Veröffentlicht: (2025)

GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
von: He, Haibin, et al.
Veröffentlicht: (2025)

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes
von: Liu, Weifeng, et al.
Veröffentlicht: (2024)

Towards Calibrated Deep Clustering Network
von: Jia, Yuheng, et al.
Veröffentlicht: (2024)

A deformation-based morphometry framework for disentangling Alzheimer's disease from normal aging using learned normal aging templates
von: Fu, Jingru, et al.
Veröffentlicht: (2023)

Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN
von: Yao, Jiawei, et al.
Veröffentlicht: (2023)

MonoPartNeRF:Human Reconstruction from Monocular Video via Part-Based Neural Radiance Fields
von: Lu, Yao, et al.
Veröffentlicht: (2025)

LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
von: Ye, Maoyuan, et al.
Veröffentlicht: (2025)

When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability
von: Xuan, Wenjie, et al.
Veröffentlicht: (2024)

Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection
von: Gan, Yuhang, et al.
Veröffentlicht: (2024)

Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?
von: He, Haibin, et al.
Veröffentlicht: (2025)