:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Jiawei, Zhang, Shunchi, Hu, Kai, Ma, Chixiang, Zhong, Zhuoyao, Sun, Lei, Huo, Qiang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.09232
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
by: Wang, Jiawei, et al.
Published: (2024)

DLAFormer: An End-to-End Transformer For Document Layout Analysis
by: Wang, Jiawei, et al.
Published: (2024)

UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
by: Wang, Jiawei, et al.
Published: (2025)

Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
by: Liu, Yang, et al.
Published: (2025)

Explicit Relational Reasoning Network for Scene Text Detection
by: Su, Yuchen, et al.
Published: (2024)

Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
by: Zhong, Yunshan, et al.
Published: (2024)

Mr. DETR++: Instructive Multi-Route Training for Detection Transformers with Mixture-of-Experts
by: Zhang, Chang-Bin, et al.
Published: (2024)

Text-Guided Mixup Towards Long-Tailed Image Categorization
by: Franklin, Richard, et al.
Published: (2024)

A Dynamic Transformer Network for Vehicle Detection
by: Tian, Chunwei, et al.
Published: (2025)

PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning
by: Liu, Zhuoyao, et al.
Published: (2025)

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning
by: Zhu, Qiang, et al.
Published: (2025)

YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection
by: Hu, Jiawei
Published: (2025)

Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models
by: Xia, Tao, et al.
Published: (2026)

Dynamic and Compressive Adaptation of Transformers From Images to Videos
by: Zhang, Guozhen, et al.
Published: (2024)

Aggregated Text Transformer for Scene Text Detection
by: Zhou, Zhao, et al.
Published: (2022)

Constrained Dynamic Gaussian Splatting
by: Zheng, Zihan, et al.
Published: (2026)

Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts
by: Sun, Yanguang, et al.
Published: (2025)

OCFER-Net: Recognizing Facial Expression in Online Learning System
by: Huo, Yi, et al.
Published: (2025)

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)

FreeText: Training-Free Text Rendering in Diffusion Transformers via Attention Localization and Spectral Glyph Injection
by: Zhang, Ruiqiang, et al.
Published: (2026)

Lane Departure Accident Prevention in Foggy Conditions: A Prior-Guided Dynamic Feature Fusion Transformer Framework for Real-Time Lane Detection
by: Zhang, Ronghui, et al.
Published: (2025)

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
by: Ma, Lichen, et al.
Published: (2024)

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
by: Lyu, Jiahao, et al.
Published: (2024)

VideoDirector: Precise Video Editing via Text-to-Video Models
by: Wang, Yukun, et al.
Published: (2024)

Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
by: Zhang, Hao, et al.
Published: (2024)

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
by: Fan, Lei, et al.
Published: (2024)

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
by: Cong, Peishan, et al.
Published: (2025)

MultiColor: Image Colorization by Learning from Multiple Color Spaces
by: Du, Xiangcheng, et al.
Published: (2024)

Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers
by: Chen, Ruidong, et al.
Published: (2026)

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
by: Deng, Wenxiao, et al.
Published: (2024)

Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2025)

FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
by: Lan, Rui, et al.
Published: (2025)

Length Matters: Length-Aware Transformer for Temporal Sentence Grounding
by: Wang, Yifan, et al.
Published: (2025)

Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
by: Lin, Tiancheng, et al.
Published: (2024)

Block-level Text Spotting with LLMs
by: Bannur, Ganesh, et al.
Published: (2024)

Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution
by: Xu, Tianyi, et al.
Published: (2024)

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
by: Zhang, Haoji, et al.
Published: (2025)

Dynamic Diffusion Transformer
by: Zhao, Wangbo, et al.
Published: (2024)

Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision
by: Lu, Qiang, et al.
Published: (2025)

PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture
by: Zheng, Qiang, et al.
Published: (2024)