:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Long, Rujiao, Xing, Hangdi, Yang, Zhibo, Zheng, Qi, Yu, Zhi, Yao, Cong, Huang, Fei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.01522
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
by: Long, Rujiao, et al.
Published: (2024)

WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
by: Shao, Zirui, et al.
Published: (2024)

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
by: Wan, Jianqiang, et al.
Published: (2024)

A Simple yet Effective Layout Token in Large Language Models for Document Understanding
by: Zhu, Zhaoqing, et al.
Published: (2025)

Robust Fine-tuning for Pre-trained 3D Point Cloud Models
by: Zhang, Zhibo, et al.
Published: (2024)

LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
by: Khodabandeh, Borna, et al.
Published: (2025)

Revisiting Continual Semantic Segmentation with Pre-trained Vision Models
by: Zhang, Duzhen, et al.
Published: (2025)

Platypus: A Generalized Specialist Model for Reading Text in Various Forms
by: Wang, Peng, et al.
Published: (2024)

SepFormer: Coarse-to-fine Separator Regression Network for Table Structure Recognition
by: Nguyen, Nam Quan, et al.
Published: (2025)

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition
by: Zhang, Yuyi, et al.
Published: (2024)

LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing
by: Ouyang, Liangyang, et al.
Published: (2025)

Fake It Right: Injecting Anatomical Logic into Synthetic Supervised Pre-training for Medical Segmentation
by: Tang, Jiaqi, et al.
Published: (2026)

Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)

Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model
by: Kim, Jihun, et al.
Published: (2024)

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training
by: Li, Ruolin, et al.
Published: (2024)

MAA: Meticulous Adversarial Attack against Vision-Language Pre-trained Models
by: Zhang, Peng-Fei, et al.
Published: (2025)

Deep Radar Inverse Sensor Models for Dynamic Occupancy Grid Maps
by: Wei, Zihang, et al.
Published: (2023)

Self-Supervised Pre-Training for Table Structure Recognition Transformer
by: Peng, ShengYun, et al.
Published: (2024)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)

Visual Text Generation in the Wild
by: Zhu, Yuanzhi, et al.
Published: (2024)

CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
by: Wang, Yeyuan, et al.
Published: (2024)

Universal Adversarial Perturbations for Vision-Language Pre-trained Models
by: Zhang, Peng-Fei, et al.
Published: (2024)

Stylized Structural Patterns for Improved Neural Network Pre-training
by: Salehi, Farnood, et al.
Published: (2025)

Pre-training for Action Recognition with Automatically Generated Fractal Datasets
by: Svyezhentsev, Davyd, et al.
Published: (2024)

IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
by: Yang, Xiaomeng, et al.
Published: (2023)

A Foundation Model for DAS Signal Recognition and Visual Prompt Tuning of the Pre-trained Model for Downstream Tasks
by: Gui, Kun, et al.
Published: (2025)

Multi-modal Multi-task Pre-training for Improved Point Cloud Understanding
by: Liu, Liwen, et al.
Published: (2025)

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
by: Shen, Shufan, et al.
Published: (2025)

Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models
by: Huang, He, et al.
Published: (2025)

ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning
by: Wang, Yeyuan, et al.
Published: (2025)

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
by: Huang, Ziyuan, et al.
Published: (2024)

Logos as a Well-Tempered Pre-train for Sign Language Recognition
by: Ovodov, Ilya, et al.
Published: (2025)

AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays
by: Yi, Chenlang, et al.
Published: (2025)

GLID: Pre-training a Generalist Encoder-Decoder Vision Model
by: Liu, Jihao, et al.
Published: (2024)

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
by: Zheng, Weijie, et al.
Published: (2024)

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
by: Shen, Yufan, et al.
Published: (2024)

SR-Stereo & DAPE: Stepwise Regression and Pre-trained Edges for Practical Stereo Matching
by: Xiao, Weiqing, et al.
Published: (2024)

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
by: Lu, Feng, et al.
Published: (2024)

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
by: Gao, Zuan, et al.
Published: (2024)

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
by: Haliassos, Alexandros, et al.
Published: (2024)