Saved in:
| Main Authors: | Xu, Jinglin, Yin, Sibo, Zhao, Guohao, Wang, Zishuo, Peng, Yuxin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.06887 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
by: Xu, Jinglin, et al.
Published: (2024)
by: Xu, Jinglin, et al.
Published: (2024)
Uni-Parser Technical Report
by: Fang, Xi, et al.
Published: (2025)
by: Fang, Xi, et al.
Published: (2025)
FineSkiing: A Fine-grained Benchmark for Skiing Action Quality Assessment
by: Zhang, Yongji, et al.
Published: (2025)
by: Zhang, Yongji, et al.
Published: (2025)
UniParser: Multi-Human Parsing with Unified Correlation Representation Learning
by: Chu, Jiaming, et al.
Published: (2023)
by: Chu, Jiaming, et al.
Published: (2023)
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
by: Li, Geng, et al.
Published: (2025)
by: Li, Geng, et al.
Published: (2025)
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection
by: Wang, Zishuo, et al.
Published: (2024)
by: Wang, Zishuo, et al.
Published: (2024)
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
by: Wan, Jianqiang, et al.
Published: (2024)
by: Wan, Jianqiang, et al.
Published: (2024)
Intra and Inter Parser-Prompted Transformers for Effective Image Restoration
by: Wang, Cong, et al.
Published: (2025)
by: Wang, Cong, et al.
Published: (2025)
SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding
by: Xu, Pengxin, et al.
Published: (2026)
by: Xu, Pengxin, et al.
Published: (2026)
CREPE: Coordinate-Aware End-to-End Document Parser
by: Okamoto, Yamato, et al.
Published: (2024)
by: Okamoto, Yamato, et al.
Published: (2024)
CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective
by: Yang, Zhiwen, et al.
Published: (2025)
by: Yang, Zhiwen, et al.
Published: (2025)
Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition
by: Sun, Baoli, et al.
Published: (2025)
by: Sun, Baoli, et al.
Published: (2025)
FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment
by: Han, Ruisheng, et al.
Published: (2025)
by: Han, Ruisheng, et al.
Published: (2025)
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
by: Shao, Dian, et al.
Published: (2025)
by: Shao, Dian, et al.
Published: (2025)
MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment
by: Xu, Huangbiao, et al.
Published: (2025)
by: Xu, Huangbiao, et al.
Published: (2025)
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
by: Fang, Xi, et al.
Published: (2024)
by: Fang, Xi, et al.
Published: (2024)
MA-Bench: Towards Fine-grained Micro-Action Understanding
by: Li, Kun, et al.
Published: (2026)
by: Li, Kun, et al.
Published: (2026)
CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment
by: Zhou, Kanglei, et al.
Published: (2024)
by: Zhou, Kanglei, et al.
Published: (2024)
Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
by: Zhao, Zhengqi, et al.
Published: (2024)
by: Zhao, Zhengqi, et al.
Published: (2024)
PFDM: Parser-Free Virtual Try-on via Diffusion Model
by: Niu, Yunfang, et al.
Published: (2024)
by: Niu, Yunfang, et al.
Published: (2024)
Bidirectional Long-Range Parser for Sequential Data Understanding
by: Leotescu, George, et al.
Published: (2024)
by: Leotescu, George, et al.
Published: (2024)
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
by: Loiseau, Romain, et al.
Published: (2023)
by: Loiseau, Romain, et al.
Published: (2023)
Storyboard guided Alignment for Fine-grained Video Action Recognition
by: Liu, Enqi, et al.
Published: (2024)
by: Liu, Enqi, et al.
Published: (2024)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
by: Yu, Wenwen, et al.
Published: (2025)
by: Yu, Wenwen, et al.
Published: (2025)
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing
by: Liu, Fuyuan, et al.
Published: (2026)
by: Liu, Fuyuan, et al.
Published: (2026)
Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark for Fine-grained Motor Behavior Recognition
by: Liu, Cheng, et al.
Published: (2024)
by: Liu, Cheng, et al.
Published: (2024)
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
by: Chen, Weidong, et al.
Published: (2026)
by: Chen, Weidong, et al.
Published: (2026)
Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization
by: Wang, Tianyu, et al.
Published: (2026)
by: Wang, Tianyu, et al.
Published: (2026)
OmniParser for Pure Vision Based GUI Agent
by: Lu, Yadong, et al.
Published: (2024)
by: Lu, Yadong, et al.
Published: (2024)
Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs
by: Horn, Pius, et al.
Published: (2025)
by: Horn, Pius, et al.
Published: (2025)
SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing
by: Jing, Hongyi, et al.
Published: (2025)
by: Jing, Hongyi, et al.
Published: (2025)
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)
by: Wang, Baode, et al.
Published: (2025)
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)
by: He, Hulingxiao, et al.
Published: (2025)
Spatio-temporal Decoupled Knowledge Compensator for Few-Shot Action Recognition
by: Qu, Hongyu, et al.
Published: (2026)
by: Qu, Hongyu, et al.
Published: (2026)
Spatio-temporal Transformers for Action Unit Classification with Event Cameras
by: Cultrera, Luca, et al.
Published: (2024)
by: Cultrera, Luca, et al.
Published: (2024)
TiFRe: Text-guided Video Frame Reduction for Efficient Video Multi-modal Large Language Models
by: Zheng, Xiangtian, et al.
Published: (2026)
by: Zheng, Xiangtian, et al.
Published: (2026)
HieroAction: Hierarchically Guided VLM for Fine-Grained Action Analysis
by: Wu, Junhao, et al.
Published: (2025)
by: Wu, Junhao, et al.
Published: (2025)
MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization
by: Chen, Shimin, et al.
Published: (2022)
by: Chen, Shimin, et al.
Published: (2022)
Interpretable Long-term Action Quality Assessment
by: Dong, Xu, et al.
Published: (2024)
by: Dong, Xu, et al.
Published: (2024)
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
by: Tang, Yunlong, et al.
Published: (2025)
by: Tang, Yunlong, et al.
Published: (2025)
Similar Items
-
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
by: Xu, Jinglin, et al.
Published: (2024) -
Uni-Parser Technical Report
by: Fang, Xi, et al.
Published: (2025) -
FineSkiing: A Fine-grained Benchmark for Skiing Action Quality Assessment
by: Zhang, Yongji, et al.
Published: (2025) -
UniParser: Multi-Human Parsing with Unified Correlation Representation Learning
by: Chu, Jiaming, et al.
Published: (2023) -
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
by: Li, Geng, et al.
Published: (2025)