:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Kunpeng, Miyazaki, Asahi, Okita, Tsuyoshi
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.20739
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Thoughts on Objectives of Sparse and Hierarchical Masked Image Model
by: Miyazaki, Asahi, et al.
Published: (2025)

Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model
by: Matsuishi, Koki, et al.
Published: (2025)

Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet
by: Hirata, Kodai, et al.
Published: (2025)

Diffusion Model-based Activity Completion for AI Motion Capture from Videos
by: Huayu, Gao, et al.
Published: (2025)

Multimodal Foundation Model for Cross-Modal Retrieval and Activity Recognition Tasks
by: Matsuishi, Koki, et al.
Published: (2025)

Image Classification Using a Diffusion Model as a Pre-Training Model
by: Ukita, Kosuke, et al.
Published: (2025)

Window to Wall Ratio Detection using SegFormer
by: De Simone, Zoe, et al.
Published: (2024)

ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding
by: Setyawan, Novendra, et al.
Published: (2024)

AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
by: Shan, Jiquan, et al.
Published: (2025)

Towards Robust Nonlinear Subspace Clustering: A Kernel Learning Approach
by: Xu, Kunpeng, et al.
Published: (2025)

Uncertainty-Aware Global-View Reconstruction for Multi-View Multi-Label Feature Selection
by: Hao, Pingting, et al.
Published: (2025)

NavFormer: IGRF Forecasting in Moving Coordinate Frames
by: Hwang, Yoontae, et al.
Published: (2026)

Group Relative Augmentation for Data Efficient Action Detection
by: Patel, Deep Anil, et al.
Published: (2025)

StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance
by: Gong, Yunpeng, et al.
Published: (2025)

MatFormer: Nested Transformer for Elastic Inference
by: Devvrit, et al.
Published: (2023)

MetaFormer Baselines for Vision
by: Yu, Weihao, et al.
Published: (2022)

ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification
by: Li, Mingshi, et al.
Published: (2025)

BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration
by: Pertigkiozoglou, Stefanos, et al.
Published: (2024)

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
by: Swetha, Sirnam, et al.
Published: (2024)

PerFormer: A Permutation Based Vision Transformer for Remaining Useful Life Prediction
by: Fan, Zhengyang, et al.
Published: (2025)

FixationFormer: Direct Utilization of Expert Gaze Trajectories for Chest X-Ray Classification
by: Beckmann, Daniel, et al.
Published: (2026)

DA-SegFormer: Damage-Aware Semantic Segmentation for Fine-Grained Disaster Assessment
by: Zhu, Kevin, et al.
Published: (2026)

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
by: Hu, Xixu, et al.
Published: (2024)

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring
by: Khirwar, Madhav, et al.
Published: (2024)

Soft-TransFormers for Continual Learning
by: Kang, Haeyong, et al.
Published: (2024)

P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos
by: Bian, Jiang, et al.
Published: (2022)

SegFormer Fine-Tuning with Dropout: Advancing Hair Artifact Removal in Skin Lesion Analysis
by: Saad, Asif Mohammed, et al.
Published: (2025)

Action-Agnostic Point-Level Supervision for Temporal Action Detection
by: Yoshida, Shuhei M., et al.
Published: (2024)

Future-Proofing Class-Incremental Learning
by: Jodelet, Quentin, et al.
Published: (2024)

A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
by: Cao, Chengtai, et al.
Published: (2022)

Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models
by: Wang, Jeffrey, et al.
Published: (2026)

Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information
by: Safaai, Houman, et al.
Published: (2026)

Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images
by: Burgert, Tom, et al.
Published: (2024)

JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
by: Lee, Seok Hwan, et al.
Published: (2024)

Channel-Aware Probing for Multi-Channel Imaging
by: Marikkar, Umar, et al.
Published: (2026)

Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
by: Ilic, Filip, et al.
Published: (2024)

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
by: Zeng, Chong, et al.
Published: (2025)

A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing
by: Ando, Yusaku, et al.
Published: (2024)

Action Dubber: Timing Audible Actions via Inflectional Flow
by: Wan, Wenlong, et al.
Published: (2025)

AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image
by: Amirkolaee, Hamed Amini, et al.
Published: (2024)