Saved in:
| Main Authors: | Zhao, Kunpeng, Miyazaki, Asahi, Okita, Tsuyoshi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.20739 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Thoughts on Objectives of Sparse and Hierarchical Masked Image Model
by: Miyazaki, Asahi, et al.
Published: (2025)
by: Miyazaki, Asahi, et al.
Published: (2025)
Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model
by: Matsuishi, Koki, et al.
Published: (2025)
by: Matsuishi, Koki, et al.
Published: (2025)
Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet
by: Hirata, Kodai, et al.
Published: (2025)
by: Hirata, Kodai, et al.
Published: (2025)
Diffusion Model-based Activity Completion for AI Motion Capture from Videos
by: Huayu, Gao, et al.
Published: (2025)
by: Huayu, Gao, et al.
Published: (2025)
Multimodal Foundation Model for Cross-Modal Retrieval and Activity Recognition Tasks
by: Matsuishi, Koki, et al.
Published: (2025)
by: Matsuishi, Koki, et al.
Published: (2025)
Image Classification Using a Diffusion Model as a Pre-Training Model
by: Ukita, Kosuke, et al.
Published: (2025)
by: Ukita, Kosuke, et al.
Published: (2025)
Window to Wall Ratio Detection using SegFormer
by: De Simone, Zoe, et al.
Published: (2024)
by: De Simone, Zoe, et al.
Published: (2024)
ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding
by: Setyawan, Novendra, et al.
Published: (2024)
by: Setyawan, Novendra, et al.
Published: (2024)
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
by: Shan, Jiquan, et al.
Published: (2025)
by: Shan, Jiquan, et al.
Published: (2025)
Towards Robust Nonlinear Subspace Clustering: A Kernel Learning Approach
by: Xu, Kunpeng, et al.
Published: (2025)
by: Xu, Kunpeng, et al.
Published: (2025)
Uncertainty-Aware Global-View Reconstruction for Multi-View Multi-Label Feature Selection
by: Hao, Pingting, et al.
Published: (2025)
by: Hao, Pingting, et al.
Published: (2025)
NavFormer: IGRF Forecasting in Moving Coordinate Frames
by: Hwang, Yoontae, et al.
Published: (2026)
by: Hwang, Yoontae, et al.
Published: (2026)
Group Relative Augmentation for Data Efficient Action Detection
by: Patel, Deep Anil, et al.
Published: (2025)
by: Patel, Deep Anil, et al.
Published: (2025)
StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance
by: Gong, Yunpeng, et al.
Published: (2025)
by: Gong, Yunpeng, et al.
Published: (2025)
MatFormer: Nested Transformer for Elastic Inference
by: Devvrit, et al.
Published: (2023)
by: Devvrit, et al.
Published: (2023)
MetaFormer Baselines for Vision
by: Yu, Weihao, et al.
Published: (2022)
by: Yu, Weihao, et al.
Published: (2022)
ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification
by: Li, Mingshi, et al.
Published: (2025)
by: Li, Mingshi, et al.
Published: (2025)
BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration
by: Pertigkiozoglou, Stefanos, et al.
Published: (2024)
by: Pertigkiozoglou, Stefanos, et al.
Published: (2024)
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
by: Swetha, Sirnam, et al.
Published: (2024)
by: Swetha, Sirnam, et al.
Published: (2024)
PerFormer: A Permutation Based Vision Transformer for Remaining Useful Life Prediction
by: Fan, Zhengyang, et al.
Published: (2025)
by: Fan, Zhengyang, et al.
Published: (2025)
FixationFormer: Direct Utilization of Expert Gaze Trajectories for Chest X-Ray Classification
by: Beckmann, Daniel, et al.
Published: (2026)
by: Beckmann, Daniel, et al.
Published: (2026)
DA-SegFormer: Damage-Aware Semantic Segmentation for Fine-Grained Disaster Assessment
by: Zhu, Kevin, et al.
Published: (2026)
by: Zhu, Kevin, et al.
Published: (2026)
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
by: Hu, Xixu, et al.
Published: (2024)
by: Hu, Xixu, et al.
Published: (2024)
GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring
by: Khirwar, Madhav, et al.
Published: (2024)
by: Khirwar, Madhav, et al.
Published: (2024)
Soft-TransFormers for Continual Learning
by: Kang, Haeyong, et al.
Published: (2024)
by: Kang, Haeyong, et al.
Published: (2024)
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos
by: Bian, Jiang, et al.
Published: (2022)
by: Bian, Jiang, et al.
Published: (2022)
SegFormer Fine-Tuning with Dropout: Advancing Hair Artifact Removal in Skin Lesion Analysis
by: Saad, Asif Mohammed, et al.
Published: (2025)
by: Saad, Asif Mohammed, et al.
Published: (2025)
Action-Agnostic Point-Level Supervision for Temporal Action Detection
by: Yoshida, Shuhei M., et al.
Published: (2024)
by: Yoshida, Shuhei M., et al.
Published: (2024)
Future-Proofing Class-Incremental Learning
by: Jodelet, Quentin, et al.
Published: (2024)
by: Jodelet, Quentin, et al.
Published: (2024)
A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
by: Cao, Chengtai, et al.
Published: (2022)
by: Cao, Chengtai, et al.
Published: (2022)
Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models
by: Wang, Jeffrey, et al.
Published: (2026)
by: Wang, Jeffrey, et al.
Published: (2026)
Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information
by: Safaai, Houman, et al.
Published: (2026)
by: Safaai, Houman, et al.
Published: (2026)
Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images
by: Burgert, Tom, et al.
Published: (2024)
by: Burgert, Tom, et al.
Published: (2024)
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
by: Lee, Seok Hwan, et al.
Published: (2024)
by: Lee, Seok Hwan, et al.
Published: (2024)
Channel-Aware Probing for Multi-Channel Imaging
by: Marikkar, Umar, et al.
Published: (2026)
by: Marikkar, Umar, et al.
Published: (2026)
Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
by: Ilic, Filip, et al.
Published: (2024)
by: Ilic, Filip, et al.
Published: (2024)
RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
by: Zeng, Chong, et al.
Published: (2025)
by: Zeng, Chong, et al.
Published: (2025)
A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing
by: Ando, Yusaku, et al.
Published: (2024)
by: Ando, Yusaku, et al.
Published: (2024)
Action Dubber: Timing Audible Actions via Inflectional Flow
by: Wan, Wenlong, et al.
Published: (2025)
by: Wan, Wenlong, et al.
Published: (2025)
AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image
by: Amirkolaee, Hamed Amini, et al.
Published: (2024)
by: Amirkolaee, Hamed Amini, et al.
Published: (2024)
Similar Items
-
Thoughts on Objectives of Sparse and Hierarchical Masked Image Model
by: Miyazaki, Asahi, et al.
Published: (2025) -
Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model
by: Matsuishi, Koki, et al.
Published: (2025) -
Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet
by: Hirata, Kodai, et al.
Published: (2025) -
Diffusion Model-based Activity Completion for AI Motion Capture from Videos
by: Huayu, Gao, et al.
Published: (2025) -
Multimodal Foundation Model for Cross-Modal Retrieval and Activity Recognition Tasks
by: Matsuishi, Koki, et al.
Published: (2025)