Saved in:
| Main Authors: | Xia, Wanke, Peng, Ruoxin, Chu, Haoqi, Zhu, Xinlei, Yang, Zhiyu, Zhao, Yiting, Yang, Lili |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.13764 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Improved Pure Fully Connected Neural Network for Rice Grain Classification
by: Xia, Wanke, et al.
Published: (2025)
by: Xia, Wanke, et al.
Published: (2025)
AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
by: Yang, Jialiang, et al.
Published: (2026)
by: Yang, Jialiang, et al.
Published: (2026)
Robust Lightweight Crack Classification for Real-Time UAV Bridge Inspection
by: Li, Wei, et al.
Published: (2026)
by: Li, Wei, et al.
Published: (2026)
Decoupled Data Augmentation for Improving Image Classification
by: Chen, Ruoxin, et al.
Published: (2024)
by: Chen, Ruoxin, et al.
Published: (2024)
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation
by: Li, Likun, et al.
Published: (2024)
by: Li, Likun, et al.
Published: (2024)
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
by: Lin, Shanchuan, et al.
Published: (2025)
by: Lin, Shanchuan, et al.
Published: (2025)
Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification
by: Xu, Qin, et al.
Published: (2025)
by: Xu, Qin, et al.
Published: (2025)
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
by: Zhang, Ziyi, et al.
Published: (2025)
by: Zhang, Ziyi, et al.
Published: (2025)
Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture
by: Zhao, Zhiyuan, et al.
Published: (2025)
by: Zhao, Zhiyuan, et al.
Published: (2025)
Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution
by: Yang, Jiarui, et al.
Published: (2024)
by: Yang, Jiarui, et al.
Published: (2024)
An Enhancement of CNN Algorithm for Rice Leaf Disease Image Classification in Mobile Applications
by: Rodrigo, Kayne Uriel K., et al.
Published: (2024)
by: Rodrigo, Kayne Uriel K., et al.
Published: (2024)
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
by: Yang, Cheng, et al.
Published: (2025)
by: Yang, Cheng, et al.
Published: (2025)
Human-AI Collaboration Mechanism Study on AIGC Assisted Image Production for Special Coverage
by: Yang, Yajie, et al.
Published: (2025)
by: Yang, Yajie, et al.
Published: (2025)
SDM: A Powerful Tool for Evaluating Model Robustness
by: Liu, Xinlei, et al.
Published: (2026)
by: Liu, Xinlei, et al.
Published: (2026)
How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM
by: Zha, Jirong, et al.
Published: (2025)
by: Zha, Jirong, et al.
Published: (2025)
AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment
by: Chen, Ruoxin, et al.
Published: (2025)
by: Chen, Ruoxin, et al.
Published: (2025)
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
by: Jiang, Jianwen, et al.
Published: (2024)
by: Jiang, Jianwen, et al.
Published: (2024)
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
by: Yang, Yiting, et al.
Published: (2025)
by: Yang, Yiting, et al.
Published: (2025)
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
by: Dong, Wei, et al.
Published: (2024)
by: Dong, Wei, et al.
Published: (2024)
Den-TP: A Density-Balanced Data Curation and Evaluation Framework for Trajectory Prediction
by: Yang, Ruining, et al.
Published: (2024)
by: Yang, Ruining, et al.
Published: (2024)
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution
by: Wei, Timothy, et al.
Published: (2024)
by: Wei, Timothy, et al.
Published: (2024)
Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification
by: Zhu, Armando, et al.
Published: (2024)
by: Zhu, Armando, et al.
Published: (2024)
Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry
by: Xue, Duoduo, et al.
Published: (2026)
by: Xue, Duoduo, et al.
Published: (2026)
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
by: Yu, Xinlei, et al.
Published: (2025)
by: Yu, Xinlei, et al.
Published: (2025)
Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025)
by: Li, Yuan, et al.
Published: (2025)
A4-Unet: Deformable Multi-Scale Attention Network for Brain Tumor Segmentation
by: Wang, Ruoxin, et al.
Published: (2024)
by: Wang, Ruoxin, et al.
Published: (2024)
Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos
by: Cui, Xiangxiang, et al.
Published: (2025)
by: Cui, Xiangxiang, et al.
Published: (2025)
Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation
by: Li, Frank, et al.
Published: (2025)
by: Li, Frank, et al.
Published: (2025)
Real-Time Posture Monitoring and Risk Assessment for Manual Lifting Tasks Using MediaPipe and LSTM
by: Bagga, Ereena, et al.
Published: (2024)
by: Bagga, Ereena, et al.
Published: (2024)
Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries
by: Pranav, Tushar, et al.
Published: (2025)
by: Pranav, Tushar, et al.
Published: (2025)
Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization
by: Kang, Inha, et al.
Published: (2025)
by: Kang, Inha, et al.
Published: (2025)
DragonFruitQualityNet: A Lightweight Convolutional Neural Network for Real-Time Dragon Fruit Quality Inspection on Mobile Devices
by: Haquea, Md Zahurul, et al.
Published: (2025)
by: Haquea, Md Zahurul, et al.
Published: (2025)
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
by: Hui, Mude, et al.
Published: (2024)
by: Hui, Mude, et al.
Published: (2024)
Real Time Multi Organ Classification on Computed Tomography Images
by: Yerebakan, Halid Ziya, et al.
Published: (2024)
by: Yerebakan, Halid Ziya, et al.
Published: (2024)
City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning
by: Sun, Penglei, et al.
Published: (2025)
by: Sun, Penglei, et al.
Published: (2025)
MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients
by: Tang, Wen, et al.
Published: (2022)
by: Tang, Wen, et al.
Published: (2022)
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
by: Yang, Xiaomeng, et al.
Published: (2025)
by: Yang, Xiaomeng, et al.
Published: (2025)
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
by: Zhu, Zhiyu, et al.
Published: (2025)
by: Zhu, Zhiyu, et al.
Published: (2025)
ECRTime: Ensemble Integration of Classification and Retrieval for Time Series Classification
by: Zhao, Fan, et al.
Published: (2024)
by: Zhao, Fan, et al.
Published: (2024)
Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit
by: Zhao, Yang, et al.
Published: (2025)
by: Zhao, Yang, et al.
Published: (2025)
Similar Items
-
An Improved Pure Fully Connected Neural Network for Rice Grain Classification
by: Xia, Wanke, et al.
Published: (2025) -
AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
by: Yang, Jialiang, et al.
Published: (2026) -
Robust Lightweight Crack Classification for Real-Time UAV Bridge Inspection
by: Li, Wei, et al.
Published: (2026) -
Decoupled Data Augmentation for Improving Image Classification
by: Chen, Ruoxin, et al.
Published: (2024) -
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation
by: Li, Likun, et al.
Published: (2024)