:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nguyen, Hy, Thudumu, Srikanth, Du, Hung, Vasa, Rajesh, Mouzakis, Kon
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.16753
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CSAOT: Cooperative Multi-Agent System for Active Object Tracking
by: Nguyen, Hy, et al.
Published: (2025)

Contextual Knowledge Sharing in Multi-Agent Reinforcement Learning with Decentralized Communication and Coordination
by: Du, Hung, et al.
Published: (2025)

Goal-Oriented Multi-Agent Reinforcement Learning for Decentralized Agent Teams
by: Du, Hung, et al.
Published: (2025)

A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions
by: Du, Hung, et al.
Published: (2024)

Local Control Networks (LCNs): Optimizing Flexibility in Neural Network Data Pattern Capture
by: Nguyen, Hy, et al.
Published: (2025)

Dual-Branch HNSW Approach with Skip Bridges and LID-Driven Optimization
by: Nguyen, Hy, et al.
Published: (2025)

The M-factor: A Novel Metric for Evaluating Neural Architecture Search in Resource-Constrained Environments
by: Thudumu, Srikanth, et al.
Published: (2025)

Pretraining Objective Matters in Extreme Low-Data FGVC: A Backbone-Controlled Study
by: Hackett, Alexander, et al.
Published: (2026)

Playing with Transformer at 30+ FPS via Next-Frame Diffusion
by: Cheng, Xinle, et al.
Published: (2025)

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction
by: Ji, Longbin, et al.
Published: (2026)

Explanation-Driven Counterfactual Testing for Faithfulness in Vision-Language Model Explanations
by: Ding, Sihao, et al.
Published: (2025)

XAI-Enhanced Semantic Segmentation Models for Visual Quality Inspection
by: Clement, Tobias, et al.
Published: (2024)

Novel 3D Binary Indexed Tree for Volume Computation of 3D Reconstructed Models from Volumetric Data
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)

RotCAtt-TransUNet++: Novel Deep Neural Network for Sophisticated Cardiac Segmentation
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)

Fostering Video Reasoning via Next-Event Prediction
by: Wang, Haonan, et al.
Published: (2025)

Overcoming the Curvature Bottleneck in MeanFlow
by: Zhang, Xinxi, et al.
Published: (2025)

Enhancing the Fairness and Performance of Edge Cameras with Explainable AI
by: Nguyen, Truong Thanh Hung, et al.
Published: (2024)

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
by: Nguyen, Hung, et al.
Published: (2024)

Predicting the Next Action by Modeling the Abstract Goal
by: Roy, Debaditya, et al.
Published: (2022)

GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction
by: Huang, Jia, et al.
Published: (2023)

DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
by: Nguyen, Ngoc-Son, et al.
Published: (2026)

Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting
by: Li, Yiqian, et al.
Published: (2025)

A Novel Framework for Automated Explain Vision Model Using Vision-Language Models
by: Nguyen, Phu-Vinh, et al.
Published: (2025)

Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer
by: Nguyen, Quoc Khanh, et al.
Published: (2024)

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
by: Chen, Dubing, et al.
Published: (2025)

DisBeaNet: A Deep Neural Network to augment Unmanned Surface Vessels for maritime situational awareness
by: Vemula, Srikanth, et al.
Published: (2024)

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
by: Duan, Zaipeng, et al.
Published: (2025)

Improving Zero-Shot Object-Level Change Detection by Incorporating Visual Correspondence
by: Nguyen, Hung Huy, et al.
Published: (2025)

Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues
by: Vu, Tuan-Anh, et al.
Published: (2025)

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
by: Ren, Shuhuai, et al.
Published: (2025)

GAIS: Frame-Level Gated Audio-Visual Integration with Semantic Variance-Scaled Perturbation for Text-Video Retrieval
by: Yang, Bowen, et al.
Published: (2025)

Representation Separation for Semantic Segmentation with Vision Transformers
by: Hong, Yuanduo, et al.
Published: (2022)

XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
by: Nguyen, Truong Thanh Hung, et al.
Published: (2024)

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
by: Huang, Yuanhui, et al.
Published: (2024)

FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
by: Ge, Haonan, et al.
Published: (2025)

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024)

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
by: Tian, Keyu, et al.
Published: (2024)

LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks
by: Nguyen, Truong Thanh Hung, et al.
Published: (2024)

M-LLM Based Video Frame Selection for Efficient Video Understanding
by: Hu, Kai, et al.
Published: (2025)

ECMNet:Lightweight Semantic Segmentation with Efficient CNN-Mamba Network
by: Du, Feixiang, et al.
Published: (2025)