:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteur principal:	Nguyen, Vinh
Format:	Preprint
Publié:	2024
Sujets:	Computer Vision and Pattern Recognition Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2410.16824
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

Sanitizing Manufacturing Dataset Labels Using Vision-Language Models
par: Mahjourian, Nazanin, et autres
Publié: (2025)

Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization
par: Nguyen, Quang Vinh, et autres
Publié: (2024)

Multimodal Object Detection using Depth and Image Data for Manufacturing Parts
par: Mahjourian, Nazanin, et autres
Publié: (2024)

Abductive Ego-View Accident Video Understanding for Safe Driving Perception
par: Fang, Jianwu, et autres
Publié: (2024)

Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
par: Lee, Insu, et autres
Publié: (2025)

View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis
par: Varghese, Subin, et autres
Publié: (2024)

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
par: Nguyen, Hung, et autres
Publié: (2024)

Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
par: Luu, Vinh Quoc, et autres
Publié: (2024)

City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning
par: Sun, Penglei, et autres
Publié: (2025)

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting
par: Nguyen, Binh Long, et autres
Publié: (2026)

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation
par: Jo, Hae-Won, et autres
Publié: (2025)

Understanding Multi-View Transformers
par: Stary, Michal, et autres
Publié: (2025)

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
par: Park, Sung-Yeon, et autres
Publié: (2025)

MPerS: Dynamic MLLM MixExperts Perception-Guided Remote Sensing Scene Segmentation
par: Wang, Ziyi, et autres
Publié: (2026)

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models
par: Fang, Shaoheng, et autres
Publié: (2025)

nuCarla: A nuScenes-Style Bird's-Eye View Perception Dataset for CARLA Simulation
par: Qiao, Zhijie, et autres
Publié: (2025)

IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
par: Liang, Yujia, et autres
Publié: (2025)

Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
par: Li, Nanxi, et autres
Publié: (2026)

Wired Perspectives: Multi-View Wire Art Embraces Generative AI
par: Qu, Zhiyu, et autres
Publié: (2023)

InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
par: Zou, Yude, et autres
Publié: (2026)

MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
par: Tang, Xiaqiang, et autres
Publié: (2024)

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
par: Lin, Baijiong, et autres
Publié: (2024)

EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
par: Lee, Dong In, et autres
Publié: (2024)

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
par: Wang, Xingrui, et autres
Publié: (2024)

SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
par: Zeng, Nianbo, et autres
Publié: (2025)

Towards Holistic Surgical Scene Understanding
par: Valderrama, Natalia, et autres
Publié: (2022)

MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement
par: Deng, Xiwei, et autres
Publié: (2024)

Privacy-Concealing Cooperative Perception for BEV Scene Segmentation
par: Wang, Song, et autres
Publié: (2026)

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders
par: Lin, Baijiong, et autres
Publié: (2024)

ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
par: Khalil, Ahmad, et autres
Publié: (2025)

ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
par: Fu, Yuqian, et autres
Publié: (2024)

MVTN: Learning Multi-View Transformations for 3D Understanding
par: Hamdi, Abdullah, et autres
Publié: (2022)

LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases
par: Quoc, Khang Nguyen, et autres
Publié: (2026)

Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification
par: Huang, Haojian, et autres
Publié: (2024)

PEAfowl: Perception-Enhanced Multi-View Vision-Language-Action for Bimanual Manipulation
par: Fan, Qingyu, et autres
Publié: (2026)

Few-Shot VLM-Based G-Code and HMI Verification in CNC Machining
par: Pour, Yasaman Hashem, et autres
Publié: (2025)

Planning with the Views via Scene Self-Exploration
par: Wang, Kangrui, et autres
Publié: (2026)

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation
par: Wang, Jinghao, et autres
Publié: (2023)

Perception-based Image Denoising via Generative Compression
par: Nguyen, Nam, et autres
Publié: (2026)

KGAlign: Joint Semantic-Structural Knowledge Encoding for Multimodal Fake News Detection
par: La, Tuan-Vinh, et autres
Publié: (2025)