:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ranasinghe, Pasindu, Ranasinghe, Pamudu
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2509.26088
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

A Deep Learning Approach to Identify Rock Bolts in Complex 3D Point Clouds of Underground Mines Captured Using Mobile Laser Scanners
von: Patra, Dibyayan, et al.
Veröffentlicht: (2025)

LiDAR Point Cloud Colourisation Using Multi-Camera Fusion and Low-Light Image Enhancement
von: Ranasinghe, Pasindu, et al.
Veröffentlicht: (2025)

Automated Discontinuity Set Characterisation in Enclosed Rock Face Point Clouds Using Single-Shot Filtering and Cyclic Orientation Transformation
von: Patra, Dibyayan, et al.
Veröffentlicht: (2026)

Predicting Soccer Penalty Kick Direction Using Human Action Recognition
von: Freire-Obregón, David, et al.
Veröffentlicht: (2025)

Towards Integrated Rock Support Visualisation in 3D Point Cloud of Underground Mines
von: Patra, Dibyayan, et al.
Veröffentlicht: (2026)

MambaKick: Early Penalty Direction Prediction from HAR Embeddings
von: Velesaca, Henry O., et al.
Veröffentlicht: (2026)

Team-Aware Football Player Tracking with SAM: An Appearance-Based Approach to Occlusion Recovery
von: Ranasinghe, Chamath, et al.
Veröffentlicht: (2025)

Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection
von: Ranasinghe, Yasiru, et al.
Veröffentlicht: (2026)

Language Repository for Long Video Understanding
von: Kahatapitiya, Kumara, et al.
Veröffentlicht: (2024)

Understanding Long Videos with Multimodal Language Models
von: Ranasinghe, Kanchana, et al.
Veröffentlicht: (2024)

CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
von: Mata, Cristina, et al.
Veröffentlicht: (2025)

$CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models
von: Ranasinghe, Yasiru, et al.
Veröffentlicht: (2023)

Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models
von: Ranasinghe, Yasiru, et al.
Veröffentlicht: (2025)

Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception
von: Thushara, Rusiru, et al.
Veröffentlicht: (2026)

Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
von: De la Jara, I. M., et al.
Veröffentlicht: (2025)

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
von: Ranasinghe, Kanchana, et al.
Veröffentlicht: (2024)

Multi-scale Attention Guided Pose Transfer
von: Roy, Prasun, et al.
Veröffentlicht: (2022)

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning
von: Watawana, Hasindri, et al.
Veröffentlicht: (2024)

Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
von: Park, Jongwoo, et al.
Veröffentlicht: (2024)

Pixel Motion Diffusion is What We Need for Robot Control
von: Nguyen, E-Ro, et al.
Veröffentlicht: (2025)

Multi-Modal Monocular Endoscopic Depth and Pose Estimation with Edge-Guided Self-Supervision
von: Ju, Xinwei, et al.
Veröffentlicht: (2026)

MultiSense-Pneumo: A Multimodal Learning Framework for Pneumonia Screening in Resource-Constrained Settings
von: Jayakody, Dineth, et al.
Veröffentlicht: (2026)

BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment
von: Hosseinzadeh, Mehdi, et al.
Veröffentlicht: (2024)

Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening
von: Chen, Dong, et al.
Veröffentlicht: (2026)

Future Optical Flow Prediction Improves Robot Control & Video Generation
von: Ranasinghe, Kanchana, et al.
Veröffentlicht: (2026)

ACIT: Attention-Guided Cross-Modal Interaction Transformer for Pedestrian Crossing Intention Prediction
von: Li, Yuanzhe, et al.
Veröffentlicht: (2025)

Pixel Motion as Universal Representation for Robot Control
von: Ranasinghe, Kanchana, et al.
Veröffentlicht: (2025)

SINR: Sparsity Driven Compressed Implicit Neural Representations
von: Jayasundara, Dhananjaya, et al.
Veröffentlicht: (2025)

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation
von: De Silva, Ulindu, et al.
Veröffentlicht: (2025)

COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
von: Thennakoon, Pandula, et al.
Veröffentlicht: (2025)

MUNIChus: Multilingual News Image Captioning Benchmark
von: Chen, Yuji, et al.
Veröffentlicht: (2026)

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction
von: Cai, Luhui, et al.
Veröffentlicht: (2024)

Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living
von: Hashemifard, Kooshan, et al.
Veröffentlicht: (2026)

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
von: Huang, Qidong, et al.
Veröffentlicht: (2023)

LatentCRF: Continuous CRF for Efficient Latent Diffusion
von: Ranasinghe, Kanchana, et al.
Veröffentlicht: (2024)

Activating Self-Attention for Multi-Scene Absolute Pose Regression
von: Lee, Miso, et al.
Veröffentlicht: (2024)

Cross-Modal Attention Guided Unlearning in Vision-Language Models
von: Bhaila, Karuna, et al.
Veröffentlicht: (2025)

Prediction of Distant Metastasis in Head and Neck Cancer Patients Using Tumor and Peritumoral Multi-Modal Deep Learning
von: Tong, Nuo, et al.
Veröffentlicht: (2025)

GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
von: Feng, Liang, et al.
Veröffentlicht: (2024)

Towards Balanced Multi-Modal Learning in 3D Human Pose Estimation
von: Qi, Mengshi, et al.
Veröffentlicht: (2025)