:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Kaixin, Meng, Ziyang, Lin, Hongzhan, Luo, Ziyang, Tian, Yuchen, Ma, Jing, Huang, Zhiyong, Chua, Tat-Seng
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Human-Computer Interaction Multimedia 68-11 68-04 I.2.7; I.2.10
Online-Zugang:	https://arxiv.org/abs/2504.07981
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
von: Meng, Ziyang, et al.
Veröffentlicht: (2024)

Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
von: Tu, Songjun, et al.
Veröffentlicht: (2025)

Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos
von: Zhang, Junbin, et al.
Veröffentlicht: (2022)

Does CLIP perceive art the same way we do?
von: Asperti, Andrea, et al.
Veröffentlicht: (2025)

Hateful Meme Detection through Context-Sensitive Prompting and Fine-Grained Labeling
von: Ouyang, Rongxin, et al.
Veröffentlicht: (2024)

RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
von: Agarwal, Amit, et al.
Veröffentlicht: (2025)

Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis
von: Korolkov, Vasilii
Veröffentlicht: (2025)

Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution
von: Cornebise, Julien, et al.
Veröffentlicht: (2022)

PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
von: Patel, Hitesh Laxmichand, et al.
Veröffentlicht: (2025)

Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors
von: Zhang, Junbin, et al.
Veröffentlicht: (2026)

Meaning over Motion: A Semantic-First Approach to 360° Viewport Prediction
von: Khah, Arman Nik, et al.
Veröffentlicht: (2026)

Efficient and Privacy-Protecting Background Removal for 2D Video Streaming using iPhone 15 Pro Max LiDAR
von: Kinnevan, Jessica, et al.
Veröffentlicht: (2025)

ForensicFormer: Hierarchical Multi-Scale Reasoning for Cross-Domain Image Forgery Detection
von: Samson, Hema Hariharan
Veröffentlicht: (2026)

ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents
von: Huo, Dongjie, et al.
Veröffentlicht: (2026)

Unpacking Hateful Memes: Presupposed Context and False Claims
von: Cai, Weibin, et al.
Veröffentlicht: (2025)

CulinaryCut-VLAP: A Vision-Language-Action-Physics Framework for Food Cutting via a Force-Aware Material Point Method
von: Koh, Hyunseo, et al.
Veröffentlicht: (2026)

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation
von: Qi, Dekang, et al.
Veröffentlicht: (2026)

Context-Dependent Affordance Computation in Vision-Language Models
von: Farzulla, Murad
Veröffentlicht: (2026)

Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
von: Korolkov, Vasilii, et al.
Veröffentlicht: (2025)

Sequence Matters: Harnessing Video Models in 3D Super-Resolution
von: Ko, Hyun-kyu, et al.
Veröffentlicht: (2024)

Hierarchical Spatial Algorithms for High-Resolution Image Quantization and Feature Extraction
von: Mohammad, Noor Islam S.
Veröffentlicht: (2025)

MORQA: Benchmarking Evaluation Metrics for Medical Open-Ended Question Answering
von: Yim, Wen-wai, et al.
Veröffentlicht: (2025)

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair
von: Rajput, Vishal
Veröffentlicht: (2026)

Motion Attribution for Video Generation
von: Wu, Xindi, et al.
Veröffentlicht: (2026)

Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
von: Han, Zhiyuan, et al.
Veröffentlicht: (2025)

A Survey on Vision-Language-Action Models for Embodied AI
von: Ma, Yueen, et al.
Veröffentlicht: (2024)

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible
von: Zhao, Lepeng, et al.
Veröffentlicht: (2026)

Cost-Effective Attention Mechanisms for Low Resource Settings: Necessity & Sufficiency of Linear Transformations
von: Hosseini, Peyman, et al.
Veröffentlicht: (2024)

Progressive Cross Attention Network for Flood Segmentation using Multispectral Satellite Imagery
von: Feliren, Vicky, et al.
Veröffentlicht: (2025)

Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
von: Menon, Anjali R., et al.
Veröffentlicht: (2025)

Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
von: Yanambakkam, Hemanth Teja, et al.
Veröffentlicht: (2025)

DeepFusionNet: Autoencoder-Based Low-Light Image Enhancement and Super-Resolution
von: Çalışkan, Halil Hüseyin, et al.
Veröffentlicht: (2025)

Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning
von: Rovai, Fabio
Veröffentlicht: (2026)

A Landmark-Aware Visual Navigation Dataset
von: Johnson, Faith, et al.
Veröffentlicht: (2024)

Banana Ripeness Level Classification using a Simple CNN Model Trained with Real and Synthetic Datasets
von: Chuquimarca, Luis, et al.
Veröffentlicht: (2025)

Learning Association via Track-Detection Matching for Multi-Object Tracking
von: Adžemović, Momir
Veröffentlicht: (2025)

Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning
von: Sharma, Aditya, et al.
Veröffentlicht: (2025)

ROI-GS: Interest-based Local Quality 3D Gaussian Splatting
von: Bui, Quoc-Anh, et al.
Veröffentlicht: (2025)

ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition
von: Bui, Quoc-Anh, et al.
Veröffentlicht: (2025)

PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation
von: Alanazi, Ahmed, et al.
Veröffentlicht: (2025)