:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Baoyao, Chen, Junxiang, Li, Wanyun, Yao, Wenbin, Zhou, Yang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning H.3.3; I.2.10; I.2.7; H.5.1
Online Access:	https://arxiv.org/abs/2502.02885
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding
by: Yang, Baoyao, et al.
Published: (2025)

Leveraging OpenFlamingo for Multimodal Embedding Analysis of C2C Car Parts Data
by: Rashid, Maisha Binte, et al.
Published: (2025)

Evaluating Perspectival Biases in Cross-Modal Retrieval
by: Saengsukhiran, Teerapol, et al.
Published: (2025)

A Grounded Memory System For Smart Personal Assistants
by: Ocker, Felix, et al.
Published: (2025)

TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation
by: Zeng, Yangchen, et al.
Published: (2026)

Large Language Model for Qualitative Research -- A Systematic Mapping Study
by: Barros, Cauã Ferreira, et al.
Published: (2024)

Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring "Tortured Phrases" in Scientific Literature
by: Maiti, Agniva, et al.
Published: (2025)

DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning
by: Guo, Jiaxin, et al.
Published: (2025)

An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems
by: Ghaffari, Shervin, et al.
Published: (2025)

ARTAI: An Evaluation Platform to Assess Societal Risk of Recommender Algorithms
by: Ruan, Qin, et al.
Published: (2024)

AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety
by: Levi, Adi, et al.
Published: (2025)

Tensor Manifold-Based Graph-Vector Fusion for AI-Native Academic Literature Retrieval
by: Wei, Xing, et al.
Published: (2026)

Visualizing the Evolution of Twitter (X.com) Conversations: A Comprehensive Methodology Applied to AI Training Discussions on ChatGPT
by: Jess, Nicole, et al.
Published: (2024)

Exploring Diagnostic Prompting Approach for Multimodal LLM-based Visual Complexity Assessment: A Case Study of Amazon Search Result Pages
by: Murtadak, Divendar, et al.
Published: (2025)

QoSGMAA: A Robust Multi-Order Graph Attention and Adversarial Framework for Sparse QoS Prediction
by: Du, Guanchen, et al.
Published: (2025)

Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation
by: Georgiou, Athos
Published: (2025)

Predicting When to Trust Vision-Language Models for Spatial Reasoning
by: Imran, Muhammad, et al.
Published: (2026)

ReCoVR: Closing the Loop in Interactive Composed Video Retrieval
by: Zhang, Bingqing, et al.
Published: (2026)

FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models
by: Yuan, Bithiah
Published: (2025)

MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
by: Xu, Tianyu, et al.
Published: (2026)

LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation
by: Zhang, Haichao, et al.
Published: (2025)

AVATAAR: Agentic Video Answering via Temporal Adaptive Alignment and Reasoning
by: Patel, Urjitkumar, et al.
Published: (2025)

From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms
by: Kai, Zhang, et al.
Published: (2026)

Dense Video Understanding with Gated Residual Tokenization
by: Zhang, Haichao, et al.
Published: (2025)

DALL-M: Context-Aware Clinical Data Augmentation with LLMs
by: Hsieh, Chihcheng, et al.
Published: (2024)

Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations
by: Rivas, Pablo, et al.
Published: (2024)

Higher education assessment practice in the era of generative AI tools
by: Ogunleye, Bayode, et al.
Published: (2024)

Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering
by: Xu, Tianyu, et al.
Published: (2025)

VOGUE: A Multimodal Dataset for Conversational Recommendation in Fashion
by: Guo, David, et al.
Published: (2025)

Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs
by: Asanuma, Haruka, et al.
Published: (2025)

Experimentation Accelerator: Interpretable Insights and Creative Recommendations for A/B Testing with Content-Aware ranking
by: Hu, Zhengmian, et al.
Published: (2026)

CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA
by: Kovalev, Vsevolod, et al.
Published: (2025)

Real-World En Call Center Transcripts Dataset with PII Redaction
by: Dao, Ha, et al.
Published: (2025)

ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation
by: Livieris, Ioannis E., et al.
Published: (2026)

HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis
by: Godinez, Alejandro
Published: (2025)

Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use
by: Ho, Justin, et al.
Published: (2025)

To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation
by: Dhole, Kaustubh D.
Published: (2025)

LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models
by: Cai, Peng, et al.
Published: (2025)

Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
by: Zhang, Bingqing, et al.
Published: (2025)

LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
by: Wei, Hualiang, et al.
Published: (2026)