:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yuning, Hassan, Mohamed Abul, He, Jiangpeng, Higgins, Janine, McCrory, Megan, Eicher-Miller, Heather, Thomas, Graham, Sazonov, Edward O, Zhu, Fengqing Maggie
Format:	Preprint
Published:	2024
Subjects:	Multimedia Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.07827
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Food Portion Estimation via 3D Object Scaling
by: Vinod, Gautham, et al.
Published: (2024)

Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
by: Vinod, Gautham, et al.
Published: (2026)

Long-Tailed Continual Learning For Visual Food Recognition
by: He, Jiangpeng, et al.
Published: (2023)

Food Portion Estimation: From Pixels to Calories
by: Vinod, Gautham, et al.
Published: (2026)

A Survey on Multimodal Wearable Sensor-based Human Action Recognition
by: Ni, Jianyuan, et al.
Published: (2024)

Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition
by: Ni, Jianyuan, et al.
Published: (2023)

Leveraging Automatic Personalised Nutrition: Food Image Recognition Benchmark and Dataset based on Nutrition Taxonomy
by: Romero-Tapiador, Sergio, et al.
Published: (2022)

AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
by: Gao, Jun, et al.
Published: (2024)

Using a beginning history teacher’s consideration of students’ prior knowledge in a single lesson case study to reframe discussion of historical knowledge
by: Catherine McCrory
Published: (2017)

BAROC: Concealing Packet Losses in LSNs with Bimodal Behavior Awareness for Livecast Ingestion
by: Zhao, Haoyuan, et al.
Published: (2025)

TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition
by: Wang, Xinyu, et al.
Published: (2026)

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results
by: Smirnov, Maksim, et al.
Published: (2024)

AIM 2024 Challenge on Video Super-Resolution Quality Assessment: Methods and Results
by: Molodetskikh, Ivan, et al.
Published: (2024)

Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding
by: Pan, Zhaoyan, et al.
Published: (2026)

Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation
by: Phan, Nghia, et al.
Published: (2026)

AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content
by: Conde, Marcos V, et al.
Published: (2024)

Automatically Generating High-Precision Simulated Road Networking in Traffic Scenario
by: Xie, Liang, et al.
Published: (2025)

An Automatic Deep Learning Approach for Trailer Generation through Large Language Models
by: Balestri, Roberto, et al.
Published: (2026)

Fully Automatic Content-Aware Tiling Pipeline for Pathology Whole Slide Images
by: Jabar, Falah, et al.
Published: (2024)

Efficient Prompt Tuning for Hierarchical Ingredient Recognition
by: Gui, Yinxuan, et al.
Published: (2025)

Multimodal Emotion Recognition with Large Language Models
by: Zhang, Hongrui, et al.
Published: (2026)

Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition
by: Zhuang, Yan, et al.
Published: (2026)

DietDelta: A Vision-Language Approach for Dietary Assessment via Before-and-After Images
by: Vinod, Gautham, et al.
Published: (2026)

Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation
by: Che, Xinyi, et al.
Published: (2025)

Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2026)

Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation
by: Che, Xinyi, et al.
Published: (2025)

SpikEmo: Enhancing Emotion Recognition With Spiking Temporal Dynamics in Conversations
by: Yu, Xiaomin, et al.
Published: (2024)

AIM 2024 Challenge on Video Saliency Prediction: Methods and Results
by: Moskalenko, Andrey, et al.
Published: (2024)

Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition
by: Zhang, Rui, et al.
Published: (2024)

Emotional Cues Extraction and Fusion for Multi-modal Emotion Prediction and Recognition in Conversation
by: Shi, Haoxiang, et al.
Published: (2024)

Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation
by: Yi, Zijian, et al.
Published: (2024)

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition
by: Pan, Zhaoyan, et al.
Published: (2026)

Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition
by: Wang, Yifan, et al.
Published: (2026)

Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition
by: Nguyen, Cam-Van Thi, et al.
Published: (2024)

Subjective Quality Assessment of Dynamic 3D Meshes in Virtual Reality Environment
by: Nguyen, Duc V., et al.
Published: (2026)

HADUA: Hierarchical Attention and Dynamic Uniform Alignment for Robust Cross-Subject Emotion Recognition
by: Tang, Jiahao, et al.
Published: (2026)

An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)

MAR3: Multi-Agent Recognition, Reasoning, and Reflection for Reference Audio-Visual Segmentation
by: Zhao, Yuan, et al.
Published: (2026)

Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos
by: Yousefi, Niloofar, et al.
Published: (2024)

Towards Reproducible Learning-based Compression
by: Pang, Jiahao, et al.
Published: (2024)