:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ge, Mengying, Tang, Dongkai, Li, Mingyang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.11286
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
by: Xuan, Shiyu, et al.
Published: (2026)

OpenVIS: Open-vocabulary Video Instance Segmentation
by: Guo, Pinxue, et al.
Published: (2023)

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
by: Wang, Dongkai, et al.
Published: (2024)

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
by: Cheng, Zesen, et al.
Published: (2024)

Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
by: Wu, Daiqing, et al.
Published: (2025)

EmoSign: A Multimodal Dataset for Understanding Emotions in American Sign Language
by: Chua, Phoebe, et al.
Published: (2025)

From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Grounded Open-vocabulary Situation Recognition
by: Cai, Chen, et al.
Published: (2025)

Contextual Emotion Recognition using Large Vision Language Models
by: Etesam, Yasaman, et al.
Published: (2024)

OpenTie: Open-vocabulary Sequential Rebar Tying System
by: Liu, Mingze, et al.
Published: (2025)

EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
by: Xing, Bohao, et al.
Published: (2025)

Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision
by: Liu, Yajie, et al.
Published: (2024)

Multimodal Video Emotion Recognition with Reliable Reasoning Priors
by: Wang, Zhepeng, et al.
Published: (2025)

Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying
by: Yin, Hairong, et al.
Published: (2025)

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
by: Wang, Jiankang, et al.
Published: (2025)

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
by: Chen, Boyu, et al.
Published: (2024)

From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
by: Chen, Zuyao, et al.
Published: (2025)

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data
by: Wu, Xuecheng, et al.
Published: (2022)

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
by: Wang, Yongqi, et al.
Published: (2024)

Decoupled Hierarchical Distillation for Multimodal Emotion Recognition
by: Li, Yong, et al.
Published: (2026)

AlignVTOFF: Texture-Spatial Feature Alignment for High-Fidelity Virtual Try-Off
by: Zhu, Yihan, et al.
Published: (2026)

PosSAM: Panoptic Open-vocabulary Segment Anything
by: VS, Vibashan, et al.
Published: (2024)

Weakly Supervised 3D Open-vocabulary Segmentation
by: Liu, Kunhao, et al.
Published: (2023)

Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023)

SEED-Story: Multimodal Long Story Generation with Large Language Model
by: Yang, Shuai, et al.
Published: (2024)

Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition
by: Liu, Ran, et al.
Published: (2025)

FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning
by: Hu, Zhuozhao, et al.
Published: (2025)

OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
by: Luo, Run, et al.
Published: (2025)

TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition
by: Yin, Wen, et al.
Published: (2025)

EmoLLM: Multimodal Emotional Understanding Meets Large Language Models
by: Yang, Qu, et al.
Published: (2024)

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding
by: Liao, Guibiao, et al.
Published: (2024)

DEEMO: De-identity Multimodal Emotion Recognition and Reasoning
by: Li, Deng, et al.
Published: (2025)

GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
by: Zheng, Yuhang, et al.
Published: (2024)

ST-LLM: Large Language Models Are Effective Temporal Learners
by: Liu, Ruyang, et al.
Published: (2024)

Evaluating Multimodal Large Language Models for Heterogeneous Face Recognition
by: Shahreza, Hatef Otroshi, et al.
Published: (2026)

IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation
by: Samet, Nermin, et al.
Published: (2026)

Compositional Caching for Training-free Open-vocabulary Attribute Detection
by: Garosi, Marco, et al.
Published: (2025)

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems
by: Zhu, Zhen, et al.
Published: (2023)

Open-vocabulary 3D scene perception in industrial environments
by: Moenck, Keno, et al.
Published: (2026)

Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments
by: Yu, Meng, et al.
Published: (2024)

A Trustworthy Method for Multimodal Emotion Recognition
by: Xue, Junxiao, et al.
Published: (2025)