Saved in:
| Main Authors: | Ge, Mengying, Tang, Dongkai, Li, Mingyang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.11286 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
by: Xuan, Shiyu, et al.
Published: (2026)
by: Xuan, Shiyu, et al.
Published: (2026)
OpenVIS: Open-vocabulary Video Instance Segmentation
by: Guo, Pinxue, et al.
Published: (2023)
by: Guo, Pinxue, et al.
Published: (2023)
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
by: Wang, Dongkai, et al.
Published: (2024)
by: Wang, Dongkai, et al.
Published: (2024)
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
by: Cheng, Zesen, et al.
Published: (2024)
by: Cheng, Zesen, et al.
Published: (2024)
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
by: Wu, Daiqing, et al.
Published: (2025)
by: Wu, Daiqing, et al.
Published: (2025)
EmoSign: A Multimodal Dataset for Understanding Emotions in American Sign Language
by: Chua, Phoebe, et al.
Published: (2025)
by: Chua, Phoebe, et al.
Published: (2025)
From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Grounded Open-vocabulary Situation Recognition
by: Cai, Chen, et al.
Published: (2025)
by: Cai, Chen, et al.
Published: (2025)
Contextual Emotion Recognition using Large Vision Language Models
by: Etesam, Yasaman, et al.
Published: (2024)
by: Etesam, Yasaman, et al.
Published: (2024)
OpenTie: Open-vocabulary Sequential Rebar Tying System
by: Liu, Mingze, et al.
Published: (2025)
by: Liu, Mingze, et al.
Published: (2025)
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
by: Xing, Bohao, et al.
Published: (2025)
by: Xing, Bohao, et al.
Published: (2025)
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision
by: Liu, Yajie, et al.
Published: (2024)
by: Liu, Yajie, et al.
Published: (2024)
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
by: Wang, Zhepeng, et al.
Published: (2025)
by: Wang, Zhepeng, et al.
Published: (2025)
Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying
by: Yin, Hairong, et al.
Published: (2025)
by: Yin, Hairong, et al.
Published: (2025)
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
by: Wang, Jiankang, et al.
Published: (2025)
by: Wang, Jiankang, et al.
Published: (2025)
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
by: Chen, Boyu, et al.
Published: (2024)
by: Chen, Boyu, et al.
Published: (2024)
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
by: Chen, Zuyao, et al.
Published: (2025)
by: Chen, Zuyao, et al.
Published: (2025)
ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data
by: Wu, Xuecheng, et al.
Published: (2022)
by: Wu, Xuecheng, et al.
Published: (2022)
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
by: Wang, Yongqi, et al.
Published: (2024)
by: Wang, Yongqi, et al.
Published: (2024)
Decoupled Hierarchical Distillation for Multimodal Emotion Recognition
by: Li, Yong, et al.
Published: (2026)
by: Li, Yong, et al.
Published: (2026)
AlignVTOFF: Texture-Spatial Feature Alignment for High-Fidelity Virtual Try-Off
by: Zhu, Yihan, et al.
Published: (2026)
by: Zhu, Yihan, et al.
Published: (2026)
PosSAM: Panoptic Open-vocabulary Segment Anything
by: VS, Vibashan, et al.
Published: (2024)
by: VS, Vibashan, et al.
Published: (2024)
Weakly Supervised 3D Open-vocabulary Segmentation
by: Liu, Kunhao, et al.
Published: (2023)
by: Liu, Kunhao, et al.
Published: (2023)
Open-vocabulary object 6D pose estimation
by: Corsetti, Jaime, et al.
Published: (2023)
by: Corsetti, Jaime, et al.
Published: (2023)
SEED-Story: Multimodal Long Story Generation with Large Language Model
by: Yang, Shuai, et al.
Published: (2024)
by: Yang, Shuai, et al.
Published: (2024)
Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition
by: Liu, Ran, et al.
Published: (2025)
by: Liu, Ran, et al.
Published: (2025)
FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning
by: Hu, Zhuozhao, et al.
Published: (2025)
by: Hu, Zhuozhao, et al.
Published: (2025)
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
by: Luo, Run, et al.
Published: (2025)
by: Luo, Run, et al.
Published: (2025)
TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition
by: Yin, Wen, et al.
Published: (2025)
by: Yin, Wen, et al.
Published: (2025)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models
by: Yang, Qu, et al.
Published: (2024)
by: Yang, Qu, et al.
Published: (2024)
OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding
by: Liao, Guibiao, et al.
Published: (2024)
by: Liao, Guibiao, et al.
Published: (2024)
DEEMO: De-identity Multimodal Emotion Recognition and Reasoning
by: Li, Deng, et al.
Published: (2025)
by: Li, Deng, et al.
Published: (2025)
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
by: Zheng, Yuhang, et al.
Published: (2024)
by: Zheng, Yuhang, et al.
Published: (2024)
ST-LLM: Large Language Models Are Effective Temporal Learners
by: Liu, Ruyang, et al.
Published: (2024)
by: Liu, Ruyang, et al.
Published: (2024)
Evaluating Multimodal Large Language Models for Heterogeneous Face Recognition
by: Shahreza, Hatef Otroshi, et al.
Published: (2026)
by: Shahreza, Hatef Otroshi, et al.
Published: (2026)
IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation
by: Samet, Nermin, et al.
Published: (2026)
by: Samet, Nermin, et al.
Published: (2026)
Compositional Caching for Training-free Open-vocabulary Attribute Detection
by: Garosi, Marco, et al.
Published: (2025)
by: Garosi, Marco, et al.
Published: (2025)
Continual Learning in Open-vocabulary Classification with Complementary Memory Systems
by: Zhu, Zhen, et al.
Published: (2023)
by: Zhu, Zhen, et al.
Published: (2023)
Open-vocabulary 3D scene perception in industrial environments
by: Moenck, Keno, et al.
Published: (2026)
by: Moenck, Keno, et al.
Published: (2026)
Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments
by: Yu, Meng, et al.
Published: (2024)
by: Yu, Meng, et al.
Published: (2024)
A Trustworthy Method for Multimodal Emotion Recognition
by: Xue, Junxiao, et al.
Published: (2025)
by: Xue, Junxiao, et al.
Published: (2025)
Similar Items
-
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
by: Xuan, Shiyu, et al.
Published: (2026) -
OpenVIS: Open-vocabulary Video Instance Segmentation
by: Guo, Pinxue, et al.
Published: (2023) -
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
by: Wang, Dongkai, et al.
Published: (2024) -
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
by: Cheng, Zesen, et al.
Published: (2024) -
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
by: Wu, Daiqing, et al.
Published: (2025)