:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chivereanu, Radu, Cosma, Adrian, Catruna, Andy, Rughinis, Razvan, Radoi, Emilian
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.12192
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models
by: Cătrună, Andy, et al.
Published: (2024)

CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
by: Cătrună, Andy, et al.
Published: (2024)

MoME: Estimating Psychological Traits from Gait with Multi-Stage Mixture of Movement Experts
by: Cǎtrunǎ, Andy, et al.
Published: (2025)

On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
by: Cosma, Adrian, et al.
Published: (2025)

GaitPT: Skeletons Are All You Need For Gait Recognition
by: Catruna, Andy, et al.
Published: (2023)

Database-Agnostic Gait Enrollment using SetTransformers
by: Basoc, Nicoleta, et al.
Published: (2025)

Gait Recognition from Highly Compressed Videos
by: Niculae, Andrei, et al.
Published: (2024)

Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models
by: Basoc, Nicoleta-Nina, et al.
Published: (2026)

What Makes a Good Doctor Response? A Study on Text-Based Telemedicine
by: Cosma, Adrian, et al.
Published: (2026)

A Retrieval-Based Approach to Medical Procedure Matching in Romanian
by: Niculae, Andrei, et al.
Published: (2025)

RoMath: A Mathematical Reasoning Benchmark in Romanian
by: Cosma, Adrian, et al.
Published: (2024)

Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
by: Ntinou, Ioanna, et al.
Published: (2025)

Training Language Models with homotokens Leads to Delayed Overfitting
by: Cosma, Adrian, et al.
Published: (2026)

Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian
by: Niculae, Andrei, et al.
Published: (2025)

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models
by: Cosma, Adrian, et al.
Published: (2025)

MoReact: Generating Reactive Motion from Textual Descriptions
by: Xu, Xiyan, et al.
Published: (2025)

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
by: Zhao, Chengyang, et al.
Published: (2023)

Learning to Generate Human-Human-Object Interactions from Textual Descriptions
by: Na, Jeonghyeon, et al.
Published: (2025)

Contact-aware Human Motion Generation from Textual Descriptions
by: Ma, Sihan, et al.
Published: (2024)

Motion Generation from Fine-grained Textual Descriptions
by: Li, Kunhang, et al.
Published: (2024)

Zero-Shot Temporal Action Localization Through Textual Guidance
by: Liberatori, Benedetta, et al.
Published: (2026)

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
by: Moon, Seokha, et al.
Published: (2024)

Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
by: Shvetsova, Nina, et al.
Published: (2025)

Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
by: Benavent-Lledo, Manuel, et al.
Published: (2024)

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
by: Pi, Renjie, et al.
Published: (2024)

MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation
by: Chen, Huangwei, et al.
Published: (2025)

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
by: Sun, Zhengyang, et al.
Published: (2026)

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
by: Dhakal, Aayush, et al.
Published: (2023)

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
by: Hashemi, Mohammad Abuzar, et al.
Published: (2021)

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
by: Zhang, Yue, et al.
Published: (2025)

Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition
by: Hu, Xiaodan, et al.
Published: (2025)

Distilling Textual Priors from LLM to Efficient Image Fusion
by: Zhang, Ran, et al.
Published: (2025)

Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
by: Gimeno-Gómez, David, et al.
Published: (2024)

Instance-aware Image Colorization with Controllable Textual Descriptions and Segmentation Masks
by: An, Yanru, et al.
Published: (2025)

Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives
by: Dong, Sixun, et al.
Published: (2025)

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)

Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
by: Yoshida, Tomoya, et al.
Published: (2025)

Streaming Neural Images
by: Conde, Marcos V., et al.
Published: (2024)

OrienText: Surface Oriented Textual Image Generation
by: Paliwal, Shubham Singh, et al.
Published: (2025)

Precise Parameter Localization for Textual Generation in Diffusion Models
by: Staniszewski, Łukasz, et al.
Published: (2025)