Saved in:
| Main Authors: | Chivereanu, Radu, Cosma, Adrian, Catruna, Andy, Rughinis, Razvan, Radoi, Emilian |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.12192 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models
by: Cătrună, Andy, et al.
Published: (2024)
by: Cătrună, Andy, et al.
Published: (2024)
CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
by: Cătrună, Andy, et al.
Published: (2024)
by: Cătrună, Andy, et al.
Published: (2024)
MoME: Estimating Psychological Traits from Gait with Multi-Stage Mixture of Movement Experts
by: Cǎtrunǎ, Andy, et al.
Published: (2025)
by: Cǎtrunǎ, Andy, et al.
Published: (2025)
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
by: Cosma, Adrian, et al.
Published: (2025)
by: Cosma, Adrian, et al.
Published: (2025)
GaitPT: Skeletons Are All You Need For Gait Recognition
by: Catruna, Andy, et al.
Published: (2023)
by: Catruna, Andy, et al.
Published: (2023)
Database-Agnostic Gait Enrollment using SetTransformers
by: Basoc, Nicoleta, et al.
Published: (2025)
by: Basoc, Nicoleta, et al.
Published: (2025)
Gait Recognition from Highly Compressed Videos
by: Niculae, Andrei, et al.
Published: (2024)
by: Niculae, Andrei, et al.
Published: (2024)
Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models
by: Basoc, Nicoleta-Nina, et al.
Published: (2026)
by: Basoc, Nicoleta-Nina, et al.
Published: (2026)
What Makes a Good Doctor Response? A Study on Text-Based Telemedicine
by: Cosma, Adrian, et al.
Published: (2026)
by: Cosma, Adrian, et al.
Published: (2026)
A Retrieval-Based Approach to Medical Procedure Matching in Romanian
by: Niculae, Andrei, et al.
Published: (2025)
by: Niculae, Andrei, et al.
Published: (2025)
RoMath: A Mathematical Reasoning Benchmark in Romanian
by: Cosma, Adrian, et al.
Published: (2024)
by: Cosma, Adrian, et al.
Published: (2024)
Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
by: Ntinou, Ioanna, et al.
Published: (2025)
by: Ntinou, Ioanna, et al.
Published: (2025)
Training Language Models with homotokens Leads to Delayed Overfitting
by: Cosma, Adrian, et al.
Published: (2026)
by: Cosma, Adrian, et al.
Published: (2026)
Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian
by: Niculae, Andrei, et al.
Published: (2025)
by: Niculae, Andrei, et al.
Published: (2025)
The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models
by: Cosma, Adrian, et al.
Published: (2025)
by: Cosma, Adrian, et al.
Published: (2025)
MoReact: Generating Reactive Motion from Textual Descriptions
by: Xu, Xiyan, et al.
Published: (2025)
by: Xu, Xiyan, et al.
Published: (2025)
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
by: Zhao, Chengyang, et al.
Published: (2023)
by: Zhao, Chengyang, et al.
Published: (2023)
Learning to Generate Human-Human-Object Interactions from Textual Descriptions
by: Na, Jeonghyeon, et al.
Published: (2025)
by: Na, Jeonghyeon, et al.
Published: (2025)
Contact-aware Human Motion Generation from Textual Descriptions
by: Ma, Sihan, et al.
Published: (2024)
by: Ma, Sihan, et al.
Published: (2024)
Motion Generation from Fine-grained Textual Descriptions
by: Li, Kunhang, et al.
Published: (2024)
by: Li, Kunhang, et al.
Published: (2024)
Zero-Shot Temporal Action Localization Through Textual Guidance
by: Liberatori, Benedetta, et al.
Published: (2026)
by: Liberatori, Benedetta, et al.
Published: (2026)
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
by: Moon, Seokha, et al.
Published: (2024)
by: Moon, Seokha, et al.
Published: (2024)
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
by: Shvetsova, Nina, et al.
Published: (2025)
by: Shvetsova, Nina, et al.
Published: (2025)
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
by: Benavent-Lledo, Manuel, et al.
Published: (2024)
by: Benavent-Lledo, Manuel, et al.
Published: (2024)
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
by: Pi, Renjie, et al.
Published: (2024)
by: Pi, Renjie, et al.
Published: (2024)
MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation
by: Chen, Huangwei, et al.
Published: (2025)
by: Chen, Huangwei, et al.
Published: (2025)
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
by: Sun, Zhengyang, et al.
Published: (2026)
by: Sun, Zhengyang, et al.
Published: (2026)
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
by: Dhakal, Aayush, et al.
Published: (2023)
by: Dhakal, Aayush, et al.
Published: (2023)
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
by: Hashemi, Mohammad Abuzar, et al.
Published: (2021)
by: Hashemi, Mohammad Abuzar, et al.
Published: (2021)
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
by: Zhang, Yue, et al.
Published: (2025)
by: Zhang, Yue, et al.
Published: (2025)
Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition
by: Hu, Xiaodan, et al.
Published: (2025)
by: Hu, Xiaodan, et al.
Published: (2025)
Distilling Textual Priors from LLM to Efficient Image Fusion
by: Zhang, Ran, et al.
Published: (2025)
by: Zhang, Ran, et al.
Published: (2025)
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
by: Gimeno-Gómez, David, et al.
Published: (2024)
by: Gimeno-Gómez, David, et al.
Published: (2024)
Instance-aware Image Colorization with Controllable Textual Descriptions and Segmentation Masks
by: An, Yanru, et al.
Published: (2025)
by: An, Yanru, et al.
Published: (2025)
Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives
by: Dong, Sixun, et al.
Published: (2025)
by: Dong, Sixun, et al.
Published: (2025)
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
by: Yoshida, Tomoya, et al.
Published: (2025)
by: Yoshida, Tomoya, et al.
Published: (2025)
Streaming Neural Images
by: Conde, Marcos V., et al.
Published: (2024)
by: Conde, Marcos V., et al.
Published: (2024)
OrienText: Surface Oriented Textual Image Generation
by: Paliwal, Shubham Singh, et al.
Published: (2025)
by: Paliwal, Shubham Singh, et al.
Published: (2025)
Precise Parameter Localization for Textual Generation in Diffusion Models
by: Staniszewski, Łukasz, et al.
Published: (2025)
by: Staniszewski, Łukasz, et al.
Published: (2025)
Similar Items
-
The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models
by: Cătrună, Andy, et al.
Published: (2024) -
CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
by: Cătrună, Andy, et al.
Published: (2024) -
MoME: Estimating Psychological Traits from Gait with Multi-Stage Mixture of Movement Experts
by: Cǎtrunǎ, Andy, et al.
Published: (2025) -
On Model and Data Scaling for Skeleton-based Self-Supervised Gait Recognition
by: Cosma, Adrian, et al.
Published: (2025) -
GaitPT: Skeletons Are All You Need For Gait Recognition
by: Catruna, Andy, et al.
Published: (2023)