Saved in:
| Main Authors: | Kalkhorani, Vahid Ahmadi, Zhang, Qingquan, Song, Guanqun, Zhu, Ting |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.10254 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography
by: Böhi, Simon, et al.
Published: (2026)
by: Böhi, Simon, et al.
Published: (2026)
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
by: Xia, Jingming, et al.
Published: (2025)
by: Xia, Jingming, et al.
Published: (2025)
PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
by: Sadanandan, Binesh, et al.
Published: (2026)
by: Sadanandan, Binesh, et al.
Published: (2026)
Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes
by: Davydov, Andrey, et al.
Published: (2024)
by: Davydov, Andrey, et al.
Published: (2024)
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
by: Ding, Xin, et al.
Published: (2025)
by: Ding, Xin, et al.
Published: (2025)
Revisiting Min-Max Optimization Problem in Adversarial Training
by: Ahmadi, Sina Hajer, et al.
Published: (2024)
by: Ahmadi, Sina Hajer, et al.
Published: (2024)
Detection of retinal diseases using an accelerated reused convolutional network
by: Kasani, Amin Ahmadi, et al.
Published: (2025)
by: Kasani, Amin Ahmadi, et al.
Published: (2025)
Hand bone age estimation using divide and conquer strategy and lightweight convolutional neural networks
by: Kasani, Amin Ahmadi, et al.
Published: (2024)
by: Kasani, Amin Ahmadi, et al.
Published: (2024)
ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)
by: Jebraeeli, Vahid, et al.
Published: (2025)
Enhancing Multi-Modal Video Sentiment Classification Through Semi-Supervised Clustering
by: Saadatinia, Mehrshad, et al.
Published: (2025)
by: Saadatinia, Mehrshad, et al.
Published: (2025)
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
by: Hoogeboom, Emiel, et al.
Published: (2026)
by: Hoogeboom, Emiel, et al.
Published: (2026)
Urban Representation Learning for Fine-grained Economic Mapping: A Semi-supervised Graph-based Approach
by: Cao, Jinzhou, et al.
Published: (2025)
by: Cao, Jinzhou, et al.
Published: (2025)
Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues
by: Zambare, Pallavi, et al.
Published: (2025)
by: Zambare, Pallavi, et al.
Published: (2025)
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
by: Pai, Jonas, et al.
Published: (2025)
by: Pai, Jonas, et al.
Published: (2025)
Supervised Contrastive Frame Aggregation for Video Representation Learning
by: Chowdhury, Shaif, et al.
Published: (2025)
by: Chowdhury, Shaif, et al.
Published: (2025)
NavFormer: IGRF Forecasting in Moving Coordinate Frames
by: Hwang, Yoontae, et al.
Published: (2026)
by: Hwang, Yoontae, et al.
Published: (2026)
Beyond Instance Consistency: Investigating View Diversity in Self-supervised Learning
by: Qin, Huaiyuan, et al.
Published: (2025)
by: Qin, Huaiyuan, et al.
Published: (2025)
Cross-Modal Binary Attention: An Energy-Efficient Fusion Framework for Audio-Visual Learning
by: Saleh, Mohamed, et al.
Published: (2026)
by: Saleh, Mohamed, et al.
Published: (2026)
Can Generative Models Improve Self-Supervised Representation Learning?
by: Ayromlou, Sana, et al.
Published: (2024)
by: Ayromlou, Sana, et al.
Published: (2024)
Beyond a Single Signal: SPECTREG2, A Unified MultiExpert Anomaly Detector for Unknown Unknowns
by: Ray, Rahul D
Published: (2026)
by: Ray, Rahul D
Published: (2026)
Your Image is Secretly the Last Frame of a Pseudo Video
by: Chen, Wenlong, et al.
Published: (2024)
by: Chen, Wenlong, et al.
Published: (2024)
Detecting Neurodegenerative Diseases using Frame-Level Handwriting Embeddings
by: Laouedj, Sarah, et al.
Published: (2025)
by: Laouedj, Sarah, et al.
Published: (2025)
The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA
by: Zhang, Hailiang, et al.
Published: (2024)
by: Zhang, Hailiang, et al.
Published: (2024)
FrameBridge: Improving Image-to-Video Generation with Bridge Models
by: Wang, Yuji, et al.
Published: (2024)
by: Wang, Yuji, et al.
Published: (2024)
MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs
by: Sun, Hui, et al.
Published: (2025)
by: Sun, Hui, et al.
Published: (2025)
Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame Attention
by: Shao, Shuai, et al.
Published: (2024)
by: Shao, Shuai, et al.
Published: (2024)
IPFed: Identity protected federated learning for user authentication
by: Kaga, Yosuke, et al.
Published: (2024)
by: Kaga, Yosuke, et al.
Published: (2024)
Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
by: Chung, Hyungjin, et al.
Published: (2025)
by: Chung, Hyungjin, et al.
Published: (2025)
Look, Remember and Reason: Grounded reasoning in videos with language models
by: Bhattacharyya, Apratim, et al.
Published: (2023)
by: Bhattacharyya, Apratim, et al.
Published: (2023)
Foul prediction with estimated poses from soccer broadcast video
by: Fang, Jiale, et al.
Published: (2024)
by: Fang, Jiale, et al.
Published: (2024)
CHAI: CacHe Attention Inference for text2video
by: Cherian, Joel Mathew, et al.
Published: (2026)
by: Cherian, Joel Mathew, et al.
Published: (2026)
Does SpatioTemporal information benefit Two video summarization benchmarks?
by: Ganesh, Aashutosh, et al.
Published: (2024)
by: Ganesh, Aashutosh, et al.
Published: (2024)
VisMin: Visual Minimal-Change Understanding
by: Awal, Rabiul, et al.
Published: (2024)
by: Awal, Rabiul, et al.
Published: (2024)
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
by: Xu, Xiang, et al.
Published: (2025)
by: Xu, Xiang, et al.
Published: (2025)
Continual Learning for Generative AI: From LLMs to MLLMs and Beyond
by: Guo, Haiyang, et al.
Published: (2025)
by: Guo, Haiyang, et al.
Published: (2025)
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
by: Yu, Sihyun, et al.
Published: (2024)
by: Yu, Sihyun, et al.
Published: (2024)
HGTS-Former: Hierarchical HyperGraph Transformer for Multivariate Time Series Analysis
by: Si, Hao, et al.
Published: (2025)
by: Si, Hao, et al.
Published: (2025)
Parabolic Continual Learning
by: Yang, Haoming, et al.
Published: (2025)
by: Yang, Haoming, et al.
Published: (2025)
SUTrack: Towards Simple and Unified Single Object Tracking
by: Chen, Xin, et al.
Published: (2024)
by: Chen, Xin, et al.
Published: (2024)
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
by: Wang, Alex N., et al.
Published: (2024)
by: Wang, Alex N., et al.
Published: (2024)
Similar Items
-
Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography
by: Böhi, Simon, et al.
Published: (2026) -
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
by: Xia, Jingming, et al.
Published: (2025) -
PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
by: Sadanandan, Binesh, et al.
Published: (2026) -
Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes
by: Davydov, Andrey, et al.
Published: (2024) -
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
by: Ding, Xin, et al.
Published: (2025)