:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kalkhorani, Vahid Ahmadi, Zhang, Qingquan, Song, Guanqun, Zhu, Ting
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2401.10254
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography
by: Böhi, Simon, et al.
Published: (2026)

Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
by: Xia, Jingming, et al.
Published: (2025)

PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
by: Sadanandan, Binesh, et al.
Published: (2026)

Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes
by: Davydov, Andrey, et al.
Published: (2024)

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
by: Ding, Xin, et al.
Published: (2025)

Revisiting Min-Max Optimization Problem in Adversarial Training
by: Ahmadi, Sina Hajer, et al.
Published: (2024)

Detection of retinal diseases using an accelerated reused convolutional network
by: Kasani, Amin Ahmadi, et al.
Published: (2025)

Hand bone age estimation using divide and conquer strategy and lightweight convolutional neural networks
by: Kasani, Amin Ahmadi, et al.
Published: (2024)

ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)

Enhancing Multi-Modal Video Sentiment Classification Through Semi-Supervised Clustering
by: Saadatinia, Mehrshad, et al.
Published: (2025)

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
by: Hoogeboom, Emiel, et al.
Published: (2026)

Urban Representation Learning for Fine-grained Economic Mapping: A Semi-supervised Graph-based Approach
by: Cao, Jinzhou, et al.
Published: (2025)

Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues
by: Zambare, Pallavi, et al.
Published: (2025)

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
by: Pai, Jonas, et al.
Published: (2025)

Supervised Contrastive Frame Aggregation for Video Representation Learning
by: Chowdhury, Shaif, et al.
Published: (2025)

NavFormer: IGRF Forecasting in Moving Coordinate Frames
by: Hwang, Yoontae, et al.
Published: (2026)

Beyond Instance Consistency: Investigating View Diversity in Self-supervised Learning
by: Qin, Huaiyuan, et al.
Published: (2025)

Cross-Modal Binary Attention: An Energy-Efficient Fusion Framework for Audio-Visual Learning
by: Saleh, Mohamed, et al.
Published: (2026)

Can Generative Models Improve Self-Supervised Representation Learning?
by: Ayromlou, Sana, et al.
Published: (2024)

Beyond a Single Signal: SPECTREG2, A Unified MultiExpert Anomaly Detector for Unknown Unknowns
by: Ray, Rahul D
Published: (2026)

Your Image is Secretly the Last Frame of a Pseudo Video
by: Chen, Wenlong, et al.
Published: (2024)

Detecting Neurodegenerative Diseases using Frame-Level Handwriting Embeddings
by: Laouedj, Sarah, et al.
Published: (2025)

The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA
by: Zhang, Hailiang, et al.
Published: (2024)

FrameBridge: Improving Image-to-Video Generation with Bridge Models
by: Wang, Yuji, et al.
Published: (2024)

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs
by: Sun, Hui, et al.
Published: (2025)

Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame Attention
by: Shao, Shuai, et al.
Published: (2024)

IPFed: Identity protected federated learning for user authentication
by: Kaga, Yosuke, et al.
Published: (2024)

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
by: Chung, Hyungjin, et al.
Published: (2025)

Look, Remember and Reason: Grounded reasoning in videos with language models
by: Bhattacharyya, Apratim, et al.
Published: (2023)

Foul prediction with estimated poses from soccer broadcast video
by: Fang, Jiale, et al.
Published: (2024)

CHAI: CacHe Attention Inference for text2video
by: Cherian, Joel Mathew, et al.
Published: (2026)

Does SpatioTemporal information benefit Two video summarization benchmarks?
by: Ganesh, Aashutosh, et al.
Published: (2024)

VisMin: Visual Minimal-Change Understanding
by: Awal, Rabiul, et al.
Published: (2024)

Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
by: Xu, Xiang, et al.
Published: (2025)

Continual Learning for Generative AI: From LLMs to MLLMs and Beyond
by: Guo, Haiyang, et al.
Published: (2025)

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
by: Yu, Sihyun, et al.
Published: (2024)

HGTS-Former: Hierarchical HyperGraph Transformer for Multivariate Time Series Analysis
by: Si, Hao, et al.
Published: (2025)

Parabolic Continual Learning
by: Yang, Haoming, et al.
Published: (2025)

SUTrack: Towards Simple and Unified Single Object Tracking
by: Chen, Xin, et al.
Published: (2024)

PooDLe: Pooled and dense self-supervised learning from naturalistic videos
by: Wang, Alex N., et al.
Published: (2024)