:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Alwazzan, Omnia, Patras, Ioannis, Slabaugh, Gregory
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.06339
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MOAB: Multi-Modal Outer Arithmetic Block For Fusion Of Histopathological Images And Genetic Data For Brain Tumor Grading
by: Alwazzan, Omnia, et al.
Published: (2024)

Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology
by: Alwazzan, Omnia, et al.
Published: (2024)

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion
by: Singh, Abhishek Kumar, et al.
Published: (2024)

BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology
by: Gallagher-Syed, Amaya, et al.
Published: (2025)

Self-Supervised Facial Representation Learning with Facial Region Awareness
by: Gao, Zheng, et al.
Published: (2024)

Prompting Visual-Language Models for Dynamic Facial Expression Recognition
by: Zhao, Zengqun, et al.
Published: (2023)

Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
by: Metaxas, Ioannis Maniadis, et al.
Published: (2024)

CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation
by: Khan, Abbas, et al.
Published: (2024)

CLIPCleaner: Cleaning Noisy Labels with CLIP
by: Feng, Chen, et al.
Published: (2024)

EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition
by: Foteinopoulou, Niki Maria, et al.
Published: (2023)

FairCoT: Enhancing Fairness in Text-to-Image Generation via Chain of Thought Reasoning with Multimodal Large Language Models
by: Sahili, Zahraa Al, et al.
Published: (2024)

CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection
by: Senadeera, Damith Chamalke, et al.
Published: (2026)

SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise
by: Feng, Chen, et al.
Published: (2021)

RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
by: Gaintseva, Tatiana, et al.
Published: (2024)

CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition
by: Sun, Zhonglin, et al.
Published: (2024)

Are CLIP features all you need for Universal Synthetic Image Origin Attribution?
by: Cioni, Dario, et al.
Published: (2024)

Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer
by: Zhao, Zengqun, et al.
Published: (2024)

MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance
by: Meng, Debin, et al.
Published: (2024)

Behaviour4All: in-the-wild Facial Behaviour Analysis Toolkit
by: Kollias, Dimitrios, et al.
Published: (2024)

Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
by: Cao, Yu, et al.
Published: (2025)

SuperCap: Multi-resolution Superpixel-based Image Captioning
by: Senior, Henry, et al.
Published: (2025)

Aligned Unsupervised Pretraining of Object Detectors with Self-training
by: Metaxas, Ioannis Maniadis, et al.
Published: (2023)

FairJudge: Abstention-Aware Multimodal Judges for Fairness and Alignment Evaluation in Text-to-Image Models
by: Sahili, Zahraa Al, et al.
Published: (2025)

VidCtx: Context-aware Video Question Answering with Image Models
by: Goulas, Andreas, et al.
Published: (2024)

P-TAME: Explain Any Image Classifier with Trained Perturbations
by: Ntrougkas, Mariano V., et al.
Published: (2025)

CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
by: Senadeera, Damith Chamalke, et al.
Published: (2024)

A low complexity contextual stacked ensemble-learning approach for pedestrian intent prediction
by: Chiang, Chia-Yen, et al.
Published: (2024)

Flatten: Video Action Recognition is an Image Classification task
by: Chen, Junlin, et al.
Published: (2024)

UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification
by: Chen, Taixi, et al.
Published: (2025)

One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space
by: Bounareli, Stella, et al.
Published: (2024)

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
by: Sun, Zhonglin, et al.
Published: (2024)

XFMamba: Cross-Fusion Mamba for Multi-View Medical Image Classification
by: Zheng, Xiaoyu, et al.
Published: (2025)

RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
by: Catley-Chandar, Sibi, et al.
Published: (2024)

STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
by: Yang, Xiaohang, et al.
Published: (2025)

FlattenGPT: Depth Compression for Transformer with Layer Flattening
by: Xu, Ruihan, et al.
Published: (2026)

Graph Neural Networks in Vision-Language Image Understanding: A Survey
by: Senior, Henry, et al.
Published: (2023)

Rethinking the Zigzag Flattening for Image Reading
by: Zhao, Qingsong, et al.
Published: (2022)

Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation
by: Zhao, Zengqun, et al.
Published: (2026)

CycleCap: Improving VLMs Captioning Performance via Self-Supervised Cycle Consistency Fine-Tuning
by: Krestenitis, Marios, et al.
Published: (2026)

Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss
by: Fu, Qifan, et al.
Published: (2024)