:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hosseini, Hesam, Mighan, Ghazal Hosseini, Afzali, Amirabbas, Amini, Sajjad, Houmansadr, Amir
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2411.12589
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Clustering Time Series Data with Gaussian Mixture Embeddings in a Graph Autoencoder Framework
by: Afzali, Amirabbas, et al.
Published: (2024)

LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
by: Khodabandeh, Borna, et al.
Published: (2025)

MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification
by: Amini, Sajjad, et al.
Published: (2024)

Real-Time Semantic Segmentation on FPGA for Autonomous Vehicles Using LMIINet with the CGRA4ML Framework
by: Hosseini, Amir Mohammad Khadem, et al.
Published: (2025)

GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
by: Parast, Aryan Yazdan, et al.
Published: (2025)

FairNVT: Improving Fairness via Noise Injection in Vision Transformers
by: Tang, Qiaoyue, et al.
Published: (2026)

Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation
by: Ghaznavi, Mahdi, et al.
Published: (2024)

SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability
by: Nasiri-Sarvi, Ali, et al.
Published: (2025)

Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data
by: Hosseini, Mohammad, et al.
Published: (2025)

Human-Centric Video Anomaly Detection Through Spatio-Temporal Pose Tokenization and Transformer
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)

CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery
by: Holm, Felix, et al.
Published: (2025)

Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation
by: Yang, Zhiyuan, et al.
Published: (2026)

GeoPos: A Minimal Positional Encoding for Enhanced Fine-Grained Details in Image Synthesis Using Convolutional Neural Networks
by: Hosseini, Mehran, et al.
Published: (2024)

AI-Powered Intracranial Hemorrhage Detection: A Co-Scale Convolutional Attention Model with Uncertainty-Based Fuzzy Integral Operator and Feature Screening
by: Chagahi, Mehdi Hosseini, et al.
Published: (2024)

WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion
by: Pakdamansavoji, Sajjad, et al.
Published: (2025)

Box6D : Zero-shot Category-level 6D Pose Estimation of Warehouse Boxes
by: Ma, Yintao, et al.
Published: (2025)

Enhancing Interpretability of Sparse Latent Representations with Class Information
by: Abiz, Farshad Sangari, et al.
Published: (2025)

GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction
by: Ghasemi, Narges, et al.
Published: (2025)

LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
by: Krojer, Benno, et al.
Published: (2026)

Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
by: Jeong, Wooseong, et al.
Published: (2025)

The Missing Point in Vision Transformers for Universal Image Segmentation
by: Shahabodini, Sajjad, et al.
Published: (2025)

From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Saptiotemporal Dynamics in Brain Signal Analysis
by: Hojjati, Amirabbas, et al.
Published: (2025)

A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection
by: Bodaghi, Morteza, et al.
Published: (2024)

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
by: Li, Wenhao, et al.
Published: (2023)

Taming Outlier Tokens in Diffusion Transformers
by: Wu, Xiaoyu, et al.
Published: (2026)

End-to-End Training for Unified Tokenization and Latent Denoising
by: Duggal, Shivam, et al.
Published: (2026)

Robustness Tokens: Towards Adversarial Robustness of Transformers
by: Pulfer, Brian, et al.
Published: (2025)

TRACER: Persistent Regularization for Robust Multimodal Finetuning
by: Asadollahzadeh, Hesam, et al.
Published: (2026)

BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles
by: Miangoleh, Seyed Ahmad Hosseini, et al.
Published: (2025)

Improving Interpretation Faithfulness for Vision Transformers
by: Hu, Lijie, et al.
Published: (2023)

Accelerating Diffusion Transformers with Token-wise Feature Caching
by: Zou, Chang, et al.
Published: (2024)

Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation
by: Kienzle, Daniel, et al.
Published: (2024)

Accurate and Efficient World Modeling with Masked Latent Transformers
by: Burchi, Maxime, et al.
Published: (2025)

Understanding Multi-View Transformers
by: Stary, Michal, et al.
Published: (2025)

Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization
by: Vali, Mohammad Hassan, et al.
Published: (2024)

Aligning Visual Contrastive learning models via Preference Optimization
by: Afzali, Amirabbas, et al.
Published: (2024)

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation
by: Sassoon, Jordan, et al.
Published: (2025)

TORE: Token Recycling in Vision Transformers for Efficient Active Visual Exploration
by: Olszewski, Jan, et al.
Published: (2023)

Point-RTD: Replaced Token Denoising for Pretraining Transformer Models on Point Clouds
by: Stone, Gunner, et al.
Published: (2025)

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
by: You, Haoran, et al.
Published: (2024)