:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Salgado, Alberto G. Rodriguez
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.26839
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What Makes a Maze Look Like a Maze?
by: Hsu, Joy, et al.
Published: (2024)

From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding
by: Xiang, Wenzhao, et al.
Published: (2026)

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training
by: Zhang, Jiacheng, et al.
Published: (2024)

From Pixels to Components: Eigenvector Masking for Visual Representation Learning
by: Bizeul, Alice, et al.
Published: (2025)

Weakly Supervised Pixel-Level Annotation with Visual Interpretability
by: Nasir, Basma, et al.
Published: (2025)

Structure over Pixels: Learning Variable-Length Visual Programs
by: Wyrwiński, Piotr, et al.
Published: (2026)

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?
by: Asadi, Nader, et al.
Published: (2024)

From Pixels to Patches: Pooling Strategies for Earth Embeddings
by: Corley, Isaac, et al.
Published: (2026)

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
by: Vasconcelos, Cristina N., et al.
Published: (2024)

Improving Accuracy and Generalization for Efficient Visual Tracking
by: Zaveri, Ram, et al.
Published: (2024)

From Pixels to Graphs: Deep Graph-Level Anomaly Detection on Dermoscopic Images
by: Xu, Dehn, et al.
Published: (2025)

Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
by: Golovanevsky, Michal, et al.
Published: (2025)

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
by: Li, Hong, et al.
Published: (2024)

From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
by: Vandenhirtz, Moritz, et al.
Published: (2025)

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
by: NVIDIA, et al.
Published: (2024)

Don't Show Pixels, Show Cues: Unlocking Visual Tool Reasoning in Language Models via Perception Programs
by: Janjua, Muhammad Kamran, et al.
Published: (2026)

From Pixels to Prose: A Large Dataset of Dense Image Captions
by: Singla, Vasu, et al.
Published: (2024)

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
by: Chen, Yiming, et al.
Published: (2025)

Rethinking Generative Image Pretraining: How Far Are We From Scaling Up Next-Pixel Prediction?
by: Yan, Xinchen, et al.
Published: (2025)

Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers
by: Slack, Dean L, et al.
Published: (2025)

ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos
by: Rehman, Mohammad Zia Ur, et al.
Published: (2025)

Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?
by: Haruna, Yunusa, et al.
Published: (2026)

Identifying Important Group of Pixels using Interactions
by: Sumiyasu, Kosuke, et al.
Published: (2024)

Pixels to Prose: Understanding the art of Image Captioning
by: Singh, Hrishikesh, et al.
Published: (2024)

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing
by: DeAndres-Tame, Ivan, et al.
Published: (2024)

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
by: You, Zuyao, et al.
Published: (2025)

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
by: Wozniak, Maciej K., et al.
Published: (2025)

When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective
by: Tsao, Hsi-Ai, et al.
Published: (2024)

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
by: Nguyen, Duy-Kien, et al.
Published: (2024)

Pixel-level Counterfactual Contrastive Learning for Medical Image Segmentation
by: Lafargue-Hauret, Marceau, et al.
Published: (2026)

From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing
by: Sun, Xintian, et al.
Published: (2024)

On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks
by: Neuhaus, Yannic, et al.
Published: (2026)

From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping
by: Long, Judy, et al.
Published: (2025)

From pre-training to downstream performance: Does domain-specific pre-training make sense?
by: Krones, Felix
Published: (2026)

Latent Forcing: Reordering the Diffusion Trajectory for Pixel-Space Image Generation
by: Baade, Alan, et al.
Published: (2026)

FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation
by: Lin, Mingfeng, et al.
Published: (2026)

Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation
by: De Vita, Michele, et al.
Published: (2024)

Improving Out-of-Domain Robustness with Targeted Augmentation in Frequency and Pixel Spaces
by: Wang, Ruoqi, et al.
Published: (2025)

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
by: Namekata, Koichi, et al.
Published: (2024)

CoordFlow: Coordinate Flow for Pixel-wise Neural Video Representation
by: Silver, Daniel, et al.
Published: (2025)