:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Fixelle, Joshua
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.08710
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes
by: Fan, Weijia, et al.
Published: (2026)

More than a Moment: Towards Coherent Sequences of Audio Descriptions
by: Khandelwal, Eshika, et al.
Published: (2025)

Vision Transformers Need More Than Registers
by: Shi, Cheng, et al.
Published: (2026)

Opinion: Learning Intuitive Physics May Require More than Visual Data
by: Su, Ellen, et al.
Published: (2025)

More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation
by: Catalano, Nico, et al.
Published: (2024)

Nearly Solved? Robust Deepfake Detection Requires More than Visual Forensics
by: Levy, Guy, et al.
Published: (2024)

More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery
by: Dong, Wenzhen, et al.
Published: (2025)

There is More to Attention: Statistical Filtering Enhances Explanations in Vision Transformers
by: Ayyar, Meghna P, et al.
Published: (2025)

More than One Step at a Time: Designing Procedural Feedback for Non-visual Makeup Routines
by: Li, Franklin Mingzhe, et al.
Published: (2025)

Nodes Are Early, Edges Are Late: Probing Diagram Representations in Large Vision-Language Models
by: Yoshida, Haruto, et al.
Published: (2026)

More than Memes: A Multimodal Topic Modeling Approach to Conspiracy Theories on Telegram
by: Steffen, Elisabeth
Published: (2024)

AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
by: Xu, Yitao, et al.
Published: (2024)

SVD-ViT: Does SVD Make Vision Transformers Attend More to the Foreground?
by: Murata, Haruhiko, et al.
Published: (2026)

More Images, More Problems? A Controlled Analysis of VLM Failure Modes
by: Das, Anurag, et al.
Published: (2026)

Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
by: Surikuchi, Aditya K, et al.
Published: (2024)

Representation Alignment for Just Image Transformers is not Easier than You Think
by: Shin, Jaeyo, et al.
Published: (2026)

From Edges to Depth: Probing the Spatial Hierarchy in Vision Transformers
by: Sanghavi, Jainum
Published: (2026)

Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
by: Hao, Tianxiang, et al.
Published: (2023)

More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV
by: Ye, Kai, et al.
Published: (2025)

Adapted Center and Scale Prediction: More Stable and More Accurate
by: Wang, Wenhao, et al.
Published: (2020)

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
by: Feng, Mingqian, et al.
Published: (2024)

Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?
by: Hoak, Blaine, et al.
Published: (2025)

Leaner Transformers: More Heads, Less Depth
by: Saratchandran, Hemanth, et al.
Published: (2025)

Less is More: Skim Transformer for Light Field Image Super-resolution
by: Hu, Zeke Zexi, et al.
Published: (2024)

The More You See in 2D, the More You Perceive in 3D
by: Han, Xinyang, et al.
Published: (2024)

Larger than memory image processing
by: Sporring, Jon, et al.
Published: (2026)

Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black Individuals
by: Lee, Messi H. J., et al.
Published: (2024)

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models
by: Lai, Yingxin, et al.
Published: (2026)

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
by: Wang, Xiangchen, et al.
Published: (2025)

More Pictures Say More: Visual Intersection Network for Open Set Object Detection
by: Dong, Bingcheng, et al.
Published: (2024)

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
by: Garcia, Gonzalo Martin, et al.
Published: (2024)

Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
by: Eliopoulos, Nick John, et al.
Published: (2024)

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
by: Wang, Feng, et al.
Published: (2025)

Floating No More: Object-Ground Reconstruction from a Single Image
by: Man, Yunze, et al.
Published: (2024)

Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection
by: Li, Yiheng, et al.
Published: (2026)

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
by: Nguyen, Duy-Kien, et al.
Published: (2024)

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
by: Wu, Shaojin, et al.
Published: (2025)

A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback
by: Khaertdinov, Bulat, et al.
Published: (2025)

Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias
by: Li, Mingxiao, et al.
Published: (2025)

How to Learn More? Exploring Kolmogorov-Arnold Networks for Hyperspectral Image Classification
by: Jamali, Ali, et al.
Published: (2024)