:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Tianqin, Zhao, Junru, Jiang, Dunhan, Wu, Shenghao, Ramirez, Alan, Lee, Tai Sing
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.01201
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision
by: Li, Tianqin, et al.
Published: (2025)

Does resistance to style-transfer equal Global Shape Bias? Measuring network sensitivity to global shape configuration
by: Wen, Ziqi, et al.
Published: (2023)

From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models
by: Li, Tianqin, et al.
Published: (2025)

In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding
by: Li, Shenghao
Published: (2024)

Modeling Rapid Contextual Learning in the Visual Cortex with Fast-Weight Deep Autoencoder Networks
by: Li, Yue, et al.
Published: (2025)

Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks
by: Zalcher, Amit, et al.
Published: (2025)

Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
by: Teney, Damien, et al.
Published: (2025)

Smart Feature is What You Need
by: Hu, Zhaoxin, et al.
Published: (2024)

Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
by: Kao, Shiu-hong, et al.
Published: (2025)

Take Only What You Need: Rank Minimization as an Implicit Forgetting Regularizer in Continual Learning
by: Lu, Haodong, et al.
Published: (2024)

SeTformer is What You Need for Vision and Language
by: Shamsolmoali, Pourya, et al.
Published: (2024)

Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
by: Zhang, Yunqi, et al.
Published: (2024)

Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
by: Liaqat, Muhammad Irzam, et al.
Published: (2024)

Think Bright, Diffuse Nice: Enhancing T2I-ICL via Inductive-Bias Hint Instruction and Query Contrastive Decoding
by: Ma, Zhiyong, et al.
Published: (2026)

Learning to See What You Need: Gaze Attention for Multimodal Large Language Models
by: Song, Junha, et al.
Published: (2026)

Masked Generative Transformer Is What You Need for Image Editing
by: Chow, Wei, et al.
Published: (2026)

Lite-SAM Is Actually What You Need for Segment Everything
by: Fu, Jianhai, et al.
Published: (2024)

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
by: Zhang, Boqiang, et al.
Published: (2024)

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
by: Li, Yian, et al.
Published: (2024)

Perceptual Classifiers: Detecting Generative Images using Perceptual Features
by: Durbha, Krishna Srikar, et al.
Published: (2025)

What You Have is What You Track: Adaptive and Robust Multimodal Tracking
by: Tan, Yuedong, et al.
Published: (2025)

Multi-View Representation is What You Need for Point-Cloud Pre-Training
by: Yan, Siming, et al.
Published: (2023)

See It Before You Grab It: Deep Learning-based Action Anticipation in Basketball
by: Roy, Arnau Barrera, et al.
Published: (2025)

Low-Resolution Editing is All You Need for High-Resolution Editing
by: Lee, Junsung, et al.
Published: (2025)

A Strong Inductive Bias: Gzip for binary image classification
by: Scilipoti, Marco, et al.
Published: (2024)

Face-Voice Association with Inductive Bias for Maximum Class Separation
by: Moscati, Marta, et al.
Published: (2026)

ParameterNet: Parameters Are All You Need
by: Han, Kai, et al.
Published: (2023)

Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images
by: Nguyen, Tai D., et al.
Published: (2025)

Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
by: Yang, Haobo, et al.
Published: (2025)

Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment
by: Hu, Yang, et al.
Published: (2025)

Generating 360° Video is What You Need For a 3D Scene
by: Zhang, Zhaoyang, et al.
Published: (2025)

Taxes Are All You Need: Integration of Taxonomical Hierarchy Relationships into the Contrastive Loss
by: Kokilepersaud, Kiran, et al.
Published: (2024)

Label Critic: Design Data Before Models
by: Bassi, Pedro R. A. S., et al.
Published: (2024)

Emu3: Next-Token Prediction is All You Need
by: Wang, Xinlong, et al.
Published: (2024)

BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression
by: Jiang, Wei, et al.
Published: (2025)

NijiGAN: Transform What You See into Anime with Contrastive Semi-Supervised Learning and Neural Ordinary Differential Equations
by: Santoso, Kevin Putra, et al.
Published: (2024)

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs
by: Lin, Jiaping, et al.
Published: (2026)

LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression
by: Jiang, Wei, et al.
Published: (2024)

AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
by: Yang, Tao, et al.
Published: (2024)