:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dutta, Siddhant, Singh, Hemant, Shankhdhar, Kalpita, Iyer, Sridhar
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2407.04708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words
by: Bao, Yujia, et al.
Published: (2023)

Is an Image Also Worth 16x16=256 Superpixels? A Framework for Attentional Image Classification
by: Avelar, Pedro Henrique da Costa, et al.
Published: (2026)

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
by: Nguyen, Duy-Kien, et al.
Published: (2024)

How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion
by: Daras, Giannis, et al.
Published: (2024)

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
by: Cui, Zichen Jeff, et al.
Published: (2024)

MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks
by: Iyer, Ram S, et al.
Published: (2025)

Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
by: Shen, Junxiao, et al.
Published: (2024)

SHaSaM: Submodular Hard Sample Mining for Fair Facial Attribute Recognition
by: Majee, Anay, et al.
Published: (2026)

LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms
by: Iyer, Krithika, et al.
Published: (2024)

Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection
by: Majee, Anay, et al.
Published: (2025)

Revolutionizing Wildfire Detection with Convolutional Neural Networks: A VGG16 Model Approach
by: Malladi, Lakshmi Aishwarya, et al.
Published: (2025)

LinFusion: 1 GPU, 1 Minute, 16K Image
by: Liu, Songhua, et al.
Published: (2024)

On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks
by: Plebe, Alice, et al.
Published: (2020)

ArcGate: Adaptive Arctangent Gated Activation
by: Bhattacharya, Avik, et al.
Published: (2026)

Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data
by: Dutta, Parag, et al.
Published: (2025)

SCoRe: Submodular Combinatorial Representation Learning
by: Majee, Anay, et al.
Published: (2023)

Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification
by: Sridhar, Aditya
Published: (2024)

Words That Make Language Models Perceive
by: Wang, Sophie L., et al.
Published: (2025)

MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration
by: Karanam, Mokshagna Sai Teja, et al.
Published: (2025)

Learning Conditional Invariances through Non-Commutativity
by: Chaudhuri, Abhra, et al.
Published: (2024)

Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models
by: Qu, Helen, et al.
Published: (2025)

CHAI: CacHe Attention Inference for text2video
by: Cherian, Joel Mathew, et al.
Published: (2026)

Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion
by: Haviv, Adi, et al.
Published: (2024)

Reconstructing facade details using MLS point clouds and Bag-of-Words approach
by: Froech, Thomas, et al.
Published: (2024)

Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
by: Kang, Minseok, et al.
Published: (2025)

TACO-Net: Topological Signatures Triumph in 3D Object Classification
by: Ghosh, Anirban, et al.
Published: (2025)

A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
by: Nagaraju, Sanath Budakegowdanadoddi, et al.
Published: (2024)

Interpreting Neurons in Deep Vision Networks with Language Models
by: Bai, Nicholas, et al.
Published: (2024)

Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation
by: Dutta, Tapas K., et al.
Published: (2025)

BodyGPS: Anatomical Positioning System
by: Yerebakan, Halid Ziya, et al.
Published: (2025)

Attention based End to end network for Offline Writer Identification on Word level data
by: Kumar, Vineet, et al.
Published: (2024)

Crossmodal Knowledge Distillation with WordNet-Relaxed Text Embeddings for Robust Image Classification
by: Guo, Chenqi, et al.
Published: (2025)

CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
by: Koishigarina, Darina, et al.
Published: (2025)

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation
by: Hirose, Noriaki, et al.
Published: (2024)

Visual Analysis of Prediction Uncertainty in Neural Networks for Deep Image Synthesis
by: Dutta, Soumya, et al.
Published: (2024)

Human Fall Detection using Transfer Learning-based 3D CNN
by: Alam, Ekram, et al.
Published: (2025)

De-biasing facial detection system using VAE
by: Kandge, Vedant V., et al.
Published: (2022)

Beyond Words: AuralLLM and SignMST-C for Sign Language Production and Bidirectional Accessibility
by: Li, Yulong, et al.
Published: (2025)

Farm-Level, In-Season Crop Identification for India
by: Deshpande, Ishan, et al.
Published: (2025)

A More Word-like Image Tokenization for MLLMs
by: Lee, Hyun, et al.
Published: (2026)