Saved in:
| Main Authors: | Dutta, Siddhant, Singh, Hemant, Shankhdhar, Kalpita, Iyer, Sridhar |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.04708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words
by: Bao, Yujia, et al.
Published: (2023)
by: Bao, Yujia, et al.
Published: (2023)
Is an Image Also Worth 16x16=256 Superpixels? A Framework for Attentional Image Classification
by: Avelar, Pedro Henrique da Costa, et al.
Published: (2026)
by: Avelar, Pedro Henrique da Costa, et al.
Published: (2026)
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
by: Nguyen, Duy-Kien, et al.
Published: (2024)
by: Nguyen, Duy-Kien, et al.
Published: (2024)
How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion
by: Daras, Giannis, et al.
Published: (2024)
by: Daras, Giannis, et al.
Published: (2024)
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
by: Cui, Zichen Jeff, et al.
Published: (2024)
by: Cui, Zichen Jeff, et al.
Published: (2024)
MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks
by: Iyer, Ram S, et al.
Published: (2025)
by: Iyer, Ram S, et al.
Published: (2025)
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
by: Shen, Junxiao, et al.
Published: (2024)
by: Shen, Junxiao, et al.
Published: (2024)
SHaSaM: Submodular Hard Sample Mining for Fair Facial Attribute Recognition
by: Majee, Anay, et al.
Published: (2026)
by: Majee, Anay, et al.
Published: (2026)
LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms
by: Iyer, Krithika, et al.
Published: (2024)
by: Iyer, Krithika, et al.
Published: (2024)
Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection
by: Majee, Anay, et al.
Published: (2025)
by: Majee, Anay, et al.
Published: (2025)
Revolutionizing Wildfire Detection with Convolutional Neural Networks: A VGG16 Model Approach
by: Malladi, Lakshmi Aishwarya, et al.
Published: (2025)
by: Malladi, Lakshmi Aishwarya, et al.
Published: (2025)
LinFusion: 1 GPU, 1 Minute, 16K Image
by: Liu, Songhua, et al.
Published: (2024)
by: Liu, Songhua, et al.
Published: (2024)
On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks
by: Plebe, Alice, et al.
Published: (2020)
by: Plebe, Alice, et al.
Published: (2020)
ArcGate: Adaptive Arctangent Gated Activation
by: Bhattacharya, Avik, et al.
Published: (2026)
by: Bhattacharya, Avik, et al.
Published: (2026)
Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data
by: Dutta, Parag, et al.
Published: (2025)
by: Dutta, Parag, et al.
Published: (2025)
SCoRe: Submodular Combinatorial Representation Learning
by: Majee, Anay, et al.
Published: (2023)
by: Majee, Anay, et al.
Published: (2023)
Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification
by: Sridhar, Aditya
Published: (2024)
by: Sridhar, Aditya
Published: (2024)
Words That Make Language Models Perceive
by: Wang, Sophie L., et al.
Published: (2025)
by: Wang, Sophie L., et al.
Published: (2025)
MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration
by: Karanam, Mokshagna Sai Teja, et al.
Published: (2025)
by: Karanam, Mokshagna Sai Teja, et al.
Published: (2025)
Learning Conditional Invariances through Non-Commutativity
by: Chaudhuri, Abhra, et al.
Published: (2024)
by: Chaudhuri, Abhra, et al.
Published: (2024)
Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models
by: Qu, Helen, et al.
Published: (2025)
by: Qu, Helen, et al.
Published: (2025)
CHAI: CacHe Attention Inference for text2video
by: Cherian, Joel Mathew, et al.
Published: (2026)
by: Cherian, Joel Mathew, et al.
Published: (2026)
Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion
by: Haviv, Adi, et al.
Published: (2024)
by: Haviv, Adi, et al.
Published: (2024)
Reconstructing facade details using MLS point clouds and Bag-of-Words approach
by: Froech, Thomas, et al.
Published: (2024)
by: Froech, Thomas, et al.
Published: (2024)
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
by: Kang, Minseok, et al.
Published: (2025)
by: Kang, Minseok, et al.
Published: (2025)
TACO-Net: Topological Signatures Triumph in 3D Object Classification
by: Ghosh, Anirban, et al.
Published: (2025)
by: Ghosh, Anirban, et al.
Published: (2025)
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
by: Nagaraju, Sanath Budakegowdanadoddi, et al.
Published: (2024)
by: Nagaraju, Sanath Budakegowdanadoddi, et al.
Published: (2024)
Interpreting Neurons in Deep Vision Networks with Language Models
by: Bai, Nicholas, et al.
Published: (2024)
by: Bai, Nicholas, et al.
Published: (2024)
Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation
by: Dutta, Tapas K., et al.
Published: (2025)
by: Dutta, Tapas K., et al.
Published: (2025)
BodyGPS: Anatomical Positioning System
by: Yerebakan, Halid Ziya, et al.
Published: (2025)
by: Yerebakan, Halid Ziya, et al.
Published: (2025)
Attention based End to end network for Offline Writer Identification on Word level data
by: Kumar, Vineet, et al.
Published: (2024)
by: Kumar, Vineet, et al.
Published: (2024)
Crossmodal Knowledge Distillation with WordNet-Relaxed Text Embeddings for Robust Image Classification
by: Guo, Chenqi, et al.
Published: (2025)
by: Guo, Chenqi, et al.
Published: (2025)
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
by: Koishigarina, Darina, et al.
Published: (2025)
by: Koishigarina, Darina, et al.
Published: (2025)
SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation
by: Hirose, Noriaki, et al.
Published: (2024)
by: Hirose, Noriaki, et al.
Published: (2024)
Visual Analysis of Prediction Uncertainty in Neural Networks for Deep Image Synthesis
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Human Fall Detection using Transfer Learning-based 3D CNN
by: Alam, Ekram, et al.
Published: (2025)
by: Alam, Ekram, et al.
Published: (2025)
De-biasing facial detection system using VAE
by: Kandge, Vedant V., et al.
Published: (2022)
by: Kandge, Vedant V., et al.
Published: (2022)
Beyond Words: AuralLLM and SignMST-C for Sign Language Production and Bidirectional Accessibility
by: Li, Yulong, et al.
Published: (2025)
by: Li, Yulong, et al.
Published: (2025)
Farm-Level, In-Season Crop Identification for India
by: Deshpande, Ishan, et al.
Published: (2025)
by: Deshpande, Ishan, et al.
Published: (2025)
A More Word-like Image Tokenization for MLLMs
by: Lee, Hyun, et al.
Published: (2026)
by: Lee, Hyun, et al.
Published: (2026)
Similar Items
-
Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words
by: Bao, Yujia, et al.
Published: (2023) -
Is an Image Also Worth 16x16=256 Superpixels? A Framework for Attentional Image Classification
by: Avelar, Pedro Henrique da Costa, et al.
Published: (2026) -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
by: Nguyen, Duy-Kien, et al.
Published: (2024) -
How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion
by: Daras, Giannis, et al.
Published: (2024) -
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
by: Cui, Zichen Jeff, et al.
Published: (2024)