Saved in:
| Main Authors: | Sushko, Peter, Bharadwaj, Ayana, Lim, Zhi Yang, Ilin, Vasily, Caffee, Ben, Chen, Dongping, Salehi, Mohammadreza, Hsieh, Cheng-Yu, Krishna, Ranjay |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.03629 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MultiRef: Controllable Image Generation with Multiple Visual References
by: Chen, Ruoxi, et al.
Published: (2025)
by: Chen, Ruoxi, et al.
Published: (2025)
VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
by: Yadav, Tanush, et al.
Published: (2026)
by: Yadav, Tanush, et al.
Published: (2026)
Score-based deterministic density sampling
by: Ilin, Vasily, et al.
Published: (2025)
by: Ilin, Vasily, et al.
Published: (2025)
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
by: Wang, Chenlong, et al.
Published: (2025)
by: Wang, Chenlong, et al.
Published: (2025)
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
by: Bigverdi, Mahtab, et al.
Published: (2024)
by: Bigverdi, Mahtab, et al.
Published: (2024)
The Hard Positive Truth about Vision-Language Compositionality
by: Kamath, Amita, et al.
Published: (2024)
by: Kamath, Amita, et al.
Published: (2024)
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
by: Salehi, Mohammadreza, et al.
Published: (2024)
by: Salehi, Mohammadreza, et al.
Published: (2024)
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
by: Zheng, Chenhao, et al.
Published: (2025)
by: Zheng, Chenhao, et al.
Published: (2025)
The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation
by: Braslavski, Pavel, et al.
Published: (2026)
by: Braslavski, Pavel, et al.
Published: (2026)
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning
by: Bandari, Abhinav, et al.
Published: (2024)
by: Bandari, Abhinav, et al.
Published: (2024)
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
by: Chuang, Yung-Sung, et al.
Published: (2024)
by: Chuang, Yung-Sung, et al.
Published: (2024)
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
by: Geng, Scott, et al.
Published: (2024)
by: Geng, Scott, et al.
Published: (2024)
Reinforced Visual Perception with Tools
by: Zhou, Zetong, et al.
Published: (2025)
by: Zhou, Zetong, et al.
Published: (2025)
Seeking and Updating with Live Visual Knowledge
by: Fu, Mingyang, et al.
Published: (2025)
by: Fu, Mingyang, et al.
Published: (2025)
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
by: Fan, Xiang, et al.
Published: (2025)
by: Fan, Xiang, et al.
Published: (2025)
Iterated Learning Improves Compositionality in Large Vision-Language Models
by: Zheng, Chenhao, et al.
Published: (2024)
by: Zheng, Chenhao, et al.
Published: (2024)
Semantic and Expressive Variation in Image Captions Across Languages
by: Ye, Andre, et al.
Published: (2023)
by: Ye, Andre, et al.
Published: (2023)
Agonistic Image Generation: Unsettling the Hegemony of Intention
by: Shaw, Andrew, et al.
Published: (2025)
by: Shaw, Andrew, et al.
Published: (2025)
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens?
by: Zhang, Tianyi, et al.
Published: (2026)
by: Zhang, Tianyi, et al.
Published: (2026)
PulseReddit: A Novel Reddit Dataset for Benchmarking MAS in High-Frequency Cryptocurrency Trading
by: Han, Qiuhan, et al.
Published: (2025)
by: Han, Qiuhan, et al.
Published: (2025)
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
by: Vani, Ankit, et al.
Published: (2024)
by: Vani, Ankit, et al.
Published: (2024)
MolmoPoint: Better Pointing for VLMs with Grounding Tokens
by: Clark, Christopher, et al.
Published: (2026)
by: Clark, Christopher, et al.
Published: (2026)
EditSleuth: A Dataset of Grounded Reasoning Chains for Image-Edit Forensics
by: Nguyen, Van-Loc, et al.
Published: (2026)
by: Nguyen, Van-Loc, et al.
Published: (2026)
TECCI: Tricky Edits of Collected and Curated Images
by: Agrawal, Aishwarya, et al.
Published: (2026)
by: Agrawal, Aishwarya, et al.
Published: (2026)
GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter
by: Bala, Aniruddha, et al.
Published: (2024)
by: Bala, Aniruddha, et al.
Published: (2024)
From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
by: Kursuncu, Ugur, et al.
Published: (2025)
by: Kursuncu, Ugur, et al.
Published: (2025)
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
by: Fan, Xiang, et al.
Published: (2024)
by: Fan, Xiang, et al.
Published: (2024)
Dynamic Template Selection for Output Token Generation Optimization: MLP-Based and Transformer Approaches
by: Yadavalli, Bharadwaj
Published: (2025)
by: Yadavalli, Bharadwaj
Published: (2025)
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
by: Ma, Zixian, et al.
Published: (2024)
by: Ma, Zixian, et al.
Published: (2024)
FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models
by: Samadi, Mohammadreza, et al.
Published: (2024)
by: Samadi, Mohammadreza, et al.
Published: (2024)
ImgEdit: A Unified Image Editing Dataset and Benchmark
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
DiT4Edit: Diffusion Transformer for Image Editing
by: Feng, Kunyu, et al.
Published: (2024)
by: Feng, Kunyu, et al.
Published: (2024)
TWeddit : A Dataset of Triggering Stories Predominantly Shared by Women on Reddit
by: Bandela, Shirlene Rose, et al.
Published: (2026)
by: Bandela, Shirlene Rose, et al.
Published: (2026)
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
by: Zhang, Zechuan, et al.
Published: (2025)
by: Zhang, Zechuan, et al.
Published: (2025)
Throwaway Accounts and Moderation on Reddit
by: Guo, Cheng, et al.
Published: (2025)
by: Guo, Cheng, et al.
Published: (2025)
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
by: Garg, Roopal, et al.
Published: (2024)
by: Garg, Roopal, et al.
Published: (2024)
The Moral Foundations Reddit Corpus
by: Trager, Jackson, et al.
Published: (2022)
by: Trager, Jackson, et al.
Published: (2022)
GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
by: Kamath, Amita, et al.
Published: (2025)
by: Kamath, Amita, et al.
Published: (2025)
Similar Items
-
MultiRef: Controllable Image Generation with Multiple Visual References
by: Chen, Ruoxi, et al.
Published: (2025) -
VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
by: Yadav, Tanush, et al.
Published: (2026) -
Score-based deterministic density sampling
by: Ilin, Vasily, et al.
Published: (2025) -
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
by: Wang, Chenlong, et al.
Published: (2025) -
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
by: Bigverdi, Mahtab, et al.
Published: (2024)