Saved in:
| Main Authors: | Gutierrez, Sebastian, Hou, Irene, Lee, Jihye, Angelikas, Kenneth, Man, Owen, Mettille, Sophia, Prather, James, Denny, Paul, MacNeil, Stephen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.11088 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Evolving Usage of GenAI by Computing Students
by: Hou, Irene, et al.
Published: (2024)
by: Hou, Irene, et al.
Published: (2024)
Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming
by: Denny, Paul, et al.
Published: (2025)
by: Denny, Paul, et al.
Published: (2025)
SpatialMath: Spatial Comprehension-Infused Symbolic Reasoning for Mathematical Problem-Solving
by: Bajpai, Ashutosh, et al.
Published: (2026)
by: Bajpai, Ashutosh, et al.
Published: (2026)
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
by: Sanders, Kate, et al.
Published: (2024)
by: Sanders, Kate, et al.
Published: (2024)
The Effects of Generative AI on Computing Students' Help-Seeking Preferences
by: Hou, Irene, et al.
Published: (2024)
by: Hou, Irene, et al.
Published: (2024)
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)
by: Tong, Jingqi, et al.
Published: (2025)
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)
by: Dai, Song, et al.
Published: (2025)
treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)
Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)
by: Han, Yudong, et al.
Published: (2026)
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)
by: Liu, Sicheng, et al.
Published: (2024)
GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)
by: Hu, Xuran, et al.
Published: (2026)
Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
by: Annese, Luca, et al.
Published: (2025)
by: Annese, Luca, et al.
Published: (2025)
WIP: Identifying Tutorial Affordances for Interdisciplinary Learning Environments
by: Kim, Hannah, et al.
Published: (2024)
by: Kim, Hannah, et al.
Published: (2024)
MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)
by: Ji, Yiyan, et al.
Published: (2025)
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)
by: Li, Huibin, et al.
Published: (2025)
Data Organization Matters in Multimodal Instruction Tuning: A Controlled Study of Capability Trade-offs
by: Tang, Guowei
Published: (2026)
by: Tang, Guowei
Published: (2026)
RDPO: Real Data Preference Optimization for Physics Consistency Video Generation
by: Qian, Wenxu, et al.
Published: (2025)
by: Qian, Wenxu, et al.
Published: (2025)
BlindSight: Harnessing Sparsity for Efficient Vision-Language Models
by: Srikrishnan, Tharun Adithya, et al.
Published: (2025)
by: Srikrishnan, Tharun Adithya, et al.
Published: (2025)
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition
by: Alishzade, Nigar, et al.
Published: (2025)
by: Alishzade, Nigar, et al.
Published: (2025)
MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue
by: Deichler, Anna, et al.
Published: (2026)
by: Deichler, Anna, et al.
Published: (2026)
See-through: Single-image Layer Decomposition for Anime Characters
by: Lin, Jian, et al.
Published: (2026)
by: Lin, Jian, et al.
Published: (2026)
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2025)
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2025)
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
by: Parcalabescu, Letitia, et al.
Published: (2022)
by: Parcalabescu, Letitia, et al.
Published: (2022)
SelvaMask: Segmenting Trees in Tropical Forests and Beyond
by: Duguay, Simon-Olivier, et al.
Published: (2026)
by: Duguay, Simon-Olivier, et al.
Published: (2026)
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
by: Shaikh, Muhammad Bilal, et al.
Published: (2024)
by: Shaikh, Muhammad Bilal, et al.
Published: (2024)
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)
by: Luo, Wang, et al.
Published: (2025)
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)
by: Hou, Zhangcheng, et al.
Published: (2026)
ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
by: Bian, Zhipeng, et al.
Published: (2026)
by: Bian, Zhipeng, et al.
Published: (2026)
Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
by: Singh, Sahajpreet, et al.
Published: (2025)
by: Singh, Sahajpreet, et al.
Published: (2025)
PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model
by: Deng, Zhongchen, et al.
Published: (2024)
by: Deng, Zhongchen, et al.
Published: (2024)
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
by: Deichler, Anna, et al.
Published: (2025)
by: Deichler, Anna, et al.
Published: (2025)
K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology
by: Kim, Soyeon, et al.
Published: (2026)
by: Kim, Soyeon, et al.
Published: (2026)
EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique
by: Zhu, Chenglin, et al.
Published: (2025)
by: Zhu, Chenglin, et al.
Published: (2025)
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
by: Koukounas, Andreas, et al.
Published: (2024)
by: Koukounas, Andreas, et al.
Published: (2024)
Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)
by: Berjawi, Jad, et al.
Published: (2025)
PC-SNN: Predictive Coding-based Local Hebbian Plasticity Learning in Spiking Neural Networks
by: Wang, Haidong, et al.
Published: (2022)
by: Wang, Haidong, et al.
Published: (2022)
FlyMeThrough: Human-AI Collaborative 3D Indoor Mapping with Commodity Drones
by: Su, Xia, et al.
Published: (2025)
by: Su, Xia, et al.
Published: (2025)
Multimodal Action Quality Assessment
by: Zeng, Ling-An, et al.
Published: (2024)
by: Zeng, Ling-An, et al.
Published: (2024)
Similar Items
-
The Evolving Usage of GenAI by Computing Students
by: Hou, Irene, et al.
Published: (2024) -
Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming
by: Denny, Paul, et al.
Published: (2025) -
SpatialMath: Spatial Comprehension-Infused Symbolic Reasoning for Mathematical Problem-Solving
by: Bajpai, Ashutosh, et al.
Published: (2026) -
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
by: Sanders, Kate, et al.
Published: (2024) -
The Effects of Generative AI on Computing Students' Help-Seeking Preferences
by: Hou, Irene, et al.
Published: (2024)