:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gutierrez, Sebastian, Hou, Irene, Lee, Jihye, Angelikas, Kenneth, Man, Owen, Mettille, Sophia, Prather, James, Denny, Paul, MacNeil, Stephen
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition Computers and Society I.2.10; K.3.2
Online Access:	https://arxiv.org/abs/2412.11088
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Evolving Usage of GenAI by Computing Students
by: Hou, Irene, et al.
Published: (2024)

Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming
by: Denny, Paul, et al.
Published: (2025)

SpatialMath: Spatial Comprehension-Infused Symbolic Reasoning for Mathematical Problem-Solving
by: Bajpai, Ashutosh, et al.
Published: (2026)

TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
by: Sanders, Kate, et al.
Published: (2024)

The Effects of Generative AI on Computing Students' Help-Seeking Preferences
by: Hou, Irene, et al.
Published: (2024)

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)

PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)

treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
by: Burmeister, Josafat-Mattias, et al.
Published: (2025)

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning
by: Han, Yudong, et al.
Published: (2026)

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)

GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)

Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
by: Annese, Luca, et al.
Published: (2025)

WIP: Identifying Tutorial Affordances for Interdisciplinary Learning Environments
by: Kim, Hannah, et al.
Published: (2024)

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation
by: Wang, Han, et al.
Published: (2024)

MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

Data Organization Matters in Multimodal Instruction Tuning: A Controlled Study of Capability Trade-offs
by: Tang, Guowei
Published: (2026)

RDPO: Real Data Preference Optimization for Physics Consistency Video Generation
by: Qian, Wenxu, et al.
Published: (2025)

BlindSight: Harnessing Sparsity for Efficient Vision-Language Models
by: Srikrishnan, Tharun Adithya, et al.
Published: (2025)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition
by: Alishzade, Nigar, et al.
Published: (2025)

MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue
by: Deichler, Anna, et al.
Published: (2026)

See-through: Single-image Layer Decomposition for Anime Characters
by: Lin, Jian, et al.
Published: (2026)

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2025)

MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
by: Parcalabescu, Letitia, et al.
Published: (2022)

SelvaMask: Segmenting Trees in Tropical Forests and Beyond
by: Duguay, Simon-Olivier, et al.
Published: (2026)

From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
by: Shaikh, Muhammad Bilal, et al.
Published: (2024)

Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
by: Bian, Zhipeng, et al.
Published: (2026)

Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
by: Singh, Sahajpreet, et al.
Published: (2025)

PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model
by: Deng, Zhongchen, et al.
Published: (2024)

Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
by: Deichler, Anna, et al.
Published: (2025)

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology
by: Kim, Soyeon, et al.
Published: (2026)

EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique
by: Zhu, Chenglin, et al.
Published: (2025)

jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
by: Koukounas, Andreas, et al.
Published: (2024)

Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)

PC-SNN: Predictive Coding-based Local Hebbian Plasticity Learning in Spiking Neural Networks
by: Wang, Haidong, et al.
Published: (2022)

FlyMeThrough: Human-AI Collaborative 3D Indoor Mapping with Commodity Drones
by: Su, Xia, et al.
Published: (2025)

Multimodal Action Quality Assessment
by: Zeng, Ling-An, et al.
Published: (2024)