:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Wanhua, Meng, Zibin, Zhou, Jiawei, Wei, Donglai, Gan, Chuang, Pfister, Hanspeter
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.21411
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
by: He, Jixuan, et al.
Published: (2024)

LangSplat: 3D Language Gaussian Splatting
by: Qin, Minghan, et al.
Published: (2023)

CTRL-GS: Cascaded Temporal Residue Learning for 4D Gaussian Splatting
by: Hou, Karly, et al.
Published: (2025)

Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024)

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
by: Li, Wanhua, et al.
Published: (2025)

S$^3$-TTA: Scale-Style Selection for Test-Time Augmentation in Biomedical Image Segmentation
by: Xie, Kangxian, et al.
Published: (2023)

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
by: Li, Wanhua, et al.
Published: (2025)

LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images
by: Liu, Yilong, et al.
Published: (2026)

RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video
by: Wu, Chenyu, et al.
Published: (2026)

Joint-Task Regularization for Partially Labeled Multi-Task Learning
by: Nishi, Kento, et al.
Published: (2024)

TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images
by: Wan, Jia, et al.
Published: (2024)

RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
by: Liu, Yifan, et al.
Published: (2025)

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality
by: Shi, Zhiyi, et al.
Published: (2024)

Learning Gaze-aware Compositional GAN
by: Aranjuelo, Nerea, et al.
Published: (2024)

Frenet-Serret Frame-based Decomposition for Part Segmentation of 3D Curvilinear Structures
by: Gu, Leslie, et al.
Published: (2024)

Generalization of CNNs on Relational Reasoning with Bar Charts
by: Cui, Zhenxing, et al.
Published: (2025)

Understanding Graphical Perception in Data Visualization through Zero-shot Prompting of Vision-Language Models
by: Guo, Grace, et al.
Published: (2024)

Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior
by: Yang, Fuming, et al.
Published: (2025)

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025)

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025)

Ella: Embodied Social Agents with Lifelong Memory
by: Zhang, Hongxin, et al.
Published: (2025)

GeCo: Evaluating Geometric Consistency for Video Generation via Motion and Structure
by: Gu, Leslie, et al.
Published: (2025)

Multimodal Learning for Embryo Viability Prediction in Clinical IVF
by: Kim, Junsik, et al.
Published: (2024)

Improving generalization by mimicking the human visual diet
by: Madan, Spandan, et al.
Published: (2022)

In-distribution adversarial attacks on object recognition models using gradient-free search
by: Madan, Spandan, et al.
Published: (2021)

Skip and Skip: Segmenting Medical Images with Prompts
by: Chen, Jiawei, et al.
Published: (2024)

A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime
by: Jiang, Shuning, et al.
Published: (2025)

Sentinel: Embodied Cooperative Spatial Reasoning and Planning
by: Lin, Xiangye, et al.
Published: (2026)

Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning
by: Si, Chongjie, et al.
Published: (2024)

They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
by: Magid, Salma Abdel, et al.
Published: (2024)

When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations
by: Lalai, Harsh Nishant, et al.
Published: (2026)

DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
by: Shi, Zhiyi, et al.
Published: (2025)

3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
by: Liu, Dingning, et al.
Published: (2025)

Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models
by: Magid, Salma Abdel, et al.
Published: (2024)

Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
by: Kao, Shiu-hong, et al.
Published: (2025)

Medal S: Spatio-Textual Prompt Model for Medical Segmentation
by: Shi, Pengcheng, et al.
Published: (2025)

Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation
by: Wei, Zhikai, et al.
Published: (2024)

Bias at the End of the Score
by: Magid, Salma Abdel, et al.
Published: (2026)

EgoSocial: Benchmarking Proactive Intervention Ability of Omnimodal LLMs via Egocentric Social Interaction Perception
by: Wang, Xijun, et al.
Published: (2025)