Saved in:
| Main Authors: | Li, Wanhua, Meng, Zibin, Zhou, Jiawei, Wei, Donglai, Gan, Chuang, Pfister, Hanspeter |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.21411 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
by: He, Jixuan, et al.
Published: (2024)
by: He, Jixuan, et al.
Published: (2024)
LangSplat: 3D Language Gaussian Splatting
by: Qin, Minghan, et al.
Published: (2023)
by: Qin, Minghan, et al.
Published: (2023)
CTRL-GS: Cascaded Temporal Residue Learning for 4D Gaussian Splatting
by: Hou, Karly, et al.
Published: (2025)
by: Hou, Karly, et al.
Published: (2025)
Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024)
by: Ding, Tong, et al.
Published: (2024)
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
by: Li, Wanhua, et al.
Published: (2025)
by: Li, Wanhua, et al.
Published: (2025)
S$^3$-TTA: Scale-Style Selection for Test-Time Augmentation in Biomedical Image Segmentation
by: Xie, Kangxian, et al.
Published: (2023)
by: Xie, Kangxian, et al.
Published: (2023)
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
by: Li, Wanhua, et al.
Published: (2025)
by: Li, Wanhua, et al.
Published: (2025)
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images
by: Liu, Yilong, et al.
Published: (2026)
by: Liu, Yilong, et al.
Published: (2026)
RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video
by: Wu, Chenyu, et al.
Published: (2026)
by: Wu, Chenyu, et al.
Published: (2026)
Joint-Task Regularization for Partially Labeled Multi-Task Learning
by: Nishi, Kento, et al.
Published: (2024)
by: Nishi, Kento, et al.
Published: (2024)
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images
by: Wan, Jia, et al.
Published: (2024)
by: Wan, Jia, et al.
Published: (2024)
RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
by: Liu, Yifan, et al.
Published: (2025)
by: Liu, Yifan, et al.
Published: (2025)
MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality
by: Shi, Zhiyi, et al.
Published: (2024)
by: Shi, Zhiyi, et al.
Published: (2024)
Learning Gaze-aware Compositional GAN
by: Aranjuelo, Nerea, et al.
Published: (2024)
by: Aranjuelo, Nerea, et al.
Published: (2024)
Frenet-Serret Frame-based Decomposition for Part Segmentation of 3D Curvilinear Structures
by: Gu, Leslie, et al.
Published: (2024)
by: Gu, Leslie, et al.
Published: (2024)
Generalization of CNNs on Relational Reasoning with Bar Charts
by: Cui, Zhenxing, et al.
Published: (2025)
by: Cui, Zhenxing, et al.
Published: (2025)
Understanding Graphical Perception in Data Visualization through Zero-shot Prompting of Vision-Language Models
by: Guo, Grace, et al.
Published: (2024)
by: Guo, Grace, et al.
Published: (2024)
Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior
by: Yang, Fuming, et al.
Published: (2025)
by: Yang, Fuming, et al.
Published: (2025)
AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance
by: Xu, Tianling, et al.
Published: (2025)
by: Xu, Tianling, et al.
Published: (2025)
Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
by: Liu, Yifan, et al.
Published: (2025)
by: Liu, Yifan, et al.
Published: (2025)
Ella: Embodied Social Agents with Lifelong Memory
by: Zhang, Hongxin, et al.
Published: (2025)
by: Zhang, Hongxin, et al.
Published: (2025)
GeCo: Evaluating Geometric Consistency for Video Generation via Motion and Structure
by: Gu, Leslie, et al.
Published: (2025)
by: Gu, Leslie, et al.
Published: (2025)
Multimodal Learning for Embryo Viability Prediction in Clinical IVF
by: Kim, Junsik, et al.
Published: (2024)
by: Kim, Junsik, et al.
Published: (2024)
Improving generalization by mimicking the human visual diet
by: Madan, Spandan, et al.
Published: (2022)
by: Madan, Spandan, et al.
Published: (2022)
In-distribution adversarial attacks on object recognition models using gradient-free search
by: Madan, Spandan, et al.
Published: (2021)
by: Madan, Spandan, et al.
Published: (2021)
Skip and Skip: Segmenting Medical Images with Prompts
by: Chen, Jiawei, et al.
Published: (2024)
by: Chen, Jiawei, et al.
Published: (2024)
A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime
by: Jiang, Shuning, et al.
Published: (2025)
by: Jiang, Shuning, et al.
Published: (2025)
Sentinel: Embodied Cooperative Spatial Reasoning and Planning
by: Lin, Xiangye, et al.
Published: (2026)
by: Lin, Xiangye, et al.
Published: (2026)
Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning
by: Si, Chongjie, et al.
Published: (2024)
by: Si, Chongjie, et al.
Published: (2024)
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
by: Magid, Salma Abdel, et al.
Published: (2024)
by: Magid, Salma Abdel, et al.
Published: (2024)
When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations
by: Lalai, Harsh Nishant, et al.
Published: (2026)
by: Lalai, Harsh Nishant, et al.
Published: (2026)
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
by: Shi, Zhiyi, et al.
Published: (2025)
by: Shi, Zhiyi, et al.
Published: (2025)
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
by: Liu, Dingning, et al.
Published: (2025)
by: Liu, Dingning, et al.
Published: (2025)
Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models
by: Magid, Salma Abdel, et al.
Published: (2024)
by: Magid, Salma Abdel, et al.
Published: (2024)
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
by: Kao, Shiu-hong, et al.
Published: (2025)
by: Kao, Shiu-hong, et al.
Published: (2025)
Medal S: Spatio-Textual Prompt Model for Medical Segmentation
by: Shi, Pengcheng, et al.
Published: (2025)
by: Shi, Pengcheng, et al.
Published: (2025)
Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation
by: Wei, Zhikai, et al.
Published: (2024)
by: Wei, Zhikai, et al.
Published: (2024)
Bias at the End of the Score
by: Magid, Salma Abdel, et al.
Published: (2026)
by: Magid, Salma Abdel, et al.
Published: (2026)
EgoSocial: Benchmarking Proactive Intervention Ability of Omnimodal LLMs via Egocentric Social Interaction Perception
by: Wang, Xijun, et al.
Published: (2025)
by: Wang, Xijun, et al.
Published: (2025)
Similar Items
-
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
by: He, Jixuan, et al.
Published: (2024) -
LangSplat: 3D Language Gaussian Splatting
by: Qin, Minghan, et al.
Published: (2023) -
CTRL-GS: Cascaded Temporal Residue Learning for 4D Gaussian Splatting
by: Hou, Karly, et al.
Published: (2025) -
Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024) -
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)