Saved in:
| Main Authors: | Chakraborty, Souradeep, Wei, Zijun, Kelton, Conor, Ahn, Seoyoung, Balasubramanian, Aruna, Zelinsky, Gregory J., Samaras, Dimitris |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.02439 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Look Hear: Gaze Prediction for Speech-directed Human Attention
by: Mondal, Sounak, et al.
Published: (2024)
by: Mondal, Sounak, et al.
Published: (2024)
Self-supervised co-salient object detection via feature correspondence at multiple scales
by: Chakraborty, Souradeep, et al.
Published: (2024)
by: Chakraborty, Souradeep, et al.
Published: (2024)
Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers
by: Yang, Zhibo, et al.
Published: (2023)
by: Yang, Zhibo, et al.
Published: (2023)
Generating metamers of human scene understanding
by: Raina, Ritik, et al.
Published: (2026)
by: Raina, Ritik, et al.
Published: (2026)
Measuring and Predicting Where and When Pathologists Focus their Visual Attention while Grading Whole Slide Images of Cancer
by: Chakraborty, Souradeep, et al.
Published: (2025)
by: Chakraborty, Souradeep, et al.
Published: (2025)
Few-shot Personalized Scanpath Prediction
by: Xue, Ruoyu, et al.
Published: (2025)
by: Xue, Ruoyu, et al.
Published: (2025)
Personalized Image Descriptions from Attention Sequences
by: Xue, Ruoyu, et al.
Published: (2025)
by: Xue, Ruoyu, et al.
Published: (2025)
Decoding the visual attention of pathologists to reveal their level of expertise
by: Chakraborty, Souradeep, et al.
Published: (2024)
by: Chakraborty, Souradeep, et al.
Published: (2024)
Human-like Object Grouping in Self-supervised Vision Transformers
by: Adeli, Hossein, et al.
Published: (2026)
by: Adeli, Hossein, et al.
Published: (2026)
Talking Head Generation via AU-Guided Landmark Prediction
by: Chang, Shao-Yu, et al.
Published: (2025)
by: Chang, Shao-Yu, et al.
Published: (2025)
One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
by: Wu, Haoyu, et al.
Published: (2025)
by: Wu, Haoyu, et al.
Published: (2025)
TopoDiffusionNet: A Topology-aware Diffusion Model
by: Gupta, Saumya, et al.
Published: (2024)
by: Gupta, Saumya, et al.
Published: (2024)
Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
by: Howlader, Prantik, et al.
Published: (2024)
by: Howlader, Prantik, et al.
Published: (2024)
Assessing Sample Quality via the Latent Space of Generative Models
by: Xu, Jingyi, et al.
Published: (2024)
by: Xu, Jingyi, et al.
Published: (2024)
Dual-Foundation Models for Unsupervised Domain Adaptation
by: Cheon, Yerin, et al.
Published: (2026)
by: Cheon, Yerin, et al.
Published: (2026)
MI-NeRF: Learning a Single Face NeRF from Multiple Identities
by: Chatziagapi, Aggelina, et al.
Published: (2024)
by: Chatziagapi, Aggelina, et al.
Published: (2024)
MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
by: Chatziagapi, Aggelina, et al.
Published: (2024)
by: Chatziagapi, Aggelina, et al.
Published: (2024)
Fast constrained sampling in pre-trained diffusion models
by: Graikos, Alexandros, et al.
Published: (2024)
by: Graikos, Alexandros, et al.
Published: (2024)
Learning 3D Reconstruction with Priors in Test Time
by: Zhou, Lei, et al.
Published: (2026)
by: Zhou, Lei, et al.
Published: (2026)
Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
by: Howlader, Prantik, et al.
Published: (2024)
by: Howlader, Prantik, et al.
Published: (2024)
Importance-Based Token Merging for Efficient Image and Video Generation
by: Wu, Haoyu, et al.
Published: (2024)
by: Wu, Haoyu, et al.
Published: (2024)
JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation
by: Chakkera, Sai Tanmay Reddy, et al.
Published: (2024)
by: Chakkera, Sai Tanmay Reddy, et al.
Published: (2024)
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
by: Chatziagapi, Aggelina, et al.
Published: (2025)
by: Chatziagapi, Aggelina, et al.
Published: (2025)
VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos
by: Rivero, Alfredo, et al.
Published: (2024)
by: Rivero, Alfredo, et al.
Published: (2024)
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
by: Le, Minh-Quan, et al.
Published: (2025)
by: Le, Minh-Quan, et al.
Published: (2025)
PathSegDiff: Pathology Segmentation using Diffusion model representations
by: Danisetty, Sachin Kumar, et al.
Published: (2025)
by: Danisetty, Sachin Kumar, et al.
Published: (2025)
LBMamba: Locally Bi-directional Mamba
by: Zhang, Jingwei, et al.
Published: (2025)
by: Zhang, Jingwei, et al.
Published: (2025)
Embedding Physical Reasoning into Diffusion-Based Shadow Generation
by: Hu, Shilin, et al.
Published: (2025)
by: Hu, Shilin, et al.
Published: (2025)
Cast and Attached Shadow Detection via Iterative Light and Geometry Reasoning
by: Hu, Shilin, et al.
Published: (2025)
by: Hu, Shilin, et al.
Published: (2025)
Shadow Removal Refinement via Material-Consistent Shadow Edges
by: Hu, Shilin, et al.
Published: (2024)
by: Hu, Shilin, et al.
Published: (2024)
GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation
by: Tomar, Snehal Singh, et al.
Published: (2025)
by: Tomar, Snehal Singh, et al.
Published: (2025)
ClassifyViStA:WCE Classification with Visual understanding through Segmentation and Attention
by: Balasubramanian, S., et al.
Published: (2024)
by: Balasubramanian, S., et al.
Published: (2024)
Phrase-Instance Alignment for Generalized Referring Segmentation
by: Nguyen, E-Ro, et al.
Published: (2024)
by: Nguyen, E-Ro, et al.
Published: (2024)
Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields
by: Yang, Yixiong, et al.
Published: (2024)
by: Yang, Yixiong, et al.
Published: (2024)
MonoLoss: A Training Objective for Interpretable Monosemantic Representations
by: Nasiri-Sarvi, Ali, et al.
Published: (2026)
by: Nasiri-Sarvi, Ali, et al.
Published: (2026)
MLI-NeRF: Multi-Light Intrinsic-Aware Neural Radiance Fields
by: Yang, Yixiong, et al.
Published: (2024)
by: Yang, Yixiong, et al.
Published: (2024)
Poppy: Polarization-based Plug-and-Play Guidance for Enhancing Monocular Normal Estimation
by: Kim, Irene, et al.
Published: (2026)
by: Kim, Irene, et al.
Published: (2026)
Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
by: Miao, Qiaomu, et al.
Published: (2024)
by: Miao, Qiaomu, et al.
Published: (2024)
TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans
by: Chatziagapi, Aggelina, et al.
Published: (2024)
by: Chatziagapi, Aggelina, et al.
Published: (2024)
Similar Items
-
Look Hear: Gaze Prediction for Speech-directed Human Attention
by: Mondal, Sounak, et al.
Published: (2024) -
Self-supervised co-salient object detection via feature correspondence at multiple scales
by: Chakraborty, Souradeep, et al.
Published: (2024) -
Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers
by: Yang, Zhibo, et al.
Published: (2023) -
Generating metamers of human scene understanding
by: Raina, Ritik, et al.
Published: (2026) -
Measuring and Predicting Where and When Pathologists Focus their Visual Attention while Grading Whole Slide Images of Cancer
by: Chakraborty, Souradeep, et al.
Published: (2025)