Saved in:
| Main Authors: | Roy, Rajarshi, Das, Devleena, Banerjee, Ankesh, Bhattacharjee, Arjya, Dasgupta, Kousik, Tripathi, Subarna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.08679 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
by: Bu, Jiazi, et al.
Published: (2024)
by: Bu, Jiazi, et al.
Published: (2024)
Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search
by: Liu, Sainan, et al.
Published: (2026)
by: Liu, Sainan, et al.
Published: (2026)
VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis
by: Dipta, Shubhashis Roy, et al.
Published: (2025)
by: Dipta, Shubhashis Roy, et al.
Published: (2025)
PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models
by: L, Murthy, et al.
Published: (2025)
by: L, Murthy, et al.
Published: (2025)
VideoSAGE: Video Summarization with Graph Representation Learning
by: Chaves, Jose M. Rojas, et al.
Published: (2024)
by: Chaves, Jose M. Rojas, et al.
Published: (2024)
Waver: Wave Your Way to Lifelike Video Generation
by: Zhang, Yifu, et al.
Published: (2025)
by: Zhang, Yifu, et al.
Published: (2025)
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
by: Valdez, Hector A., et al.
Published: (2024)
by: Valdez, Hector A., et al.
Published: (2024)
Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models
by: Wu, Tz-Ying, et al.
Published: (2025)
by: Wu, Tz-Ying, et al.
Published: (2025)
Harnessing Object Grounding for Time-Sensitive Video Understanding
by: Wu, Tz-Ying, et al.
Published: (2025)
by: Wu, Tz-Ying, et al.
Published: (2025)
Customize Your Own Paired Data via Few-shot Way
by: Chen, Jinshu, et al.
Published: (2024)
by: Chen, Jinshu, et al.
Published: (2024)
Keystep Recognition using Graph Neural Networks
by: Romero, Julia Lee, et al.
Published: (2025)
by: Romero, Julia Lee, et al.
Published: (2025)
Contrastive Language Video Time Pre-training
by: Liu, Hengyue, et al.
Published: (2024)
by: Liu, Hengyue, et al.
Published: (2024)
Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition
by: Romero, Julia Lee, et al.
Published: (2025)
by: Romero, Julia Lee, et al.
Published: (2025)
TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models
by: Cheng, Jiajun, et al.
Published: (2026)
by: Cheng, Jiajun, et al.
Published: (2026)
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
by: Wu, Tz-Ying, et al.
Published: (2024)
by: Wu, Tz-Ying, et al.
Published: (2024)
Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM
by: Kamboj, Payal, et al.
Published: (2025)
by: Kamboj, Payal, et al.
Published: (2025)
Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization
by: Su, Weijian, et al.
Published: (2026)
by: Su, Weijian, et al.
Published: (2026)
SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models
by: Roy, Arani, et al.
Published: (2025)
by: Roy, Arani, et al.
Published: (2025)
bi-modal textual prompt learning for vision-language models in remote sensing
by: Kashyap, Pankhi, et al.
Published: (2026)
by: Kashyap, Pankhi, et al.
Published: (2026)
AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way
by: Huang, Sining, et al.
Published: (2024)
by: Huang, Sining, et al.
Published: (2024)
NOVO: Unlearning-Compliant Vision Transformers
by: Roy, Soumya, et al.
Published: (2025)
by: Roy, Soumya, et al.
Published: (2025)
Seeing a Rose in Five Thousand Ways
by: Zhang, Yunzhi, et al.
Published: (2022)
by: Zhang, Yunzhi, et al.
Published: (2022)
Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)
by: Shi, Xiangyu, et al.
Published: (2025)
Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
KAN or MLP? Point Cloud Shows the Way Forward
by: Shi, Yan, et al.
Published: (2025)
by: Shi, Yan, et al.
Published: (2025)
HDR Reconstruction Boosting with Training-Free and Exposure-Consistent Diffusion
by: Lin, Yo-Tin, et al.
Published: (2026)
by: Lin, Yo-Tin, et al.
Published: (2026)
How to Design and Train Your Implicit Neural Representation for Video Compression
by: Gwilliam, Matthew, et al.
Published: (2025)
by: Gwilliam, Matthew, et al.
Published: (2025)
EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs
by: Rodin, Ivan, et al.
Published: (2025)
by: Rodin, Ivan, et al.
Published: (2025)
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
by: Gunduboina, Hariseetharam, et al.
Published: (2025)
by: Gunduboina, Hariseetharam, et al.
Published: (2025)
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
by: Hajimiri, Sina, et al.
Published: (2024)
by: Hajimiri, Sina, et al.
Published: (2024)
Are We on the Right Way for Evaluating Large Vision-Language Models?
by: Chen, Lin, et al.
Published: (2024)
by: Chen, Lin, et al.
Published: (2024)
Just Project! Multi-Channel Despeckling, the Easy Way
by: Denis, Loïc, et al.
Published: (2024)
by: Denis, Loïc, et al.
Published: (2024)
Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
by: Bhattacharya, Abhigyan, et al.
Published: (2025)
by: Bhattacharya, Abhigyan, et al.
Published: (2025)
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
by: Aydın, M. Arda, et al.
Published: (2024)
by: Aydın, M. Arda, et al.
Published: (2024)
The Way Up: A Dataset for Hold Usage Detection in Sport Climbing
by: Maschek, Anna, et al.
Published: (2025)
by: Maschek, Anna, et al.
Published: (2025)
BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?
by: Li, Zhenyu, et al.
Published: (2025)
by: Li, Zhenyu, et al.
Published: (2025)
Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way
by: Luo, Zhanpeng, et al.
Published: (2025)
by: Luo, Zhanpeng, et al.
Published: (2025)
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation
by: Guo, Zilu, et al.
Published: (2024)
by: Guo, Zilu, et al.
Published: (2024)
Computational Pathology: A Survey Review and The Way Forward
by: Hosseini, Mahdi S., et al.
Published: (2023)
by: Hosseini, Mahdi S., et al.
Published: (2023)
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
by: Das, Deepayan, et al.
Published: (2025)
by: Das, Deepayan, et al.
Published: (2025)
Similar Items
-
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
by: Bu, Jiazi, et al.
Published: (2024) -
Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search
by: Liu, Sainan, et al.
Published: (2026) -
VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis
by: Dipta, Shubhashis Roy, et al.
Published: (2025) -
PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models
by: L, Murthy, et al.
Published: (2025) -
VideoSAGE: Video Summarization with Graph Representation Learning
by: Chaves, Jose M. Rojas, et al.
Published: (2024)