:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Roy, Rajarshi, Das, Devleena, Banerjee, Ankesh, Bhattacharjee, Arjya, Dasgupta, Kousik, Tripathi, Subarna
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.08679
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
by: Bu, Jiazi, et al.
Published: (2024)

Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search
by: Liu, Sainan, et al.
Published: (2026)

VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis
by: Dipta, Shubhashis Roy, et al.
Published: (2025)

PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models
by: L, Murthy, et al.
Published: (2025)

VideoSAGE: Video Summarization with Graph Representation Learning
by: Chaves, Jose M. Rojas, et al.
Published: (2024)

Waver: Wave Your Way to Lifelike Video Generation
by: Zhang, Yifu, et al.
Published: (2025)

SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
by: Valdez, Hector A., et al.
Published: (2024)

Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models
by: Wu, Tz-Ying, et al.
Published: (2025)

Harnessing Object Grounding for Time-Sensitive Video Understanding
by: Wu, Tz-Ying, et al.
Published: (2025)

Customize Your Own Paired Data via Few-shot Way
by: Chen, Jinshu, et al.
Published: (2024)

Keystep Recognition using Graph Neural Networks
by: Romero, Julia Lee, et al.
Published: (2025)

Contrastive Language Video Time Pre-training
by: Liu, Hengyue, et al.
Published: (2024)

Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition
by: Romero, Julia Lee, et al.
Published: (2025)

TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models
by: Cheng, Jiajun, et al.
Published: (2026)

Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
by: Wu, Tz-Ying, et al.
Published: (2024)

Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM
by: Kamboj, Payal, et al.
Published: (2025)

Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization
by: Su, Weijian, et al.
Published: (2026)

SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models
by: Roy, Arani, et al.
Published: (2025)

bi-modal textual prompt learning for vision-language models in remote sensing
by: Kashyap, Pankhi, et al.
Published: (2026)

AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way
by: Huang, Sining, et al.
Published: (2024)

NOVO: Unlearning-Compliant Vision Transformers
by: Roy, Soumya, et al.
Published: (2025)

Seeing a Rose in Five Thousand Ways
by: Zhang, Yunzhi, et al.
Published: (2022)

Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)

Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation
by: Liu, Xueyu, et al.
Published: (2024)

KAN or MLP? Point Cloud Shows the Way Forward
by: Shi, Yan, et al.
Published: (2025)

HDR Reconstruction Boosting with Training-Free and Exposure-Consistent Diffusion
by: Lin, Yo-Tin, et al.
Published: (2026)

How to Design and Train Your Implicit Neural Representation for Video Compression
by: Gwilliam, Matthew, et al.
Published: (2025)

EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs
by: Rodin, Ivan, et al.
Published: (2025)

FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
by: Gunduboina, Hariseetharam, et al.
Published: (2025)

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
by: Hajimiri, Sina, et al.
Published: (2024)

Are We on the Right Way for Evaluating Large Vision-Language Models?
by: Chen, Lin, et al.
Published: (2024)

Just Project! Multi-Channel Despeckling, the Easy Way
by: Denis, Loïc, et al.
Published: (2024)

Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
by: Bhattacharya, Abhigyan, et al.
Published: (2025)

ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
by: Aydın, M. Arda, et al.
Published: (2024)

The Way Up: A Dataset for Hold Usage Detection in Sport Climbing
by: Maschek, Anna, et al.
Published: (2025)

BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?
by: Li, Zhenyu, et al.
Published: (2025)

Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way
by: Luo, Zhanpeng, et al.
Published: (2025)

DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation
by: Guo, Zilu, et al.
Published: (2024)

Computational Pathology: A Survey Review and The Way Forward
by: Hosseini, Mahdi S., et al.
Published: (2023)

Training-Free Personalization via Retrieval and Reasoning on Fingerprints
by: Das, Deepayan, et al.
Published: (2025)