Saved in:
| Main Authors: | Srivastava, Vedika, Singh, Hemant Kumar, Singh, Jaisal |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.21194 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
by: Berton, Gabriele, et al.
Published: (2024)
by: Berton, Gabriele, et al.
Published: (2024)
GeoRC: A Benchmark for Geolocation Reasoning Chains
by: Talreja, Mohit, et al.
Published: (2026)
by: Talreja, Mohit, et al.
Published: (2026)
CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging
by: Singh, Pooja, et al.
Published: (2025)
by: Singh, Pooja, et al.
Published: (2025)
EarthMatch: Iterative Coregistration for Fine-grained Localization of Astronaut Photography
by: Berton, Gabriele, et al.
Published: (2024)
by: Berton, Gabriele, et al.
Published: (2024)
Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing
by: Uppur, Suchetan G., et al.
Published: (2025)
by: Uppur, Suchetan G., et al.
Published: (2025)
GeoShield: Safeguarding Geolocation Privacy from Vision-Language Models via Adversarial Perturbations
by: Liu, Xinwei, et al.
Published: (2025)
by: Liu, Xinwei, et al.
Published: (2025)
Real-Time Feedback and Benchmark Dataset for Isometric Pose Evaluation
by: Jaiswal, Abhishek, et al.
Published: (2025)
by: Jaiswal, Abhishek, et al.
Published: (2025)
Simple Unsupervised Knowledge Distillation With Space Similarity
by: Singh, Aditya, et al.
Published: (2024)
by: Singh, Aditya, et al.
Published: (2024)
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients
by: Verma, Tushar, et al.
Published: (2024)
by: Verma, Tushar, et al.
Published: (2024)
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
by: Rismanchian, Sina, et al.
Published: (2024)
by: Rismanchian, Sina, et al.
Published: (2024)
GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction
by: Ghasemi, Narges, et al.
Published: (2025)
by: Ghasemi, Narges, et al.
Published: (2025)
Skill-Conditioned Visual Geolocation for Vision-Language Models
by: Yang, Chenjie, et al.
Published: (2026)
by: Yang, Chenjie, et al.
Published: (2026)
The Role of Generative Systems in Historical Photography Management: A Case Study on Catalan Archives
by: Śanchez, Èric, et al.
Published: (2024)
by: Śanchez, Èric, et al.
Published: (2024)
Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis
by: Singh, Amit Kumar, et al.
Published: (2024)
by: Singh, Amit Kumar, et al.
Published: (2024)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion
by: Singh, Abhishek Kumar, et al.
Published: (2024)
by: Singh, Abhishek Kumar, et al.
Published: (2024)
MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space
by: Singh, Anshul, et al.
Published: (2025)
by: Singh, Anshul, et al.
Published: (2025)
Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms
by: Chen, Boyi, et al.
Published: (2025)
by: Chen, Boyi, et al.
Published: (2025)
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
by: Gadi, Hari Krishna, et al.
Published: (2026)
by: Gadi, Hari Krishna, et al.
Published: (2026)
Zero-shot Vision-Language Reranking for Cross-View Geolocalization
by: Erzurumlu, Yunus Talha, et al.
Published: (2026)
by: Erzurumlu, Yunus Talha, et al.
Published: (2026)
A Multimodal, Multitask System for Generating E Commerce Text Listings from Images
by: Singh, Nayan Kumar
Published: (2025)
by: Singh, Nayan Kumar
Published: (2025)
Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion
by: Singh, Shivam, et al.
Published: (2026)
by: Singh, Shivam, et al.
Published: (2026)
A Hybrid Machine Learning Model for Cerebral Palsy Detection
by: Singh, Karan Kumar, et al.
Published: (2026)
by: Singh, Karan Kumar, et al.
Published: (2026)
HYPERPOSE: Hyperbolic Kinematic Phase-Space Attention for 3D Human Pose Estimation
by: Thekkath, Vinduja, et al.
Published: (2026)
by: Thekkath, Vinduja, et al.
Published: (2026)
LG-Traj: LLM Guided Pedestrian Trajectory Prediction
by: Chib, Pranav Singh, et al.
Published: (2024)
by: Chib, Pranav Singh, et al.
Published: (2024)
PhotoBot: Reference-Guided Interactive Photography via Natural Language
by: Limoyo, Oliver, et al.
Published: (2024)
by: Limoyo, Oliver, et al.
Published: (2024)
Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning
by: Tadesse, Girmaw Abebe, et al.
Published: (2024)
by: Tadesse, Girmaw Abebe, et al.
Published: (2024)
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
by: Astruc, Guillaume, et al.
Published: (2024)
by: Astruc, Guillaume, et al.
Published: (2024)
Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach
by: Li, Hao, et al.
Published: (2025)
by: Li, Hao, et al.
Published: (2025)
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
by: Modi, Rajat, et al.
Published: (2024)
by: Modi, Rajat, et al.
Published: (2024)
LocationAgent: A Hierarchical Agent for Image Geolocation via Decoupling Strategy and Evidence from Parametric Knowledge
by: Li, Qiujun, et al.
Published: (2026)
by: Li, Qiujun, et al.
Published: (2026)
Continuous Sign Language Recognition System using Deep Learning with MediaPipe Holistic
by: Srivastava, Sharvani, et al.
Published: (2024)
by: Srivastava, Sharvani, et al.
Published: (2024)
Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes
by: Jiang, Ruixiang, et al.
Published: (2026)
by: Jiang, Ruixiang, et al.
Published: (2026)
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
by: Ji, Yuxiang, et al.
Published: (2026)
by: Ji, Yuxiang, et al.
Published: (2026)
Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation
by: Bicakci, Yunus Serhat, et al.
Published: (2025)
by: Bicakci, Yunus Serhat, et al.
Published: (2025)
AT-2FF: Adaptive Type-2 Fuzzy Filter for De-noising Images Corrupted with Salt-and-Pepper
by: Singh, Vikas
Published: (2023)
by: Singh, Vikas
Published: (2023)
Designing a Robust Radiology Report Generation System
by: Singh, Sonit
Published: (2024)
by: Singh, Sonit
Published: (2024)
Augmenting End-to-End Steering Angle Prediction with CAN Bus Data
by: Singh, Amit
Published: (2023)
by: Singh, Amit
Published: (2023)
PhotoFlow: Agentic 3D Virtual Photography Missions
by: Guo, Jiarui, et al.
Published: (2026)
by: Guo, Jiarui, et al.
Published: (2026)
Benchmarking Large Language Models for Geolocating Colonial Virginia Land Grants
by: Mioduski, Ryan
Published: (2025)
by: Mioduski, Ryan
Published: (2025)
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
by: Jia, Pengyue, et al.
Published: (2024)
by: Jia, Pengyue, et al.
Published: (2024)
Similar Items
-
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
by: Berton, Gabriele, et al.
Published: (2024) -
GeoRC: A Benchmark for Geolocation Reasoning Chains
by: Talreja, Mohit, et al.
Published: (2026) -
CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging
by: Singh, Pooja, et al.
Published: (2025) -
EarthMatch: Iterative Coregistration for Fine-grained Localization of Astronaut Photography
by: Berton, Gabriele, et al.
Published: (2024) -
Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing
by: Uppur, Suchetan G., et al.
Published: (2025)