:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pattnayak, Priyaranjan, Patel, Hitesh Laxmichand, Kumar, Bhargava, Agarwal, Amit, Banerjee, Ishan, Panda, Srikant, Kumar, Tejaswini
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online-Zugang:	https://arxiv.org/abs/2412.17759
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization
von: Pattnayak, Priyaranjan, et al.
Veröffentlicht: (2025)

Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts
von: Agarwal, Amit, et al.
Veröffentlicht: (2024)

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation
von: Pattnayak, Priyaranjan, et al.
Veröffentlicht: (2025)

Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems
von: Meghwani, Hansa, et al.
Veröffentlicht: (2025)

MVTamperBench: Evaluating Robustness of Vision-Language Models
von: Agarwal, Amit, et al.
Veröffentlicht: (2024)

LLM for Barcodes: Generating Diverse Synthetic Data for Identity Documents
von: Patel, Hitesh Laxmichand, et al.
Veröffentlicht: (2024)

Tokenization Matters: Improving Zero-Shot NER for Indic Languages
von: Pattnayak, Priyaranjan, et al.
Veröffentlicht: (2025)

SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use
von: Patel, Hitesh Laxmichand, et al.
Veröffentlicht: (2025)

AccessEval: Benchmarking Disability Bias in Large Language Models
von: Panda, Srikant, et al.
Veröffentlicht: (2025)

LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
von: Pattnayak, Priyaranjan, et al.
Veröffentlicht: (2026)

PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
von: Patel, Hitesh Laxmichand, et al.
Veröffentlicht: (2025)

World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models
von: Kim, Eunsu, et al.
Veröffentlicht: (2025)

RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
von: Agarwal, Amit, et al.
Veröffentlicht: (2025)

Who's Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs
von: Hari, Vishnu, et al.
Veröffentlicht: (2025)

FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
von: Agarwal, Amit, et al.
Veröffentlicht: (2025)

Do Image-Text Metrics Respect Semantic Invariances?
von: Agarwal, Amit, et al.
Veröffentlicht: (2026)

When Better Eyes Lead to Blindness: A Diagnostic Study of the Information Bottleneck in CNN-LSTM Image Captioning Models
von: Gupta, Hitesh Kumar
Veröffentlicht: (2025)

RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications
von: Gupta, Amit Kumar, et al.
Veröffentlicht: (2025)

A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
von: Mehta, Naval Kishore, et al.
Veröffentlicht: (2025)

Automatic Recognition of Learning Resource Category in a Digital Library
von: Banerjee, Soumya, et al.
Veröffentlicht: (2023)

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
von: Kumar, Deepak, et al.
Veröffentlicht: (2026)

ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
von: Burapacheep, Jirayu, et al.
Veröffentlicht: (2024)

HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count
von: Wiederhold, Noah, et al.
Veröffentlicht: (2023)

3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes
von: Medi, Tejaswini, et al.
Veröffentlicht: (2024)

DAIQ: Auditing Demographic Attribute Inference from Question in LLMs
von: Panda, Srikant, et al.
Veröffentlicht: (2025)

Skin Cancer Classification: Hybrid CNN-Transformer Models with KAN-Based Fusion
von: Agarwal, Shubhi, et al.
Veröffentlicht: (2025)

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation
von: Pattnayak, Priyaranjan
Veröffentlicht: (2026)

The Unseen Adversaries: Robust and Generalized Defense Against Adversarial Patches
von: Kumar, Vishesh, et al.
Veröffentlicht: (2026)

A Review on Large Language Models for Visual Analytics
von: Agarwal, Navya Sonal, et al.
Veröffentlicht: (2025)

A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research Prospects
von: Wang, Fei, et al.
Veröffentlicht: (2025)

Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
von: He, Hulingxiao, et al.
Veröffentlicht: (2026)

Mobile-friendly Image de-noising: Hardware Conscious Optimization for Edge Application
von: Miriyala, Srinivas, et al.
Veröffentlicht: (2026)

Hallucination of Multimodal Large Language Models: A Survey
von: Bai, Zechen, et al.
Veröffentlicht: (2024)

Survey of Adversarial Robustness in Multimodal Large Language Models
von: Jiang, Chengze, et al.
Veröffentlicht: (2025)

MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories
von: Zhang, Yuyi, et al.
Veröffentlicht: (2025)

HIDISC: A Hyperbolic Framework for Domain Generalization with Generalized Category Discovery
von: Rathore, Vaibhav, et al.
Veröffentlicht: (2025)

Survey of Multimodal Geospatial Foundation Models: Techniques, Applications, and Challenges
von: Yang, Liling, et al.
Veröffentlicht: (2025)

FPBench: A Comprehensive Benchmark of Multimodal Large Language Models for Fingerprint Analysis
von: Gavas, Ekta, et al.
Veröffentlicht: (2025)

Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices
von: Gupta, Jigyasa, et al.
Veröffentlicht: (2025)

FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training
von: Medi, Tejaswini, et al.
Veröffentlicht: (2024)