Vision-language models - Discuss Papers

Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation ... Read More >

Clap 0 0 Comment 0 Collection Report
HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Gra ...
Adapter-based tuning methods have shown significant potential in transferring knowledge from pre-trained Vision-Language Models to the downstream tasks. However, after reviewing existing adapters, we find they generally fail to ... Read More >

Clap 0 0 Comment 0 Collection Report
A Unified Debiasing Approach for Vision-Language Models across Modalities and Ta ...
Recent advancements in Vision-Language Models (VLMs) have enabled complex multimodal tasks by processing text and image data simultaneously, significantly enhancing the field of artificial intelligence. However, these models oft ... Read More >

Clap 0 0 Comment 0 Collection Report
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language ...
submucosal dissection (ESD) enables rapid resection of large lesions, minimizing recurrence rates and improving long-term overall survival. Despite these advantages, ESD is technically challenging and carries high risks of compl ... Read More >

Clap 0 0 Comment 0 Collection Report
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality ...
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs). Large-scale pre-training plays a critical rol ... Read More >

Clap 0 0 Comment 0 Collection Report
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient I ...
Incremental object detection (IOD) is challenged by background shift, where background categories in sequential data may include previously learned or future classes. Inspired by the vision-language foundation models such as CLI ... Read More >

Clap 0 0 Comment 0 Collection Report
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering. Recently, there has been a surge of interest in developing universal text embedding mode ... Read More >

Clap 0 0 Comment 0 Collection Report
AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversaria ...
Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial at ... Read More >

Clap 0 0 Comment 0 Collection Report
Generating CAD Code with Vision-Language Models for 3D Designs
Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Comp ... Read More >

Clap 0 0 Comment 0 Collection Report
PANav: Toward Privacy-Aware Robot Navigation via Vision-Language Models
Navigating robots discreetly in human work environments while considering the possible privacy implications of robotic tasks presents significant challenges. Such scenarios are increasingly common, for instance, when robots tran ... Read More >

Clap 0 0 Comment 0 Collection Report