Code-Switching LLM Jailbreak

Description: Large Language Models (LLMs) exhibit increased vulnerability to adversarial prompts employing code-switching techniques, where multiple languages are interwoven within a single query. This vulnerability stems from an unintended correlation between the resource availability of the languages used in the prompt and the LLM's safety alignment. LLMs trained on imbalanced multilingual data are more susceptible to attacks leveraging low-resource languages, resulting in a higher rate of unsafe or undesirable responses compared to monolingual prompts. Intra-sentence code-switching is particularly effective.

Examples: See the paper for specific examples of code-switching prompts eliciting undesirable behaviors from various LLMs. The paper includes examples using prompts containing multiple languages including English, Chinese, Italian, Vietnamese, Arabic, Korean, Thai, Bengali, Swahili, and Javanese, demonstrating varying degrees of vulnerability based on language resource availability.

Impact: Successful exploitation of this vulnerability can lead to LLMs generating harmful, biased, or otherwise undesirable outputs, including hate speech, unsafe instructions, and the disclosure of private information. The severity of the impact depends on the specific LLM and the nature of the elicited response.

Affected Systems: Multiple state-of-the-art LLMs are affected, including (but not limited to) GPT-3.5-turbo, GPT-4, Claude-3, Llama-3, Mistral, and Qwen-1.5.

Mitigation Steps:

Improve the robustness of LLMs to code-switching attacks through enhanced safety training data that includes diverse code-switched examples and addresses resource imbalances.
Develop and implement more sophisticated detection mechanisms to identify and mitigate code-switching attempts in user inputs.
Implement input sanitization or filtering techniques to reduce the effectiveness of code-switching attacks.
Use LLMs with higher parameter counts as they exhibit improved ability to resist this vulnerability, but this is not a complete solution.

Code-Switching LLM Jailbreak

Research Paper