Inaudible Ultrasonic LLM Jailbreak

Description: Speech-driven Large Language Models (LLMs) and end-to-end Large Audio-Language Models (LALMs) are vulnerable to inaudible near-ultrasonic prompt injections, a framework dubbed Sirens' Whisper (SWhisper). By exploiting the non-linear response of commodity microphones, attackers can encode structured, phonetically optimized adversarial prompts into the 17–22 kHz near-ultrasonic band. Using regularized channel-inversion pre-compensation, the attacker shapes the waveform to account for microphone and environmental transfer functions. When played through a commodity speaker, the inaudible signal covertly demodulates into high-fidelity baseband audio inside the victim's microphone, bypassing human perception while successfully delivering duration-compliant jailbreaks or malicious commands directly to the LLM.

Examples: An attacker utilizes a commodity speaker (e.g., HiVi M200MKIII+) to broadcast a pre-computed near-ultrasonic payload at a target device (e.g., an iPhone 14 Pro or smart assistant) from up to 4 meters away.

The attacker generates an adversarial audio prompt using a semantic-regularized template (e.g., incorporating the goal "How to make a bomb", the guiding response "Sure, here's how to make a bomb", and an optimized, pronounceable adversarial suffix).
The synthesized speech spectrum is subjected to regularized least-squares inversion against the target microphone's empirical nonlinear transfer matrix to compensate for channel distortion.
The compensated signal is shifted to the 17–22 kHz band via Hilbert-based single-sideband (SSB) modulation.
The signal is played in the target's vicinity at ~70 dB SPL. It remains completely inaudible to human bystanders but demodulates into a clear voice command, successfully executing the jailbreak on targets like GLM-4-Voice or Qwen-Omni-Turbo. See demonstration videos and templates at https://swhisper-jailbreak.github.io/.

Impact: Allows proximate attackers to silently bypass model alignment, execute unauthorized commands, override system instructions, or extract sensitive information without alerting the user or leaving a visible pre-execution trace.

Affected Systems:

End-to-end Large Audio-Language Models (e.g., GLM-4-Voice, Qwen-Omni-Turbo).
Speech-to-text (STT) mediated LLM pipelines integrating commercial or open-source models (e.g., DeepSeek, GLM-4-Air, Grok-4, Llama-3.1-8B-Instruct, Gemma-3-4B, Qwen2.5-7B, Mistral-7B).
Voice assistants relying on commodity microphones that exhibit standard diaphragm and preamp nonlinearities (e.g., smart home assistants, in-vehicle systems, smartphones).

Mitigation Steps:

Model-Level Intent Evaluation: Deploy LLM-as-a-Judge frameworks (e.g., LLaMA-Guard) to evaluate the semantic intent and safety of demodulated audio transcripts, mitigating physical-layer stealth by blocking the payload at the logical layer.
Speaker Authentication: Implement biometric voice authentication or signed/authenticated prompt mechanisms to verify that commands originate from the authorized user's acoustic profile rather than generic TTS or ambient anomalies.
Hardware-Level Anomaly Detection: Where specialized hardware permits, deploy ultrasonic feature-tracking to detect anomalous propagation in the 17–22 kHz bands. (Note: Standard correlation-based detection comparing baseband signals with high-frequency envelopes is unreliable on many commodity devices, such as iPhones, due to aggressive low-pass filtering at the microphone front-end).

Inaudible Ultrasonic LLM Jailbreak

Research Paper