Inter-Agent Privacy Leakage

Description: Multi-agent Large Language Model (LLM) architectures are vulnerable to internal-channel data leakage due to the absence of access controls and data minimization in inter-agent communication and shared memory. Frameworks such as LangChain, CrewAI, AutoGPT, and MetaGPT propagate complete task contexts—including unredacted sensitive data—between specialized agents during task delegation. Because traditional LLM security guardrails only filter final user-facing outputs, attackers or benign misconfigurations can cause sensitive data to be exposed in unmonitored internal channels (inter-agent messages, shared memory), system logs, and external tool inputs. This bypasses output-centric audits and filters, rendering multi-agent systems highly susceptible to coordination attacks where one agent is manipulated into delegating restricted data to another.

Examples:

Healthcare Workflow Leakage: A user prompts a multi-agent system to schedule a follow-up appointment, providing a patient's name, preferences, and complete medical record. The scheduling agent generates a clean, privacy-compliant appointment confirmation as the final user output. However, during internal processing, the scheduling agent sends its delegation message to a verification agent. This inter-agent message (Channel C2) contains the patient's entire unredacted medical record and diagnosis history, which is subsequently exposed in the system's operational logs (Channel C6).
External Tool Exfiltration: In a CrewAI v0.83+ portfolio management deployment, an attacker injects a prompt requesting a compliance check on a client's account. The Senior Research Analyst agent passes the client's full financial profile and Personally Identifiable Information (PII) to an Investment Advisor agent. The Advisor agent then forwards the client's raw IBAN and tax bracket data as parameters into an external API calculator tool (Channel C3), leaking the PII to an unauthorized third-party endpoint while the final user-facing output remains sanitized.

Impact: Unauthorized disclosure of sensitive information (PII, PHI, financial records, corporate secrets). Data remains unprotected in system logs, shared memory states, and external API calls, resulting in severe compliance violations (e.g., HIPAA, GDPR, PCI-DSS). Output-only audits miss approximately 41.7% of these internal privacy violations, providing a false sense of security while expanding the system's total attack surface by 1.6× compared to single-agent deployments.

Affected Systems: Multi-agent LLM systems and frameworks that utilize inter-agent messaging and shared memory without internal sanitization, including but not limited to:

LangChain (v0.1.x)
CrewAI (v0.83.x)
AutoGPT (v0.5.x)
MetaGPT (v0.8.x)

Mitigation Steps:

Message-Level Sanitization: Implement internal interceptors that apply output-level redaction logic directly to inter-agent communication and shared memory writes.
Selective Disclosure Policies: Enforce explicit data-flow policies at the framework level, ensuring agents share only the minimum necessary fields required for a subtask rather than the full context window.
Memory Access Controls: Implement field-level permissions for shared memory reading and writing.
Full-Channel Auditing: Implement comprehensive trace logging and privacy impact assessments across all internal and external communication pathways (inter-agent messages, memory, tool inputs, and system logs), rather than relying solely on user-facing output audits.

Inter-Agent Privacy Leakage

Research Paper