Evaluator Consensus Manipulation

Description: A vulnerability exists in the "Adaptive Trust Weighting" mechanism of the Cost-Aware Proof of Quality (PoQ) protocol for decentralized LLM inference. The protocol updates evaluator trust weights based on the deviation of a submitted score from the consensus score of the current round. Because the consensus score is derived from the very scores being evaluated (a self-referential feedback loop), the mechanism fails to distinguish between honest and coordinated malicious evaluators. Additionally, the multiplicative update rule causes trust weights to drift upward and saturate at the maximum bound ($w_{max}$) for any evaluator whose average deviation is less than 0.5. When weights saturate, the "Adaptive Weighted Mean" degrades into a "Simple Mean," bypassing the intended reputation defense and rendering the network susceptible to score manipulation, payout inflation, and sabotage.

Examples: The vulnerability stems from the trust update logic defined in Equation 21 combined with the consensus dependency.

Trust Saturation: The protocol updates weights using the formula: $$w_{e} \leftarrow \text{clip}\Bigl(w_{e} \cdot \bigl(1+\lambda(0.5 - d_{t,e})\bigr),,w_{\min},,w_{\max}\Bigr)$$ where $d_{t,e}$ is the normalized deviation $|s_{t,e} - c_t|/10$. If an evaluator $e$ maintains a deviation $d_{t,e} < 0.5$ (which implies a raw score difference of less than 5.0 on a 0-10 scale), the term $(0.5 - d_{t,e})$ is positive, and the weight $w_e$ increases. Over time, both honest nodes and cautious adversaries saturate at $w_{max}$, effectively neutralizing the weighting mechanism.
Boosting Attack Exploitation: An adversary controls a subset of evaluators and applies a "Boosting" strategy (Equation 18): $$s^{\prime}{t,e} = \min{10,,s{t,e}+b}$$ If the adversary controls a sufficient fraction of the pool (e.g., $\rho=0.3$), the consensus score $c_t$ shifts upward toward the boosted values. Consequently, the adversary's manipulated scores $s^{\prime}{t,e}$ remain close to the shifted consensus $c_t$, resulting in a low deviation $d{t,e}$. The system incorrectly increases the trust weight of the malicious evaluators, reinforcing their ability to inflate payouts in future rounds.

Impact:

Incentive Manipulation: Malicious evaluators can artificially inflate rewards (payout inflation) for specific inference nodes or sabotage competitors by coordinating score deviations.
Defense Bypass: The adaptive trust system degrades to a non-robust simple average, failing to filter out "Boosting" or "Sabotage" attacks.
Financial Loss: The network may overpay for low-quality inference or underpay high-quality providers, breaking the cost-aware incentive structure.

Affected Systems:

Decentralized LLM inference networks implementing Cost-Aware Proof of Quality (PoQ) with internal deviation-based trust updates (specifically adhering to the logic in Eq. 21 of the referenced paper).

Mitigation Steps:

Replace Consensus Rule: Abandon the "Adaptive Weighted Mean" and "Simple Mean" in favor of robust statistics that do not rely on stateful weights, specifically Median or Trimmed Mean aggregation.
Decouple Trust Updates: Do not base trust updates solely on consensus deviation. Incorporate external "anchor tasks" (tasks with known ground truth/Gold Standards) to calibrate weights.
Prevent Weight Saturation: Recalibrate the update logic to prevent monotonic growth; ensure typical variances do not lead to automatic saturation at $w_{max}$.
Dynamic Sampling: Increase the evaluator sample size ($K$) dynamically when high variance or signs of manipulation are detected, despite the increased cost.

Evaluator Consensus Manipulation

Research Paper