FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents
Haoxuan Jia, Yang Liu, Bin Chong, Yingguang Yang, Yancheng Chen, Jiayu Liang, Qian Li, Hanning Lu, Kefu Xu, Hao Zheng, Chongyang Zhang, Hao Peng, Philip S. Yu
Read on arXiv →Key claim
FinHarness reduces unauthorized actions while preserving approvals.
FinHarness is a new safety mechanism for finance LLM agents that effectively reduces unauthorized actions while maintaining legitimate approvals. It achieves a significant drop in action success rate from 38.3% to 15.0% and uses fewer advanced judge calls, making it efficient. This approach allows agents to make better decisions in real-time.
In plain English
FinHarness is a new safety mechanism for finance LLM agents that effectively reduces unauthorized actions while maintaining legitimate approvals. It achieves a significant drop in action success rate from 38.3% to 15.0% and uses fewer advanced judge calls, making it efficient. This approach allows agents to make better decisions in real-time.
FinHarness introduces a novel inline safety mechanism for finance LLMs, significantly enhancing their operational safety.
The paper provides solid experimental results with clear metrics, demonstrating the effectiveness of FinHarness.
Deep reliability assessment
The methodology supports the claim that inline, step-level monitoring with accumulated risk can improve the ASR/benign-approval trade-off on the fixed FINVAULT benchmark and a 856-trace synthetic stress set. It overclaims general adaptive robustness: the authors themselves note proprietary judges, frozen rule heads, single-run evaluation, benchmark-bound results, and unknown robustness if attackers know the risk heads, thresholds, window length, or evidence-injection format.
Reproducibility
No open-source code or repository is mentioned. The evaluation uses FINVAULT and a 856-trace synthetic stress set, but the excerpts do not state that FINHARNESS code, prompts, rule heads, judge configurations, or the synthetic traces are released; proprietary judge models are also a reproducibility blocker.
Discussion questions
- 1.Does the core assumption that cumulative weak risk signals indicate malicious intent hold in real finance workflows, where legitimate cases may naturally accumulate anomalies across multi-step processes?
- 2.For builders, is re-injecting fired risk factors into the agent context safer than enforcing hard policy gates, or can attackers learn to manipulate the agent’s response to those injected safety signals?
- 3.What evaluation would falsify the result: adaptive attacks with knowledge of the rule heads and cascade thresholds, variance across different agent models/prompts/tool simulators, or deployment logs showing benign approval collapse?
Key figure
Figure 1 shows a five-step finance-agent attack where each individual tool-call risk score stays below a per-step rejection threshold, but FINHARNESS accumulates the weak signals over the trajectory and routes the case to stronger verification before final approval.