Apr 18, 2026 • 6 min read

Zero False Positives: The Human-Validation Layer in Kavach

Every AppSec lead we’ve met has the same drawer in their head. It’s labelled “tickets the scanner found that weren’t real.” Nessus flagging a library version that’s actually patched via a distro backport. Burp marking a reflected parameter as XSS when the output is JSON-encoded three layers deep. A Nuclei template firing on a decoy endpoint. The drawer fills up faster than the team can empty it, and eventually the team just stops opening tickets from that source.

The industry’s answer for the last three years has been “more AI.” Train a bigger model, rank the findings, auto-close the noise. It helps. It doesn’t solve the problem, because the problem isn’t ranking — it’s judgment. Is this thing actually exploitable? What’s the business impact? How should we word it so the developer fixes the root cause and not the symptom? Those are decisions, not classifications. We built Kavach with the assumption that judgment belongs to a human, and we don’t ship a finding to a customer until one has signed it.

What “AI-driven” without review looks like in production

A scanner can find a thousand candidate issues an hour. A model can prune that to a hundred. Without a human gate, those hundred land in the customer’s Jira. Maybe fifteen are real. The AppSec lead triages, closes eighty-five as noise, and now trusts the feed less. The next batch arrives. Trust erodes. Six months in, the platform is muted or the contract isn’t renewed.

This is the failure we keep seeing with “fully autonomous” pentest platforms. The demo looks incredible — thousands of findings, pretty graphs. The production reality is that the customer’s team becomes the validator, uncompensated, under pressure, and they quit the tool.

What a Kavach validator actually does

When Sentinel — our agentic pentest engine — produces a candidate finding, it goes to the validator queue. A validator is a certified offensive engineer on our team. Minimum bar is OSCP; for complex chains we require OSEP. Here’s what happens next, in order:

Reproduce on a separate instance. The validator replays the chain against the target in a clean session. No cached state, no leftover cookies. If it doesn’t reproduce, the candidate is dropped or sent back for more work. That step alone catches a surprising amount of flaky-scanner noise.
Confirm business impact. Technical exploitability is not the same as risk. An IDOR on a staging endpoint with no real data is a finding; the same IDOR on a production billing endpoint is a Sev-1. The validator makes that call with the customer’s asset context.
Write the customer-facing finding. Not a templated output. A real paragraph describing what’s broken, why it matters to this customer’s business, and what the developer should do. Repro steps that a dev can actually follow. Proof — screenshots, request/response pairs — that survives a skeptical reviewer.
Sign it. The validator’s name goes on the finding. That signature is the accountability. If a finding is wrong, the customer knows who to call, and so do we.

Only after the signature does the finding flow into the customer’s Jira or GitHub project through the integration. The customer’s AppSec lead opens a ticket that’s already triaged, already contextualized, already actionable.

The split: AI handles breadth, humans handle judgment

We’re not anti-AI. Half of Kavach’s speed comes from AI doing what humans are bad at. The split we’ve settled on:

AI handles breadth

Continuous asset discovery across the customer’s surface, 24/7.
Initial probing — reachability, fingerprinting, low-risk enumeration.
Candidate exploit-chain construction. The recon agent finds an exposed subdomain, the exploit agent hypothesizes a chain, the validate agent runs a first-pass check.
Deduplication — if the same issue was already reported last month, don’t re-raise it.
Severity scoring as a first draft, not a final answer.

Humans handle judgment

Is this actually exploitable against this customer’s deployment?
What’s the real business risk, given what we know about their assets?
How do we word the finding so it gets fixed instead of deflected?
Does this chain together with something else we’ve seen this quarter?
Is there a quieter way to verify that doesn’t disrupt production?

Those five questions are the difference between a tool that sends noise and a partner that sends findings.

The SLA loop

A few AppSec leads have asked us “does the human step slow things down?” Fair question. Here’s the loop we run:

Candidate finding from Sentinel enters the validator queue. A validator picks it up, reproduces, confirms, writes, signs. For a straightforward finding — a missing security header, a classic IDOR, a known-version vuln with a clear PoC — this is typically same-day. For a complex chain involving multiple subsystems, it may be two or three working days because the validator is walking through business context with the customer.

Signed finding flows into the customer’s Jira or GitHub via the integration. The customer’s team sees a ticket with severity, impact, repro, and fix guidance. They don’t triage. They prioritize and assign.

The customer’s reviewer can reopen the finding if they disagree — we expect that, and the validator who signed it owns the response. No “support ticket disappears into a pool.” The same engineer who validated it handles the pushback.

A gentle push-back on pure-autonomous vendors

We’ve watched a few platforms pitch “fully autonomous, AI-only pentesting” to Indian BFSI prospects. The demo is always impressive. The production feedback, when we get it, is the same: too much noise, too little context, the customer’s team becomes the de-facto validator.

If a vendor tells you they don’t need human review because their model is good enough, ask three things:

Who signs the finding? If the answer is “the platform,” you’re the reviewer.
What’s the false-positive rate in production, not in the benchmark? Benchmarks are curated; production is messy.
When a finding is disputed, who does the customer talk to? If the answer is a support queue, the accountability gap will show up the first time something is wrong.

The right answer is boring and obvious: a named engineer, below an acceptable noise rate, reachable when there’s a dispute. Kavach runs that way because we’ve never seen the alternative work for the customers we talk to.

Why the validators are in India

We staff the validator role in Pune with certified offensive engineers. OSCP is the minimum; OSEP is required for anyone working on AD, binary, or complex chain work. A few reasons this matters:

Time-zone alignment with Indian and APAC customers. A finding raised at 11 AM IST gets a validator’s eyes within hours, not the next business day in another hemisphere.
Context. A validator familiar with UPI flows, RBI cyber guidance, and the quirks of Indian cloud deployments will make better judgment calls than someone parachuting in from a different market.
Data residency. For BFSI and government customers who need work done inside Indian borders, the validator team is physically in India, working on infrastructure that meets the residency requirements.
Career ladder. We want the best offensive engineers in the country building findings for our customers, not hunting in isolation. A validator role at ZynoSec comes with active research time, bounty participation, and path to red-team lead.

What “zero false positives” actually means

We use the phrase carefully. It means that every finding a customer receives has been reproduced and signed by a named engineer. It does not mean that our automated pre-validation catches everything perfectly — it means we put a human gate between the model and the customer’s ticket queue, and that gate is accountable.

The quiet consequence: customers who move to Kavach tend to stop muting security tickets. That’s the metric we actually watch. If an AppSec lead starts trusting their inbound security feed again, the platform is working.

If you’re evaluating automated pentest vendors and your current stack has a false-positive problem, the question to bring to every demo is “show me the signature on a real finding.” The answer tells you most of what you need to know.

What “AI-driven” without review looks like in production

What a Kavach validator actually does

The split: AI handles breadth, humans handle judgment

AI handles breadth

Humans handle judgment

The SLA loop

A gentle push-back on pure-autonomous vendors

Why the validators are in India

What “zero false positives” actually means

More from the Blog

Kavach by ZynoSec — A Platform Walkthrough for First-Time Buyers

DPDP Act, CERT-In 6-Hour, RBI Cyber — How Compass Auto-Maps Findings

Why CVSS Alone Lies: Exploit-Chain Reasoning with Recon