SecLens evaluates LLMs on real-world vulnerability detection through five stakeholder lenses. Decision Scores diverge by up to 31 points for the same model.
Qwen3-Coder earns an A for Head of Engineering but a D for CISO. Claude Haiku 4.5, ranked 8th on the leaderboard, scores 2nd for CISO. No single model dominates — six different models lead at least one of 8 vulnerability categories.
Select a stakeholder role to see how model rankings shift. Same evaluation data, different priorities.
| # | Model | Score | Grade | LB % | vs LB |
|---|
F1 scores by model and OWASP-aligned vulnerability category. Six different models lead at least one category.
Models with conservative strategies earn top grades for Engineering but fail for CISO. Spending more does not guarantee better results.
Each role weights 7 dimension categories differently. The same 35 dimensions, filtered through distinct organizational needs.
Sourced from confirmed CVEs in open-source projects across 10 languages and 8 OWASP-aligned categories.
Models tested on 406 CVE tasks in two layers: Code-in-Prompt (single-turn reasoning) and Tool-Use (sandboxed codebase navigation).
Each task scored on verdict (1pt), CWE classification (+1pt), and location accuracy (+1pt IoU). 35 aggregate dimensions computed across 7 categories.
Dimensions normalized to [0,1] using four strategies: ratio, MCC, lower-is-better, higher-is-better. Fixed reference caps eliminate cohort artifacts.
Five YAML weight profiles select 12-16 dimensions each. Decision Score = weighted sum / available weight × 100, yielding grades A through F.
| Category | Tasks | OWASP | Leader (F1) | Worst (F1) |
|---|---|---|---|---|
| Broken Access Control | 82 | A01:2021 | Kimi K2.5 (0.667) | Qwen3-Coder (0.128) |
| Cryptographic Failures | 64 | A02:2021 | Gemini 3 Flash (0.676) | Qwen3-Coder (0.118) |
| Injection | 62 | A03:2021 | Gemini 3.1 Pro (0.632) | Qwen3-Coder (0.062) |
| Improper Input Validation | 58 | Extended | Haiku 4.5 (0.675) | Qwen3-Coder (0.125) |
| SSRF | 46 | A10:2021 | Sonnet 4.6 (0.690) | Qwen3-Coder (0.512) |
| Authentication Failures | 38 | A07:2021 | Kimi K2.5 (0.585) | Opus 4.6 (0.000) |
| Data Integrity Failures | 36 | A08:2021 | Gemini 3 Flash (0.680) | Qwen3-Coder (0.200) |
| Memory Safety | 20 | Extended | Haiku 4.5 (0.690) | Qwen3-Coder (0.308) |