EU AI Footprint Scanner
AST-based static analysis that detects AI/ML library use across a Python codebase, classified into simplified EU AI Act risk tiers. The first product I'm shipping under Argus Intelligence.
- status
- active
- started
- 2026-03
- updated
- 2026-04
- tags
- AI/ML · Tools
What
A Python static analyser that walks the Abstract Syntax Tree of a codebase, finds every AI/ML library import and call, and classifies each finding into a three-tier risk model loosely inspired by the EU AI Act. Output is structured JSON listing every finding with file path, line number, library name, and risk tier — designed to feed straight into a compliance review.
Currently shipped as a CLI; the next milestone is a GitHub App that runs the same analysis on every pull request and posts findings as a PR comment.
Why
Most EU SMEs have no idea what AI/ML code is running inside their products. The EU AI Act’s GPAI obligations land on 2 August 2026, and the audit asks “what AI are you using?” before anything else. Manual code audits are slow, error-prone, and require specialised knowledge. The existing compliance tools are aimed at large enterprises with dedicated GRC teams — there’s a real gap for a pragmatic, engineering-grade tool that fits in CI.
It’s also the first commercial product I’m shipping under Argus Intelligence.
How
- Pure-Python stack. AST visitor from the
aststdlib + PyYAML for risk-definition config + pytest for the test harness. No heavy framework overhead; the binary path is cold-start fast. - AST-based detection rather than string matching — catches
import openaiand chained calls likeopenai.ChatCompletion.create()without false positives from comments or docstrings. Recursive attribute traversal handles dotted imports (google.generativeai) and walks back to the base name for chained calls. - Three-tier risk schema:
- HIGH — generative LLMs and direct AI APIs (
openai,anthropic,cohere,gemini,mistral,ollama) - LIMITED — foundation-model frameworks and orchestration (
langchain,transformers,tensorflow,pytorch) - MINIMAL — classical ML and numerical computing (
scikit-learn,numpy,pandas)
- HIGH — generative LLMs and direct AI APIs (
- Configurable. Risk definitions live in a YAML file. Customers can add internal libraries to a tier or move libraries between tiers without touching the scanner code.
Honest limitations
- It’s a technical risk-discovery tool, not a legal compliance certification. The output is engineering input to a compliance review — not a substitute for one. This framing is on every page of the product site and matters more than every feature combined.
- Doesn’t detect dynamic imports like
importlib.import_module("openai")— known limitation, may add later. - Library-centric, not data-flow. It tells you what’s imported, not what’s being done with it. A compliance review still needs human judgement about how the library is used.
- Three-tier model is engineering shorthand, not a literal Annex I/II mapping of the EU AI Act. Useful as a starting point, not a final classification.
What’s next
- Build the GitHub App for CI integration. Posts findings as PR comments rather than running ad-hoc on a developer’s laptop.
- Add GitLab CI support in v1.1.
- Continue expanding
risk_definitions.ymlto cover more libraries as the AI ecosystem moves.