Multi-layer API request security scanner with Regex, YARA rules, and ML anomaly detection (Isolation Forest). Validates payloads, detects SQLi/XSS abuse, rate-limits IPs, and logs threats.
The API Security Analyzer is a real-time threat detection system designed to protect API endpoints from common attack vectors including SQL Injection (SQLi), Cross-Site Scripting (XSS), and anomalous request patterns. The system employs a defense-in-depth approach by combining three complementary detection mechanisms:
- Regex-based Pattern Matching - Fast, deterministic detection of known attack signatures
- YARA Rule Engine - Advanced pattern recognition for complex threat detection
- Machine Learning Anomaly Detection - Statistical outlier identification using Isolation Forest
- Input Validation: Pydantic schemas + size limits (10KB max)
- Threat Detection: Regex + YARA rules for SQLi, XSS, command injection
- ML Anomaly Detection: Isolation Forest flags zero-day patterns
- Rate Limiting: 10 requests/minute per IP
- Live Dashboard: Real-time logs + anomaly visualization
- Production Logging: JSON-formatted
anomalies.log
- POST /analyze β Pydantic validation
- Regex scan β "union select", "<script>", etc.
- YARA rules β Advanced pattern matching
- ML features β [length, params, entropy, rate]
- Isolation Forest β Anomaly score (-1 = threat)
- Rate limit check β 429 if abused
- Log + Return results
| Component | Technology |
|---|---|
| Backend | FastAPI, Pydantic |
| ML | scikit-learn (Isolation Forest) |
| Rules | YARA, Regex |
| Frontend | HTML/CSS/JS |
| Logging | JSON + file rotation |
The first line of defense uses compiled regular expressions to detect well-known attack signatures:
sqli_pat = re.compile(r"union.*select|drop.*table|exec.*sp", re.I)
xss_pat = re.compile(r"<script|javascript:|alert\(", re.I)YARA provides industry-standard pattern-matching for threat detection:
rule SQLi {
strings:
$sqli = /union.*select/i
condition:
$sqli
}
rule XSS {
strings:
$xss = /<script|javascript:|alert\(/i
condition:
$xss
}The system extracts four key features from each request:
| Feature | Description |
|---|---|
payload_length |
Character count of request payload |
num_parameters |
Number of JSON fields or params |
entropy |
Shannon entropy of payload content |
request_rate |
Requests per minute from IP |
Why Isolation Forest?
- Efficient: O(n) average case complexity
- No distance calculations required
- Handles high-dimensional data well
- Provides anomaly scores for severity ranking
The system was evaluated against a dataset of 40 API requests comprising:
| Category | Count | Description |
|---|---|---|
| Legitimate Requests | 10 | Normal API traffic patterns |
| SQLi Attacks | 15 | UNION-based, stacked queries, boolean-based |
| XSS Attacks | 15 | Reflected, stored, DOM-based vectors |
| Metric | Value |
|---|---|
| True Positives | 20 |
| False Positives | 0 |
| Precision | 100.0% |
| Recall | 66.7% |
| F1-Score | 80.0% |
| Metric | Value |
|---|---|
| True Positives | 18 |
| False Positives | 0 |
| Precision | 100.0% |
| Recall | 60.0% |
| F1-Score | 75.0% |
| Metric | Value |
|---|---|
| True Positives | 30 |
| False Positives | 10 |
| Precision | 75.0% |
| Recall | 100.0% |
| F1-Score | 85.7% |
When all three mechanisms operate in ensemble:
| Metric | Value |
|---|---|
| True Positives | 30 |
| False Positives | 10 |
| Precision | 75.0% |
| Recall | 100.0% |
| F1-Score | 85.7% |
| Component | P50 | P95 | P99 |
|---|---|---|---|
| Regex | 0.00ms | 0.01ms | 0.01ms |
| YARA | 0.00ms | 0.02ms | 0.02ms |
| ML Inference | 4.38ms | 4.53ms | 4.71ms |
| Combined | 4.38ms | 4.83ms | 4.92ms |
Test environment: Python 3.11, scikit-learn 1.5+
- Ensemble Advantage: The combined system achieves 100% recall by leveraging all three detection mechanisms
- Zero False Positives (Rules): Regex and YARA maintain 100% precision with no false alarms on legitimate traffic
- ML Trade-off: Higher false positive rate (10) but catches all attacks - suitable as a secondary layer
- Ultra-Low Latency: Sub-5ms P99 latency makes this suitable for production API gateways
Analyzes a single API request for security threats.
Request:
{
"url": "/api/users",
"method": "POST",
"payload": {"username": "test", "data": "<script>alert(1)</script>"}
}Response:
{
"valid": false,
"issues": ["XSS detected", "ML_Anomaly"],
"anomaly_score": -0.15
}Retrieves recent security events.
Serves the dashboard interface.
cd "api security analyser"
pip install -r requirements.txt
python main.pyServer runs at http://localhost:8000
python evaluate.pynpm i -g vercel
vercel --prodfastapi>=0.100.0
uvicorn>=0.22.0
scikit-learn>=1.3.0
numpy>=1.24.0
jinja2>=3.1.0
pydantic>=2.0.0
yara-python>=4.3.0
- YARA Availability: Optional; gracefully degrades if unavailable
- ML Model: Currently trained on synthetic data; retrain with real traffic for production
- Rate Limiting: In-memory storage; use Redis for distributed deployments
- Model Retraining Pipeline - Continuous learning from verified attacks
- Redis Rate Limiting - Distributed rate limiting across instances
- Additional Attack Vectors - Command injection, LDAP injection, XXE
- SIEM Integration - Splunk, Elastic, QRadar webhook alerts
Kasmya Bhatia
This project demonstrates the implementation of defense-in-depth security using complementary detection mechanisms combining deterministic pattern matching with statistical machine learning approaches.