Subgroup E2 : Data Engineering + AI/ML

Overview

E2 is responsible for the data and intelligence layer of the Energy Management System. We handle data ingestion from smart meters, real time stream processing, batch analytics, ML based load forecasting and anomaly detection, and serving insights via a REST API

Our Responsibilities

Kafka based data ingestion from E1 (smart meters via MQTT)
Real time stream processing with Apache Flink
Batch analytics pipelines with Apache Spark + Airflow
Load forecasting model (LSTM)
Anomaly detection for energy theft/leakage (Isolation Forest)
Data quality validation with Great Expectations
REST API (FastAPI) serving forecasts, anomalies and recommendations to E3 & E4

System Architecture

View Architecture Diagram

Tech Stack

Area	Tools
Ingestion	Apache Kafka, MQTT
Stream Processing	Apache Flink
Batch Processing	Apache Spark, Apache Airflow
ML Models	PyTorch, MLflow
API	FastAPI
Storage	InfluxDB, PostgreSQL
Validation	Great Expectations
Containerization	Docker, Docker Compose
CI/CD	GitHub Actions

Project Structure

data-intelligence/
├── .github/
│   └── workflows/
│       └── ci.yml                    # GitHub Actions CI pipeline
├── dags/                             # Airflow DAGs (batch pipelines)
│   ├── data-validation-dag.py
│   ├── db-retention-dag.py
│   ├── energy-batch-pipeline.py
│   └── model-retraining-dag.py
├── data/                             # Local sample/mock data
├── db/
│   ├── postgres/                     # Schema + migrations
│   └── influxdb/                     # Bucket configs
├── docker/                           # Per-service Dockerfiles
│   ├── Dockerfile.airflow
│   ├── Dockerfile.anomaly
│   ├── Dockerfile.api
│   ├── Dockerfile.batch-pipeline
│   ├── Dockerfile.forecasting
│   ├── Dockerfile.ingestion
│   ├── Dockerfile.storage
│   └── Dockerfile.streaming
├── mlflow/                           # MLflow experiment tracking config
│   └── config.yaml
├── src/
│   ├── ingestion/                    # Kafka consumers, MQTT bridge
│   ├── streaming/                    # Flink stream processors
│   ├── models/
│   │   ├── forecasting/
│   │   └── anomaly/
│   ├── optimization/                 # Energy optimization recommendations
│   │   └── recommendations.py
│   ├── spark/                        # Spark batch jobs
│   │   ├── batch-energy-analytics.py
│   │   └── feature-engineering.py
│   ├── api/                          # FastAPI app + routes
│   │   ├── routes/
│   │   ├── main.py
│   │   ├── schemas.py
│   │   └── dependencies.py
│   ├── validation/                   # Great Expectations
│   └── utils/
├── docs/                             # Integration guides + API contracts
│   ├── integration-e1.md
│   ├── integration-e3.md
│   └── integration-e4.md
├── tests/
│   ├── fixtures/
│   │   └── energy-readings.json
│   ├── integration/
│   │   ├── test-db-connections.py
│   │   └── test-kafka-pipeline.py
│   ├── system/                       # End-to-end system tests
│   │   ├── test-end-to-end-pipeline.py
│   │   └── test-api-e2e.py
│   └── unit/
│       ├── test-anomaly.py
│       ├── test-anomaly-pipeline.py
│       ├── test-api.py
│       ├── test-forecasting.py
│       ├── test-ingestion.py
│       ├── test-postgres-validator.py
│       ├── test-recommendations.py
│       ├── test-schemas.py
│       ├── test-streaming.py
│       └── test-validation.py
├── .env.example
├── docker-compose.yml
├── pyproject.toml
├── README.md
└── requirements.txt

Getting Started

Important: Before you start working, read the Contributing Guidelines.

Prerequisites

Docker + Docker Compose
Python 3.11+

Setup

git clone https://github.com/Prypiatos/data-intelligence.git
cd data-intelligence
cp .env.example .env
docker-compose up -d
pip install -r requirements.txt

Run Tests

pytest tests/ -v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subgroup E2 : Data Engineering + AI/ML

Overview

Our Responsibilities

System Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Setup

Run Tests

Related Repositories

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
.githooks		.githooks
.github		.github
config/mosquitto		config/mosquitto
dags		dags
data		data
db		db
docker		docker
docs		docs
mlflow		mlflow
models		models
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-api.txt		requirements-api.txt
requirements.airflow.txt		requirements.airflow.txt
requirements.anomaly.txt		requirements.anomaly.txt
requirements.spark.txt		requirements.spark.txt
requirements.streaming.txt		requirements.streaming.txt
requirements.txt		requirements.txt
requirements_batch_pipeline.txt		requirements_batch_pipeline.txt
requirements_forecasting.txt		requirements_forecasting.txt

Folders and files

Latest commit

History

Repository files navigation

Subgroup E2 : Data Engineering + AI/ML

Overview

Our Responsibilities

System Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Setup

Run Tests

Related Repositories

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages