MEDfl Complete Tutorial (Simulation)

In this complete tutorial, we will demonstrate how to use the MEDfl package to set up and run a federated learning experiment in simulation mode.

Starting from a realistic healthcare scenario, we will:

  • Configure the database used by MEDfl

  • Create a network and nodes with the NetManager

  • Generate a federated dataset

  • Define a dynamic model

  • Configure the aggregation strategy

  • Start a Flower-based FL server

  • Run the federated training pipeline

  • Plot accuracy and loss

  • Automatically test the final model and store results in the database

This tutorial is based on the accompanying Jupyter notebook. It is designed as a step-by-step guide you can follow and adapt to your own datasets and configurations.

Real-world motivation

Martin is an AI researcher whose main interest is applying AI to the healthcare domain. He is contacted by a prestigious institute to study the feasibility of a new project:

Designing and developing a federated learning system between several hospitals, using deep learning while preserving patient privacy.

After analyzing the requirements, Martin identifies that the project needs:

  • Federated Learning (FL) to keep data local to each hospital

  • Differential Privacy (DP) to protect model updates

  • A robust data and experiment management layer

Martin knows MEDfl has been designed for exactly these kinds of tasks. With its two main sub-packages, NetManager and LearningManager, MEDfl allows him to:

  • Design different federated learning architectures (setups)

  • Simulate real-world collaborations between hospitals

  • Integrate transfer learning and differential privacy

  • Store and compare results systematically in a database

0. Prerequisites

Before following this tutorial, make sure you have:

  • Installed MEDfl and its dependencies (see installation)

  • A Python environment (e.g. fl-env) with:

    • torch

    • flwr

    • pandas

    • sqlalchemy

  • A CSV dataset. In this tutorial we use a diabetes dataset located at:

    ../data/masterDataSet/diabetes_dataset.csv
    

Note

In production, MEDfl can be connected to a MySQL database (see database_management). In this tutorial, for simplicity, we use a local SQLite database.

1. Environment and imports

We start by making sure the project root is on the Python path and importing all the necessary modules.

import sys
sys.path.append("../..")

import os
os.environ["PYTHONPATH"] = "../.."

# Database and data handling
import pandas as pd

# Torch imports
import torch
import torch.nn as nn
import torch.optim as optim

# Flower
import flwr as fl

# MEDfl imports - NetManager
from MEDfl.NetManager.node import Node
from MEDfl.NetManager.network import Network
from MEDfl.NetManager.flsetup import FLsetup
from MEDfl.NetManager.database_connector import DatabaseManager

# MEDfl imports - LearningManager
from MEDfl.LearningManager.dynamicModal import DynamicModel
from MEDfl.LearningManager.model import Model
from MEDfl.LearningManager.strategy import Strategy
from MEDfl.LearningManager.server import FlowerServer
from MEDfl.LearningManager.flpipeline import FLpipeline
from MEDfl.LearningManager.plot import AccuracyLossPlotter
from MEDfl.LearningManager.utils import set_db_config

2. Database configuration

In MEDfl, all networks, nodes, datasets, setups, pipelines, and results are stored in a relational database.

In this tutorial we use a local SQLite database file named medfl_database.db:

# Configure the database path
set_db_config("./medfl_database.db")

# Create and connect the database manager
db_manager = DatabaseManager()
db_manager.connect()
connection = db_manager.get_connection()

print("Database connection OK")

Next, we generate the necessary MEDfl tables based on a master dataset CSV file. This file describes the global structure of the data that will later be partitioned across hospitals.

db_manager.create_MEDfl_db(
    path_to_csv="../data/masterDataSet/diabetes_dataset.csv"
)

Note

create_MEDfl_db:

  • infers dataset-related tables from the CSV structure,

  • creates the core MEDfl tables to manage networks, nodes, datasets and experiments.

3. Network creation (NetManager)

We now create a federated network that will hold all hospitals (nodes) and the corresponding datasets.

# Create a new network
net = Network("Net1")

# Register the network in the database
net.create_network()

print(net.name)  # "Net1"

We then register the master dataset associated with this network:

net.create_master_dataset(
    "../data/masterDataSet/diabetes_dataset.csv"
)

4. Federated Learning setup (FLsetup)

An FLsetup describes a federated learning configuration for a given network: which network it uses, how datasets are split, and how the federated dataset will be derived.

Here we create an automatic setup:

auto_fl = FLsetup(
    name="Flsetup_2",
    description="The second FL setup",
    network=net,
)
auto_fl.create()

auto_fl.list_allsetups()

This will show a table of FL setups stored in the database, including the one we just created.

5. Node creation and dataset upload

Now we add hospital nodes to the network. Each node receives a local dataset, representing that hospital’s data.

# Train node: hospital_1
hospital_1 = Node(name="hospital_1", train=1)
net.add_node(hospital_1)
hospital_1.upload_dataset(
    "hospital_1",
    "../data/masterDataSet/client_1_dataset.csv",
)
# Train node: hospital_2
hospital_2 = Node(name="hospital_2", train=1)
net.add_node(hospital_2)
hospital_2.upload_dataset(
    "hospital_2",
    "../data/masterDataSet/client_2_dataset.csv",
)
# Test node: hospital_3 (no local training)
hospital_3 = Node(name="hospital_3", train=0)
net.add_node(hospital_3)
hospital_3.upload_dataset(
    "hospital_3",
    "../data/masterDataSet/client_3_dataset.csv",
)

You can list all nodes registered in the network:

net.list_allnodes()

6. Federated dataset creation

We now ask MEDfl to build a federated dataset from:

  • the FL setup,

  • the nodes,

  • and the master dataset.

In this example, we consider "diabetes" as the target variable.

fl_dataset = auto_fl.create_federated_dataset(
    output="diabetes",   # target column
    fit_encode=[],       # columns to encode (if any)
    to_drop=["diabetes"] # columns to drop from the inputs
)

You can inspect the federated dataset object:

fl_dataset.size          # number of clients / partitions
auto_fl.get_flDataSet()  # summary table stored in the DB

7. Model definition (DynamicModel)

MEDfl provides a DynamicModel class to create models dynamically depending on the task (binary classification, multiclass, regression, etc.).

In this tutorial, we build a binary classifier with 8 input features:

# Create a DynamicModel helper
dynamic_model = DynamicModel()

# Build a specific model
specific_model = dynamic_model.create_model(
    model_type="Binary Classifier",
    params_dict={
        "input_dim": 8,
        "output_dim": 1,
        "hidden_dims": [16, 32],
    },
)

# Optimizer and loss
optimizer = optim.SGD(specific_model.parameters(), lr=0.001)
criterion = nn.BCELoss()

# Wrap everything into a MEDfl Model
global_model = Model(specific_model, optimizer, criterion)

# Initial parameters (to share with clients)
init_params = global_model.get_parameters()

8. Aggregation strategy

The aggregation strategy specifies how local model updates are combined on the server side (e.g., FedAvg, FedAdam, etc.).

Here we use FedAdam as an example:

aggreg_algo = Strategy(
    "FedAdam",
    fraction_fit=1.0,
    fraction_evaluate=1.0,
    min_fit_clients=2,
    min_evaluate_clients=2,
    min_available_clients=2,
    initial_parameters=init_params,
)
aggreg_algo.create_strategy()

9. Federated learning server

We now create the Flower-based federated server that will orchestrate training across the clients (nodes) using the federated dataset.

server = FlowerServer(
    global_model,
    strategy=aggreg_algo,
    num_rounds=10,
    num_clients=len(fl_dataset.trainloaders),
    fed_dataset=fl_dataset,
    diff_privacy=False,  # set True to enable DP
    client_resources={
        "num_cpus": 1.0,
        "num_gpus": 0.0,
    },
)

10. FL pipeline creation and training

To make the experiment reproducible and easy to manage, MEDfl provides the FLpipeline class. It links the server, setup, and results together.

ppl_1 = FLpipeline(
    name="the first fl_pipeline",
    description="This is our first FL pipeline",
    server=server,
)

To start federated training:

history = ppl_1.server.run()

11. Plotting accuracy and loss

After training, we can visualize the evolution of global accuracy and loss across federated rounds.

global_accuracy = ppl_1.server.accuracies
global_loss = ppl_1.server.losses

results_dict = {
    ("LR: 0.001, Optimizer: SGD", "accuracy"): global_accuracy,
    ("LR: 0.001, Optimizer: SGD", "loss"): global_loss,
}

plotter = AccuracyLossPlotter(results_dict)
plotter.plot_accuracy_loss()

This produces a figure showing the training curves over the rounds, helping you compare different configurations or hyperparameters.

12. Automatic testing and result storage

Finally, we can automatically test the global model on test nodes and store the metrics in the database:

test_results = ppl_1.auto_test()
test_results

Each entry in test_results contains:

  • The node name

  • A classification report including:

    • Confusion matrix (TP, FP, FN, TN)

    • Accuracy

    • Sensitivity/Recall

    • Specificity

    • PPV/Precision

    • NPV

    • F1-score

    • False positive rate

    • True positive rate

    • AUC

All these results are also saved in the MEDfl database, allowing you to:

  • Compare different FL setups

  • Track experiments across time

  • Reuse configurations in future studies