Skip to content

Quantum-Software-Development/2-distributed-system-robotfault-ai-server-iot

Repository files navigation


[πŸ‡§πŸ‡· PortuguΓͺs] [πŸ‡¬πŸ‡§ English]



Distributed TCP/IP Server with ML for Predictive Diagnostics




Institution: Pontifical Catholic University of SΓ£o Paulo (PUC‑SP – Humanistic AI & Data Science β€’ 5ΒΊ Semester β€’ 2026)
School: FACEI – Faculty of Interdisciplinary Studies
Course Repo: DISTRIBUTED SYSTEMS – 108 Hours
Professor: ⭐️ Carlos Eduardo Paes
Extensionist Activities: Extension projects and workshops using open‑source software and data‑driven consulting to support the community, aligned with the 20 official extension hours of the course.







[!MPORTANT]

⚠️ Heads Up






Table of Contents



Project overview

ROBOT Sentinel is a distributed systems project that simulates predictive diagnostics in a Factory 4.0 environment. In this solution, robot clients send telemetry to a central TCP/IP server, which processes the data with a trained Machine Learning model and returns a diagnostic result in real time.

The repository is intentionally organized into three evolutionary stages, showing the progression from exploratory experimentation to a cleaner socket architecture and finally to the integrated version delivered in class. The final version combines networking, concurrency and intelligent diagnosis in a centralized server model.**



Scenario and motivation

The scenario represents an industrial environment in which dozens of robots operate with high precision and downtime is expensive. The main idea is to centralize intelligence in a more powerful server instead of making each robot process diagnostics locally.

This design creates three core technical challenges: handling many simultaneous network connections, performing accurate real-time anomaly detection, and protecting shared counters or robot state from race conditions. These three concerns are directly reflected in the project architecture and code evolution.



Repository structure

.
β”œβ”€β”€ 1_Robot_Fabi_Exploratory/
β”‚   β”œβ”€β”€ train_model.py
β”‚   β”œβ”€β”€ servidor_central.py
β”‚   └── no_sensor.py
β”‚
β”œβ”€β”€ 2_Robot_Pedro_Exploratory/
β”‚   β”œβ”€β”€ robot_server.py
β”‚   └── robot_client.py
β”‚
└── 3-ROBOT_FINAL/
    β”œβ”€β”€ Bot Status Identificator.pkl
    β”œβ”€β”€ robot_server.py
    β””



Project stages overview

Stage / Folder Focus Key files Main concepts
1_Robot_Fabi_Exploratory Full exploratory prototype train_model.py, servidor_central.py, no_sensor.py Model training, JSON communication, TCP server, Lock-protected global alert counter, simulated robot client with reconnection logic
2_Robot_Pedro_Exploratory Cleaner socket base robot_server.py, robot_client.py Simplified client–server separation, cleaner TCP skeleton, architectural refactoring for later expansion
3-ROBOT_FINAL Final integrated solution robot_server.py, robot_client.py, Bot Status Identificator.pkl Centralized ML inference, shared robot registry, safer concurrent access, operational command protocol, clean session termination



Stage 1 – 1_Robot_Fabi_Exploratory

This stage is the first complete prototype of the project. It already demonstrates the full flow from data generation to server inference and diagnostic response, making it the most conceptually complete exploratory step.

train_model.py

The script creates a structured dataset with the features temperatura, vibracao and rpm, and the target falha. It then splits the data, trains a RandomForestClassifier, prints a classification report and exports both the CSV dataset and the serialized model file.

Generated artifacts:

  • exemplo_dados.csv
  • modelo_falha_rf.pkl

servidor_central.py

This server validates the model file before startup, loads it with pickle, listens on TCP, receives robot telemetry as JSON and classifies the data through a function that expects temperatura, vibracao and rpm. It returns either "FALHA" or "NORMAL" and also tracks the global number of alerts.

A strong point of this implementation is its robustness. It handles invalid UTF-8, invalid JSON, missing required fields, numeric conversion errors and internal exceptions, all while protecting the shared counter with threading.Lock().


no_sensor.py

This is a simulated robot client used for testing. It generates random sensor values periodically, sends them to the server in JSON format and logs the diagnosis returned by the server together with timestamps and the robot identifier.

It also implements reconnection behavior for timeouts, refused connections, connection reset and unexpected failures, which makes it useful for resilience testing during demonstrations.



Stage 2 – 2_Robot_Pedro_Exploratory

This stage focuses on simplification and architectural clarity. Instead of keeping the first prototype’s heavier end-to-end structure, it isolates the socket communication into a cleaner server–client base that can be extended later with intelligence and synchronization logic.

Its main value is educational and architectural. It turns the project into a more understandable network skeleton, which makes the final integrated version easier to develop, test and explain.



Stage 3 – 3-ROBOT_FINAL

This is the final version delivered for the course presentation. It extends the cleaner socket structure with centralized ML inference, robot registration, shared state and a command-based interaction model.


The final version includes:

  • a pre-trained model file;
  • a central threaded server;
  • a client terminal for operational commands;
  • shared robot tracking;
  • support for clean disconnect behavior through a sentinel command.



System architecture

The final system uses a centralized decision architecture. Robot clients act as distributed nodes that send operational data, while the server acts as the intelligence layer responsible for processing, diagnosis and state coordination. This directly matches the Factory 4.0 motivation in the project.



Server flow

  1. A robot client connects to the TCP server.
  2. The server creates a dedicated thread to handle that connection.
  3. The client may register, request the current robot list, send telemetry or terminate its session.
  4. The server preprocesses the received payload and invokes the Machine Learning model.
  5. The server returns a diagnosis and safely updates shared state whenever needed.



Architecture diagram


flowchart LR
    C1[Robot Client 1]
    C2[Robot Client 2]
    C3[Robot Client 3]

    S[Central TCP Server]

    T1[Thread 1]
    T2[Thread 2]
    T3[Thread 3]

    M[ML Model]
    R[Shared Robot Registry]
    L[Lock / RLock]
    D[Diagnosis Response]

    C1 --> S
    C2 --> S
    C3 --> S

    S --> T1
    S --> T2
    S --> T3

    T1 --> M
    T2 --> M
    T3 --> M

    T1 --> R
    T2 --> R
    T3 --> R

    R --> L
    M --> D
Loading
























πŸ›ΈΰΉ‹ My Contacts Hub




────────────── βŠΉπŸ”­ΰΉ‹ ──────────────

➣➒➀ Back to Top

Copyright 2026 Quantum Software Development. Code released under the MIT license.

About

πŸ•ΈοΈ 2- Poject Distributed System- Python system using TCP/IP sockets, multithreading, and a Random Forest model for IoT-based predictive maintenance and failure detection in industrial robots (Industry 4.0).

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Contributors

Languages