Data processing platform for a major European bus manufacturer

The client

Our client is a manufacturer of public transport vehicles, and is challenged to exploit vehicle operating data for activity analysis and predictive maintenance needs.

The challenge

Since 2018, Akkodis has set up 3 processing chains collecting data from 9,000 buses in operation throughout Europe; the challenge was to migrate these processing chains & the associated storage solutions to an AWS / Databricks environment

The solution

We access Databricks through an already set up SSO connection, without any intervention on our part.

We use S3 for data storage, CloudWatch to monitor that storage, EC2 to run the virtual machines associated with Databricks jobs, and Databricks for Python and SQL processing.

Security threat management is not directly supported by our team. We only ensure that Databricks runtimes are kept up to date with the latest improvements and fixes.

Performance monitoring is done through a Databricks job that tracks several metrics:

  1. Cluster usage (in minutes) to process collected events.
  2. Number of files rejected per day.
  3. Number of unique VINs.
  4. Non-standard dates over the last two weeks.
  5. Event delays over the last 120 days.
  6. PCM and Intellibus events per day.
  7. Unknown VINs by date and distinct unknown VINs.
Solution architecture

KPIs currently being tracked include:

  1. Costs of each cluster based on processing time.
  2. Cluster usage in minutes, depending on the types of machines used.
  3. Comparing costs between different cluster types to optimize resources.

Incidents are discussed during sprint review meetings or by email exchange. If necessary, a ticket is created and tracked in Azure DevOps, for processing by Akkodis teams

Before each update, we check:

  1. Correct configuration of the Databricks cluster.
  2. The availability of associated EC2 instances.
  3. That the notebooks are correctly versioned and pushed correctly to GitLab.

Network security management is not directly our responsibility and is handled by a client-side team.

  • How is in-flight data encrypted, and where, with what certificates, how are certificates managed/refreshed, TLS protocols, etc.

The management of in-flight data encryption and certificates is not our responsibility and is handled by a client-side team.

Job deployment is primarily handled by Databricks, which provides the tools to schedule and execute them. We also integrated GitLab to version notebooks and allow jobs to be automatically triggered from there. However, this setup is not a typical CI/CD pipeline, as it does not include automated phases like testing or validation before deployment.

RTO and RPO are not formally defined in our current environment and are supported by standard Databricks and AWS configurations.

Our architecture is deployed in a single AWS region. We have no visibility into how Multi-AZ is used or configured.

TCO analysis focuses on costs related to:

  1. Storing data in S3, depending on volume and frequency of access.
  2. When Databricks clusters are running, including usage time and size of configured instances.

The result

  • The data chains are functional. We ensure their maintenance and operation.
  1. Control of the perimeter.
  2. Mastery of data chains in an AWS environment.
  3. Mastery of migration processes.