Overview

The Imaging Platform is meant to provide a powerful support to the developer throughout the typical stages of an AI project workflow. Following, a simple representation of the common AI workflow:

graph LR; A(Data acquisition) --> B(Data pre-processing) B --> C{AI algorithms} C -->|Problem type| D(Identify, incorporate code) C -->|AI method| E(Configure/adapt coding) D --> F(Training & testing) E --> F F --> G(Evaluation)

The content of the Datasets is actually stored outside the platform. There are several Storage Sites that can be configured and connected to the Imaging Platform as external repositories. The role of the platform within the Data acquisition stage is to provide a common way of creating an image collection and to ensure it is managed under a version control tool (DVC), and integrated within the project workflow by means of Gitlab.

The platform is also designed to define and promote a common structure for project development and experiment tracking, in such a way that the output components can be easily understandable and reusable.

More specifically, and according to the typical stages depicted above, the Imaging Platform provides support via the following workflow mappings:

graph TD; A(Data acquisition) -->|dataset definition| B(Dataset creation) B -->|Gitlab integration| C(Gitlab repo powered by DVC) C -->|DVC version control| D(Content uploading) D -->|data preparation| E(Code project creation) E -->|pre-processed dataset| F(Sync Gitlab repo) F -->|AI algorithms| G(AI project creation - fork) F -->|training/testing| G(Execution workflow) G -->|evaluation| H(Component/Model publication)

In the same way as Storage Sites, the platform may count on external Execution endpoints, intended to perform high resource-consuming tasks (training and testing).

Relationships

The platform differentiates among Datasets and Projects, but both are git repositories hosted on GitLab with DVC enabled. The relationship between them is tracked by forked relationships.

As an example, this diagram illustrates how a real structure will look like.

graph LR Dataset1([Dataset1]) Dataset2([Dataset2]) Project1(Project1) Project2(Project2) Project3(Project3) Dataset1 --> Project1 Project1 --> Dataset2 Dataset2 --> Project2 Dataset1 --> Project3

The idea behind the pattern is to easily track which projects are based on a dataset and which projects generate different datasets.

This forks relationship offer a great advantage in code synchronization.

  • If a new version of the dataset is created by a project, it can be easily integrated by a merge request.
  • If a project generates a totally new dataset, it can be created with a fork.