Usage

The Imaging Platform projects are designed to carefully track the relationship with a given dataset to allow easy experiment reproducibility. Also, being DVC powered git repositories, they can version experiment results easily.

Project Creation

A Project can be created using the web UI by going to the Dataset view, selecting the desired dataset and clicking the button to “start new project”. This creates a fork from the dataset within the given namespace.

The folder structure from the dataset is maintained, so it is wise to store files in datasets with a known structure to ease code integration.

Project Templates

Project templates allow you to get started quickly. They intend to homogenize developer’s criteria and provide support for automated programmable actions.

Templates are like building blocks that can be added to your project on demand. Once you identify the blocks that you are going to need in your application, the platform will add them to your project fork.

The current implementation will copy/paste the content from the template into the project and open a merge request with the new changes. This way you can review the output, modify and accept it with the regular git workflow.

Dataset download

As the project is just a fork from the dataset, you can download files the same way.

## Download project
$PROJECT=http://jisap.tecnalia.com/namespace/project.git
git clone $PROJECT project
cd project

## Pull files
dvc remote modify --local jisap-basf password your-gitlab-token
dvc pull

Artifact versioning

Project can generate different models from different experiments. Thanks to the integration with DVC, the outputs are automatically tracked and uploaded to the default storage site.

The platform allows you to browse the tracked files directly from the details view. From here, you can navigate to any file and copy the link to share this version with other members. This is useful used with git tags. For example, if you generate a model initial version and you want to import it elsewhere, you can create a git tag to identify the version and then use that tag to generate a download url.

https://jisap.tecnalia.com/api/v1/projects/<id>/files/path/to/model.h5?ref=v0.1.0

As long as you do not modify the path, you can keep working on your model. Once you have the next version ready, you simply create another git tag and update the ref value in the url.

For more advanced users, the ref can be any valid git reference. This means that you can use master as a reference and it will be a url that always downloads the latest version. This can be annoying because it changes without your control, so it can be better idea to use git hashes as references to ensure that you always get the same model.

jisap-cli getUrl command can help you to generate valid urls

Publish versions

Although the previous method to share urls works fine, it lacks of proper documentation and link to source code. For this purpose, you can generate a release page that can include not only relevant download URLs, but also a link to the source code and a markdown description that can include detailed information about the version.

This process is based on GitLab Releases, so you can refer to original documentation.

All releases will be visible in the project details view, together with the description provided. This can include any markdown format.

‘jisap-cli release’ can be used to create a release from your local computer. It is useful to build custom automation scripts. It can be used also during GitLab pipelines, but the release pipeline instruction is better for this purpose.

Publish environments

After publishing a version, you are likely to want to deploy the model as a service somewhere. This is up to you and heavily depends on your needs. The platform can help you to register where your model is available and display this information using GitLab environments. Once the environment is registered, it will be visible from the projects details view.

Dataset Update

After the project is created, if the dataset is updated, the changes are not automatically integrated with the project. This is to ensure reproducibility and to prevent unwanted changes. The dataset must be manually updated doing a merge from the original dataset repository to the project repository.

One of the easiest ways to do the merge operation is to add a new remote to the existing project repository. This can be easily done with the following git command.

git remote add dataset <dataset-url>

After running this command, you will be able to see all the branches from the dataset and your current project with any git client of your choice. All git commands will also work as usual, but you will need to specify which remote to use. For example, to fetch new updates from the dataset you can run git fetch dataset.

Assuming that you want to update the branch master from the dataset to the local master branch, you need to run a merge command like git merge dataset/master master

Note that you must pay special attention to conflicts with dvc files. See dvc documentation to understand how to resolve them.

Dataset generation

It is a common pattern to create a project that modifies the content of dataset. After this operation, the output can be viewed as a new version of the original dataset or a totally new dataset. The platform can handle both cases easily.

Create a new version

Following the Merge Request pattern, you can create a Merge Request from the project to the original dataset thanks to the fork relationship. At this point you can choose to update an existing branch in the original dataset or to create a new one.

It is recommended to remove all the code from the project before the merge. A dataset should not contain project code.

Create a new dataset

Following the same process as when creating a new project from a dataset, you can fork the project to create a dataset. The easiest way is to do this using the project detail view.