DVC and git are terminal tools by default. For this reason, working with datasets from the cli can be really easy, comfortable and powerful.
Before reading this section, It can help to read the Technical Details to understand how datasets works. Also you will find more examples at DVC documentation.
The Imaging Platform has a special endpoint that behaves as a dvc http remote that uses a custom auth method based on GitLab Token. The following snippet shows how dvc can be configured to use the platform storage site. For further details, go to dvc-remote-repo
[core]
remote = 1magingPlatform
['remote "1magingPlatform"']
url = https://jisap.tecnalia.com/remote/?remote=69
auth = custom
custom_auth_header = X-Token
ssl_verify = False
To provide the GitLab token, you have to run the following command under the repo to create a config.local
file holding the credentials.
dvc remote modify --local jisap-tecnalia password <your-gitlab-token>
After this setup, all the dvc operations that interact with the remote will work seamlessly.
This is the terminal flow to add a image.png to a dataset.
## Download dataset
export DATASET=http://jisap.tecnalia.com/namespace/dataset.git
git clone $DATASET dataset
## Add data
cp image.png dataset/images/image.png
## Version it
cd dataset
dvc add images/image.png
git add images/image.png.dvc images/.gitignore
git commit -m "added image.png"
## Upload changes
dvc remote modify --local jisap-tecnalia password your-gitlab-token
dvc push
git push
You can refer to dvc add documentation for a detailed explanation.
Download files is really easy thanks to DVC.
## Download dataset
export DATASET=http://jisap.tecnalia.com/namespace/dataset.git
git clone $DATASET dataset
## Pull files
dvc remote modify --local jisap-tecnalia password your-gitlab-token
dvc pull
The operation is performed using standard git checkout
command followed by dvc checkout
to update the files version to match the current branch.