Tutorial: Uploading files to Pre-ingest File Storage

Note

The following tutorial assumes you have experience working with command-line tools under Linux.

This tutorial does not cover IDA users, which has a different workflow that does not use Pre-Ingest File Storage API.

Here, we go through an example workflow of using upload-rest-api-client to upload files to Pre-Ingest File Storage. Files uploaded to Pre-Ingest File Storage currently can not be described using Qvain, so metax-access is used to create a dataset.

Install tools

To get started, you need to install upload-rest-api-client and metax-access. If you’re running CentOS, you can install the required Python dependencies using the following command.

$ sudo yum install -y python3 git

Clone the repositories and install them using pip

$ # Create and activate virtualenv
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install --upgrade pip==20.3.4 setuptools
$ # Install metax-access
$ git clone https://github.com/Digital-Preservation-Finland/metax-access.git
$ cd metax-access
$ pip install -r requirements_dev.txt
$ pip install .
$ cp include/etc/metax.fg ~/.metax.cfg
$ cd ..
$ # Install upload-rest-api-client
$ git clone https://github.com/Digital-Preservation-Finland/upload-rest-api-client.git
$ cd upload-rest-api-client
$ pip install .
$ cp include/upload.cfg ~/.upload.cfg

Whenever you need to use metax_access and upload-client commands, activate the virtualenv you created with the command

$ source venv/bin/activate

To deactivate the virtualenv, use

$ deactivate

Configuration

During installation, two configuration files were created. Adjust them as necessary.

~/.upload.cfg should contain the Pre-Ingest Filestorage token or the username and password. Token is preferred and can be created using the web UI (https://manage.fairdata.fi/).

$ cat ~/.upload.cfg
[upload]
host=https://manage.fairdata.fi/filestorage/api
user=username
password=********************
token=********
default_project=

~/.metax.cfg should contain the Metax access token (https://metax.fairdata.fi/secure/)

$ cat ~/.metax.cfg
[metax]
host=https://metax.fairdata.fi
token=**************

Upload files

Once you have activated the virtualenv and configured the tools, you can get started by creating a tar archive with files. You can use an archive manager of your choice for this.

Note

Both tar and zip archives are supported. tar archives additionally support gzip and bz2 compression.

This tutorial uses tar since most Linux systems come with a pre-installed tar command.

This example uses sample_archive.tar with the following structure:

$ tar -tf sample_archive.tar
sample_dir/
sample_dir/sample1.tiff
sample_dir/sample2.tiff

List available projects:

$ upload-client list-projects

Project           Used quota    Quota
--------------  ------------  -------
test_project_a          1024  1024000
test_project_b          4096  4096000

Send the tar archive to the filestorage to the selected project.

$ upload-client upload --project test_project_a sample_archive.tar --target archive_dir
.
Uploaded 'sample_archive.tar'
.
Generated metadata for directory: /archive_dir (identifier: 172fa901cf5b3e2188766d0383a6d5fb)

The directory contains subdirectories:
sample_dir (identifier: 59879acead5c3f61ad7cdf504536af64)

Note

You can also specify a default project in ~/.upload.cfg in which case you don’t need to provide the --project parameter.

Create a dataset

Get a template dataset and save it to file dataset.json

$ metax_access get template dataset --output dataset.json

Edit the metadata. Use value urn:nbn:fi:att:data-catalog-pas for the data_catalog field and add the generated root directory to the directories list. Use the parent_dir value returned by the previous command.

$ cat dataset.json
{
    "research_dataset": {
        "publisher": {
            "member_of": {
                "name": {
                    "fi": "Testiorganisaatio"
                },
                "@type": "Organization"
            },
            "name": "Teppo Testaaja",
            "@type": "Person"
        },
        "description": {
            "en": "A descriptive description describing the contents of this dataset. Must be descriptive."
        },
        "creator": [
            {
                "member_of": {
                    "name": {
                        "fi": "Testiorganisaatio"
                    },
                    "@type": "Organization"
                },
                "name": "Teppo Testaaja",
                "@type": "Person"
            }
        ],
        "issued": "2019-01-01",
        "title": {
            "en": "Upload-rest-api test dataset"
        },
        "access_rights": {
            "access_type": {
                "identifier": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open"
            }
        },
        "directories": [
                {
                        "identifier": "<replace with subdirectory identifier from previous command>",
                        "title": "Title for directory",
                        "use_category": {
                                "in_scheme": "http://uri.suomi.fi/codelist/fairdata/use_category",
                                "identifier": "http://uri.suomi.fi/codelist/fairdata/use_category/code/source",
                                "pref_label": {
                                        "en": "Source material",
                                        "fi": "Lähdeaineisto",
                                        "und": "Lähdeaineisto"
                                }
                        }
                }
        ]
    },
    "data_catalog": "urn:nbn:fi:att:data-catalog-pas"
}

Finally, post the dataset to Metax:

$ metax_access post dataset dataset.json

You can now start managing your dataset using the Fairdata DPS web application!