Tutorial: Uploading files to Pre-ingest File Storage¶
Note
The following tutorial assumes you have experience working with command-line tools under Linux.
This tutorial does not cover IDA users, which has a different workflow that does not use Pre-Ingest File Storage API.
Here, we go through an example workflow of using upload-rest-api-client to upload files to Pre-Ingest File Storage. Files uploaded to Pre-Ingest File Storage currently can not be described using Qvain, so metax-access is used to create a dataset.
Install tools¶
To get started, you need to install upload-rest-api-client and metax-access. If you’re running CentOS, you can install the required Python dependencies using the following command.
$ sudo yum install -y python3 git
Clone the repositories and install them using pip
$ # Create and activate virtualenv
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install --upgrade pip==20.3.4 setuptools
$ # Install metax-access
$ git clone https://github.com/Digital-Preservation-Finland/metax-access.git
$ cd metax-access
$ pip install -r requirements_dev.txt
$ pip install .
$ cp include/etc/metax.fg ~/.metax.cfg
$ cd ..
$ # Install upload-rest-api-client
$ git clone https://github.com/Digital-Preservation-Finland/upload-rest-api-client.git
$ cd upload-rest-api-client
$ pip install .
$ cp include/upload.cfg ~/.upload.cfg
Whenever you need to use metax_access and upload-client commands, activate the virtualenv you created with the command
$ source venv/bin/activate
To deactivate the virtualenv, use
$ deactivate
Configuration¶
During installation, two configuration files were created. Adjust them as necessary.
~/.upload.cfg
should contain the Pre-Ingest Filestorage token or the username and password.
Token is preferred and can be created using the web UI (https://manage.fairdata.fi/).
$ cat ~/.upload.cfg
[upload]
host=https://manage.fairdata.fi/filestorage/api
user=username
password=********************
token=********
default_project=
~/.metax.cfg
should contain the Metax access token
(https://metax.fairdata.fi/secure/)
$ cat ~/.metax.cfg
[metax]
host=https://metax.fairdata.fi
token=**************
Upload files¶
Once you have activated the virtualenv and configured the tools, you can get
started by creating a tar
archive with files. You can use an archive manager
of your choice for this.
Note
Both tar and zip archives are supported. tar archives additionally support gzip and bz2 compression.
This tutorial uses tar since most Linux systems come with a pre-installed tar command.
This example uses sample_archive.tar
with the following structure:
$ tar -tf sample_archive.tar
sample_dir/
sample_dir/sample1.tiff
sample_dir/sample2.tiff
List available projects:
$ upload-client list-projects
Project Used quota Quota
-------------- ------------ -------
test_project_a 1024 1024000
test_project_b 4096 4096000
Send the tar archive to the filestorage to the selected project.
$ upload-client upload --project test_project_a sample_archive.tar --target archive_dir
.
Uploaded 'sample_archive.tar'
.
Generated metadata for directory: /archive_dir (identifier: 172fa901cf5b3e2188766d0383a6d5fb)
The directory contains subdirectories:
sample_dir (identifier: 59879acead5c3f61ad7cdf504536af64)
Note
You can also specify a default project in ~/.upload.cfg
in which case
you don’t need to provide the --project
parameter.
Create a dataset¶
Get a template dataset and save it to file dataset.json
$ metax_access get template dataset --output dataset.json
Edit the metadata. Use value urn:nbn:fi:att:data-catalog-pas for the
data_catalog field and add the generated root directory to the
directories list. Use the parent_dir
value returned by the previous
command.
$ cat dataset.json
{
"research_dataset": {
"publisher": {
"member_of": {
"name": {
"fi": "Testiorganisaatio"
},
"@type": "Organization"
},
"name": "Teppo Testaaja",
"@type": "Person"
},
"description": {
"en": "A descriptive description describing the contents of this dataset. Must be descriptive."
},
"creator": [
{
"member_of": {
"name": {
"fi": "Testiorganisaatio"
},
"@type": "Organization"
},
"name": "Teppo Testaaja",
"@type": "Person"
}
],
"issued": "2019-01-01",
"title": {
"en": "Upload-rest-api test dataset"
},
"access_rights": {
"access_type": {
"identifier": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open"
}
},
"directories": [
{
"identifier": "<replace with subdirectory identifier from previous command>",
"title": "Title for directory",
"use_category": {
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/use_category",
"identifier": "http://uri.suomi.fi/codelist/fairdata/use_category/code/source",
"pref_label": {
"en": "Source material",
"fi": "Lähdeaineisto",
"und": "Lähdeaineisto"
}
}
}
]
},
"data_catalog": "urn:nbn:fi:att:data-catalog-pas"
}
Finally, post the dataset to Metax:
$ metax_access post dataset dataset.json
You can now start managing your dataset using the Fairdata DPS web application!