Tutorial: Uploading files to Pre-ingest File Storage ==================================================== .. note:: The following tutorial assumes you have experience working with command-line tools under Linux. This tutorial does not cover IDA users, which has a different workflow that does not use Pre-Ingest File Storage API. Here, we go through an example workflow of using :ref:`upload-rest-api-client` to upload files to Pre-Ingest File Storage. Files uploaded to Pre-Ingest File Storage currently can not be described using Qvain, so :doc:`metax-access` is used to create a dataset. Install tools ------------- To get started, you need to install upload-rest-api-client and metax-access. If you're running CentOS, you can install the required Python dependencies using the following command. .. code-block:: console $ sudo yum install -y python3 git Clone the repositories and install them using pip .. We're deviating from the "normal" installation instructions since running `make github` in metax-access and upload-rest-api-client would result in two different virtualenvs being created and activated, which the user would have to manage separately. .. That makes running commands more complicated than it needs to be. The end goal could be that the following works instead: $ python3 -m venv venv $ source venv/bin/activate $ pip install --upgrade pip setuptools $ pip install git+https://github.com/Digital-Preservation-Finland/metax-access.git $ pip install git+https://github.com/Digital-Preservation-Finland/upload-rest-api-client.git .. code-block:: console $ # Create and activate virtualenv $ python3 -m venv venv $ source venv/bin/activate $ pip install --upgrade pip==20.3.4 setuptools $ # Install metax-access $ git clone https://github.com/Digital-Preservation-Finland/metax-access.git $ cd metax-access $ pip install -r requirements_dev.txt $ pip install . $ cp include/etc/metax.fg ~/.metax.cfg $ cd .. $ # Install upload-rest-api-client $ git clone https://github.com/Digital-Preservation-Finland/upload-rest-api-client.git $ cd upload-rest-api-client $ pip install . $ cp include/upload.cfg ~/.upload.cfg Whenever you need to use `metax_access` and `upload-client` commands, activate the virtualenv you created with the command .. code-block:: console $ source venv/bin/activate To deactivate the virtualenv, use .. code-block:: console $ deactivate Configuration ------------- During installation, two configuration files were created. Adjust them as necessary. ``~/.upload.cfg`` should contain the Pre-Ingest Filestorage token *or* the username and password. Token is preferred and can be created using the web UI (https://manage.fairdata.fi/). .. code-block:: console $ cat ~/.upload.cfg [upload] host=https://manage.fairdata.fi/filestorage/api user=username password=******************** token=******** default_project= ``~/.metax.cfg`` should contain the Metax access token (https://metax.fairdata.fi/secure/) .. code-block:: console $ cat ~/.metax.cfg [metax] host=https://metax.fairdata.fi token=************** Upload files ------------ Once you have activated the virtualenv and configured the tools, you can get started by creating a ``tar`` archive with files. You can use an archive manager of your choice for this. .. TODO: Python 3.6 also supports tar + lzma compression. Update that here once the migration is complete. .. note:: Both `tar` and `zip` archives are supported. `tar` archives additionally support `gzip` and `bz2` compression. This tutorial uses `tar` since most Linux systems come with a pre-installed `tar` command. This example uses ``sample_archive.tar`` with the following structure: .. code-block:: console $ tar -tf sample_archive.tar sample_dir/ sample_dir/sample1.tiff sample_dir/sample2.tiff List available projects: .. code-block:: console $ upload-client list-projects Project Used quota Quota -------------- ------------ ------- test_project_a 1024 1024000 test_project_b 4096 4096000 Send the tar archive to the filestorage to the selected project. .. code-block:: console $ upload-client upload --project test_project_a sample_archive.tar --target archive_dir . Uploaded 'sample_archive.tar' . Generated metadata for directory: /archive_dir (identifier: 172fa901cf5b3e2188766d0383a6d5fb) The directory contains subdirectories: sample_dir (identifier: 59879acead5c3f61ad7cdf504536af64) .. note:: You can also specify a default project in ``~/.upload.cfg`` in which case you don't need to provide the ``--project`` parameter. Create a dataset ---------------- Get a template dataset and save it to file ``dataset.json`` .. code-block:: console $ metax_access get template dataset --output dataset.json Edit the metadata. Use value **urn:nbn:fi:att:data-catalog-pas** for the **data_catalog** field and add the generated root directory to the **directories** list. Use the ``parent_dir`` value returned by the previous command. .. Only the system user (not the end-user) requires that the `metadata_provider_org` and `metadata_provider_user` fields are included. The fields' omission from the template is deliberate and not a bug. .. code-block:: console $ cat dataset.json { "research_dataset": { "publisher": { "member_of": { "name": { "fi": "Testiorganisaatio" }, "@type": "Organization" }, "name": "Teppo Testaaja", "@type": "Person" }, "description": { "en": "A descriptive description describing the contents of this dataset. Must be descriptive." }, "creator": [ { "member_of": { "name": { "fi": "Testiorganisaatio" }, "@type": "Organization" }, "name": "Teppo Testaaja", "@type": "Person" } ], "issued": "2019-01-01", "title": { "en": "Upload-rest-api test dataset" }, "access_rights": { "access_type": { "identifier": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open" } }, "directories": [ { "identifier": "", "title": "Title for directory", "use_category": { "in_scheme": "http://uri.suomi.fi/codelist/fairdata/use_category", "identifier": "http://uri.suomi.fi/codelist/fairdata/use_category/code/source", "pref_label": { "en": "Source material", "fi": "Lähdeaineisto", "und": "Lähdeaineisto" } } } ] }, "data_catalog": "urn:nbn:fi:att:data-catalog-pas" } Finally, post the dataset to Metax: .. code-block:: console $ metax_access post dataset dataset.json **You can now start managing your dataset using the Fairdata DPS web application!**