How-To¶
Usage¶
Before using datlight to upload data, it is necessary to add some metadata which will be used by the data repository to create the data record.
Zenodo Metadata¶
Zenodo requires a specific set of metadata which should be included with every data record.
Zenodo metadata which should always be include in any metadata are:
- title
- description
- upload_type
- creators
- access_right
- license
some additional values may be needed depending on the values of the upload_type and access_right.
If:
- upload_type is publication the publication_type keyword is required.
- upload_type is image the image_type keyword is required.
- access_right is embargoed the embargo_date keyword is required.
- access_right is restricted the access_conditions keyword is required.
The Zenodo API documentation gives all of the accepted metadata keywords.
These metadata muse be provided in a file with the YAML format which is described below.
YAML Format.¶
Basics¶
YAML files can optionally begin with — and end with … That indicates the start and end of a document.
Comments¶
It is possible to add comments in a yaml file. The character used to declare that everything after that character is considered as comment is # (similar to python):
# This is a comment in yaml
title: a title # <- The line read up to this character.
Key-value pairs¶
YAML stores information using key value pairs.The key describes what the data is and the value is the data. To describe the title of a record, a key value pair might look like:
.. code-block:: yaml
title : A title for our data
A string can be written over multiple lines in three different ways:
title : "A title for our data
which extend on multiple lines using quote"
title : |
A title for our data
which extend on multiple lines "Literal Block Scalar" |
will include the newlines and any trailing spaces.
title : >
A title for our data
which extend on multiple lines "Literal Block Scalar" >
will fold newlines to spaces; it’s used to make what would
otherwise be a very long line easier to read and edit.
In either case the indentation will be ignored.
Values may be more complex than just a text. YAML allows lists or combinations of lists and key-pair values. YAML will consider lines prefixed with more spaces than the parent key as being are contained inside tje parent key, All child keys must be prefixed with the same amount of spaces to belong to the same level.
Below are some examples of this.
A list in YAML where the key has the name alist and the value is a list of three elements:
alist:
- first element
- second element.
- third element
For datalight we can use it to list, for example, the creators of the data:
creators:
- name: Jane Doe
- name: Alan Smith
The value associated witgh a key can be a list the list may also contain more key-value pairs. In this example there are multiple creators each of whom has an affiliation:
creators:
- name: Jane Doe
affiliation: University of Neverland
- name: Alan Smith
affiliation: University of Shire
This is the description of the YAML format needed to create a metadata file to upload on our favorite data repository.
In the following section, we are going to see different examples of valid metadata for the Zenodo repository.
Zenodo metadata examples¶
Minimal¶
title: A small title describing our data
description: "Description of the dataset that
is going to be upload"
upload_type: dataset
creators:
- name: Jane Doe
affiliation: University of Neverland
- name: Alan Smith
affiliation: University of Shire
access_right: open
license: CC-BY-4.0
This metadata will be sufficient to upload successfully a dataset on Zenodo.
A more complete set of metadata¶
title: "A very long
title"
description: "Description of the data"
creators:
- name: John Doe
affiliation: The University of Neverland
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
upload_type: dataset
access_right: restricted
access_conditions: "Only available through contact to myproject project"
communities:
- identifier: mycommunity
thesis_supervisors:
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
contributors:
- name: Alan Smith
affiliation: The University of Mars
orcid: 0000-0000-0000-0005
type: ContactPerson
license: CC-BY-4.0
keywords:
- MyProject
- another_keyword
notes: "If grant number is not reconize by OpenAir this is where you
indicate the information related to the grant (as mention in
the Zenodo documentation)."
# If project known in the OpenAir
#grants:
# id:
language: eng
subjects:
- term: Fantasy and SF
identifier: http://id.loc.gov/authorities/subjects/sh000000
scheme: "url"
An example of a metadata file where the record is emboargoed until a certain date:
title: "A very long
title"
description: "Description of the data"
creators:
- name: John Doe
affiliation: The University of Neverland
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
upload_type: dataset
access_right: embargoed
embargo_date: 2022-12-31
communities:
- identifier: mycommunity
thesis_supervisors:
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
contributors:
- name: Alan Smith
affiliation: The University of Mars
orcid: 0000-0000-0000-0005
type: ContactPerson
license: CC-BY-4.0
keywords:
- MyProject
- another_keyword
notes: "If grant number is not reconize by OpenAir this is where you
indicate the information related to the grant (as mention in
the Zenodo documentation)."
# If project known in the OpenAir
#grants:
# id:
language: eng
subjects:
- term: Fantasy and SF
identifier: http://id.loc.gov/authorities/subjects/sh000000
scheme: "url"
Datalight usage¶
When you have a file or directory containing your data and a proper metadata file associated, you can upload your data to Zenodo data repository.
$ python main.py file_name <name of the file which contains zenodo metadata>
$ python main.py directory <name of the file which contains zenodo metadata>
The second argument should point to the file which contains the Zenodo metadata as described above.
Publishing the data at the upload time¶
By default the data will be upload on the data repository but they will not be published. You can ask datalight to do it using the argument publish=True:
$ python main.py file_to_upload.txt metadata.yaml publish=True
In this example file_to_upload.txt will be uploaded with the information found in the metadata.yaml and the record will be published on the Zenodo.
Warning
data which have been published cannot be removed. They will be present forever on the data repository.
The finalisation of the data and the publication can also be done through the web interface on Zenodo. When you upload a file with datalight using your token it associates it with your account and so you can see it by logging in to Zenodo with your username and password.
Testing the upload¶
If you prefer to test the upload of your data, Zenodo provides a sandbox website. Tis is just like the real website except data is regularly deleted. As such you can use it for testing purposes. To upload data to the sandbox you need to use the sandbox=True argument:
$ python main.py file_to_upload.txt -m metadata.yaml sandbox=True
will upload (but not publish) the data on the sandbox website.
Warning
1. To be able to use the sandbox you need to create an account and get a token from: https://sandbox.zenodo.org. This is described in the .. _prerequisites: section of the documenation.
- Zenodo sandbox is sometimes unreliable and the tests can fail with an error 500. That does not necessarily mean that the upload didn’t work but that datalight did not get a valid resposne from the website.