Submission Workflow
The submission workflow follows four major steps:
Create
A submission consists of one or more Dataset
protocol buffers containing
user-defined Reaction
messages. Datasets can be created programatically or
interactively with the ORD web editor.
Please use the ORD web editor to create your submission.
The ORD web editor performs automatic validation that will catch any errors entered into the form, so there is no separate validation step.
See the Python examples here.
If you create your submission programmatically, be sure to run the validate_dataset.py script to identify any validation errors:
$ python validate_dataset.py --input="example_dataset.pbtxt"
When defining reactions and datasets programatically, it is good
practice to use the validation methods
in ord-schema
as part of your workflow.
Prepare
Submissions are received as GitHub pull requests from a fork of the ORD repository. In essence, you are creating a personal copy of the repository, updating it with your data, and then requesting that your changes be merged into the main repository.
If you haven’t done so already, you will need to create a fork of the ord-data repository on GitHub.
TODO: Make sure your fork is up to date.
Create a new branch for your submission.
Clone
your forked repository to your workstation. You may want to use the
--depth
flag to create a shallow clone
instead of fetching the entire commit history:
Important
Be sure to clone your forked repository and not the official repo.
$ git clone --depth=1 "https://github.com/${GITHUB_USERNAME}/${REPOSITORY}"
# Make sure your fork is up to date.
$ git checkout main
$ git pull --rebase upstream main
# Create a new branch for your submission.
$ git checkout -b my_submission
Submit
Upload your dataset(s) into your submission branch on GitHub and commit the result.
# Copy your dataset(s) into your submission branch.
$ cp path/to/example_dataset.pbtxt .
# Commit your changes.
$ git add example_dataset.pbtxt
$ git commit -m "Example dataset submission"
# Push the submission to your fork.
$ git push origin my_submission
Next, log in to GitHub, navigate to the database repository, and create a pull request from your fork to the official repository.
Review
Your submission will be automatically validated and manually reviewed by one of the ORD reviewers. The reviewers may suggest additional changes and continue to iterate with you until they are satisfied with the submission. After your pull request is approved, it will be merged into a new branch in the official repository; this new branch is staging point for automated preprocessing that is required before merging into the official database.
After your submission has been accepted, a reviewer will trigger various automated preprocessing steps, such as renaming the dataset and assigning reaction and dataset IDs. Once these changes are verified by the reviewer, the dataset will be merged into the “main” branch and become part of the official database.