ord_schema

Generic helpers for ord_schema, including common message types.

Subpackages

ord_schema.frozen_message

Wrappers and utilities for handling protocol buffers in Python.

class ord_schema.frozen_message.FrozenMessage(_message: Any)

Bases: Mapping

Container for a protocol buffer that does not allow edits.

Notes

For standard scalar values, it is not possible to distinguish between default values and explicitly set values that match the default. If the default is a valid value, add the optional label to the field. See https://github.com/Open-Reaction-Database/ord-schema/pull/174.
For optional scalar values and all submessage fields, exceptions are raised if the user attempts to access an undefined attribute (AttributeError), access an undefined map key (KeyError), or set any attribute or map value (dataclasses.FrozenInstanceError).
I considered adding a raise_on_error option that would return None instead of raising AttributeError or KeyError when requesting unset values. However, this breaks the guarantee that hasattr returns False for unset optional scalar values and submessages.

ord_schema.logging

Logging utilities.

ord_schema.logging.get_logger(name: str, level: int = 20) → Logger: Creates a Logger.

ord_schema.logging.silence_rdkit_logs(pattern: str = 'rdApp.*') → None: Disables noisy RDKit logs.

ord_schema.message_helpers

Helper functions for constructing Protocol Buffer messages.

class ord_schema.message_helpers.MessageFormat(value)

Bases: Enum

Input/output types for protocol buffer messages.

BINARY/BINPB and PBTXT/TXTPB pairs use the same wire format; the second of each pair is the newer canonical suffix recommended by protobuf.dev.

BINARY = '.pb'

BINPB = '.binpb'

JSON = '.json'

PBTXT = '.pbtxt'

TXTPB = '.txtpb'

Builds a Compound message with the most common fields.

Parameters:

smiles – Text compound SMILES.
name – Text compound name.
amount – Text amount string, e.g. ‘1.25 g’.
role – Text reaction role. Must match a value in ReactionRoleType.
is_limiting – Boolean whether this compound is limiting for the reaction.
prep – Text compound preparation type. Must match a value in PreparationType.
prep_details – Text compound preparation details. If provided, prep is required.
vendor – Text compound vendor/supplier.

Returns:

Compound message.

Raises:

KeyError – if role or prep does not match a supported enum value.
TypeError – if amount units are not supported.
ValueError – if prep_details is provided and prep is None.

ord_schema.message_helpers.build_data(filename: str, description: str) → Data

Reads raw data from a file and creates a Data message.

Parameters:

filename – Text filename.
description – Text description of the data.

Returns:

Data message.

ord_schema.message_helpers.check_compound_identifiers(compound: Compound | ProductCompound) → None

Verifies that structural compound identifiers are consistent.

Parameters:: compound – reaction_pb2.Compound message.
Raises:: ValueError – If structural identifiers are not consistent or are invalid.

ord_schema.message_helpers.create_message(message_name: str) → Message

Converts a message name into an instantiation of that class.

The message belongs to the reaction_pb2 module.

Parameters:: message_name – Text name of a message field. For example, “Reaction” or “TemperatureConditions.Measurement”.
Returns:: Initialized message of the requested type.
Raises:: ValueError if the name cannot be resolved. –

ord_schema.message_helpers.find_submessages(message: Message, submessage_type: type[MessageType]) → list[MessageType]

Recursively finds all submessages of a specified type.

Parameters:

message – Protocol buffer.
submessage_type – Protocol buffer type.

Returns:

List of messages.

Raises:

TypeError – if submessage_type is not a protocol buffer type.

ord_schema.message_helpers.get_compound_identifier(compound: Compound | ProductCompound, identifier_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7a5f6e7d0b10>) → str | None

Returns the value of a compound identifier if it exists.

If multiple identifiers of that type exist, only the first is returned.

Parameters:

compound – Compound message.
identifier_type – The CompoundIdentifier type to retrieve the value of.

Returns:

Identifier value or None if the identifier is not defined.

ord_schema.message_helpers.get_compound_molblock(compound: Compound | ProductCompound) → str | None

Returns the value of the compound’s MOLBLOCK identifier if it exists.

Parameters:: compound – Compound message.
Returns:: MOLBLOCK string or None if the compound has no MOLBLOCK identifier.

ord_schema.message_helpers.get_compound_name(compound: Compound) → str | None

Returns the value of the compound’s NAME identifier if it exists.

Parameters:: compound – Compound message.
Returns:: NAME string or None if the compound has no NAME identifier.

ord_schema.message_helpers.get_compound_smiles(compound: Compound | ProductCompound) → str | None

Returns the value of the compound’s SMILES identifier if it exists.

Parameters:: compound – Compound message.
Returns:: SMILES string or None if the compound has no SMILES identifier.

ord_schema.message_helpers.get_product_yield(product: ProductCompound, as_measurement: bool = False) → ProductMeasurement | float | None

Returns the value of a product’s yield if it is defined.

If multiple measurements of type YIELD exist, only the first is returned.

Parameters:

product – ProductCompound message.
as_measurement – Whether to return the full ProductMeasurement that corresponds to the yield measurement. Defaults to False.

Returns:

Yield value as a percentage, the ProductMeasurement message, or None.

ord_schema.message_helpers.get_reaction_smiles(message: Reaction, generate_if_missing: bool = False, allow_incomplete: bool = True, allow_unspecified_roles: bool = True, validate: bool = False, canonical: bool = True) → str | None

Fetches or generates a reaction SMILES.

Parameters:

message – reaction_pb2.Reaction message.
generate_if_missing – Whether to generate a reaction SMILES from the inputs and outputs if one is not defined explicitly.
allow_incomplete – Boolean whether to allow “incomplete” reaction SMILES that do not include all components (e.g. if a component does not have a structural identifier).
allow_unspecified_roles – If True, reactants and products with the UNSPECIFIED reaction role will be included when generating a reaction SMILES.
validate – Boolean whether to validate the reaction SMILES with rdkit. Only used if allow_incomplete is False.
canonical – Boolean whether to return a canonicalized reaction SMILES.

Returns:

Text reaction SMILES, or None.

Raises:

ValueError – If the reaction contains errors.

ord_schema.message_helpers.has_transition_metal(mol: Mol) → bool

Determines if a molecule contains a transition metal.

Parameters:: mol – The molecule in question. Should be of type rdkit.Chem.rdchem.Mol
Returns:: Boolean for whether the molecule has a transition metal.

ord_schema.message_helpers.id_filename(filename: str) → str

Converts a filename into a relative path for the repository.

Parameters:: filename – Text basename including an ID.
Returns:: Text filename relative to the root of the repository.

ord_schema.message_helpers.is_transition_metal(atom: Atom) → bool

Determines if an atom is a transition metal.

Parameters:: atom – The atom in question. Should be of type rdkit.Chem.rdchem.Atom
Returns:: Boolean for whether the atom is a transition metal.

ord_schema.message_helpers.load_message(filename: str | PathLike[str], message_type: type[MessageType]) → MessageType

Loads a protocol buffer message from a file.

Parameters:

filename – Filename (str or path-like) containing a serialized message.
message_type – Message subclass.

Returns:

Message object.

Raises:

ValueError – if the message cannot be parsed, or if input_format is not supported.

Converts a proto into a flat dictionary mapping fields to values.

The keys indicate any nesting; for instance a proto that looks like this:

value: {: subvalue: 5

}

will show up as {‘value.subvalue’: 5} in the dict.

Parameters:

message – Proto to convert.
trace – Tuple of strings; the trace of nested field names.

Returns:

Dict mapping string field names to scalar value types.

ord_schema.message_helpers.messages_to_dataframe(messages: Iterable[Message], drop_constant_columns: bool = False) → DataFrame

Converts a list of protos to a pandas DataFrame.

Parameters:

messages – List of protos.
drop_constant_columns – Whether to drop columns that have the same value for all rows.

Returns:

DataFrame.

ord_schema.message_helpers.mol_from_compound(compound: Compound | ProductCompound, return_identifier: bool = False) → Mol | tuple[Mol, CompoundIdentifier]

Creates an RDKit Mol from a Compound message.

Parameters:

compound – reaction_pb2.Compound message.
return_identifier – If True, return the CompoundIdentifier used to create the Mol.

Returns:

RDKit Mol. identifier: The identifier that was used to create mol. Only returned

if return_identifier is True.

Return type:

mol

Raises:

ValueError – If no structural identifier is available, or if the resulting Mol object is invalid.

ord_schema.message_helpers.molblock_from_compound(compound: Compound | ProductCompound) → str

Fetches or generates a MolBlock identifier for a compound.

Parameters:: compound – reaction_pb2.Compound or ProductCompound message.
Returns:: MolBlock identifier.
Return type:: molblock
Raises:: ValueError – if no structural identifiers are defined.

ord_schema.message_helpers.parse_doi(doi: str) → str

Parses a DOI from e.g. a URL.

Parameters:: doi – DOI string.
Returns:: The (possibly trimmed) DOI.
Raises:: ValueError – if the DOI cannot be parsed.

ord_schema.message_helpers.reaction_from_smiles(reaction_smiles: str) → Reaction: Builds a Reaction by splitting a reaction SMILES.

ord_schema.message_helpers.safe_update(target: dict, update: Mapping) → None: Checks that update will not clobber any keys in target.

ord_schema.message_helpers.save_message(message: Message, filename: str | PathLike[str]) → None

Writes a protocol buffer message to disk.

Parameters:

message – Protocol buffer message.
filename – Output filename (str or path-like).

Raises:

ValueError – if filename does not have the expected suffix.

ord_schema.message_helpers.set_compound_identifier(compound: Compound, identifier_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7a5f6e7d0b10>, value: str) → CompoundIdentifier

Sets the value of a compound identifier if it exists or creates one.

If multiple identifiers of that type exist, only the first is overwritten.

Parameters:

compound – Compound message.
identifier_type – The CompoundIdentifier type to retrieve the value of.
value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_molblock(compound: Compound, value: str) → CompoundIdentifier

Sets the value of the compound’s MOLBLOCK identifier if it exists or creates one.

Parameters:

compound – Compound message.
value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_name(compound: Compound, value: str) → CompoundIdentifier

Sets the value of the compound’s NAME identifier if it exists or creates one.

Parameters:

compound – Compound message.
value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_smiles(compound: Compound, value: str) → CompoundIdentifier

Sets the value of the compound’s SMILES identifier if it exists or creates one.

Parameters:

compound – Compound message.
value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_dative_bonds(mol: Mol, from_atoms: tuple[str, ...] = ('N', 'P')) → Mol

Converts metal-ligand bonds to dative.

Replaces some single bonds between metals and atoms with atomic numbers in fromAtoms with dative bonds. For all atoms except carbon, the replacement is only done if the atom has “too many” bonds. To handle metal-carbene complexes, metal-carbon bonds are converted to dative if the sum of the explicit and implicit valence of the carbon atom does not equal its default valence, 4.

Parameters:

mol – The molecule to be converted.
from_atoms – tuple of atomic symbols corresponding to atom types that should have atom-metal bonds converted to dative. Default is N and P

Returns:

The modified molecule.

ord_schema.message_helpers.set_solute_moles(solute: Compound, solvents: Sequence[Compound], concentration: str, overwrite: bool = False) → list[Compound]

Helps define components for stock solution inputs.

Handles a single solute and one or more solvent compounds.

Parameters:

solute – Compound with identifiers, roles, etc.; this argument is modified in place to define an amount in moles.
solvents – list of Compounds each with defined volume.
concentration – string defining solute concentration.
overwrite – whether to overwrite an existing solute amount if defined. Defaults to False

Raises:

ValueError – if any solvent does not have a defined volume.
ValueError – if the solute has an existing amount field and overwrite is set to False.

Returns:

List of Compounds to assign to a repeated components field.

ord_schema.message_helpers.smiles_from_compound(compound: Compound | ProductCompound, canonical: bool = True) → str

Fetches or generates a SMILES identifier for a compound.

If a SMILES identifier already exists, it is simply returned.

Parameters:

compound – reaction_pb2.Compound or reaction_pb2.ProductCompound message.
canonical – If True, returns a canonicalized SMILES.

Returns:

Text SMILES.

Raises:

ValueError – if no structural identifiers are defined.

ord_schema.message_helpers.validate_reaction_smiles(reaction_smiles: str) → None

Validates reaction SMILES.

Parameters:: reaction_smiles – Text reaction SMILES.
Raises:: ValueError – If the reaction contains errors.

ord_schema.message_helpers.write_dataset(dataset: Dataset, filename: str | PathLike[str]) → None: Deprecated alias for ord_schema.datasets.save_dataset().

ord_schema.message_helpers.write_message(message: Message, filename: str | PathLike[str]) → None: Deprecated alias for save_message().

ord_schema.resolvers

Name/string resolution to structured messages or identifiers.

ord_schema.resolvers.canonicalize_smiles(smiles: str) → str

Canonicalizes a SMILES string.

Parameters:: smiles – SMILES string.
Returns:: Canonicalized SMILES string.
Raises:: ValueError – If the SMILES cannot be parsed by RDKit.

ord_schema.resolvers.name_resolve(*args: str, **kwargs: str) → tuple[str, str]: Deprecated alias for resolve_name().

ord_schema.resolvers.resolve_input(input_string: str) → ReactionInput

Resolves a text-based description of an input.

Supported formats:

[AMOUNT] of [NAME]
[AMOUNT] of [CONCENTRATION] [SOLUTE] in [SOLVENT]

Parameters:: input_string – String describing the input.
Returns:: ReactionInput message.
Raises:: ValueError – if the string cannot be parsed properly.

ord_schema.resolvers.resolve_name(value_type: str, value: str) → tuple[str, str]: Resolves compound identifiers to SMILES via multiple APIs.

ord_schema.resolvers.resolve_names(message: Message) → bool

Attempts to resolve compound NAME identifiers to SMILES.

When a NAME identifier is resolved, a SMILES identifier is added to the list of identifiers for that compound. Note that this function moves on to the next Compound after the first successful name resolution.

Parameters:: message – Protocol buffer tree containing Compound submessages (e.g. Reaction or ReactionInput).
Returns:: Boolean whether message was modified.

ord_schema.templating

Functions for creating Datasets by enumerating a template with a spreadsheet.

The templating code has specific expectations for how the reaction pbtxt and spreadsheet are defined, namely that placeholder values in the pbtxt begin and end with a “$” (dollar sign) and that these match a unique column header in the spreadsheet file.

ord_schema.templating.generate_dataset(name: str, description: str, template_string: str, df: DataFrame, validate: bool = True) → Dataset

Generates a Dataset by enumerating a template reaction.

Parameters:

name – Dataset name.
description – Dataset description.
template_string – The contents of a Reaction pbtxt where placeholder values to be replaced are defined between dollar signs. For example, a SMILES identifier value could be “$product_smiles$”. Placeholders may only use letters, numbers, and underscores.
df – Pandas Dataframe where each row corresponds to one reaction and column names match placeholders in the template_string.
validate – Optional Boolean controlling whether Reaction messages should be validated as they are defined. Defaults to True.

Returns:

A Dataset message.

Raises:

ValueError – If there is no match for a placeholder string in df.
ValueError – If validate is True and there are validation errors when validating an enumerated Reaction message.

ord_schema.templating.load_spreadsheet(file_name_or_buffer: str | BinaryIO, suffix: str | None = None) → DataFrame

Reads a {csv, xls, xlsx} spreadsheet file.

Parameters:

file_name_or_buffer – Filename or buffer. Note that a buffer is only allowed if suffix is not None.
suffix – Filename suffix, used to determine the data encoding.

Returns:

DataFrame containing the reaction spreadsheet data.

ord_schema.templating.read_spreadsheet(file_name_or_buffer: str | BinaryIO, suffix: str | None = None) → DataFrame: Deprecated alias for load_spreadsheet().

ord_schema.units

Helpers for translating strings with units.

Bases: object

Resolver class for translating value+unit strings into messages.

Converts a united message into another of the same type with different units.

Parameters:

message – a message with units, e.g., Mass, Length.
new_units – the desired units of the new message, expressed either as a string or an integer (ENUM value). Use of a string is recommended due to the ambiguity of using ENUM values; for example, Mass.GRAM == Time.MINUTE.

Returns:

A new message with units, e.g., Mass, Length.

Resolves a string into a message containing a value with units.

Parameters:

string – The string to parse; must contain a numeric value and a string unit. For example: “1.25 h”.
allow_range – If True, ranges like “1-2 h” can be provided and the average value will be reported along with the standard deviation.

Returns:

Message containing a numeric value with units listed in the schema.

Raises:

ValueError – if string does not contain a value with units, or if the value is invalid.

Resolves a unit string into its message type and unit ENUM value.

Parameters:: string_unit – The string unit to parse; for example: “gram”.
Returns:: Tuple containing the message type and unit ENUM value.
Raises:: KeyError – if string unit cannot be parsed.

ord_schema.units.compute_solute_quantity(volume: Volume, concentration: Concentration) → Amount: Computes the quantity of a solute, given volume and concentration.

Formats a united message into a string.

Parameters:

message – a message with units, e.g., Mass, Length.

Returns:

A string describing the value, e.g., “5.0 (± 0.1) mL” using the: first unit synonym listed in _UNIT_SYNONYMS.

ord_schema.updates

Automated updates for Reaction messages.

ord_schema.updates.apply_cross_reference_substitutions(reaction: Reaction, id_substitutions: dict[str, str]) → None: Rewrites cross-referenced reaction_ids inside reaction using the substitution map.

ord_schema.updates.apply_reaction_updates(reaction: Reaction, *, new_id: str | None) → bool

Applies per-reaction updates in place using a pre-computed reaction ID.

Splitting ID generation out of this function lets a streaming caller allocate IDs in a cheap pre-pass (e.g. from a Parquet reaction_id column) and inject them here without re-deriving them.

Parameters:

reaction – Reaction message to mutate.
new_id – Pre-computed reaction_id to assign, or None to leave the existing ID untouched.

Returns:

True if the reaction was modified.

ord_schema.updates.assign_dataset_id(dataset: Dataset | DatasetView) → str

Assigns a canonical dataset_id if the existing one is missing or non-canonical.

Mutates dataset.dataset_id in place. Works for both Dataset and DatasetView (which exposes dataset_id as a writable attribute).

Returns:: The (possibly newly-assigned) dataset_id.

ord_schema.updates.assign_id_substitutions(old_ids: Iterable[str]) → tuple[list[str | None], dict[str, str]]

Pre-allocates canonical reaction IDs for a sequence of old IDs.

A reaction’s ID is replaced when the existing one is missing or does not match the canonical ord-{32 hex} pattern. Cross-reference rewriting only applies to old IDs that were non-empty (i.e., user-supplied placeholders); reactions whose old ID was empty get a new ID but no substitution entry, since nothing else could have referenced them.

NOTE(kearnes): This does not check for the case where a Dataset is edited and reaction_id values are changed inappropriately. This will need to be either (1) caught in review or (2) found by a complex check of the diff.

Parameters:

old_ids – Reaction IDs in the order they appear in the dataset.

Returns:

List parallel to old_ids; entry is the new reaction_id: to assign, or None if the old ID was already canonical.
id_substitutions: Map of old_id -> new_id for entries where the: old ID was a non-empty placeholder. Used to rewrite cross-references.

Return type:

new_ids

ord_schema.updates.update_dataset(dataset: Dataset) → None

Updates a Dataset message.

Current updates:

Sets dataset_id if not already canonical.
Sets reaction_id on each Reaction if not already canonical, and appends a record_modified provenance event for any modified Reaction.
Rewrites reaction_id cross-references between Reactions in the dataset.

Parameters:: dataset – dataset_pb2.Dataset message.
Raises:: KeyError – if the dataset has not been validated and there exists a cross-referenced reaction_id in any Reaction that is not defined elsewhere in the Dataset.

ord_schema.updates.update_parquet_dataset(input_path: str | PathLike[str], output_path: str | PathLike[str], *, dataset_id: str) → None

Stream-applies update_dataset to a Parquet input, writing the result to output_path.

Two passes over input_path:

Pass 1 reads only the reaction_id column (no Reaction decode) to pre-allocate canonical reaction IDs and build the cross-reference map.
Pass 2 streams full Reactions, applies per-reaction updates and cross-reference rewrites, and writes them via DatasetWriter.

Peak memory is bounded by one row group plus the ID maps. The caller is responsible for choosing output_path based on the resolved dataset_id (call assign_dataset_id on the input header first to learn it) and for any atomic-rename / validation dance — keeping the rename outside lets the caller validate the written file before publishing it.

Parameters:

input_path – Path to the input Parquet dataset.
output_path – Path to write the updated Parquet dataset to.
dataset_id – Resolved dataset_id to write into the output footer.

ord_schema.validations

Helpers validating specific Message types.

class ord_schema.validations.DatasetCrossRefState(defined_ids: set[str] = <factory>, referenced_ids: set[str] = <factory>, duplicate_count: int = 0, self_reference_count: int = 0)

Bases: object

Aggregated cross-reference observations for a Dataset.

A worker validating a slice of reactions feeds each one into observe and returns the resulting state. The master process merges the per-slice states with merge and then emit_warnings raises a warning per duplicate occurrence, per self-reference, and one summary warning if any referenced reaction_ids are undefined. This keeps the streaming path behaviorally equivalent to the in-memory path.

defined_ids: set[str]

duplicate_count: int = 0

emit_warnings() → None: Emits warnings for duplicate IDs, self-references, and undefined references.

merge(other: DatasetCrossRefState) → None: Merges another state into this one, counting cross-slice duplicate IDs.

observe(reaction: Reaction) → None: Records one reaction’s defined ID, referenced IDs, and self-references.

referenced_ids: set[str]

self_reference_count: int = 0

exception ord_schema.validations.ValidationError

Bases: Warning

Warning category for validation failures that indicate invalid data.

class ord_schema.validations.ValidationOptions(validate_ids: bool = False, require_provenance: bool = True, allow_reaction_smiles_only: bool = True)

Bases: object

Options for message validation.

allow_reaction_smiles_only: bool = True

require_provenance: bool = True

validate_ids: bool = False

class ord_schema.validations.ValidationOutput(errors: list[str] = <factory>, warnings: list[str] = <factory>)

Bases: object

Validation output: errors and warnings.

errors: list[str]

extend(other: ValidationOutput) → None: Appends the errors and warnings from another output to this one.

warnings: list[str]

exception ord_schema.validations.ValidationWarning

Bases: Warning

Warning category for non-fatal validation concerns.

ord_schema.validations.check_type_and_details(message: Analysis | CompoundIdentifier | CompoundPreparation | ElectrochemistryConditions | ElectrochemistryCell | FlowConditions | Tubing | IlluminationConditions | Atmosphere | PressureMeasurement | PressureControl | ProductMeasurement | MassSpecMeasurementDetails | Selectivity | ReactionIdentifier | AdditionDevice | ReactionEnvironment | ReactionWorkup | StirringConditions | TemperatureMeasurement | TemperatureControl | Texture | UnmeasuredAmount | Vessel | VesselAttachment | VesselMaterial | VesselPreparation) → None: Checks that type/details messages are complete.

ord_schema.validations.check_value_and_units(message: Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength) → None: Checks that value/units messages are complete.

ord_schema.validations.ensure_float_nonnegative(message: Message, field: str) → None

Warns if the given numeric field of the message is negative.

Parameters:

message – The message whose field is checked.
field – Name of the numeric field to check.

ord_schema.validations.ensure_float_range(message: Message, field: str, min_value: float = -inf, max_value: float = inf) → None

Warns if the given numeric field of the message is outside [min_value, max_value].

Parameters:

message – The message whose field is checked.
field – Name of the numeric field to check.
min_value – Inclusive lower bound for the field value.
max_value – Inclusive upper bound for the field value.

ord_schema.validations.get_referenced_reaction_ids(message: Reaction) → set[str]: Return the set of reaction IDs that are referenced in a Reaction.

ord_schema.validations.has_atom_mapping(smiles: str) → bool: Returns whether a SMILES string contains atom-map numbers.

ord_schema.validations.is_empty(message: Message) → bool: Returns whether the given message is empty.

ord_schema.validations.is_url(value: str) → bool: Returns whether a string looks like an http(s) URL with a host.

ord_schema.validations.is_valid_dataset_id(dataset_id: str) → bool: Returns whether a dataset ID matches the ord_dataset-<32 hex digits> format.

ord_schema.validations.is_valid_orcid(orcid: str) → bool

Returns whether an ORCID is well-formed, including its checksum.

The final character is an ISO 7064 MOD 11-2 check digit over the preceding 15 digits; see https://support.orcid.org/hc/en-us/articles/360006897674.

Parameters:: orcid – ORCID string, expected as 0000-0000-0000-0000.
Returns:: True if orcid is well-formed and the checksum is correct.

ord_schema.validations.is_valid_reaction_id(reaction_id: str) → bool: Returns whether a reaction ID matches the ord-<32 hex digits> format.

ord_schema.validations.reaction_has_internal_standard(message: Reaction) → bool: Whether any reaction component uses the internal standard role.

ord_schema.validations.reaction_has_limiting_component(message: Reaction) → bool: Whether any reaction input compound is limiting.

ord_schema.validations.reaction_needs_internal_standard(message: Reaction) → bool: Whether any analysis uses an internal standard.

ord_schema.validations.skip_validation(message: Message) → None

No-op validator for message types that need no message-level checks.

Registered explicitly in _VALIDATOR_SWITCH so that every message type has a deliberate entry; see the note in validate_message about forcing a decision whenever a new message type is added.

Parameters:: message – The message that requires no validation.

ord_schema.validations.validate_amount(message: Amount) → None: Validates that volume_includes_solutes is only set for volume amounts.

ord_schema.validations.validate_analysis(message: Analysis) → None: Validates an Analysis message’s type and details.

ord_schema.validations.validate_compound(message: Compound) → None: Validates that a Compound has usable identifiers.

ord_schema.validations.validate_compound_identifier(message: CompoundIdentifier) → None: Validates a CompoundIdentifier’s value and type-specific format.

ord_schema.validations.validate_compound_preparation(message: CompoundPreparation) → None: Validates CompoundPreparation type/details and reaction_id usage.

ord_schema.validations.validate_crude_component(message: CrudeComponent) → None: Validates that a CrudeComponent has a reaction_id and a consistent amount.

ord_schema.validations.validate_data(message: Data) → None: Validates that a Data message has a value and a valid URL/format.

ord_schema.validations.validate_dataset(message: Dataset | DatasetView, options: ValidationOptions | None = None) → None: Validates a Dataset’s scalar fields, reactions, and cross-references.

ord_schema.validations.validate_dataset_example(message: DatasetExample) → None: Validates that a DatasetExample has description, url, and created set.

ord_schema.validations.validate_dataset_streaming(*, name: str, description: str, dataset_id: str, reaction_ids: list[str], has_reactions: bool, state: DatasetCrossRefState, options: ValidationOptions | None = None) → None

Dataset-level validation for callers that have already streamed reactions.

Equivalent to validate_dataset for a Dataset whose reactions have been iterated in slices (e.g., per Parquet row group) by upstream workers, with each worker contributing a DatasetCrossRefState that the caller has merged. has_reactions should reflect the source’s row count (e.g., parquet.load_metadata plus num_row_groups for parquet); inferring it from state would misclassify reactions without reaction_ids or references as empty. Pass reaction_ids=[] for the typical streaming case (parquet does not persist Dataset.reaction_ids).

ord_schema.validations.validate_datasets(datasets: Mapping[str, Dataset | DatasetView], write_errors: bool = False, options: ValidationOptions | None = None) → None

Runs validation for a set of datasets.

Parameters:

datasets – Dict mapping text filenames to Dataset protos.
write_errors – If True, errors are written to disk.
options – ValidationOptions.

Raises:

ValidationError – if any Dataset does not pass validation.

ord_schema.validations.validate_date_time(message: DateTime) → None: Validates that a DateTime value is parseable.

ord_schema.validations.validate_electrochemistry_conditions(message: ElectrochemistryConditions) → None: Validates ElectrochemistryConditions type/field consistency.

ord_schema.validations.validate_float_value(message: FloatValue) → None: Validates that a FloatValue’s precision is non-negative.

ord_schema.validations.validate_illumination_conditions(message: IlluminationConditions) → None: Validates IlluminationConditions type and peak_wavelength usage.

ord_schema.validations.validate_mass_spec_measurement_type(message: MassSpecMeasurementDetails) → None: Validates mass spec m/z ranges and EIC/TIC mass usage.

ord_schema.validations.validate_message(message: Message, recurse: bool = True, raise_on_error: bool = True, options: ValidationOptions | None = None, trace: tuple[str, ...] | None = None) → ValidationOutput

Template function for validating custom messages in the reaction_pb2.

Messages are not validated to check enum values, since these are enforced by the schema. Instead, we only check for validity of items that cannot be enforced in the schema (e.g., non-negativity of certain measurements, consistency of cross-referenced keys).

Note that the message may be modified in-place with any unambiguous changes needed to ensure validity.

Parameters:

message – A message to validate.
recurse – A boolean that controls whether submessages of message (i.e., fields that are messages) should also be validated. Defaults to True.
raise_on_error – If True, raises a ValidationError exception when errors are encountered. If False, the user must manually check the return value to identify validation errors.
options – ValidationOptions.
trace – Tuple containing a string “stack trace” to track the position of the current message relative to the recursion root.

Returns:

ValidationOutput.

Raises:

ValidationError – If any fields are invalid.

ord_schema.validations.validate_percentage(message: Percentage) → None: Validates that a Percentage value is within 0-100 (and not a fraction).

ord_schema.validations.validate_person(message: Person) → None: Validates a Person’s ORCID and email formats.

ord_schema.validations.validate_product_compound(message: ProductCompound) → None: Validates a ProductCompound’s identifiers and desired-product role.

ord_schema.validations.validate_product_measurement(message: ProductMeasurement) → None: Validates a ProductMeasurement’s type-specific value fields.

ord_schema.validations.validate_reaction(message: Reaction, options: ValidationOptions | None = None) → None: Validates a Reaction’s inputs, outcomes, identifiers, and provenance.

ord_schema.validations.validate_reaction_conditions(message: ReactionConditions) → None: Validates ReactionConditions dynamic-details pairing and pH range.

ord_schema.validations.validate_reaction_identifier(message: ReactionIdentifier) → None: Validates a ReactionIdentifier’s SMILES and atom-mapping consistency.

ord_schema.validations.validate_reaction_input(message: ReactionInput) → None: Validates ReactionInput component counts and texture consistency.

ord_schema.validations.validate_reaction_outcome(message: ReactionOutcome) → None: Validates ReactionOutcome products, analysis keys, and conversion.

ord_schema.validations.validate_reaction_provenance(message: ReactionProvenance) → None: Validates ReactionProvenance timestamps, emails, DOI, and URL.

ord_schema.validations.validate_reaction_workup(message: ReactionWorkup) → None: Validates a ReactionWorkup’s type-specific required fields and pH range.

ord_schema.validations.validate_record_event(message: RecordEvent) → None: Validates that a RecordEvent has a time and an identifiable person.

ord_schema.validations.validate_stirring_rate(message: StirringRate) → None: Validates that the stirring rate (rpm) is non-negative.

ord_schema.validations.validate_temperature(message: Temperature) → None: Validates a Temperature, enforcing absolute-zero lower bounds per unit.

Validates a value/units measurement with non-negative value and precision.

Covers the unit message types that share this exact contract (Time, Mass, Volume, etc.); Temperature is validated separately because of its per-unit absolute-zero bounds.

Parameters:: message – A unit message to validate.