ord_schema

Generic helpers for ord_schema, including common message types.

Subpackages

ord_schema.frozen_message

Wrappers and utilities for handling protocol buffers in Python.

class ord_schema.frozen_message.FrozenMessage(_message: Any)

Bases: Mapping

Container for a protocol buffer that does not allow edits.

Notes

  • For standard scalar values, it is not possible to distinguish between default values and explicitly set values that match the default. If the default is a valid value, add the optional label to the field. See https://github.com/Open-Reaction-Database/ord-schema/pull/174.

  • For optional scalar values and all submessage fields, exceptions are raised if the user attempts to access an undefined attribute (AttributeError), access an undefined map key (KeyError), or set any attribute or map value (dataclasses.FrozenInstanceError).

  • I considered adding a raise_on_error option that would return None instead of raising AttributeError or KeyError when requesting unset values. However, this breaks the guarantee that hasattr returns False for unset optional scalar values and submessages.

ord_schema.logging

Logging utilities.

ord_schema.logging.get_logger(name: str, level: int = 20) Logger

Creates a Logger.

ord_schema.logging.silence_rdkit_logs(pattern: str = 'rdApp.*') None

Disables noisy RDKit logs.

ord_schema.message_helpers

Helper functions for constructing Protocol Buffer messages.

class ord_schema.message_helpers.MessageFormat(value)

Bases: Enum

Input/output types for protocol buffer messages.

BINARY/BINPB and PBTXT/TXTPB pairs use the same wire format; the second of each pair is the newer canonical suffix recommended by protobuf.dev.

BINARY = '.pb'
BINPB = '.binpb'
JSON = '.json'
PBTXT = '.pbtxt'
TXTPB = '.txtpb'
ord_schema.message_helpers.build_compound(smiles: str | None = None, name: str | None = None, amount: str | None = None, role: str | None = None, is_limiting: bool | None = None, prep: str | None = None, prep_details: str | None = None, vendor: str | None = None) Compound

Builds a Compound message with the most common fields.

Parameters:
  • smiles – Text compound SMILES.

  • name – Text compound name.

  • amount – Text amount string, e.g. ‘1.25 g’.

  • role – Text reaction role. Must match a value in ReactionRoleType.

  • is_limiting – Boolean whether this compound is limiting for the reaction.

  • prep – Text compound preparation type. Must match a value in PreparationType.

  • prep_details – Text compound preparation details. If provided, prep is required.

  • vendor – Text compound vendor/supplier.

Returns:

Compound message.

Raises:
  • KeyError – if role or prep does not match a supported enum value.

  • TypeError – if amount units are not supported.

  • ValueError – if prep_details is provided and prep is None.

ord_schema.message_helpers.build_data(filename: str, description: str) Data

Reads raw data from a file and creates a Data message.

Parameters:
  • filename – Text filename.

  • description – Text description of the data.

Returns:

Data message.

ord_schema.message_helpers.check_compound_identifiers(compound: Compound | ProductCompound)

Verifies that structural compound identifiers are consistent.

Parameters:

compound – reaction_pb2.Compound message.

Raises:

ValueError – If structural identifiers are not consistent or are invalid.

ord_schema.message_helpers.create_message(message_name: str) Message

Converts a message name into an instantiation of that class, where the message belongs to the reaction_pb2 module.

Parameters:

message_name – Text name of a message field. For example, “Reaction” or “TemperatureConditions.Measurement”.

Returns:

Initialized message of the requested type.

Raises:

ValueError if the name cannot be resolved.

ord_schema.message_helpers.fetch_dataset(dataset_id: str, timeout: float = 10.0) Dataset

Loads a dataset from the ord-data repository.

Parameters:
  • dataset_id – Dataset ID.

  • timeout – Number of seconds to wait before timing out the request.

Returns:

Dataset message.

Raises:
  • RuntimeError – If the request fails.

  • ValueError – If the dataset ID is invalid.

ord_schema.message_helpers.find_submessages(message: Message, submessage_type: type[MessageType]) list[MessageType]

Recursively finds all submessages of a specified type.

Parameters:
  • message – Protocol buffer.

  • submessage_type – Protocol buffer type.

Returns:

List of messages.

Raises:

TypeError – if submessage_type is not a protocol buffer type.

ord_schema.message_helpers.get_compound_identifier(compound: Compound | ProductCompound, identifier_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x734b00981d10>) str | None

Returns the value of a compound identifier if it exists. If multiple identifiers of that type exist, only the first is returned.

Parameters:
  • compound – Compound message.

  • identifier_type – The CompoundIdentifier type to retrieve the value of.

Returns:

Identifier value or None if the identifier is not defined.

ord_schema.message_helpers.get_compound_molblock(compound: Compound | ProductCompound) str | None

Returns the value of the compound’s MOLBLOCK identifier if it exists.

Parameters:

compound – Compound message.

Returns:

MOLBLOCK string or None if the compound has no MOLBLOCK identifier.

ord_schema.message_helpers.get_compound_name(compound: Compound) str | None

Returns the value of the compound’s NAME identifier if it exists.

Parameters:

compound – Compound message.

Returns:

NAME string or None if the compound has no NAME identifier.

ord_schema.message_helpers.get_compound_smiles(compound: Compound | ProductCompound) str | None

Returns the value of the compound’s SMILES identifier if it exists.

Parameters:

compound – Compound message.

Returns:

SMILES string or None if the compound has no SMILES identifier.

ord_schema.message_helpers.get_product_yield(product: ProductCompound, as_measurement: bool = False)

Returns the value of a product’s yield if it is defined. If multiple measurements of type YIELD exist, only the first is returned.

Parameters:
  • product – ProductCompound message.

  • as_measurement – Whether to return the full ProductMeasurement that corresponds to the yield measurement. Defaults to False.

Returns:

Yield value as a percentage, the ProductMeasurement message, or None.

ord_schema.message_helpers.get_reaction_smiles(message: Reaction, generate_if_missing: bool = False, allow_incomplete: bool = True, allow_unspecified_roles: bool = True, validate: bool = False, canonical: bool = True) str | None

Fetches or generates a reaction SMILES.

Parameters:
  • message – reaction_pb2.Reaction message.

  • generate_if_missing – Whether to generate a reaction SMILES from the inputs and outputs if one is not defined explicitly.

  • allow_incomplete – Boolean whether to allow “incomplete” reaction SMILES that do not include all components (e.g. if a component does not have a structural identifier).

  • allow_unspecified_roles – If True, reactants and products with the UNSPECIFIED reaction role will be included when generating a reaction SMILES.

  • validate – Boolean whether to validate the reaction SMILES with rdkit. Only used if allow_incomplete is False.

  • canonical – Boolean whether to return a canonicalized reaction SMILES.

Returns:

Text reaction SMILES, or None.

Raises:

ValueError – If the reaction contains errors.

ord_schema.message_helpers.has_transition_metal(mol: Mol) bool

Determines if a molecule contains a transition metal.

Parameters:

mol – The molecule in question. Should be of type rdkit.Chem.rdchem.Mol

Returns:

Boolean for whether the molecule has a transition metal.

ord_schema.message_helpers.id_filename(filename: str) str

Converts a filename into a relative path for the repository.

Parameters:

filename – Text basename including an ID.

Returns:

Text filename relative to the root of the repository.

ord_schema.message_helpers.is_transition_metal(atom: Atom) bool

Determines if an atom is a transition metal.

Parameters:

atom – The atom in question. Should be of type rdkit.Chem.rdchem.Atom

Returns:

Boolean for whether the atom is a transition metal.

ord_schema.message_helpers.load_message(filename: str, message_type: type[MessageType]) MessageType

Loads a protocol buffer message from a file.

Parameters:
  • filename – Text filename containing a serialized protocol buffer message.

  • message_type – Message subclass.

Returns:

Message object.

Raises:

ValueError – if the message cannot be parsed, or if input_format is not supported.

ord_schema.message_helpers.message_to_row(message: Message, trace: tuple[str, ...] | None = None) dict[str, str | bytes | float | int | bool]

Converts a proto into a flat dictionary mapping fields to values.

The keys indicate any nesting; for instance a proto that looks like this:

value: {

subvalue: 5

}

will show up as {‘value.subvalue’: 5} in the dict.

Parameters:
  • message – Proto to convert.

  • trace – Tuple of strings; the trace of nested field names.

Returns:

Dict mapping string field names to scalar value types.

ord_schema.message_helpers.messages_to_dataframe(messages: Iterable[Message], drop_constant_columns: bool = False) DataFrame

Converts a list of protos to a pandas DataFrame.

Parameters:
  • messages – List of protos.

  • drop_constant_columns – Whether to drop columns that have the same value for all rows.

Returns:

DataFrame.

ord_schema.message_helpers.mol_from_compound(compound: Compound | ProductCompound, return_identifier: bool = False) Mol | tuple[Mol, CompoundIdentifier]

Creates an RDKit Mol from a Compound message.

Parameters:
  • compound – reaction_pb2.Compound message.

  • return_identifier – If True, return the CompoundIdentifier used to create the Mol.

Returns:

RDKit Mol. identifier: The identifier that was used to create mol. Only returned

if return_identifier is True.

Return type:

mol

Raises:

ValueError – If no structural identifier is available, or if the resulting Mol object is invalid.

ord_schema.message_helpers.molblock_from_compound(compound: Compound | ProductCompound) str

Fetches or generates a MolBlock identifier for a compound.

Parameters:

compound – reaction_pb2.Compound or ProductCompound message.

Returns:

MolBlock identifier.

Return type:

molblock

Raises:

ValueError – if no structural identifiers are defined.

ord_schema.message_helpers.parse_doi(doi: str) str

Parses a DOI from e.g. a URL.

Parameters:

doi – DOI string.

Returns:

The (possibly trimmed) DOI.

Raises:

ValueError – if the DOI cannot be parsed.

ord_schema.message_helpers.reaction_from_smiles(reaction_smiles)

Builds a Reaction by splitting a reaction SMILES.

ord_schema.message_helpers.safe_update(target: dict, update: Mapping) None

Checks that update will not clobber any keys in target.

ord_schema.message_helpers.set_compound_identifier(compound: Compound, identifier_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x734b00981d10>, value: str) CompoundIdentifier

Sets the value of a compound identifier if it exists or creates one. If multiple identifiers of that type exist, only the first is overwritten.

Parameters:
  • compound – Compound message.

  • identifier_type – The CompoundIdentifier type to retrieve the value of.

  • value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_molblock(compound: Compound, value: str) CompoundIdentifier

Sets the value of the compound’s MOLBLOCK identifier if it exists or creates one.

Parameters:
  • compound – Compound message.

  • value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_name(compound: Compound, value: str) CompoundIdentifier

Sets the value of the compound’s NAME identifier if it exists or creates one.

Parameters:
  • compound – Compound message.

  • value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_smiles(compound: Compound, value: str) CompoundIdentifier

Sets the value of the compound’s SMILES identifier if it exists or creates one.

Parameters:
  • compound – Compound message.

  • value – The value to set.

Returns:

The compound identifier that was modified or created.

ord_schema.message_helpers.set_dative_bonds(mol: Mol, from_atoms: tuple[str, ...] = ('N', 'P')) Mol

Converts metal-ligand bonds to dative.

Replaces some single bonds between metals and atoms with atomic numbers in fromAtoms with dative bonds. For all atoms except carbon, the replacement is only done if the atom has “too many” bonds. To handle metal-carbene complexes, metal-carbon bonds are converted to dative if the sum of the explicit and implicit valence of the carbon atom does not equal its default valence, 4.

Parameters:
  • mol – The molecule to be converted.

  • from_atoms – tuple of atomic symbols corresponding to atom types that should have atom-metal bonds converted to dative. Default is N and P

Returns:

The modified molecule.

ord_schema.message_helpers.set_solute_moles(solute: Compound, solvents: Sequence[Compound], concentration: str, overwrite: bool = False) list[Compound]

Helps define components for stock solution inputs with a single solute and a one or more solvent compounds.

Parameters:
  • solute – Compound with identifiers, roles, etc.; this argument is modified in place to define an amount in moles.

  • solvents – list of Compounds each with defined volume.

  • concentration – string defining solute concentration.

  • overwrite – whether to overwrite an existing solute amount if defined. Defaults to False

Raises:
  • ValueError – if any solvent does not have a defined volume.

  • ValueError – if the solute has an existing amount field and overwrite is set to False.

Returns:

List of Compounds to assign to a repeated components field.

ord_schema.message_helpers.smiles_from_compound(compound: Compound | ProductCompound, canonical: bool = True) str

Fetches or generates a SMILES identifier for a compound.

If a SMILES identifier already exists, it is simply returned.

Parameters:
  • compound – reaction_pb2.Compound or reaction_pb2.ProductCompound message.

  • validate – If True, returns a canonicalized SMILES.

Returns:

Text SMILES.

Raises:

ValueError – if no structural identifiers are defined.

ord_schema.message_helpers.validate_reaction_smiles(reaction_smiles: str) None

Validates reaction SMILES.

Parameters:

reaction_smiles – Text reaction SMILES.

Raises:

ValueError – If the reaction contains errors.

ord_schema.message_helpers.write_dataset(dataset: Dataset, filename: str) None

Writes a Dataset to disk, dispatching on filename suffix.

.parquet routes to parquet_dataset.write_dataset; other suffixes go through write_message.

ord_schema.message_helpers.write_message(message: Message, filename: str)

Writes a protocol buffer message to disk.

Parameters:
  • message – Protocol buffer message.

  • filename – Text output filename.

Raises:

ValueError – if filename does not have the expected suffix.

ord_schema.resolvers

Name/string resolution to structured messages or identifiers.

ord_schema.resolvers.canonicalize_smiles(smiles: str) str

Canonicalizes a SMILES string.

Parameters:

smiles – SMILES string.

Returns:

Canonicalized SMILES string.

Raises:

ValueError – If the SMILES cannot be parsed by RDKit.

ord_schema.resolvers.name_resolve(value_type: str, value: str) tuple[str, str]

Resolves compound identifiers to SMILES via multiple APIs.

ord_schema.resolvers.resolve_input(input_string: str) ReactionInput

Resolve a text-based description of an input in one of the following formats:

  1. [AMOUNT] of [NAME]

  2. [AMOUNT] of [CONCENTRATION] [SOLUTE] in [SOLVENT]

Parameters:

input_string – String describing the input.

Returns:

ReactionInput message.

Raises:

ValueError – if the string cannot be parsed properly.

ord_schema.resolvers.resolve_names(message: Message) bool

Attempts to resolve compound NAME identifiers to SMILES.

When a NAME identifier is resolved, a SMILES identifier is added to the list of identifiers for that compound. Note that this function moves on to the next Compound after the first successful name resolution.

Parameters:

message – Protocol buffer tree containing Compound submessages (e.g. Reaction or ReactionInput).

Returns:

Boolean whether message was modified.

ord_schema.templating

Functions for creating Datasets by enumerating a template with a spreadsheet.

The templating code has specific expectations for how the reaction pbtxt and spreadsheet are defined, namely that placeholder values in the pbtxt begin and end with a “$” (dollar sign) and that these match a unique column header in the spreadsheet file.

ord_schema.templating.generate_dataset(name: str, description: str, template_string: str, df: DataFrame, validate: bool = True) Dataset

Generates a Dataset by enumerating a template reaction.

Parameters:
  • name – Dataset name.

  • description – Dataset description.

  • template_string – The contents of a Reaction pbtxt where placeholder values to be replaced are defined between dollar signs. For example, a SMILES identifier value could be “$product_smiles$”. PLaceholders may only use letters, numbers, and underscores.

  • df – Pandas Dataframe where each row corresponds to one reaction and column names match placeholders in the template_string.

  • validate – Optional Boolean controlling whether Reaction messages should be validated as they are defined. Defaults to True.

Returns:

A Dataset message.

Raises:
  • ValueError – If there is no match for a placeholder string in df.

  • ValueError – If validate is True and there are validation errors when validating an enumerated Reaction message.

ord_schema.templating.read_spreadsheet(file_name_or_buffer: str | BinaryIO, suffix: str | None = None) DataFrame

Reads a {csv, xls, xlsx} spreadsheet file.

Parameters:
  • file_name_or_buffer – Filename or buffer. Note that a buffer is only allowed if suffix is not None.

  • suffix – Filename suffix, used to determine the data encoding.

Returns:

DataFrame containing the reaction spreadsheet data.

ord_schema.units

Helpers for translating strings with units.

class ord_schema.units.UnitResolver(unit_synonyms: dict[type[Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength], dict[int, list[str]]] | None = None, forbidden_units: dict[str, str] | None = None)

Bases: object

Resolver class for translating value+unit strings into messages.

convert(message: Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength, new_units: str | int) Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength

Converts a united message into another united message of the same type, but with different units.

Parameters:
  • message – a message with units, e.g., Mass, Length.

  • new_units – the desired units of the new message, expressed either as a string or an integer (ENUM value). Use of a string is recommended due to the ambiguity of using ENUM values; for example, Mass.GRAM == Time.MINUTE.

Returns:

A new message with units, e.g., Mass, Length.

resolve(string: str, allow_range: bool = False) Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength

Resolves a string into a message containing a value with units.

Parameters:
  • string – The string to parse; must contain a numeric value and a string unit. For example: “1.25 h”.

  • allow_range – If True, ranges like “1-2 h” can be provided and the average value will be reported along with the standard deviation.

Returns:

Message containing a numeric value with units listed in the schema.

Raises:

ValueError – if string does not contain a value with units, or if the value is invalid.

resolve_unit(string_unit: str) tuple[type[Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength], int]

Resolves a unit string into its message type and unit ENUM value.

Parameters:

string_unit – The string unit to parse; for example: “gram”.

Returns:

Tuple containing the message type and unit ENUM value.

Raises:

KeyError – if string unit cannot be parsed.

ord_schema.units.compute_solute_quantity(volume: Volume, concentration: Concentration) Amount

Computes the quantity of a solute, given volume and concentration.

ord_schema.units.format_message(message: Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength) str | None

Formats a united message into a string.

Parameters:

message – a message with units, e.g., Mass, Length.

Returns:

A string describing the value, e.g., “5.0 (p/m 0.1) mL” using the

first unit synonym listed in _UNIT_SYNONYMS.

ord_schema.updates

Automated updates for Reaction messages.

ord_schema.updates.apply_cross_reference_substitutions(reaction: Reaction, id_substitutions: dict[str, str]) None

Rewrites cross-referenced reaction_ids inside reaction using the substitution map.

ord_schema.updates.apply_reaction_updates(reaction: Reaction, *, new_id: str | None) bool

Applies per-reaction updates in place using a pre-computed reaction ID.

Splitting ID generation out of this function lets a streaming caller allocate IDs in a cheap pre-pass (e.g. from a Parquet reaction_id column) and inject them here without re-deriving them.

Parameters:
  • reaction – Reaction message to mutate.

  • new_id – Pre-computed reaction_id to assign, or None to leave the existing ID untouched.

Returns:

True if the reaction was modified.

ord_schema.updates.assign_dataset_id(dataset: Dataset | DatasetView) str

Assigns a canonical dataset_id if the existing one is missing or non-canonical.

Mutates dataset.dataset_id in place. Works for both Dataset and DatasetView (which exposes dataset_id as a writable attribute).

Returns:

The (possibly newly-assigned) dataset_id.

ord_schema.updates.assign_id_substitutions(old_ids: Iterable[str]) tuple[list[str | None], dict[str, str]]

Pre-allocates canonical reaction IDs for a sequence of old IDs.

A reaction’s ID is replaced when the existing one is missing or does not match the canonical ord-{32 hex} pattern. Cross-reference rewriting only applies to old IDs that were non-empty (i.e., user-supplied placeholders); reactions whose old ID was empty get a new ID but no substitution entry, since nothing else could have referenced them.

NOTE(kearnes): This does not check for the case where a Dataset is edited and reaction_id values are changed inappropriately. This will need to be either (1) caught in review or (2) found by a complex check of the diff.

Parameters:

old_ids – Reaction IDs in the order they appear in the dataset.

Returns:

List parallel to old_ids; entry is the new reaction_id

to assign, or None if the old ID was already canonical.

id_substitutions: Map of old_id -> new_id for entries where the

old ID was a non-empty placeholder. Used to rewrite cross-references.

Return type:

new_ids

ord_schema.updates.update_dataset(dataset: Dataset)

Updates a Dataset message.

Current updates:
  • Sets dataset_id if not already canonical.

  • Sets reaction_id on each Reaction if not already canonical, and appends a record_modified provenance event for any modified Reaction.

  • Rewrites reaction_id cross-references between Reactions in the dataset.

Parameters:

dataset – dataset_pb2.Dataset message.

Raises:

KeyError – if the dataset has not been validated and there exists a cross-referenced reaction_id in any Reaction that is not defined elsewhere in the Dataset.

ord_schema.updates.update_parquet_dataset(input_path: str, output_path: str, *, dataset_id: str) None

Stream-applies update_dataset to a Parquet input, writing the result to output_path.

Two passes over input_path:

  • Pass 1 reads only the reaction_id column (no Reaction decode) to pre-allocate canonical reaction IDs and build the cross-reference map.

  • Pass 2 streams full Reactions, applies per-reaction updates and cross-reference rewrites, and writes them via DatasetWriter.

Peak memory is bounded by one row group plus the ID maps. The caller is responsible for choosing output_path based on the resolved dataset_id (call assign_dataset_id on the input header first to learn it) and for any atomic-rename / validation dance — keeping the rename outside lets the caller validate the written file before publishing it.

Parameters:
  • input_path – Path to the input Parquet dataset.

  • output_path – Path to write the updated Parquet dataset to.

  • dataset_id – Resolved dataset_id to write into the output footer.

ord_schema.validations

Helpers validating specific Message types.

class ord_schema.validations.DatasetCrossRefState(defined_ids: set[str] = <factory>, referenced_ids: set[str] = <factory>, duplicate_count: int = 0, self_reference_count: int = 0)

Bases: object

Aggregated cross-reference observations for a Dataset.

A worker validating a slice of reactions feeds each one into observe and returns the resulting state. The master process merges the per-slice states with merge and then emit_warnings raises a warning per duplicate occurrence, per self-reference, and one summary warning if any referenced reaction_ids are undefined. This keeps the streaming path behaviorally equivalent to the in-memory path.

defined_ids: set[str]
duplicate_count: int = 0
emit_warnings() None
merge(other: DatasetCrossRefState) None
observe(reaction: Reaction) None
referenced_ids: set[str]
self_reference_count: int = 0
exception ord_schema.validations.ValidationError

Bases: Warning

class ord_schema.validations.ValidationOptions(validate_ids: bool = False, require_provenance: bool = True, allow_reaction_smiles_only: bool = True)

Bases: object

Options for message validation.

allow_reaction_smiles_only: bool = True
require_provenance: bool = True
validate_ids: bool = False
class ord_schema.validations.ValidationOutput(errors: list[str] = <factory>, warnings: list[str] = <factory>)

Bases: object

Validation output: errors and warnings.

errors: list[str]
extend(other)
warnings: list[str]
exception ord_schema.validations.ValidationWarning

Bases: Warning

ord_schema.validations.check_type_and_details(message: Analysis | CompoundIdentifier | CompoundPreparation | ElectrochemistryConditions | ElectrochemistryCell | FlowConditions | Tubing | IlluminationConditions | Atmosphere | PressureMeasurement | PressureControl | ProductMeasurement | MassSpecMeasurementDetails | Selectivity | ReactionIdentifier | AdditionDevice | ReactionEnvironment | ReactionWorkup | StirringConditions | TemperatureMeasurement | TemperatureControl | Texture | UnmeasuredAmount | Vessel | VesselAttachment | VesselMaterial | VesselPreparation)

Checks that type/details messages are complete.

ord_schema.validations.check_value_and_units(message: Concentration | Current | FlowRate | Length | Mass | Moles | Pressure | Temperature | Time | Voltage | Volume | Wavelength)

Checks that value/units messages are complete.

ord_schema.validations.ensure_float_nonnegative(message: Message, field: str)
ord_schema.validations.ensure_float_range(message: Message, field: str, min_value: float = -inf, max_value: float = inf)
ord_schema.validations.get_referenced_reaction_ids(message: Reaction) set[str]

Return the set of reaction IDs that are referenced in a Reaction.

ord_schema.validations.is_empty(message: Message)

Returns whether the given message is empty.

ord_schema.validations.is_valid_dataset_id(dataset_id: str) bool
ord_schema.validations.is_valid_reaction_id(reaction_id: str) bool
ord_schema.validations.reaction_has_internal_standard(message: Reaction) bool

Whether any reaction component uses the internal standard role.

ord_schema.validations.reaction_has_limiting_component(message: Reaction) bool

Whether any reaction input compound is limiting.

ord_schema.validations.reaction_needs_internal_standard(message: Reaction) bool

Whether any analysis uses an internal standard.

ord_schema.validations.validate_addition_device(message: AdditionDevice)
ord_schema.validations.validate_addition_speed(message: AdditionSpeed)
ord_schema.validations.validate_amount(message: Amount)
ord_schema.validations.validate_analysis(message: Analysis)
ord_schema.validations.validate_atmosphere(message: Atmosphere)
ord_schema.validations.validate_compound(message: Compound)
ord_schema.validations.validate_compound_identifier(message: CompoundIdentifier)
ord_schema.validations.validate_compound_preparation(message: CompoundPreparation)
ord_schema.validations.validate_concentration(message: Concentration)
ord_schema.validations.validate_crude_component(message: CrudeComponent)
ord_schema.validations.validate_current(message: Current)
ord_schema.validations.validate_data(message: Data)
ord_schema.validations.validate_dataset(message: Dataset | DatasetView, options: ValidationOptions | None = None)
ord_schema.validations.validate_dataset_example(message: DatasetExample)
ord_schema.validations.validate_dataset_streaming(*, name: str, description: str, dataset_id: str, reaction_ids: list[str], has_reactions: bool, state: DatasetCrossRefState, options: ValidationOptions | None = None) None

Dataset-level validation for callers that have already streamed reactions.

Equivalent to validate_dataset for a Dataset whose reactions have been iterated in slices (e.g., per Parquet row group) by upstream workers, with each worker contributing a DatasetCrossRefState that the caller has merged. has_reactions should reflect the source’s row count (e.g., parquet_dataset.read_metadata plus num_row_groups for parquet); inferring it from state would misclassify reactions without reaction_ids or references as empty. Pass reaction_ids=[] for the typical streaming case (parquet does not persist Dataset.reaction_ids).

ord_schema.validations.validate_datasets(datasets: Mapping[str, Dataset | DatasetView], write_errors: bool = False, options: ValidationOptions | None = None) None

Runs validation for a set of datasets.

Parameters:
  • datasets – Dict mapping text filenames to Dataset protos.

  • write_errors – If True, errors are written to disk.

  • options – ValidationOptions.

Raises:

ValidationError – if any Dataset does not pass validation.

ord_schema.validations.validate_date_time(message: DateTime)
ord_schema.validations.validate_electrochemistry_cell(message: ElectrochemistryCell)
ord_schema.validations.validate_electrochemistry_conditions(message: ElectrochemistryConditions)
ord_schema.validations.validate_electrochemistry_measurement(message: ElectrochemistryMeasurement)
ord_schema.validations.validate_float_value(message: Message)
ord_schema.validations.validate_flow_conditions(message: FlowConditions)
ord_schema.validations.validate_flow_rate(message: FlowRate)
ord_schema.validations.validate_illumination_conditions(message: IlluminationConditions)
ord_schema.validations.validate_length(message: Length)
ord_schema.validations.validate_mass(message: Mass)
ord_schema.validations.validate_mass_spec_measurement_type(message: MassSpecMeasurementDetails)
ord_schema.validations.validate_message(message: Message, recurse: bool = True, raise_on_error: bool = True, options: ValidationOptions | None = None, trace: tuple[str, ...] | None = None) ValidationOutput

Template function for validating custom messages in the reaction_pb2.

Messages are not validated to check enum values, since these are enforced by the schema. Instead, we only check for validity of items that cannot be enforced in the schema (e.g., non-negativity of certain measurements, consistency of cross-referenced keys).

Note that the message may be modified in-place with any unambiguous changes needed to ensure validity.

Parameters:
  • message – A message to validate.

  • recurse – A boolean that controls whether submessages of message (i.e., fields that are messages) should also be validated. Defaults to True.

  • raise_on_error – If True, raises a ValidationError exception when errors are encountered. If False, the user must manually check the return value to identify validation errors.

  • options – ValidationOptions.

  • trace – Tuple containing a string “stack trace” to track the position of the current message relative to the recursion root.

Returns:

ValidationOutput.

Raises:

ValidationError – If any fields are invalid.

ord_schema.validations.validate_moles(message: Moles)
ord_schema.validations.validate_percentage(message: Percentage)
ord_schema.validations.validate_person(message: Person)
ord_schema.validations.validate_pressure(message: Pressure)
ord_schema.validations.validate_pressure_conditions(message: PressureConditions)
ord_schema.validations.validate_pressure_control(message: PressureControl)
ord_schema.validations.validate_pressure_measurement(message: PressureMeasurement)
ord_schema.validations.validate_product_compound(message: ProductCompound)
ord_schema.validations.validate_product_measurement(message: ProductMeasurement)
ord_schema.validations.validate_reaction(message: Reaction, options: ValidationOptions | None = None)
ord_schema.validations.validate_reaction_conditions(message: ReactionConditions)
ord_schema.validations.validate_reaction_environment(message: ReactionEnvironment)
ord_schema.validations.validate_reaction_identifier(message: ReactionIdentifier)
ord_schema.validations.validate_reaction_input(message: ReactionInput)
ord_schema.validations.validate_reaction_notes(message: ReactionNotes)
ord_schema.validations.validate_reaction_observation(message: ReactionObservation)
ord_schema.validations.validate_reaction_outcome(message: ReactionOutcome)
ord_schema.validations.validate_reaction_provenance(message: ReactionProvenance)
ord_schema.validations.validate_reaction_setup(message: ReactionSetup)
ord_schema.validations.validate_reaction_workup(message: ReactionWorkup)
ord_schema.validations.validate_record_event(message: RecordEvent)
ord_schema.validations.validate_selectivity(message: Selectivity)
ord_schema.validations.validate_source(message: Source)
ord_schema.validations.validate_stirring_conditions(message: StirringConditions)
ord_schema.validations.validate_stirring_rate(message: StirringRate)
ord_schema.validations.validate_temperature(message: Temperature)
ord_schema.validations.validate_temperature_conditions(message: TemperatureConditions)
ord_schema.validations.validate_temperature_control(message: TemperatureControl)
ord_schema.validations.validate_temperature_measurement(message: TemperatureMeasurement)
ord_schema.validations.validate_texture(message: Texture)
ord_schema.validations.validate_time(message: Time)
ord_schema.validations.validate_tubing(message: Tubing)
ord_schema.validations.validate_unmeasured_amount(message: UnmeasuredAmount)
ord_schema.validations.validate_vessel(message: Vessel)
ord_schema.validations.validate_vessel_attachment(message: VesselAttachment)
ord_schema.validations.validate_vessel_material(message: VesselMaterial)
ord_schema.validations.validate_vessel_preparation(message: VesselPreparation)
ord_schema.validations.validate_voltage(message: Voltage)
ord_schema.validations.validate_volume(message: Volume)
ord_schema.validations.validate_wavelength(message: Wavelength)