Generic helpers for ord_schema, including common message types.



Wrappers and utilities for handling protocol buffers in Python.

class ord_schema.frozen_message.FrozenMessage(_message: Union[MutableMapping, Message])

Bases: Mapping

Container for a protocol buffer that does not allow edits.


  • For standard scalar values, it is not possible to distinguish between default values and explicitly set values that match the default. If the default is a valid value, add the optional label to the field. See https://github.com/Open-Reaction-Database/ord-schema/pull/174.

  • For optional scalar values and all submessage fields, exceptions are raised if the user attempts to access an undefined attribute (AttributeError), access an undefined map key (KeyError), or set any attribute or map value (dataclasses.FrozenInstanceError).

  • I considered adding a raise_on_error option that would return None instead of raising AttributeError or KeyError when requesting unset values. However, this breaks the guarantee that hasattr returns False for unset optional scalar values and submessages.


Helper functions for constructing Protocol Buffer messages.

class ord_schema.message_helpers.MessageFormat(value)

Bases: Enum

Input/output types for protocol buffer messages.

BINARY = '.pb'
JSON = '.json'
PBTXT = '.pbtxt'
ord_schema.message_helpers.build_compound(smiles: Optional[str] = None, name: Optional[str] = None, amount: Optional[str] = None, role: Optional[str] = None, is_limiting: Optional[bool] = None, prep: Optional[str] = None, prep_details: Optional[str] = None, vendor: Optional[str] = None) Compound

Builds a Compound message with the most common fields.

  • smiles – Text compound SMILES.

  • name – Text compound name.

  • amount – Text amount string, e.g. ‘1.25 g’.

  • role – Text reaction role. Must match a value in ReactionRoleType.

  • is_limiting – Boolean whether this compound is limiting for the reaction.

  • prep – Text compound preparation type. Must match a value in PreparationType.

  • prep_details – Text compound preparation details. If provided, prep is required.

  • vendor – Text compound vendor/supplier.


Compound message.

  • KeyError – if role or prep does not match a supported enum value.

  • TypeError – if amount units are not supported.

  • ValueError – if prep_details is provided and prep is None.

ord_schema.message_helpers.build_data(filename: str, description: str) Data

Reads raw data from a file and creates a Data message.

  • filename – Text filename.

  • description – Text description of the data.


Data message.

ord_schema.message_helpers.check_compound_identifiers(compound: Compound)

Verifies that structural compound identifiers are consistent.


compound – reaction_pb2.Compound message.


ValueError – If structural identifiers are not consistent or are invalid.

ord_schema.message_helpers.create_message(message_name: str) Message

Converts a message name into an instantiation of that class, where the message belongs to the reaction_pb2 module.


message_name – Text name of a message field. For example, “Reaction” or “TemperatureConditions.Measurement”.


Initialized message of the requested type.


ValueError if the name cannot be resolved.

ord_schema.message_helpers.find_submessages(message: Message, submessage_type: Type[MessageType]) list[MessageType]

Recursively finds all submessages of a specified type.

  • message – Protocol buffer.

  • submessage_type – Protocol buffer type.


List of messages.


TypeError – if submessage_type is not a protocol buffer type.

ord_schema.message_helpers.get_compound_identifier(compound: ~ord_schema.proto.reaction_pb2.Compound, identifier_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f3d64127490>) Optional[str]

Returns the value of a compound identifier if it exists. If multiple identifiers of that type exist, only the first is returned.

  • compound – Compound message.

  • identifier_type – The CompoundIdentifier type to retrieve the value of.


Identifier value or None if the identifier is not defined.

ord_schema.message_helpers.get_compound_molblock(compound: Compound) Optional[str]

Returns the value of the compound’s MOLBLOCK identifier if it exists.


compound – Compound message.


MOLBLOCK string or None if the compound has no MOLBLOCK identifier.

ord_schema.message_helpers.get_compound_name(compound: Compound) Optional[str]

Returns the value of the compound’s NAME identifier if it exists.


compound – Compound message.


NAME string or None if the compound has no NAME identifier.

ord_schema.message_helpers.get_compound_smiles(compound: Compound) Optional[str]

Returns the value of the compound’s SMILES identifier if it exists.


compound – Compound message.


SMILES string or None if the compound has no SMILES identifier.

ord_schema.message_helpers.get_product_yield(product: ProductCompound, as_measurement: bool = False)

Returns the value of a product’s yield if it is defined. If multiple measurements of type YIELD exist, only the first is returned.

  • product – ProductCompound message.

  • as_measurement – Whether to return the full ProductMeasurement that corresponds to the yield measurement. Defaults to False.


Yield value as a percentage, the ProductMeasurement message, or None.

ord_schema.message_helpers.get_reaction_smiles(message: Reaction, generate_if_missing: bool = False, allow_incomplete: bool = True, validate: bool = True) Optional[str]

Fetches or generates a reaction SMILES.

  • message – reaction_pb2.Reaction message.

  • generate_if_missing – Whether to generate a reaction SMILES from the inputs and outputs if one is not defined explicitly.

  • allow_incomplete – Boolean whether to allow “incomplete” reaction SMILES that do not include all components (e.g. if a component does not have a structural identifier).

  • validate – Boolean whether to validate the reaction SMILES with rdkit. Only used if allow_incomplete is False.


Text reaction SMILES, or None.


ValueError – If the reaction contains errors.

ord_schema.message_helpers.has_transition_metal(mol: Mol) bool

Determines if a molecule contains a transition metal.


mol – The molecule in question. Should be of type rdkit.Chem.rdchem.Mol


Boolean for whether the molecule has a transition metal.

ord_schema.message_helpers.id_filename(filename: str) str

Converts a filename into a relative path for the repository.


filename – Text basename including an ID.


Text filename relative to the root of the repository.

ord_schema.message_helpers.is_transition_metal(atom: Atom) bool

Determines if an atom is a transition metal.


atom – The atom in question. Should be of type rdkit.Chem.rdchem.Atom


Boolean for whether the atom is a transition metal.

ord_schema.message_helpers.load_message(filename: str, message_type: Type[MessageType]) MessageType

Loads a protocol buffer message from a file.

  • filename – Text filename containing a serialized protocol buffer message.

  • message_type – Message subclass.


Message object.


ValueError – if the message cannot be parsed, or if input_format is not supported.

ord_schema.message_helpers.message_to_row(message: Message, trace: Optional[tuple[str]] = None) dict[str, Union[str, bytes, float, int, bool]]

Converts a proto into a flat dictionary mapping fields to values.

The keys indicate any nesting; for instance a proto that looks like this:

value: {

subvalue: 5


will show up as {‘value.subvalue’: 5} in the dict.

  • message – Proto to convert.

  • trace – Tuple of strings; the trace of nested field names.


Dict mapping string field names to scalar value types.

ord_schema.message_helpers.messages_to_dataframe(messages: Iterable[Message], drop_constant_columns: bool = False) DataFrame

Converts a list of protos to a pandas DataFrame.

  • messages – List of protos.

  • drop_constant_columns – Whether to drop columns that have the same value for all rows.



ord_schema.message_helpers.mol_from_compound(compound: Compound, return_identifier: bool = False) Union[Mol, tuple[rdkit.Chem.rdchem.Mol, str]]

Creates an RDKit Mol from a Compound message.

  • compound – reaction_pb2.Compound message.

  • return_identifier – If True, return the CompoundIdentifier used to create the Mol.


RDKit Mol. identifier: The identifier that was used to create mol. Only returned

if return_identifier is True.

Return type



ValueError – If no structural identifier is available, or if the resulting Mol object is invalid.

ord_schema.message_helpers.molblock_from_compound(compound: Compound) str

Fetches or generates a MolBlock identifier for a compound.


compound – reaction_pb2.Compound message.


MolBlock identifier.

Return type



ValueError – if no structural identifiers are defined.

ord_schema.message_helpers.parse_doi(doi: str) str

Parses a DOI from e.g. a URL.


doi – DOI string.


The (possibly trimmed) DOI.


ValueError – if the DOI cannot be parsed.


Builds a Reaction by splitting a reaction SMILES.

ord_schema.message_helpers.safe_update(target: dict, update: Mapping) None

Checks that update will not clobber any keys in target.

ord_schema.message_helpers.set_compound_identifier(compound: ~ord_schema.proto.reaction_pb2.Compound, identifier_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f3d64127490>, value: str) CompoundIdentifier

Sets the value of a compound identifier if it exists or creates one. If multiple identifiers of that type exist, only the first is overwritten.

  • compound – Compound message.

  • identifier_type – The CompoundIdentifier type to retrieve the value of.

  • value – The value to set.


The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_molblock(compound: Compound, value: str) CompoundIdentifier

Sets the value of the compound’s MOLBLOCK identifier if it exists or creates one.

  • compound – Compound message.

  • value – The value to set.


The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_name(compound: Compound, value: str) CompoundIdentifier

Sets the value of the compound’s NAME identifier if it exists or creates one.

  • compound – Compound message.

  • value – The value to set.


The compound identifier that was modified or created.

ord_schema.message_helpers.set_compound_smiles(compound: Compound, value: str) CompoundIdentifier

Sets the value of the compound’s SMILES identifier if it exists or creates one.

  • compound – Compound message.

  • value – The value to set.


The compound identifier that was modified or created.

ord_schema.message_helpers.set_dative_bonds(mol: Mol, from_atoms: tuple[str, ...] = ('N', 'P')) Mol

Converts metal-ligand bonds to dative.

Replaces some single bonds between metals and atoms with atomic numbers in fromAtoms with dative bonds. For all atoms except carbon, the replacement is only done if the atom has “too many” bonds. To handle metal-carbene complexes, metal-carbon bonds are converted to dative if the sum of the explicit and implicit valence of the carbon atom does not equal its default valence, 4.

  • mol – The molecule to be converted.

  • from_atoms – tuple of atomic symbols corresponding to atom types that

  • P (should have atom-metal bonds converted to dative. Default is N and) –


The modified molecule.

ord_schema.message_helpers.set_solute_moles(solute: Compound, solvents: Sequence[Compound], concentration: str, overwrite: bool = False) list[ord_schema.proto.reaction_pb2.Compound]

Helps define components for stock solution inputs with a single solute and a one or more solvent compounds.

  • solute – Compound with identifiers, roles, etc.; this argument is modified in place to define an amount in moles.

  • solvents – list of Compounds each with defined volume.

  • concentration – string defining solute concentration.

  • overwrite – whether to overwrite an existing solute amount if defined. Defaults to False

  • ValueError – if any solvent does not have a defined volume.

  • ValueError – if the solute has an existing amount field and overwrite is set to False.


List of Compounds to assign to a repeated components field.

ord_schema.message_helpers.smiles_from_compound(compound: Compound) str

Fetches or generates a SMILES identifier for a compound.

If a SMILES identifier already exists, it is simply returned.


compound – reaction_pb2.Compound message.




ValueError – if no structural identifiers are defined.

ord_schema.message_helpers.validate_reaction_smiles(reaction_smiles: str) str

Validates reaction SMILES.


reaction_smiles – Text reaction SMILES.


Updated reaction SMILES.


ValueError – If the reaction contains errors.

ord_schema.message_helpers.write_message(message: Message, filename: str)

Writes a protocol buffer message to disk.

  • message – Protocol buffer message.

  • filename – Text output filename.


ValueError – if filename does not have the expected suffix.


Name/string resolution to structured messages or identifiers.

ord_schema.resolvers.canonicalize_smiles(smiles: str) str

Canonicalizes a SMILES string.


smiles – SMILES string.


Canonicalized SMILES string.


ValueError – If the SMILES cannot be parsed by RDKit.

ord_schema.resolvers.name_resolve(value_type: str, value: str) tuple[str, str]

Resolves compound identifiers to SMILES via multiple APIs.

ord_schema.resolvers.resolve_input(input_string: str) ReactionInput

Resolve a text-based description of an input in one of the following formats:

  1. [AMOUNT] of [NAME]



input_string – String describing the input.


ReactionInput message.


ValueError – if the string cannot be parsed properly.

ord_schema.resolvers.resolve_names(message: Reaction) bool

Attempts to resolve compound NAME identifiers to SMILES.

When a NAME identifier is resolved, a SMILES identifier is added to the list of identifiers for that compound. Note that this function moves on to the next Compound after the first successful name resolution.


message – Reaction proto.


Boolean whether message was modified.


Functions for creating Datasets by enumerating a template with a spreadsheet.

The templating code has specific expectations for how the reaction pbtxt and spreadsheet are defined, namely that placeholder values in the pbtxt begin and end with a “$” (dollar sign) and that these match a unique column header in the spreadsheet file.

ord_schema.templating.generate_dataset(template_string: str, df: DataFrame, validate: bool = True) Dataset

Generates a Dataset by enumerating a template reaction.

  • template_string – The contents of a Reaction pbtxt where placeholder values to be replaced are defined between dollar signs. For example, a SMILES identifier value could be “$product_smiles$”. PLaceholders may only use letters, numbers, and underscores.

  • df – Pandas Dataframe where each row corresponds to one reaction and column names match placeholders in the template_string.

  • validate – Optional Boolean controlling whether Reaction messages should be validated as they are defined. Defaults to True.


A Dataset message.

  • ValueError – If there is no match for a placeholder string in df.

  • ValueError – If validate is True and there are validation errors when validating an enumerated Reaction message.

ord_schema.templating.read_spreadsheet(file_name_or_buffer: Union[str, BinaryIO], suffix: Optional[str] = None) DataFrame

Reads a {csv, xls, xlsx} spreadsheet file.

  • file_name_or_buffer – Filename or buffer. Note that a buffer is only allowed if suffix is not None.

  • suffix – Filename suffix, used to determine the data encoding.


DataFrame containing the reaction spreadsheet data.


Helpers for translating strings with units.

class ord_schema.units.UnitResolver(unit_synonyms: Optional[dict[Type[Union[ord_schema.proto.reaction_pb2.Current, ord_schema.proto.reaction_pb2.FlowRate, ord_schema.proto.reaction_pb2.Length, ord_schema.proto.reaction_pb2.Mass, ord_schema.proto.reaction_pb2.Moles, ord_schema.proto.reaction_pb2.Pressure, ord_schema.proto.reaction_pb2.Temperature, ord_schema.proto.reaction_pb2.Time, ord_schema.proto.reaction_pb2.Voltage, ord_schema.proto.reaction_pb2.Volume, ord_schema.proto.reaction_pb2.Wavelength]], dict[google.protobuf.message.Message, list[str]]]] = None, forbidden_units: Optional[dict[str, str]] = None)

Bases: object

Resolver class for translating value+unit strings into messages.

resolve(string: str, allow_range: bool = False) Union[Current, FlowRate, Length, Mass, Moles, Pressure, Temperature, Time, Voltage, Volume, Wavelength]

Resolves a string into a message containing a value with units.

  • string – The string to parse; must contain a numeric value and a string unit. For example: “1.25 h”.

  • allow_range – If True, ranges like “1-2 h” can be provided and the average value will be reported along with the standard deviation.


Message containing a numeric value with units listed in the schema.


ValueError – if string does not contain a value with units, or if the value is invalid.

ord_schema.units.compute_solute_quantity(volume: Volume, concentration: Concentration) Amount

Computes the quantity of a solute, given volume and concentration.

ord_schema.units.format_message(message: Union[Current, FlowRate, Length, Mass, Moles, Pressure, Temperature, Time, Voltage, Volume, Wavelength]) Optional[str]

Formats a united message into a string.


message – a message with units, e.g., Mass, Length.


A string describing the value, e.g., “5.0 (p/m 0.1) mL” using the

first unit synonym listed in _UNIT_SYNONYMS.


Automated updates for Reaction messages.

ord_schema.updates.update_dataset(dataset: Dataset)

Updates a Dataset message.

Current updates:
  • All reaction-level updates in update_reaction.

  • reaction_id cross-references between Reactions in the dataset.


dataset – dataset_pb2.Dataset message.


KeyError – if the dataset has not been validated and there exists a cross-referenced reaction_id in any Reaction that is not defined elsewhere in the Dataset.

ord_schema.updates.update_reaction(reaction: Reaction) dict[str, str]

Updates a Reaction message.

Current updates:
  • Sets reaction_id if not already set.

  • Adds a record modification event to the provenance.


reaction – reaction_pb2.Reaction message.


A dictionary mapping placeholder reaction_ids to newly-assigned



Helpers validating specific Message types.

exception ord_schema.validations.ValidationError

Bases: Warning

class ord_schema.validations.ValidationOptions(validate_ids: bool = False, require_provenance: bool = False, allow_reaction_smiles_only: bool = True)

Bases: object

Options for message validation.

allow_reaction_smiles_only: bool = True
require_provenance: bool = False
validate_ids: bool = False
class ord_schema.validations.ValidationOutput(errors: list[str] = <factory>, warnings: list[str] = <factory>)

Bases: object

Validation output: errors and warnings.

errors: list[str]
warnings: list[str]
exception ord_schema.validations.ValidationWarning

Bases: Warning

ord_schema.validations.check_type_and_details(message: Union[Analysis, CompoundIdentifier, CompoundPreparation, ElectrochemistryConditions, ElectrochemistryCell, FlowConditions, Tubing, IlluminationConditions, Atmosphere, Measurement, PressureControl, Texture, ProductMeasurement, MassSpecMeasurementDetails, Selectivity, ReactionIdentifier, AdditionDevice, ReactionEnvironment, ReactionWorkup, StirringConditions, Measurement, TemperatureControl, Vessel, VesselAttachment, VesselMaterial, VesselPreparation])

Checks that type/details messages are complete.

ord_schema.validations.check_value_and_units(message: Union[Current, FlowRate, Length, Mass, Moles, Pressure, Temperature, Time, Voltage, Volume, Wavelength])

Checks that value/units messages are complete.

ord_schema.validations.ensure_float_nonnegative(message: Message, field: str)
ord_schema.validations.ensure_float_range(message: Message, field: str, min_value: float = - inf, max_value: float = inf)
ord_schema.validations.get_referenced_reaction_ids(message: Reaction) set[str]

Return the set of reaction IDs that are referenced in a Reaction.

ord_schema.validations.is_empty(message: Message)

Returns whether the given message is empty.

ord_schema.validations.is_valid_dataset_id(dataset_id: str) bool
ord_schema.validations.is_valid_reaction_id(reaction_id: str) bool
ord_schema.validations.reaction_has_internal_standard(message: Reaction) bool

Whether any reaction component uses the internal standard role.

ord_schema.validations.reaction_has_limiting_component(message: Reaction) bool

Whether any reaction input compound is limiting.

ord_schema.validations.reaction_needs_internal_standard(message: Reaction) bool

Whether any analysis uses an internal standard.

ord_schema.validations.validate_addition_device(message: AdditionDevice)
ord_schema.validations.validate_addition_speed(message: AdditionSpeed)
ord_schema.validations.validate_amount(message: Amount)
ord_schema.validations.validate_analysis(message: Analysis)
ord_schema.validations.validate_atmosphere(message: Atmosphere)
ord_schema.validations.validate_compound(message: Compound)
ord_schema.validations.validate_compound_identifier(message: CompoundIdentifier)
ord_schema.validations.validate_compound_preparation(message: CompoundPreparation)
ord_schema.validations.validate_concentration(message: Concentration)
ord_schema.validations.validate_crude_component(message: CrudeComponent)
ord_schema.validations.validate_current(message: Current)
ord_schema.validations.validate_data(message: Data)
ord_schema.validations.validate_dataset(message: Dataset, options: Optional[ValidationOptions] = None)
ord_schema.validations.validate_dataset_example(message: DatasetExample)
ord_schema.validations.validate_datasets(datasets: Mapping[str, Dataset], write_errors: bool = False, options: Optional[ValidationOptions] = None) None

Runs validation for a set of datasets.

  • datasets – Dict mapping text filenames to Dataset protos.

  • write_errors – If True, errors are written to disk.

  • options – ValidationOptions.


ValidationError – if any Dataset does not pass validation.

ord_schema.validations.validate_date_time(message: DateTime)
ord_schema.validations.validate_electrochemistry_cell(message: ElectrochemistryCell)
ord_schema.validations.validate_electrochemistry_conditions(message: ElectrochemistryConditions)
ord_schema.validations.validate_electrochemistry_measurement(message: Measurement)
ord_schema.validations.validate_float_value(message: Message)
ord_schema.validations.validate_flow_conditions(message: FlowConditions)
ord_schema.validations.validate_flow_rate(message: FlowRate)
ord_schema.validations.validate_illumination_conditions(message: IlluminationConditions)
ord_schema.validations.validate_length(message: Length)
ord_schema.validations.validate_mass(message: Mass)
ord_schema.validations.validate_mass_spec_measurement_type(message: MassSpecMeasurementDetails)
ord_schema.validations.validate_message(message: Message, recurse: bool = True, raise_on_error: bool = True, options: Optional[ValidationOptions] = None, trace: Optional[tuple[str, ...]] = None) ValidationOutput

Template function for validating custom messages in the reaction_pb2.

Messages are not validated to check enum values, since these are enforced by the schema. Instead, we only check for validity of items that cannot be enforced in the schema (e.g., non-negativity of certain measurements, consistency of cross-referenced keys).

Note that the message may be modified in-place with any unambiguous changes needed to ensure validity.

  • message – A message to validate.

  • recurse – A boolean that controls whether submessages of message (i.e., fields that are messages) should also be validated. Defaults to True.

  • raise_on_error – If True, raises a ValidationError exception when errors are encountered. If False, the user must manually check the return value to identify validation errors.

  • options – ValidationOptions.

  • trace – Tuple containing a string “stack trace” to track the position of the current message relative to the recursion root.




ValidationError – If any fields are invalid.

ord_schema.validations.validate_moles(message: Moles)
ord_schema.validations.validate_percentage(message: Percentage)
ord_schema.validations.validate_person(message: Person)
ord_schema.validations.validate_pressure(message: Pressure)
ord_schema.validations.validate_pressure_conditions(message: PressureConditions)
ord_schema.validations.validate_pressure_control(message: PressureControl)
ord_schema.validations.validate_pressure_measurement(message: Measurement)
ord_schema.validations.validate_product_compound(message: ProductCompound)
ord_schema.validations.validate_product_measurement(message: ProductMeasurement)
ord_schema.validations.validate_reaction(message: Reaction, options: Optional[ValidationOptions] = None)
ord_schema.validations.validate_reaction_conditions(message: ReactionConditions)
ord_schema.validations.validate_reaction_environment(message: ReactionEnvironment)
ord_schema.validations.validate_reaction_identifier(message: ReactionIdentifier)
ord_schema.validations.validate_reaction_input(message: ReactionInput)
ord_schema.validations.validate_reaction_notes(message: ReactionNotes)
ord_schema.validations.validate_reaction_observation(message: ReactionObservation)
ord_schema.validations.validate_reaction_outcome(message: ReactionOutcome)
ord_schema.validations.validate_reaction_provenance(message: ReactionProvenance)
ord_schema.validations.validate_reaction_setup(message: ReactionSetup)
ord_schema.validations.validate_reaction_workup(message: ReactionWorkup)
ord_schema.validations.validate_record_event(message: RecordEvent)
ord_schema.validations.validate_selectivity(message: Selectivity)
ord_schema.validations.validate_source(message: Source)
ord_schema.validations.validate_stirring_conditions(message: StirringConditions)
ord_schema.validations.validate_stirring_rate(message: StirringRate)
ord_schema.validations.validate_temperature(message: Temperature)
ord_schema.validations.validate_temperature_conditions(message: TemperatureConditions)
ord_schema.validations.validate_temperature_control(message: TemperatureControl)
ord_schema.validations.validate_temperature_measurement(message: Measurement)
ord_schema.validations.validate_texture(message: Texture)
ord_schema.validations.validate_time(message: Time)
ord_schema.validations.validate_tubing(message: Tubing)
ord_schema.validations.validate_unmeasured_amount(message: UnmeasuredAmount)
ord_schema.validations.validate_vessel(message: Vessel)
ord_schema.validations.validate_vessel_attachment(message: VesselAttachment)
ord_schema.validations.validate_vessel_material(message: VesselMaterial)
ord_schema.validations.validate_vessel_preparation(message: VesselPreparation)
ord_schema.validations.validate_voltage(message: Voltage)
ord_schema.validations.validate_volume(message: Volume)
ord_schema.validations.validate_wavelength(message: Wavelength)