ord_schema.orm

Base ORM objects.

ord_schema.orm.add_datasets

Adds datasets to the ORM database.

Usage:

add_datasets.py –pattern=<str> [options] add_datasets.py -h | –help

Options:
--pattern=<str>

Pattern for dataset filenames

--overwrite

Overwrite existing datasets

--url=<str>

Postgres connection string

--database=<str>

Database [default: orm]

--username=<str>

Database username [default: postgres]

--password=<str>

Database password

--host=<str>

Database host [default: localhost]

--port=<int>

Database port [default: 5432]

--n_jobs=<int>

Number of parallel workers [default: 1]

ord_schema.orm.add_datasets.main(**kwargs)

ord_schema.orm.conftest

ord_schema.orm.database

Functions for creating/managing the PostgreSQL database.

ord_schema.orm.database.add_dataset(dataset: Dataset, session: Session) None

Adds a dataset to the database.

ord_schema.orm.database.add_rdkit(session: Session) None

Adds RDKit PostgreSQL cartridge data to any null-valued cells in the structure table.

ord_schema.orm.database.delete_dataset(dataset_id: str, session: Session) None

Deletes a dataset from the database.

ord_schema.orm.database.get_connection_string(database: str, username: str, password: str, host: str = 'localhost', port: int = 5432) str

Creates an SQLAlchemy connection string.

ord_schema.orm.database.prepare_database(engine: Engine) bool

Prepares the database and creates the ORM table structure.

Parameters

engine – SQLAlchemy Engine.

Returns

Whether the RDKit PostgreSQL cartridge is installed.

ord_schema.orm.mappers

Table mappings for Reaction protos.

Notes

  • Foreign keys to the reaction table are done using the id column, not the ORD reaction ID (reaction_id). However, the reaction_id column is used when the reaction ID is specifically called for, as with crude inputs.

  • We use inheritance to handle messages that appear in more than one context; see https://docs.sqlalchemy.org/en/14/orm/inheritance.html. The possible constraints are:

    • Some message types are used more than once in a parent (such as Time in ReactionInput), which forces the use of either joined or single table inheritance.

    • Some messages can be repeated, while others are unique to their parent. These uniqueness constraints force the use of joined table inheritance since they are specific to their polymorphic type.

    For convenience and consistency, we use single table inheritance for all message types, regardless of whether they are used in one context or more than one context. This means that we do not enforce the second constraint in the database.

ord_schema.orm.mappers.build_mapper(message_type: Type[Message], parents: dict[Type[google.protobuf.message.Message], list[tuple[Type[google.protobuf.message.Message], str, bool]]]) Type

Creates a mapper class for a specific protocol buffer message type.

Parameters
  • message_type – Protocol buffer message type.

  • parents – Dict mapping message types to lists of (parent message type, field name, unique) tuples.

Returns

Generated mapper class.

ord_schema.orm.mappers.build_mappers() dict[Type[google.protobuf.message.Message], Type]

Creates ORM mapper classes for protocol buffer message types.

Returns

Dict mapping protocol buffer message types to mapper classes.

ord_schema.orm.mappers.from_proto(message: Message, mapper: Optional[Type[Base]] = None, key: Optional[str] = None) Base

Converts a protobuf message into an ORM object.

Parameters
  • message – Protobuf message.

  • mapper – ORM mapper class. For top-level protos like Dataset and Reaction this can be left as None; it must be provided for Child subclasses to properly handle polymorphism.

  • key – Map key (we store maps as rows of (key, value) tuples).

Returns

ORM object.

ord_schema.orm.mappers.get_message_type(full_name: str) Any

Fetches the class for a protocol buffer message type.

ord_schema.orm.mappers.get_parents(message_type: Type[Message]) dict[Type[google.protobuf.message.Message], list[tuple[Type[google.protobuf.message.Message], str, bool]]]

Returns the parent message types for each message type.

ord_schema.orm.mappers.to_proto(base: Base) Message

Converts an ORM object into a protobuf message.

Parameters

base – ORM object.

Returns

Protobuf message.

ord_schema.orm.rdkit_mappers

RDKit PostgreSQL cartridge functionality.

class ord_schema.orm.rdkit_mappers.CString

Bases: UserDefinedType

PostgreSQL cstring.

cache_ok: Optional[bool] = True

Indicate if statements using this ExternalType are “safe to cache”.

The default value None will emit a warning and then not allow caching of a statement which includes this type. Set to False to disable statements using this type from being cached at all without a warning. When set to True, the object’s class and selected elements from its state will be used as part of the cache key. For example, using a TypeDecorator:

class MyType(TypeDecorator):
    impl = String

    cache_ok = True

    def __init__(self, choices):
        self.choices = tuple(choices)
        self.internal_only = True

The cache key for the above type would be equivalent to:

>>> MyType(["a", "b", "c"])._static_cache_key
(<class '__main__.MyType'>, ('choices', ('a', 'b', 'c')))

The caching scheme will extract attributes from the type that correspond to the names of parameters in the __init__() method. Above, the “choices” attribute becomes part of the cache key but “internal_only” does not, because there is no parameter named “internal_only”.

The requirements for cacheable elements is that they are hashable and also that they indicate the same SQL rendered for expressions using this type every time for a given cache value.

To accommodate for datatypes that refer to unhashable structures such as dictionaries, sets and lists, these objects can be made “cacheable” by assigning hashable structures to the attributes whose names correspond with the names of the arguments. For example, a datatype which accepts a dictionary of lookup values may publish this as a sorted series of tuples. Given a previously un-cacheable type as:

class LookupType(UserDefinedType):
    '''a custom type that accepts a dictionary as a parameter.

    this is the non-cacheable version, as "self.lookup" is not
    hashable.

    '''

    def __init__(self, lookup):
        self.lookup = lookup

    def get_col_spec(self, **kw):
        return "VARCHAR(255)"

    def bind_processor(self, dialect):
        # ...  works with "self.lookup" ...

Where “lookup” is a dictionary. The type will not be able to generate a cache key:

>>> type_ = LookupType({"a": 10, "b": 20})
>>> type_._static_cache_key
<stdin>:1: SAWarning: UserDefinedType LookupType({'a': 10, 'b': 20}) will not
produce a cache key because the ``cache_ok`` flag is not set to True.
Set this flag to True if this type object's state is safe to use
in a cache key, or False to disable this warning.
symbol('no_cache')

If we did set up such a cache key, it wouldn’t be usable. We would get a tuple structure that contains a dictionary inside of it, which cannot itself be used as a key in a “cache dictionary” such as SQLAlchemy’s statement cache, since Python dictionaries aren’t hashable:

>>> # set cache_ok = True
>>> type_.cache_ok = True

>>> # this is the cache key it would generate
>>> key = type_._static_cache_key
>>> key
(<class '__main__.LookupType'>, ('lookup', {'a': 10, 'b': 20}))

>>> # however this key is not hashable, will fail when used with
>>> # SQLAlchemy statement cache
>>> some_cache = {key: "some sql value"}
Traceback (most recent call last): File "<stdin>", line 1,
in <module> TypeError: unhashable type: 'dict'

The type may be made cacheable by assigning a sorted tuple of tuples to the “.lookup” attribute:

class LookupType(UserDefinedType):
    '''a custom type that accepts a dictionary as a parameter.

    The dictionary is stored both as itself in a private variable,
    and published in a public variable as a sorted tuple of tuples,
    which is hashable and will also return the same value for any
    two equivalent dictionaries.  Note it assumes the keys and
    values of the dictionary are themselves hashable.

    '''

    cache_ok = True

    def __init__(self, lookup):
        self._lookup = lookup

        # assume keys/values of "lookup" are hashable; otherwise
        # they would also need to be converted in some way here
        self.lookup = tuple(
            (key, lookup[key]) for key in sorted(lookup)
        )

    def get_col_spec(self, **kw):
        return "VARCHAR(255)"

    def bind_processor(self, dialect):
        # ...  works with "self._lookup" ...

Where above, the cache key for LookupType({"a": 10, "b": 20}) will be:

>>> LookupType({"a": 10, "b": 20})._static_cache_key
(<class '__main__.LookupType'>, ('lookup', (('a', 10), ('b', 20))))

New in version 1.4.14: - added the cache_ok flag to allow some configurability of caching for TypeDecorator classes.

New in version 1.4.28: - added the ExternalType mixin which generalizes the cache_ok flag to both the TypeDecorator and UserDefinedType classes.

See also

sql_caching

get_col_spec(**kwargs)

Returns the column type.

property python_type

Return the Python type object expected to be returned by instances of this type, if known.

Basically, for those types which enforce a return type, or are known across the board to do such for all common DBAPIs (like int for example), will return that type.

If a return type is not defined, raises NotImplementedError.

Note that any type also accommodates NULL in SQL which means you can also get back None from any type in practice.

class ord_schema.orm.rdkit_mappers.FingerprintType(value)

Bases: Enum

RDKit PostgreSQL fingerprint types.

MORGAN_BFP = <sqlalchemy.sql.functions._FunctionGenerator object>
MORGAN_SFP = <sqlalchemy.sql.functions._FunctionGenerator object>
classmethod get_table_args() list

Returns a list of __table_args__ for _Structure.

Each fingerprint type is given a column (name.lower()) and a corresponding index.

Returns

List of Column and Index objects.

class ord_schema.orm.rdkit_mappers.RDKitMol(**kwargs)

Bases: Base

Table for storing compound structures and associated RDKit cartridge data.

Notes

  • This table lives in a separate “rdkit” schema to avoid name conflicts between tables and extension types.

  • The RDKit-specific columns are populated by ord_schema.orm.database.add_rdkit; this allows the ORM to function normally even if if the RDKit PostgreSQL cartridge is not installed (the smiles column will be populated and the other columns will be empty).

  • Objects with this type are added to the ORM in from_proto() using the structure field.

id
mol
morgan_bfp
morgan_sfp
smiles
classmethod tanimoto(other: str, fp_type: FingerprintType = FingerprintType.MORGAN_BFP)
class ord_schema.orm.rdkit_mappers.RDKitReaction(**kwargs)

Bases: Base

Table for storing reaction objects and associated RDKit cartridge data.

id
reaction
reaction_smiles
ord_schema.orm.rdkit_mappers.rdkit_cartridge() bool

Returns whether to use RDKit PostgreSQL cartridge functionality.