Utils#
Here are all the utility functions to manipulate the combinatorial complexes, graphs and molecules, to make plots, create logs, load configuration and datasets, etc.
cc_utils.py: utility functions for combinatorial complex data (flag masking, conversions, etc.).
- ccsd.src.utils.cc_utils.get_cells(N: int, d_min: int, d_max: int) Tuple[List[FrozenSet[int]], Dict[FrozenSet[int], int], Dict[int, List[int]], List[FrozenSet[int]], Dict[FrozenSet[int], int], Dict[int, List[int]]] [source]#
Get all rank-2 cells of size d_min to d_max. Returns a list of all rank-2 cells, a dictionary mapping rank-2 cells to a column index in the incidence matrix, a dictionary mapping nodes to a list of column indices in the incidence matrix, a list of all edges, a dictionary mapping edges to a row index in the incidence matrix and a dictionary mapping nodes to a list of row indices in the incidence matrix.
- Parameters:
N (int) – maximum number of nodes
d_min (int, optional) – minimum size of rank-2 cells.
d_max (int, optional) – maximum size of rank-2 cells.
- Returns:
list of all rank-2 cells, dictionary mapping rank-2 cells to a column index in the incidence matrix, dictionary mapping nodes to a list of column indices in the incidence matrix, dictionary mapping edges to a row index in the incidence matrix and a dictionary mapping nodes to a list of row indices in the incidence matrix
- Return type:
Tuple[List[FrozenSet[int]], Dict[FrozenSet[int], int], Dict[int, List[int]], List[FrozenSet[int]], Dict[FrozenSet[int], int], Dict[int, List[int]]]
- ccsd.src.utils.cc_utils.create_incidence_1_2(N: int, A: ndarray | Tensor, d_min: int, d_max: int, two_rank_cells: Dict[FrozenSet[int], Dict[str, Any]]) ndarray [source]#
Create the incidence matrix of rank-1 to rank-2 cells from an adjacency matrix and a list of the rank-2 cells of the CC.
- Parameters:
N (int) – maximum number of nodes
A (Union[np.ndarray, torch.Tensor]) – adjacency matrix
d_min (int) – minimum size of rank-2 cells
d_max (int) – maximum size of rank-2 cells
two_rank_cells (Dict[FrozenSet[int], Dict[str, Any]]) – list of rank-2 cells
- Returns:
incidence matrix of rank-1 to rank-2 cells
- Return type:
np.ndarray
- ccsd.src.utils.cc_utils.cc_from_incidence(incidence_matrices_: List[ndarray | None] | List[Tensor | None] | None, d_min: int, d_max: int, is_molecule: bool = False) CombinatorialComplex [source]#
Convert (pseudo)-incidence matrices to a combinatorial complex (CC).
- Parameters:
incidence_matrices (Optional[Union[List[Optional[np.ndarray]], List[Optional[torch.Tensor]]]]) – list of incidence matrices [X, A, F]
d_min (int, optional) – minimum size of rank-2 cells.
d_max (int, optional) – maximum size of rank-2 cells.
is_molecule (bool, optional) – whether the CC is a molecule. Defaults to False.
- Raises:
NotImplementedError – raise an error if the CC is of dimension greater than 2 (if len(incidence_matrices_) > 3)
- Returns:
combinatorial complex (CC) object
- Return type:
CombinatorialComplex
- ccsd.src.utils.cc_utils.get_rank2_dim(N: int, d_min: int, d_max: int) int [source]#
Get the dimension of the rank-2 incidence matrix of a combinatorial complex with the given parameters.
- Parameters:
N (int) – maximum number of nodes
d_min (int) – minimum size of rank-2 cells
d_max (int) – maximum size of rank-2 cells
- Returns:
dimension of the rank-2 incidence matrix
- Return type:
int
- ccsd.src.utils.cc_utils.get_mol_from_x_adj(x: Tensor, adj: Tensor, dataset: str = 'QM9') Mol [source]#
Get a molecule from the node and adjacency matrices after being processed by get_transform_fn inside data_loader_mol.py. Atoms - 0: C, 1: N, 2: O, 3: F, 4: P, 5: S, 6: Cl, 7: Br, 8: I Bonds - 1: single, 2: double, 3: triple
- Parameters:
x (torch.Tensor) – node matrix
adj (torch.Tensor) – adjacency matrix
- Returns:
molecule (RDKIT mol)
- Return type:
Chem.Mol
- ccsd.src.utils.cc_utils.get_all_mol_rings(mol: Mol) List[FrozenSet[int]] [source]#
Get all the rings of a molecule.
- Parameters:
mol (Chem.Mol) – molecule (RDKIT mol)
- Returns:
list of rings as frozensets of atom indices
- Return type:
List[FrozenSet[int]]
- ccsd.src.utils.cc_utils.mols_to_cc(mols: List[Mol]) List[CombinatorialComplex] [source]#
Convert a list of molecules to a list of combinatorial complexes where the rings are rank-2 cells. This is a lift operation.
This is a general function mostly used for testing. A more complete one is implemented in src/utils/data_loader_mol.py within the MolDataset class.
- Parameters:
mols (List[Chem.Mol]) – list of molecules (RDKIT mol)
- Returns:
molecules as combinatorial complexes where the cycles are rank-2 cells
- Return type:
List[CombinatorialComplex]
Example
>>> mols = [Chem.MolFromSmiles("Cc1ccccc1"), Chem.MolFromSmiles("c1cccc2c1CCCC2")] >>> ccs = mols_to_cc(mols)
- ccsd.src.utils.cc_utils.CC_to_incidence_matrices(CC: CombinatorialComplex, d_min: int | None, d_max: int | None, N: int | None = None) List[ndarray] [source]#
Convert a combinatorial complex to a list of incidence matrices.
- Parameters:
CC (CombinatorialComplex) – combinatorial complex
d_min (Optional[int]) – minimum size of rank-2 cells. If not provided, calculated from the CC
d_max (Optional[int]) – maximum size of rank-2 cells. If not provided, calculated from the CC
N (Optional[int], optional) – maximum number of nodes. If not provided, calculated from the CC. Defaults to None. This parameter is here just in case but it is better to not use it and to pad the matrices with the correct functions.
- Returns:
list of incidence matrices [X, A, F]
- Return type:
List[np.ndarray]
- ccsd.src.utils.cc_utils.ccs_to_mol(ccs: List[CombinatorialComplex]) List[Mol] [source]#
Convert a list of combinatorial complexes to a list of molecules.
- Parameters:
ccs (List[CombinatorialComplex]) – list of combinatorial complexes
convert (that represent molecules to) –
- Returns:
list of molecules
- Return type:
List[Chem.Mol]
- ccsd.src.utils.cc_utils.get_N_from_nb_edges(nb_edges: int) int [source]#
Get number of nodes from number of edges
- Parameters:
nb_edges (int) – number of edges
- Returns:
number of nodes
- Return type:
int
- ccsd.src.utils.cc_utils.get_N_from_rank2(rank2: Tensor) int [source]#
Get number of nodes from batch of rank2 incidence matrices
- Parameters:
rank2 (torch.Tensor) – rank2 incidence matrices (raw, batch, or batch and channel). (NC2) x K or B x (NC2) x K or B x C x (NC2) x K
- Returns:
number of nodes
- Return type:
int
- ccsd.src.utils.cc_utils.get_rank2_flags(rank2: Tensor, N: int, d_min: int, d_max: int, flags: Tensor) Tuple[Tensor, Tensor] [source]#
Get flags for left and right nodes of rank2 cells. The left flag is 0 if the edge is not in the CC as a node is not. The right flag is 0 if the rank-2 cell is not in the CC as a node is not.
- Parameters:
rank2 (torch.Tensor) – batch of rank2 incidence matrices. B x (NC2) x K or B x C x (NC2) x K
N (int) – number of nodes
d_min (int) – minimum dimension of rank2 cells
d_max (int) – maximum dimension of rank2 cells
flags (torch.Tensor) – 0-1 flags tensor. B x N
- Returns:
flags for left and right nodes of rank2 cells
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- ccsd.src.utils.cc_utils.mask_rank2(rank2: Tensor, N: int, d_min: int, d_max: int, flags: Tensor | None = None) Tensor [source]#
Mask batch of rank2 incidence matrices with 0-1 flags tensor
- Parameters:
rank2 (torch.Tensor) – batch of rank2 incidence matrices. B x (NC2) x K or B x C x (NC2) x K
N (int) – number of nodes
d_min (int) – minimum dimension of rank2 cells
d_max (int) – maximum number of rank2 cells
flags (Optional[torch.Tensor], optional) – 0-1 flags tensor. Defaults to None. B x N
- Returns:
Mask batch of rank2 incidence matrices
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.gen_noise_rank2(x: Tensor, N: int, d_min: int, d_max: int, flags: Tensor | None = None) Tensor [source]#
Generate noise for the rank-2 incidence matrix
- Parameters:
x (torch.Tensor) – input tensor
N (int) – number of nodes
d_min (int) – minimum dimension of rank2 cells
d_max (int) – maximum dimension of rank2 cells
flags (Optional[torch.Tensor], optional) – optional flags. Defaults to None.
- Returns:
generated noisy tensor
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.pad_rank2(ori_rank2: ndarray, node_number: int, d_min: int, d_max: int) ndarray [source]#
Create padded rank2 incidence matrices
- Parameters:
ori_adj (np.ndarray) – original rank2 incidence matrix
node_number (int) – number of desired nodes
d_min (int) – minimum dimension of rank2 cells
d_max (int) – maximum dimension of rank2 cells
- Raises:
ValueError – if the original rank2 incidence matrix has more nodes larger than the desired number of nodes (we can’t pad)
- Returns:
Padded adjacency matrix
- Return type:
np.ndarray
- ccsd.src.utils.cc_utils.get_global_cc_properties(ccs: List[CombinatorialComplex]) Tuple[int, int, int] [source]#
Get the global properties of a list of combinatorial complexes: number of nodes, minimum dimension of rank2 cells and maximum dimension of rank2 cells
- Parameters:
ccs (List[CombinatorialComplex]) – list of combinatorial complexes
- Returns:
number of nodes, minimum dimension of rank2 cells and maximum dimension of rank2 cells
- Return type:
Tuple[int, int, int]
Example
>>> mols = [Chem.MolFromSmiles("Cc1ccccc1"), Chem.MolFromSmiles("c1cccc2c1CCCC2"), Chem.MolFromSmiles("C1CC1")] >>> ccs = mols_to_cc(mols) >>> get_global_cc_properties(ccs) (10, 3, 6)
- ccsd.src.utils.cc_utils.ccs_to_tensors(cc_list: List[CombinatorialComplex], max_node_num: int | None = None, d_min: int | None = None, d_max: int | None = None) Tuple[Tensor, Tensor] [source]#
Convert a list of combinatorial complexes to two tensors, one for the adjacency matrices and one for the incidence matrices If the combinatorial complexes have different number of nodes, the adjacency matrices and incidence matrices are padded to the maximum number of nodes. If the max number of nodes is not provided, it is calculated from the combinatorial complexes. Same for the minimum and maximum dimension of rank2 cells.
- Parameters:
cc_list (List[CombinatorialComplex]) – list of combinatorial complexes
max_node_num (Optional[int], optional) – max number of nodes in all the combinatorial complexes. Defaults to None.
d_min (Optional[int], optional) – minimum dimension of rank2 cells. Defaults to None.
d_max (Optional[int], optional) – maximum dimension of rank2 cells. Defaults to None.
- Returns:
adjacency matrices and rank2 incidence matrices
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- ccsd.src.utils.cc_utils.cc_to_tensor(cc: CombinatorialComplex, max_node_num: int | None = None, d_min: int | None = None, d_max: int | None = None) Tuple[Tensor, Tensor] [source]#
Convert a single combinatorial complex to a tuple of tensors, one for the adjacency matrix and one for the rank2 incidence matrix If the max number of nodes is not provided, it is calculated from the combinatorial complexes. Same for the minimum and maximum dimension of rank2 cells. Incidence matrices (A, F) are padded to the maximum number of nodes.
- Parameters:
cc (CombinatorialComplex) – combinatorial complex to convert
max_node_num (Optional[int], optional) – maximum number of nodes. Defaults to None.
d_min (Optional[int], optional) – minimum dimension of rank2 cells. Defaults to None.
d_max (Optional[int], optional) – maximum dimension of rank2 cells. Defaults to None.
- Returns:
adjacency matrix and rank2 incidence matrix
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- ccsd.src.utils.cc_utils.convert_CC_to_graphs(ccs: List[CombinatorialComplex], undirected: bool = True) List[Graph] [source]#
Convert a list of combinatorial complexes to a list of graphs
- Parameters:
ccs (List[CombinatorialComplex]) – list of combinatorial complexes
undirected (bool, optional) – whether to create an undirected graph. Defaults to True.
- Returns:
list of graphs
- Return type:
List[nx.Graph]
- ccsd.src.utils.cc_utils.convert_graphs_to_CCs(graphs: List[Graph], is_molecule: bool = False, lifting_procedure: str | None = None, lifting_procedure_kwargs: str | Dict[Any, Any] | None = None, **kwargs) List[CombinatorialComplex] [source]#
Convert a list of graphs to a list of combinatorial complexes (of dimension 1).
- Parameters:
graphs (List[nx.Graph]) – list of graphs
is_molecule (bool, optional) – whether the graphs are molecules. Defaults to False.
lifting_procedure (Optional[str], optional) – lifting procedure to use. Defaults to None.
lifting_procedure_kwargs (Optional[Union[str, Dict[Any, Any]]], optional) – kwargs for the lifting procedure. Defaults to None.
- Returns:
list of combinatorial complexes
- Return type:
List[CombinatorialComplex]
- ccsd.src.utils.cc_utils.init_flags(obj_list: List[Graph] | List[CombinatorialComplex], config: EasyDict, batch_size: int | None = None, is_cc: bool = False) Tensor [source]#
Sample initial flags tensor from the training graph set
- Parameters:
graph_list (List[nx.Graph]) – list of graphs
config (EasyDict) – configuration
batch_size (Optional[int], optional) – batch size. Defaults to None.
is_cc (bool, optional) – is the objects combinatorial complexes?. Defaults to False.
- Returns:
flag tensors
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.hodge_laplacian(rank2: Tensor) Tensor [source]#
Compute the Hodge Laplacian of a batch of rank2 incidence matrices. H = F @ F.T where F is the rank-2 incidence matrix of a combinatorial complex.
- Parameters:
rank2 (torch.Tensor) – batch of rank2 incidence matrices. B x (NC2) x K or B x C x (NC2) x K
- Returns:
- Hodge Laplacian
B x (NC2) x (NC2) or B x C x (NC2) x (NC2)
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.default_mask(n: int, device: str = 'cpu') Tensor [source]#
Create default adjacency or Hodge Laplacian mask (no diagonal elements)
- Parameters:
n (int) – number of nodes or edges
- Returns:
default adjacency or Hodge Laplacian mask
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.pow_tensor_cc(x: Tensor, cnum: int, hodge_mask: Tensor | None = None) Tensor [source]#
Create higher order rank-2 incidence matrices from a batch of rank-2 incidence matrices.
- Parameters:
x (torch.Tensor) – input tensor of shape B x (NC2) x K or B x C * (NC2) x K
cnum (int) – number of higher order matrices to create (made with consecutive multiplication of the Hodge Laplacian matrix of x)
hodge_mask (Optional[torch.Tensor], optional) – optional mask to apply to the Hodge Laplacian. Defaults to None. If None, no mask is applied. shape (NC2) x (NC2) or B x (NC2) x (NC2)
- Returns:
output higher order matrices of shape B x cnum x (NC2) x K
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.is_empty_cc(cc: CombinatorialComplex) bool [source]#
Check if a combinatorial complex is empty
- Parameters:
cc (CombinatorialComplex) – combinatorial complex
- Returns:
whether the combinatorial complex is empty
- Return type:
bool
- ccsd.src.utils.cc_utils.hodge_laplacian_spectrum_worker(CC: CombinatorialComplex, d_min: int, d_max: int, N: int) ndarray [source]#
Function for computing the rank-2 cell histogram of a combinatorial complex.
- Parameters:
CC (CombinatorialComplex) – combinatorial complex
d_min (int) – minimum dimension of the rank-2 cells
d_max (int) – maximum dimension of the rank-2 cells
N (int) – maximum number of nodes
- Returns:
rank-2 cell histogram
- Return type:
np.ndarray
- ccsd.src.utils.cc_utils.hodge_laplacian_spectrum_stats(cc_ref_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], cc_pred_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], worker_kwargs: ~typing.Dict[str, ~typing.Any], kernel: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function gaussian_emd>, is_parallel: bool = True, debug_mode: bool = False) float [source]#
Compute the MMD distance between the hodge laplacian eigenvalues distributions of two unordered sets of combinatorial complexes.
- Parameters:
cc_ref_list (List[CombinatorialComplex]) – reference list of toponetx combinatorial complexes to be evaluated
cc_pred_list (List[CombinatorialComplex]) – target list of toponetx combinatorial complexes to be evaluated
worker_kwargs (Dict[str, Any]) – kwargs for the worker function
kernel (Callable[[np.ndarray, np.ndarray], float], optional) – kernel function. Defaults to gaussian_emd.
is_parallel (bool, optional) – if True, do parallel computing. Defaults to True.
debug_mode (bool, optional) – if True, print debug information when is_parallel is set to True. Defaults to False.
- Returns:
MMD distance
- Return type:
float
- ccsd.src.utils.cc_utils.rank0_distrib_worker(CC: CombinatorialComplex, min_node_val: int, max_node_val: int, node_label: str = 'label') ndarray [source]#
Function for computing the rank-0 cell value histogram of a combinatorial complex. Values are converted to integers.
- Parameters:
CC (CombinatorialComplex) – combinatorial complex
min_node_val (int) – minimum node value
max_node_val (int) – maximum node value
node_label (str, optional) – node label, where is stored the value in the CC. Defaults to “label”.
- Returns:
rank-0 cell histogram
- Return type:
np.ndarray
- ccsd.src.utils.cc_utils.rank0_distrib_stats(cc_ref_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], cc_pred_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], worker_kwargs: ~typing.Dict[str, ~typing.Any], kernel: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function gaussian_emd>, is_parallel: bool = True, debug_mode: bool = False) float [source]#
Compute the MMD distance between the rank-0 cells’ values distributions of two unordered sets of combinatorial complexes.
- Parameters:
cc_ref_list (List[CombinatorialComplex]) – reference list of toponetx combinatorial complexes to be evaluated
cc_pred_list (List[CombinatorialComplex]) – target list of toponetx combinatorial complexes to be evaluated
worker_kwargs (Dict[str, Any]) – kwargs for the worker function
kernel (Callable[[np.ndarray, np.ndarray], float], optional) – kernel function. Defaults to gaussian_emd.
is_parallel (bool, optional) – if True, do parallel computing. Defaults to True.
debug_mode (bool, optional) – if True, print debug information when is_parallel is set to True. Defaults to False.
- Returns:
MMD distance
- Return type:
float
- ccsd.src.utils.cc_utils.rank1_distrib_worker(CC: CombinatorialComplex, min_edge_val: int, max_edge_val: int, edge_label: str = 'label') ndarray [source]#
Function for computing the rank-1 cell value histogram of a combinatorial complex. Values are converted to integers.
- Parameters:
CC (CombinatorialComplex) – combinatorial complex
min_edge_val (int) – minimum edge value
max_edge_val (int) – maximum edge value
edge_label (str, optional) – edge label, where is stored the value in the CC. Defaults to “label”.
- Returns:
rank-1 cell histogram
- Return type:
np.ndarray
- ccsd.src.utils.cc_utils.rank1_distrib_stats(cc_ref_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], cc_pred_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], worker_kwargs: ~typing.Dict[str, ~typing.Any], kernel: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function gaussian_emd>, is_parallel: bool = True, debug_mode: bool = False) float [source]#
Compute the MMD distance between the rank-1 cells’ values distributions of two unordered sets of combinatorial complexes.
- Parameters:
cc_ref_list (List[CombinatorialComplex]) – reference list of toponetx combinatorial complexes to be evaluated
cc_pred_list (List[CombinatorialComplex]) – target list of toponetx combinatorial complexes to be evaluated
worker_kwargs (Dict[str, Any]) – kwargs for the worker function
kernel (Callable[[np.ndarray, np.ndarray], float], optional) – kernel function. Defaults to gaussian_emd.
is_parallel (bool, optional) – if True, do parallel computing. Defaults to True.
debug_mode (bool, optional) – if True, print debug information when is_parallel is set to True. Defaults to False.
- Returns:
MMD distance
- Return type:
float
- ccsd.src.utils.cc_utils.rank2_distrib_worker(CC: CombinatorialComplex, d_min: int, d_max: int) ndarray [source]#
Function for computing the rank-2 cell histogram of a combinatorial complex.
- Parameters:
CC (CombinatorialComplex) – combinatorial complex
d_min (int) – minimum dimension of the rank-2 cells
d_max (int) – maximum dimension of the rank-2 cells
- Returns:
rank-2 cell histogram
- Return type:
np.ndarray
- ccsd.src.utils.cc_utils.rank2_distrib_stats(cc_ref_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], cc_pred_list: ~typing.List[~toponetx.classes.combinatorial_complex.CombinatorialComplex], worker_kwargs: ~typing.Dict[str, ~typing.Any], kernel: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function gaussian_emd>, is_parallel: bool = True, debug_mode: bool = False) float [source]#
Compute the MMD distance between the number of rank-2 cells distributions of two unordered sets of combinatorial complexes.
- Parameters:
cc_ref_list (List[CombinatorialComplex]) – reference list of toponetx combinatorial complexes to be evaluated
cc_pred_list (List[CombinatorialComplex]) – target list of toponetx combinatorial complexes to be evaluated
worker_kwargs (Dict[str, Any]) – kwargs for the worker function
kernel (Callable[[np.ndarray, np.ndarray], float], optional) – kernel function. Defaults to gaussian_emd.
is_parallel (bool, optional) – if True, do parallel computing. Defaults to True.
debug_mode (bool, optional) – if True, print debug information when is_parallel is set to True. Defaults to False.
- Returns:
MMD distance
- Return type:
float
- ccsd.src.utils.cc_utils.eval_CC_list(cc_ref_list: List[CombinatorialComplex], cc_pred_list: List[CombinatorialComplex], worker_kwargs: Dict[str, Any], methods: List[str] | None = None, kernels: Dict[str, Callable[[ndarray, ndarray], float]] | None = None, cc_nb_eval: int | None = 1000) Dict[str, float] [source]#
Evaluate generated generic combinatorial complexes against a reference set of combinatorial complexes using a set of methods and their corresponding kernels.
- Parameters:
cc_ref_list (List[CombinatorialComplex]) – reference list of toponetx combinatorial complexes to be evaluated
cc_pred_list (List[CombinatorialComplex]) – target list of toponetx combinatorial complexes to be evaluated
worker_kwargs (Dict[str, Any]) – kwargs for the worker functions
methods (Optional[List[str]], optional) – methods to be evaluated. Defaults to None.
kernels (Optional[Dict[str, Callable[[np.ndarray, np.ndarray], float]]], optional) – kernels to be used for each methods. Defaults to None.
cc_nb_eval (Optional[int], optional) – number of reference and predicted combinatorial complexes to be evaluated. If set to None, evaluate on the entire dataset. Defaults to 1000.
- Returns:
dictionary mapping method names to their corresponding scores
- Return type:
Dict[str, float]
- ccsd.src.utils.cc_utils.load_cc_eval_settings() Tuple[List[str], Dict[str, Callable[[ndarray, ndarray], float]]] [source]#
Load the methods and kernels to be used for evaluating combinatorial complexes.
- Returns:
methods and kernels to be used for evaluating combinatorial complexes
- Return type:
Tuple[List[str], Dict[str, Callable[[np.ndarray, np.ndarray], float]]]
- ccsd.src.utils.cc_utils.adj_to_hodgedual(adj: Tensor) Tensor [source]#
Convert adjacency matrices to Hodge dual adjacency matrices. Matrices are assumed to be symmetric and can be batched and/or have channels.
- Parameters:
adj (torch.Tensor) – adjacency matrices (B x C x N x N) or (B x N x N) or (N x N)
- Returns:
Hodge dual adjacency matrices (B x C x (NC2) x (NC2)) or (B x (NC2) x (NC2)) or ((NC2) x (NC2))
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.hodgedual_to_adj(hodgedual: Tensor) Tensor [source]#
Convert Hodge dual adjacency matrices to adjacency matrices. Matrices can be batched and/or have channels.
- Parameters:
hodgedual (torch.Tensor) – Hodge dual adjacency matrices (B x C x (NC2) x (NC2)) or (B x (NC2) x (NC2)) or ((NC2) x (NC2))
- Returns:
adjacency matrices (B x C x N x N) or (B x N x N) or (N x N)
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.get_hodge_adj_flags(hodge_adj: Tensor, flags: Tensor) Tuple[Tensor, Tensor] [source]#
Get flags for the adjacency matrices. The flag is 0 if the edge is not in the CC as a node is not.
- Parameters:
hodge_adj (torch.Tensor) – batch of hodge adjacency matrices. B x (NC2) x (NC2) or B x C x (NC2) x (NC2)
flags (torch.Tensor) – 0-1 flags tensor. B x N
- Returns:
flags for the for the adjacency matrices B x (NC2)
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- ccsd.src.utils.cc_utils.mask_hodge_adjs(hodge_adjs: Tensor, flags: Tensor | None = None) Tensor [source]#
Mask batch of hodge adjacency matrices with 0-1 flags tensor
- Parameters:
hodge_adjs (torch.Tensor) – batch of hodge adjacency matrices. B x (NC2) x (NC2) or B x C x (NC2) x (NC2)
N (int) – number of nodes
flags (Optional[torch.Tensor], optional) – 0-1 flags tensor. Defaults to None. B x N
- Returns:
Mask batch of hodge adjacency matrices
- Return type:
torch.Tensor
- ccsd.src.utils.cc_utils.get_all_paths_from_single_node(n: int, g: Dict[int, List[int]], path_length: int) Set[FrozenSet[int]] [source]#
Get all paths from a dictionary of edges and a list of nodes
- Parameters:
n (int) – node to start the paths from
g (Dict[int, List[int]]) – graph
path_length (int) – length of the paths
- Returns:
list of paths
- Return type:
Set[FrozenSet[int]]
- ccsd.src.utils.cc_utils.get_all_paths_from_nodes(nodes: List[int], g: Dict[int, List[int]], path_length: int) Set[FrozenSet[int]] [source]#
Get all paths from a dictionary of edges and a list of nodes
- Parameters:
nodes (List[int]) – list of nodes to start the paths from
g (Dict[int, List[int]]) – graph
path_length (int) – length of the paths
- Returns:
list of paths
- Return type:
Set[FrozenSet[int]]
- ccsd.src.utils.cc_utils.path_based_lift_CC(input_cc: CombinatorialComplex, sources_nodes: List[int], path_length: int) CombinatorialComplex [source]#
Lift a 1-dimensional CC to a 2-dimensional CC by lifting the paths to rank-2 cells. Rank-2 cells must be edges.
- Parameters:
input_cc (CombinatorialComplex) – original combinatorial complex
sources_nodes (List[int]) – list of source nodes to start the paths from
path_length (int) – length of the paths to lift
- Returns:
lifted combinatorial complex
- Return type:
CombinatorialComplex
- ccsd.src.utils.cc_utils.cycles_lift_CC(input_cc: CombinatorialComplex) CombinatorialComplex [source]#
Lift a 1-dimensional CC to a 2-dimensional CC by lifting the cycles to rank-2 cells.
- Parameters:
input_cc (CombinatorialComplex) – original combinatorial complex
- Returns:
lifted combinatorial complex
- Return type:
CombinatorialComplex
data_loader_mol.py: utility functions for loading the graph data (molecular ones).
Only dataloader_mol left untouched from Jo, J. & al (2022)
- ccsd.src.utils.data_loader_mol.load_mol(filepath: str) List[Tuple[Any, Any]] [source]#
Load molecular dataset from filepath.
Adapted from GraphEBM
- Parameters:
filepath (str) – filepath to the dataset
- Raises:
ValueError – raise an error if the filepath is invalid
- Returns:
list of tuples of (node features, adjacency matrix)
- Return type:
List[Tuple[Any, Any]]
- class ccsd.src.utils.data_loader_mol.MolDataset(mols: List[Tuple[ndarray, ndarray]], transform: Callable[[Tuple[ndarray, ndarray]], Tuple[Tensor, Tensor]] | Callable[[Tuple[ndarray, ndarray]], Tuple[Tensor, Tensor, Tensor]])[source]#
Bases:
Dataset
Dataset object for molecular dataset.
- __init__(mols: List[Tuple[ndarray, ndarray]], transform: Callable[[Tuple[ndarray, ndarray]], Tuple[Tensor, Tensor]] | Callable[[Tuple[ndarray, ndarray]], Tuple[Tensor, Tensor, Tensor]]) None [source]#
Initialize the dataset.
- Parameters:
mols (List[Tuple[np.ndarray, np.ndarray]]) – list of tuples of (node features, adjacency matrix)
transform (Union[Callable[[Tuple[np.ndarray, np.ndarray]], Tuple[torch.Tensor, torch.Tensor]], Callable[[Tuple[np.ndarray, np.ndarray]], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]]) – transform function that transforms the data into tensors with some preprocessing. Two tensors for graph-based modelisation and three tensors for combinatorial complex-based modelisation.
- ccsd.src.utils.data_loader_mol.get_transform_fn(dataset: str, is_cc: bool = False, **kwargs: Any) Callable[[Tuple[ndarray, ndarray]], Tuple[Tensor, Tensor]] | Callable[[Tuple[ndarray, ndarray]], Tuple[Tensor, Tensor, Tensor]] [source]#
Get the transform function for the given dataset.
- Parameters:
dataset (str) – name of the dataset
is_cc (bool, optional) – if True, the transform function returns three tensors for combinatorial complexes modelisation. Defaults to False.
- Raises:
ValueError – raise an error if the dataset is invalid/unsupported
- Returns:
transform function that transforms the data into tensors with some preprocessing. Two tensors for graph-based modelisation and three tensors for combinatorial complex-based modelisation.
- Return type:
Union[Callable[[Tuple[np.ndarray, np.ndarray]], Tuple[torch.Tensor, torch.Tensor]], Callable[[Tuple[np.ndarray, np.ndarray]], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]]
- ccsd.src.utils.data_loader_mol.dataloader_mol(config: EasyDict, get_graph_list: bool = False) Tuple[DataLoader, DataLoader] | Tuple[List[Graph], List[Graph]] [source]#
Load the dataset and return the train and test dataloader for the given molecular dataset.
- Parameters:
config (EasyDict) – configuration to use
get_graph_list (bool, optional) – if True, the dataloader are lists of graphs. Defaults to False.
- Returns:
train and test dataloader (tensors or lists of graphs)
- Return type:
Union[Tuple[DataLoader, DataLoader], Tuple[List[nx.Graph], List[nx.Graph]]]
- ccsd.src.utils.data_loader_mol.dataloader_mol_cc(config: EasyDict, get_cc_list: bool = False) Tuple[DataLoader, DataLoader] | Tuple[List[CombinatorialComplex], List[CombinatorialComplex]] [source]#
Load the dataset and return the train and test dataloader for the given molecular dataset.
- Parameters:
config (EasyDict) – configuration to use
get_cc_list (bool, optional) – if True, the dataloader are lists of combinatorial complexes. Defaults to False.
- Returns:
train and test dataloader (tensors or lists of combinatorial complexes)
- Return type:
Union[Tuple[DataLoader, DataLoader], Tuple[List[CombinatorialComplex], List[CombinatorialComplex]]]
data_loader.py: utility functions for loading the graph data (not molecular ones).
Only dataloader left untouched from Jo, J. & al (2022)
- ccsd.src.utils.data_loader.graphs_to_dataloader(config: EasyDict, graph_list: List[Graph]) DataLoader [source]#
Convert a list of graphs to a dataloader.
- Parameters:
config (EasyDict) – configuration to use
graph_list (List[nx.Graph]) – list of graphs to convert
- Returns:
DataLoader object for the graphs
- Return type:
DataLoader
- ccsd.src.utils.data_loader.ccs_to_dataloader(config: EasyDict, cc_list: List[CombinatorialComplex]) DataLoader [source]#
Convert a list of combinatorial complexes to a dataloader.
- Parameters:
config (EasyDict) – configuration to use
cc_list (List[CombinatorialComplex]) – list of combinatorial complexes to convert
- Returns:
DataLoader object for the combinatorial complexes
- Return type:
DataLoader
- ccsd.src.utils.data_loader.dataloader(config: EasyDict, get_graph_list: bool = False) Tuple[DataLoader, DataLoader] | Tuple[List[Graph], List[Graph]] [source]#
Load the dataset and return the train and test dataloader for the given non-molecular dataset.
- Parameters:
config (EasyDict) – configuration to use
get_graph_list (bool, optional) – if True, the dataloader are lists of graphs. Defaults to False.
- Returns:
train and test dataloader (tensors or lists of graphs)
- Return type:
Union[Tuple[DataLoader, DataLoader], Tuple[List[nx.Graph], List[nx.Graph]]]
- ccsd.src.utils.data_loader.dataloader_cc(config: EasyDict, get_cc_list: bool = False) Tuple[DataLoader, DataLoader] | Tuple[List[CombinatorialComplex], List[CombinatorialComplex]] [source]#
Load the dataset and return the train and test dataloader for the given non-molecular dataset.
- Parameters:
config (EasyDict) – configuration to use
get_cc_list (bool, optional) – if True, the dataloader are lists of combinatorial complexes. Defaults to False.
- Returns:
train and test dataloader (tensors or lists of combinatorial complexes)
- Return type:
Union[Tuple[DataLoader, DataLoader], Tuple[List[CombinatorialComplex], List[CombinatorialComplex]]]
ema.py: code for the exponential moving average class for the parameters.
Adapted from Jo, J. & al (2022), almost left untouched.
- class ccsd.src.utils.ema.ExponentialMovingAverage(parameters: Parameter, decay: float, use_num_updates: bool = True)[source]#
Bases:
object
Maintains (exponential) moving average of a set of parameters.
- __init__(parameters: Parameter, decay: float, use_num_updates: bool = True) None [source]#
Initialize the EMA class.
- Parameters:
parameters (torch.nn.parameter.Parameter) – Iterable of torch.nn.Parameter, initial parameters to use for EMA.
decay (float) – Decay rate for exponential moving average.
use_num_updates (bool, optional) – if True, initialize the number of updates to 0. Defaults to True.
- Raises:
ValueError – raise an error if decay is not between 0 and 1.
- update(parameters: Parameter) None [source]#
Update currently maintained parameters. Call this every time the parameters are updated, such as the result of the optimizer.step() call.
- Parameters:
parameters (torch.nn.parameter.Parameter) – Iterable of torch.nn.Parameter; usually the same set of
object. (parameters used to initialize this) –
- copy_to(parameters: Parameter) None [source]#
Copy current parameters into given collection of parameters.
- Parameters:
parameters (torch.nn.parameter.Parameter) – Iterable of torch.nn.Parameter; the parameters to be
averages. (updated with the stored moving) –
- store(parameters: Parameter) None [source]#
Save the current parameters for restoring later.
- Parameters:
parameters (torch.nn.parameter.Parameter) – Iterable of torch.nn.Parameter; the parameters to be
stored. (temporarily) –
- restore(parameters: Parameter) None [source]#
Restore the parameters stored with the store method. Useful to validate the model with EMA parameters without affecting the original optimization process. Store the parameters before the copy_to method. After validation (or model saving), use this to restore the former parameters.
- Parameters:
parameters (torch.nn.parameter.Parameter) – Iterable of torch.nn.Parameter; the parameters to be
parameters. (updated with the stored) –
errors.py: contains custom exceptions.
- exception ccsd.src.utils.errors.SymmetryError(message: str = '')[source]#
Bases:
Exception
Exception raised for when a matrix is not symmetric.
- message -- more detailed explanation of the error
graph_utils.py: utility functions for graph data (flag masking, quantization, etc.).
Adapted from Jo, J. & al (2022), almost left untouched.
- ccsd.src.utils.graph_utils.mask_x(x: Tensor, flags: Tensor | None = None) Tensor [source]#
Mask batch of node features with 0-1 flags tensor
- Parameters:
x (torch.Tensor) – batch of node features
flags (Optional[torch.Tensor], optional) – 0-1 flags tensor. Defaults to None.
- Returns:
Mask batch of node features
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.mask_adjs(adjs: Tensor, flags: Tensor | None = None) Tensor [source]#
Mask batch of adjacency matrices with 0-1 flags tensor
- Parameters:
adjs (torch.Tensor) – batch of adjacency matrices. B x N x N or B x C x N x N
flags (Optional[torch.Tensor], optional) – 0-1 flags tensor. Defaults to None. B x N
- Returns:
Mask batch of adjacency matrices
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.node_flags(adj: Tensor, eps: float = 1e-05) Tensor [source]#
Create flags tensor from graph dataset
- Parameters:
adj (torch.Tensor) – adjacency matrix
eps (float, optional) – threshold. Defaults to 1e-5.
- Returns:
flags tensor
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.init_features(init: str, adjs: Tensor, nfeat: int = 10) Tensor [source]#
Create initial node features by initaliazing the adjacency matrix, creating a node flag matrix based on the initialization, and masking the node features with the node flag matrix
- Parameters:
init (str) – node feature initialization method
adjs (torch.Tensor, optional) – adjacency matrix.
nfeat (int, optional) – number of different features. Defaults to 10.
- Raises:
ValueError – If number of features is larger than number of classes
NotImplementedError – initialization method not implemented
- Returns:
node features tensor
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.init_flags(graph_list: List[Graph], config: EasyDict, batch_size: int | None = None) Tensor [source]#
Sample initial flags tensor from the training graph set
- Parameters:
graph_list (List[nx.Graph]) – list of graphs
config (EasyDict) – _description_
batch_size (Optional[int], optional) – batch size. Defaults to None.
- Returns:
flag tensors
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.gen_noise(x: Tensor, flags: Tensor | None = None, sym: bool = True) Tensor [source]#
Generate noise
- Parameters:
x (torch.Tensor) – input tensor
flags (Optional[torch.Tensor], optional) – optional flags. Defaults to None.
sym (bool, optional) – symetric noise (for adjacency matrix). Defaults to True.
- Returns:
generated noisy tensor
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.quantize(t: Tensor, thr: float = 0.5) Tensor [source]#
Quantize (clip) generated graphs regarding a threshold
- Parameters:
t (torch.Tensor) – original adjacency or rank2 incidence matrix
thr (float, optional) – threshold. Defaults to 0.5.
- Returns:
quantized/cropped/clipped an adjacency or rank2 incidence matrix
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.quantize_mol(adjs: Tensor | ndarray) ndarray [source]#
Quantize generated molecules
- Parameters:
adjs (Union[torch.Tensor, np.ndarray]) – adjacency matrix adjs: 32 x 9 x 9
- Returns:
quantized array for molecules
- Return type:
np.ndarray
- ccsd.src.utils.graph_utils.adjs_to_graphs(adjs: Tensor | List[Tensor] | List[ndarray] | List[List[List[int | float]]], is_cuda: bool = False) List[Graph] [source]#
Convert generated adjacency matrices to networkx graphs
- Parameters:
adjs (Union[torch.Tensor, List[torch.Tensor], List[np.ndarray], List[List[List[Union[int, float]]]]]) – Adjaency matrices
is_cuda (bool, optional) – are the tensor on CPU?. Defaults to False.
- Returns:
list of graph representations
- Return type:
List[nx.Graph]
- ccsd.src.utils.graph_utils.check_sym(adjs: Tensor, print_val: bool = False, epsilon: float = 0.01) None [source]#
Check if the adjacency matrices are symmetric
- Parameters:
adjs (torch.Tensor) – adjacency matrices
print_val (bool, optional) – whether or not we print the symmetry error. Defaults to False.
epsilon (float, optional) – theshold for the sum of the absolute errors. Defaults to 1e-2.
- Raises:
SymmetryError – If the sum of the absolute errors is greater than epsilon
- ccsd.src.utils.graph_utils.pow_tensor(x: Tensor, cnum: int) Tensor [source]#
Create higher order adjacency matrices
- Parameters:
x (torch.Tensor) – input tensor of shape B x N x N
cnum (int) – number of higher order matrices to create (made with powers of x)
- Returns:
output higher order matrices of shape B x cnum x N x N
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.pad_adjs(ori_adj: ndarray, node_number: int) ndarray [source]#
Create padded adjacency matrices
- Parameters:
ori_adj (np.ndarray) – original adjacency matrix
node_number (int) – number of desired nodes
- Raises:
ValueError – if the original adjacency matrix is larger than the desired number of nodes (we can’t pad)
- Returns:
Padded adjacency matrix
- Return type:
np.ndarray
- ccsd.src.utils.graph_utils.graphs_to_tensor(graph_list: List[Graph], max_node_num: int) Tensor [source]#
Convert a list of graphs to a tensor
- Parameters:
graph_list (List[nx.Graph]) – List of graphs to convert to adjacency matrices tensors
max_node_num (int) – max number of nodes in all the graphs
- Returns:
Tensor of adjacency matrices
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.graphs_to_adj(graph: Graph, max_node_num: int) Tensor [source]#
Convert a graph to an adjacency matrix
- Parameters:
graph (nx.Graph) – graph to convert to an adjacency matrix tensor
max_node_num (int) – maximum number of nodes
- Returns:
Adjacency matrix as a tensor
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.node_feature_to_matrix(x: Tensor) Tensor [source]#
Convert a node feature matrix to a node pair feature matrix. Squared matrices where coeff i, j: concatenation of coeff i and coeff j of the associated node feature matrix
- Parameters:
x (torch.Tensor) – B x N x F (F feature space)
- Returns:
converted node feature matrix to node pair feature matrix with shape B x N x N x 2F
- Return type:
torch.Tensor
- ccsd.src.utils.graph_utils.nxs_to_mols(graphs: List[Graph]) List[Mol] [source]#
Convert a list of nx graphs to a list of rdkit molecules
- Parameters:
graphs (List[nx.Graph]) – list of nx graphs
- Returns:
list of rdkit molecules
- Return type:
List[Chem.Mol]
loader.py: code for loading the model, the optimizer, the scheduler, the loss function, etc
Adapted from Jo, J. & al (2022)
- ccsd.src.utils.loader.load_seed(seed: int) int [source]#
Apply the random seed to all libraries (torch, numpy, random) and make sure that the results are reproducible.
- Parameters:
seed (int) – seed to use
- Returns:
return the seed
- Return type:
int
- ccsd.src.utils.loader.load_device() str | List[int] [source]#
Check if cuda is available and then return the device(s) to use
- Returns:
device(s) to use
- Return type:
Union[str, List[int]]
- ccsd.src.utils.loader.load_model(params: Dict[str, Any]) Module [source]#
Load the Score Network model from the parameters
- Parameters:
params (dict) – parameters to use
- Raises:
ValueError – raise an error if the model is unknown
- Returns:
Score Network model to use
- Return type:
torch.nn.Module
- ccsd.src.utils.loader.load_model_optimizer(params: Dict[str, Any], config_train: EasyDict, device: str | List[str] | List[int]) Tuple[Module | DataParallel, Optimizer, LRScheduler] [source]#
Return the model, the optimizer and the scheduler in function of the parameters
- Parameters:
params (Dict[str, Any]) – model parameters
config_train (EasyDict) – configuration for training
device (Union[str, List[str], List[int]]) – device to use
- Returns:
return the model, the optimizer and the scheduler
- Return type:
Tuple[Union[torch.nn.Module, torch.nn.DataParallel], torch.optim.Optimizer, torch.optim.lr_scheduler.LRScheduler]
- ccsd.src.utils.loader.load_ema(model: Module, decay: float = 0.999) ExponentialMovingAverage [source]#
Create an exponential moving average object for the model’s parameters
- Parameters:
model (torch.nn.Module) – model used to train the model
decay (float, optional) – decay parameter. Defaults to 0.999.
- Returns:
exponential moving average object for the model’s parameters
- Return type:
- ccsd.src.utils.loader.load_ema_from_ckpt(model: Module, ema_state_dict: Dict[str, Any], decay: float = 0.999) ExponentialMovingAverage [source]#
Load the exponential moving average object for the model’s parameters from a checkpoint
- Parameters:
model (torch.nn.Module) – model used to train the model
ema_state_dict (Dict[str, Any]) – parameters of the exponential moving average
decay (float, optional) – decay parameter. Defaults to 0.999.
- Returns:
exponential moving average object for the model’s parameters
- Return type:
- ccsd.src.utils.loader.load_data(config: EasyDict, get_list: bool = False, is_cc: bool = False) Tuple[DataLoader, DataLoader] | Tuple[List[Graph], List[Graph]] | Tuple[List[CombinatorialComplex], List[CombinatorialComplex]] [source]#
Return a DataLoader object for training based on the configuration
- Parameters:
config (EasyDict) – configuration for training
get_list (bool, optional) – if True, returns lists of graph or combinatorial complexes instead of dataloaders. Defaults to False.
is_cc (bool, optional) – if True, the dataset is made of combinatorial complexes. Defaults to False.
- Returns:
DataLoader object or list of objects for training
- Return type:
Union[Tuple[DataLoader, DataLoader], Union[Tuple[List[nx.Graph], List[nx.Graph]], Tuple[List[CombinatorialComplex], List[CombinatorialComplex]]]]
- ccsd.src.utils.loader.load_batch(batch: List[Tensor], device: str | List[str], is_cc: bool = False) Tuple[Tensor, Tensor] | Tuple[Tensor, Tensor, Tensor] [source]#
Load the batch on the device
- Parameters:
batch (List[torch.Tensor]) – input batch
device (Union[str, List[str]]) – device to use
is_cc (bool, optional) – if True, the elements of the input batch are combinatorial complexes. Defaults to False.
- Returns:
input batch on the device
- Return type:
Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]
- ccsd.src.utils.loader.load_sde(config_sde: EasyDict) SDE [source]#
Load the stochastic differential equation (SDE) from the configuration
- Parameters:
config_sde (EasyDict) – configuration for the SDE
- Raises:
NotImplementedError – raise an error if the SDE is unknown
- Returns:
SDE to use
- Return type:
- ccsd.src.utils.loader.load_loss_fn(config: EasyDict, is_cc: bool = False) Callable[[Module, Module, Tensor, Tensor], Tuple[Tensor, Tensor]] | Callable[[Module, Module, Module, Tensor, Tensor, Tensor], Tuple[Tensor, Tensor, Tensor]] [source]#
Load the loss function from the configuration
- Parameters:
config (EasyDict) – configuration to use
is_cc (bool, optional) – if True, loss function for combinatorial complexes. Defaults to False.
- Returns:
loss function that returns 2 or 3 losses, for x, adj and rank2 if cc
- Return type:
Union[Callable[[torch.nn.Module, torch.nn.Module, torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor]], Callable[[torch.nn.Module, torch.nn.Module, torch.nn.Module, torch.Tensor, torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]]
- ccsd.src.utils.loader.load_sampling_fn(config_train: EasyDict, config_module: EasyDict, config_sample: EasyDict, device: str | List[str], is_cc: bool = False, d_min: int | None = None, d_max: int | None = None, divide_batch: int | None = None) Callable[[Module, Module, Tensor], Tuple[Tensor, Tensor, float]] | Callable[[Module, Module, Module, Tensor], Tuple[Tensor, Tensor, Tensor, float]] [source]#
Load the sampling function from the configuration
- Parameters:
config_train (EasyDict) – configuration for training
config_module (EasyDict) – configuration for the module
config_sample (EasyDict) – configuration for the sampling
device (Union[str, List[str]]) – device to use
is_cc (bool, optional) – if True, we sample combinatorial complexes. Defaults to False.
d_min (Optional[int], optional) – minimum size of rank2 cells (for cc). Defaults to None.
d_max (Optional[int], optional) – maximum size of rank2 cells (for cc). Defaults to None.
divide_batch (Optional[int], optional) – if not None, divide the samples by this number to bypass RAM saturation. Defaults to None.
- Returns:
sampling function
- Return type:
Union[Callable[[torch.nn.Module, torch.nn.Module, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, float]], Callable[[torch.nn.Module, torch.nn.Module, torch.nn.Module, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor, float]]]
- ccsd.src.utils.loader.load_model_params(config: EasyDict, is_cc: bool = False) Tuple[Dict[str, Any], Dict[str, Any]] | Tuple[Dict[str, Any], Dict[str, Any], Dict[str, Any]] [source]#
Load the model parameters from the configuration
- Parameters:
config (EasyDict) – configuration to use
is_cc (bool, optional) – whether to model using combinatorial complexes. Defaults to False.
- Returns:
parameters for x, adj, and rank-2 cells if cc
- Return type:
Union[Tuple[Dict[str, Any], Dict[str, Any]], Tuple[Dict[str, Any], Dict[str, Any], Dict[str, Any]]]
- ccsd.src.utils.loader.load_ckpt(config: EasyDict, device: str | List[str], ts: str | None = None, return_ckpt: bool = False, is_cc: bool = False) Dict[str, Any] [source]#
Load the checkpoint from the configuration
- Parameters:
config (EasyDict) – configuration to use
device (Union[str, List[str]]) – device to use
ts (Optional[str], optional) – timestamp (checkpoint name). Defaults to None.
return_ckpt (bool, optional) – if True, add the checkpoint in the resulting dictionary (key: “ckpt”). Defaults to False.
is_cc (bool, optional) – whether to model using combinatorial complexes. Defaults to False.
- Returns:
loaded checkpoint parameters and configuration
- Return type:
Dict[str, Any]
- ccsd.src.utils.loader.load_model_from_ckpt(params: Dict[str, Any], state_dict: Dict[str, Any], device: str | List[device] | List[int]) Module | DataParallel [source]#
Load the model from the checkpoint
- Parameters:
params (Dict[str, Any]) – parameters of the model
state_dict (Dict[str, Any]) – state dictionary of the model
device (Union[str, List[str], List[int]]) – device to use
- Returns:
loaded model
- Return type:
Union[torch.nn.Module, torch.nn.DataParallel]
- ccsd.src.utils.loader.load_eval_settings(data: str, orbit_on: bool = True) Tuple[List[str], Dict[str, Callable[[ndarray, ndarray], float]]] [source]#
Load the evaluation settings from the configuration
- Parameters:
data (str) – dataset to use. UNUSED HERE.
orbit_on (bool, optional) – whether to use orbit distance. UNUSED HERE. Defaults to True.
- Returns:
methods and kernels, used for generic graph generation
- Return type:
Tuple[List[str], Dict[str, Callable[[np.ndarray, np.ndarray], float]]]
logger.py: utility functions for logging.
Adapted from Jo, J. & al (2022), almost left untouched.
- class ccsd.src.utils.logger.Logger(filepath: str, mode: str, lock: Any | None = None)[source]#
Bases:
object
Logger class for logging to a file.
- ccsd.src.utils.logger.set_log(config: EasyDict, is_train: bool = True, folder: str = './') Tuple[str, str, str] [source]#
Set the log folder name, log directory and checkpoint directory
- Parameters:
config (EasyDict) – the config object
is_train (bool, optional) – True if we are training, False if we are sampling. Defaults to True.
folder (str, optional) – the general saving folder. Defaults to “./”.
- Returns:
the name of the folder, the log directory and the checkpoint directory of the log
- Return type:
Tuple[str, str, str]
- ccsd.src.utils.logger.check_log(log_folder_name: str, log_name: str) bool [source]#
Check if a log file exists
- Parameters:
log_folder_name (str) – given log folder name
log_name (str) – given log name
- Returns:
True if the log file exists, False otherwise
- Return type:
bool
- ccsd.src.utils.logger.data_log(logger: Logger, config: EasyDict) None [source]#
Log the current configuration
- Parameters:
logger (Logger) – Logger object
config (EasyDict) – current configuration used
- ccsd.src.utils.logger.sde_log(logger: Logger, config_sde: EasyDict, is_cc: bool = False) None [source]#
Log the current SDE configuration
- Parameters:
logger (Logger) – Logger object
config_sde (EasyDict) – sde configuration
is_cc (bool, optional) – True if we are modelling with combinatorial complexes. Defaults to False.
- ccsd.src.utils.logger.model_log(logger: Logger, config: EasyDict, is_cc: bool = False) None [source]#
Log the current model configuration
- Parameters:
logger (Logger) – Logger object
config (EasyDict) – current configuration used
is_cc (bool, optional) – True if we are modelling with combinatorial complexes. Defaults to False.
- ccsd.src.utils.logger.device_log(logger: Logger, device: str | List[int] | List[str] | List[device]) None [source]#
Log the device(s) that will be used as detected by PyTorch
- Parameters:
logger (Logger) – Logger object
device (Union[str, List[int], List[str], List[torch.device]]) – device(s) used as detected
- ccsd.src.utils.logger.start_log(logger: Logger, config: EasyDict) None [source]#
Log initial message with the configuration
- Parameters:
logger (Logger) – Logger object
config (EasyDict) – configuration used
- ccsd.src.utils.logger.train_log(logger: Logger, config: EasyDict) None [source]#
Log configuration used for training
- Parameters:
logger (Logger) – Logger object
config (EasyDict) – configuration used
- ccsd.src.utils.logger.sample_log(logger: Logger, config: EasyDict) None [source]#
Log configuration used for sampling
- Parameters:
logger (Logger) – Logger object
config (EasyDict) – configuration used
- ccsd.src.utils.logger.model_parameters_log(logger: Logger, models: List[Module]) None [source]#
Print the number of parameters of the models and the total number of parameters.
- Parameters:
logger (Logger) – Logger object
models (List[torch.nn.Module]) – list of models.
- ccsd.src.utils.logger.time_log(logger: Logger, time_type: str, elapsed_time: float) None [source]#
Log the time elapsed since the start of the training/sampling
- Parameters:
logger (Logger) – Logger object
time_type (str) – type of time. Must be in [“train”, “sample”].
elapsed_time (float) – elapsed time since the start of the training/sampling
- Raises:
ValueError – raise an error if time_type is not in [“train”, “sample”]
models_utils.py: utility functions related to the models.
- ccsd.src.utils.models_utils.get_model_device(model: Module | DataParallel) str [source]#
Get the the device on which the model is loaded (“cpu”, “cuda”, etc?)
- Parameters:
model (Union[torch.nn.Module, torch.nn.DataParallel]) – Pytorch model
- Returns:
device on which the model is loaded
- Return type:
str
- ccsd.src.utils.models_utils.get_nb_parameters(model: Module) int [source]#
Get the number of parameters of the model.
- Parameters:
model (torch.nn.Module) – model.
- Returns:
number of parameters of the model.
- Return type:
int
- ccsd.src.utils.models_utils.get_ones_cache(shape: Sequence[int], device: str) Tensor [source]#
Cached function to get a tensor of ones of the given shape and device.
- Parameters:
shape (Sequence[int]) – shape of the tensor
device (str) – device on which the tensor should be allocated
- Returns:
tensor of ones of the given shape and device
- Return type:
torch.Tensor
- ccsd.src.utils.models_utils.get_ones(shape: Sequence[int], device: str) Tensor [source]#
Function to get a tensor of ones of the given shape and device. Call the cached version of the function and clone it.
- Parameters:
shape (Sequence[int]) – shape of the tensor
device (str) – device on which the tensor should be allocated
- Returns:
tensor of ones of the given shape and device
- Return type:
torch.Tensor
mol_utils.py: utility functions for loading the molecular data, checking the validity of the molecules, converting them, saving them, etc.
Adapted from Jo, J. & al (2022), almost left untouched.
- ccsd.src.utils.mol_utils.is_molecular_config(config: EasyDict) bool [source]#
Checks if the config is for a molecular dataset. Right now, it only checks if the dataset is QM9 or ZINC250k.
- Parameters:
config (EasyDict) – config to check
- Returns:
whether or not the config is for a molecular dataset
- Return type:
bool
- ccsd.src.utils.mol_utils.mols_to_smiles(mols: List[Mol]) List[str] [source]#
Converts a list of RDKit molecules to a list of SMILES strings.
- Parameters:
mols (List[Chem.Mol]) – molecules to convert
- Returns:
SMILES strings
- Return type:
List[str]
- ccsd.src.utils.mol_utils.smiles_to_mols(smiles: List[str]) List[Mol] [source]#
Converts a list of SMILES strings to a list of RDKit molecules.
- Parameters:
smiles (List[str]) – SMILES strings to convert
- Returns:
molecules
- Return type:
List[Chem.Mol]
- ccsd.src.utils.mol_utils.canonicalize_smiles(smiles: List[str]) List[str] [source]#
Canonicalizes a list of SMILES strings.
- Parameters:
smiles (List[str]) – SMILES strings to canonicalize
- Returns:
canonicalized SMILES strings
- Return type:
List[str]
- ccsd.src.utils.mol_utils.load_smiles(dataset: str = 'QM9', folder: str = './') Tuple[List[str], List[str]] [source]#
Loads SMILES strings from a dataset and return train and test splits.
- Parameters:
dataset (str, optional) – smiles dataset to load. Defaults to “QM9”.
folder (str, optional) – folder where the data folder is located. Defaults to “./”.
- Raises:
ValueError – raise an error if dataset is not supported
- Returns:
train and test splits
- Return type:
Tuple[List[str], List[str]]
- ccsd.src.utils.mol_utils.construct_mol(x: ndarray, adj: ndarray, atomic_num_list: List[int]) Mol [source]#
Constructs molecule(s) from the model output.
- Parameters:
x (np.ndarray) – node features
adj (np.ndarray) – adjacency matrix
atomic_num_list (List[int]) – atomic number list
- Returns:
molecule
- Return type:
Chem.Mol
- ccsd.src.utils.mol_utils.gen_mol(x: Tensor, adj: Tensor, dataset: str, largest_connected_comp: bool = True) Tuple[List[Mol], int] [source]#
Generates molecules from the model output and returns valid molecules and the number of molecules that are not corrected.
- Parameters:
x (torch.Tensor) – node features
adj (torch.Tensor) – adjacency matrix
dataset (str) – dataset name
largest_connected_comp (bool, optional) – whether or not we keep only the largest connected component. Defaults to True.
- Returns:
valid molecules and the number of molecules that are not corrected
- Return type:
Tuple[List[Chem.Mol], int]
- ccsd.src.utils.mol_utils.check_valency(mol: Mol | RWMol) Tuple[bool, List[int] | None] [source]#
Checks the valency of the molecule.
- Parameters:
mol (Union[Chem.Mol, Chem.RWMol]) – molecule
- Returns:
whether or not the molecule is valid and the atom id and valency of the atom that is not valid
- Return type:
Tuple[bool, Optional[List[int]]]
- ccsd.src.utils.mol_utils.correct_mol(m: RWMol) Tuple[RWMol, bool] [source]#
Corrects the molecule.
- Parameters:
m (Chem.RWMol) – molecule
- Returns:
corrected molecule and whether or not the molecule is corrected
- Return type:
Tuple[Chem.RWMol, bool]
- ccsd.src.utils.mol_utils.valid_mol_can_with_seg(m: Mol | None, largest_connected_comp: bool = True) Mol | None [source]#
Returns a valid molecule with the largest connected component (in option).
- Parameters:
m (Optional[Chem.Mol]) – molecule
largest_connected_comp (bool, optional) – whether or not we keep only the largest connected component. Defaults to True.
- Returns:
valid molecule
- Return type:
Optional[Chem.Mol]
- ccsd.src.utils.mol_utils.mols_to_nx(mols: List[Mol]) List[Graph] [source]#
Converts a list of molecules to a list of networkx graphs.
- Parameters:
mols (List[Chem.Mol]) – list of molecules
- Returns:
list of networkx graphs
- Return type:
List[nx.Graph]
plot.py: utility functions for plotting.
- ccsd.src.utils.plot.save_fig(config: EasyDict, save_dir: str | None = None, title: str = 'fig', dpi: int = 300, is_sample: bool = True) None [source]#
Function to adjust the figure and save it.
Adapted from Jo, J. & al (2022)
- Parameters:
config (EasyDict) – configuration file
save_dir (Optional[str], optional) – directory to save the figures. Defaults to None.
title (str, optional) – name of the file. Defaults to “fig”.
dpi (int, optional) – DPI (Dots per Inch). Defaults to 300.
is_sample (bool, optional) – whether the figure is generated during the sample phase. Defaults to True.
- ccsd.src.utils.plot.plot_graphs_list(config: EasyDict, graphs: List[Graph | Dict[str, Any]], title: str = 'title', max_num: int = 16, save_dir: str | None = None, N: int = 0) None [source]#
Plot a list of graphs.
Adapted from Jo, J. & al (2022)
- Parameters:
config (EasyDict) – configuration file
graphs (List[Union[nx.Graph, Dict[str, Any]]]) – graphs to plot
title (str, optional) – title of the plot. Defaults to “title”.
max_num (int, optional) – number of graphs to plot (must lower or equal than batch size). Defaults to 16.
save_dir (Optional[str], optional) – directory to save the figures. Defaults to None.
N (int, optional) – parameter to skip the first graphs of the list. Defaults to 0.
- ccsd.src.utils.plot.save_graph_list(config: EasyDict, log_folder_name: str, exp_name: str, gen_graph_list: List[Graph]) str [source]#
Save the generated graphs in a pickle file.
Adapted from Jo, J. & al (2022)
- Parameters:
config (EasyDict) – configuration file
log_folder_name (str) – name of the folder where the pickle file will be saved
exp_name (str) – name of the experiment
gen_graph_list (List[nx.Graph]) – list of generated graphs
- Returns:
path to the pickle file
- Return type:
str
- ccsd.src.utils.plot.plot_cc_list(config: EasyDict, ccs: List[CombinatorialComplex | Dict[str, Any]], title: str = 'title', max_num: int = 16, save_dir: str | None = None, N: int = 0) None [source]#
Plot a list of combinatorial complexes (represented here as hypergraphs), using hypernetx, for complexes of dimension 2.
- Parameters:
ccs (List[Union[CombinatorialComplexes, Dict[str, Any]]]) – combinatorial complexes to plot
title (str, optional) – title of the plot. Defaults to “title”.
max_num (int, optional) – number of combinatorial complexes to plot (must lower or equal than batch size). Defaults to 16.
save_dir (Optional[str], optional) – directory to save the figures. Defaults to None.
N (int, optional) – parameter to skip the first graphs of the list. Defaults to 0.
- ccsd.src.utils.plot.save_cc_list(config: EasyDict, log_folder_name: str, exp_name: str, gen_cc_list: List[CombinatorialComplex]) str [source]#
Save the generated combinatorial complexes in a pickle file.
- Parameters:
config (EasyDict) – configuration file
log_folder_name (str) – name of the folder where the pickle file will be saved
exp_name (str) – name of the experiment
gen_cc_list (List[CombinatorialComplex]) – list of generated ccs
- Returns:
path to the pickle file
- Return type:
str
- ccsd.src.utils.plot.plot_molecule_list(config: EasyDict, mols: List[Mol], title: str = 'title', max_num: int = 16, save_dir: str | None = None, N: int = 0) None [source]#
Plot a list of molecules, using rdkit.
- Parameters:
config (EasyDict) – configuration file
mols (List[Chem.Mol]) – molecules to plot
title (str, optional) – title of the plot. Defaults to “title”.
max_num (int, optional) – number of molecules to plot (must lower or equal than batch size). Defaults to 16.
save_dir (Optional[str], optional) – directory to save the figures. Defaults to None.
N (int, optional) – parameter to skip the first graphs of the list. Defaults to 0.
- ccsd.src.utils.plot.save_molecule_list(config: EasyDict, log_folder_name: str, exp_name: str, gen_mol_list: List[Mol]) str [source]#
Save the generated molecules in a pickle file.
- Parameters:
config (EasyDict) – configuration file
log_folder_name (str) – name of the folder where the pickle file will be saved
exp_name (str) – name of the experiment
gen_mol_list (List[Chem.Mol]) – list of generated molecules
- Returns:
path to the pickle file
- Return type:
str
- ccsd.src.utils.plot.plot_lc(config: EasyDict, learning_curves: Dict[str, List[float]], f_dir: str = './', filename: str = 'learning_curves', cols: int = 3) None [source]#
Plot the learning curves.
- Parameters:
config (EasyDict) – configuration file
learning_curves (Dict[str, List[float]]) – dictionary containing the learning curves
f_dir (str, optional) – directory to save the figure. Defaults to “./”.
filename (str, optional) – name of the figure. Defaults to “learning_curves”.
cols (int, optional) – number of columns in the figure. Defaults to 3.
- ccsd.src.utils.plot.plot_3D_molecule(molecule: Mol, atomic_radii: Dict[str, float] | None = None, cpk_colors: Dict[str, str] | None = None) Figure [source]#
Creates a 3D plot of the molecule.
- Parameters:
molecule (Chem.Mol) – The RDKit molecule to plot.
atomic_radii (Optional[Dict[str, float]], optional) – Dictionary mapping atomic symbols to atomic radii. Defaults to None.
cpk_colors (Optional[Dict[str, str]], optional) – Dictionary mapping atomic symbols to CPK colors. Defaults to None.
- Returns:
The 3D plotly figure of the molecule.
- Return type:
plotly.graph_objs.Figure
- ccsd.src.utils.plot.rotate_molecule_animation(figure: Figure, filedir: str, filename: str, duration: float = 1.0, frames: int = 30, rotations_per_sec: float = 1.0, overwrite: bool = False, engine: str = 'kaleido') None [source]#
Creates an animated GIF of the molecule rotating.
- Parameters:
figure (plotly.graph_objs.Figure) – The 3D plotly figure of the molecule.
filedir (str) – The directory to save the animated GIF.
filename (str) – The filename of the output animated GIF.
duration (float, optional) – Duration of the animation in seconds. Defaults to 1.0.
frames (int, optional) – Number of frames in the animation. Defaults to 30.
rotations_per_sec (float, optional) – Number of rotations per second. Defaults to 1.0.
overwrite (bool, optional) – If True, overwrite the file if it already exists. Defaults to False.
engine (str, optional) – engine to use for the .write_image plotly method. Defaults to “kaleido”.
- ccsd.src.utils.plot.plot_diffusion_trajectory(gen_obj: List[Tensor], is_molecule: bool = False, dataset: str = 'QM9', largest_connected_comp: bool = True) Figure | Figure [source]#
Return the figure of one generated object as part of a diffusion trajectory.
- Parameters:
gen_obj (List[torch.Tensor]) – The generated object (node features (x) and adjacency matrix (adj), and rank-2 incidence matrix (rank2) if we generated combinatorial complexes).
is_molecule (bool, optional) – if True, we plot a molecule, otherwise a graph. Defaults to False.
dataset (str, optional) – The dataset from which the object was generated. Defaults to “QM9” (only used if is_molecule=True).
largest_connected_comp (bool, optional) – whether or not we keep only the largest connected component. Defaults to True.
- Returns:
The figure of the generated object.
- Return type:
Union[plotly.graph_objs.Figure, matplotlib.figure.Figure]
- ccsd.src.utils.plot.diffusion_animation(diff_traj: List[List[Tensor]], is_molecule: bool = False, filedir: str = './', filename: str = 'diffusion_animation', fps: int = 25, overwrite: bool = True, engine: str = 'kaleido', duration: float = 4.0, cropped: bool = False) None [source]#
Creates an animated GIF of the diffusion trajectory.
- Parameters:
diff_traj (List[List[torch.Tensor]]) – The diffusion trajectory (list of generated node features (x) and adjacency matrices (adj), and rank-2 incidence matrices (rank2) if we generated combinatorial complexes).
is_molecule (bool, optional) – If True, the frames are molecules not graphs. Defaults to False.
filedir (str, optional) – The directory to save the animated GIF. Defaults to “./”.
filename (str, optional) – The filename of the output animated GIF. Defaults to “diffusion_animation”.
fps (int, optional) – Number of frames per second. Defaults to 25.
overwrite (bool, optional) – If True, overwrite the file if it already exists. Defaults to True.
engine (str, optional) – engine to use for the .write_image plotly method if plotly is used. Defaults to “kaleido”.
duration (float, optional) – duration of the animation (in seconds). Defaults to 4.0.
cropped (bool, optional) – if True, we select the first frames. Otherwise, we skip some frames to build the animation. Defaults to False.
print.py: utility functions for printing to the console.
- ccsd.src.utils.print.get_ascii_logo(ascii_logo_path: str = 'ascii_logo.txt') str [source]#
Get the ascii logo.
- Parameters:
ascii_logo_path (str, optional) – path of the logo. Defaults to “ascii_logo.txt”.
- Returns:
the ascii logo.
- Return type:
str
- ccsd.src.utils.print.get_experiment_desc(args: Namespace | Dict[str, Any]) str [source]#
Get the experiment description.
- Parameters:
args (Union[argparse.Namespace, Dict[str, Any]]) – parsed arguments for the experiment.
- Returns:
the experiment description.
- Return type:
str
- ccsd.src.utils.print.initial_print(args: Namespace | Dict[str, Any], ascii_logo_path: str = 'ascii_logo.txt') None [source]#
Print the initial message to the console.
- Parameters:
args (Union[argparse.Namespace, Dict[str, Any]]) – parsed arguments for the experiment.
ascii_logo_path (str, optional) – path of the logo. Defaults to “ascii_logo.txt”.
time_utils.py: utility functions for time operations.