Protocol Configuration Guide

The protocol file is the core configuration for Ensemble Analyzer. It is structured as a JSON dictionary where each key (e.g., "0", "1") represents a sequential computational step.

Core Computational Keywords

These parameters define the level of theory and the type of calculation to be performed by the QM engine.

  • functional (str, Required) The DFT functional or semi-empirical method to use (e.g., "B97-3c", "wB97X-D4", "xtb").

  • basis (str) The basis set definition. If using composite methods (like r2SCAN-3c), this is automatically handled or can be omitted.

  • opt (bool) If true, performs a geometry optimization for the current step.

  • freq (bool) If true, performs a frequency calculation. This enables vibrational analysis and qRRHO thermochemical corrections.

  • charge (int) The total charge of the system (default: 0).

  • mult (int) The spin multiplicity of the system (default: 1).

  • solvent (dict) Configuration for implicit solvation models.

    • solvent (str): Name of the solvent (e.g., "water", "chcl3").

    • smd (bool): If true, uses the SMD solvation model; otherwise, uses CPCM.

Advanced Calculation Control

  • add_input (str) Additional keywords or blocks passed directly to the external QM engine (ORCA/Gaussian) input file. No sanity check performed.

  • read_orbitals (int) Specifies the index of a previous step to read orbitals/guess from (e.g., "0"). Useful for SCF convergence in difficult cases.

  • skip_opt_fail (bool) If true, conformers that fail to converge during optimization are automatically deactivated instead of crashing the workflow.

  • monitor_internals (list) A list of atom indices to track specific internal coordinates in the log output.

    • Example: [[0, 1], [2, 3, 4]] monitors a bond length and an angle.

  • block_on_retention_rate (bool) If true, the program will halt execution if the number of surviving conformers drops below a safety threshold (default 20%), preventing total loss of the ensemble.

Refinement & Pruning Settings

  • cluster (int | bool) Controls the unsupervised clustering of conformers.

    • If an integer > 1: Performs K-Means clustering to reduce the ensemble to that exact number of structures.

    • If true: Performs clustering with automatic detection of the optimal number of clusters (\(k\)).

  • no_prune (bool) If true, completely disables energy and geometric pruning for this specific step.

  • thrG / thrB (float) Overrides the default thresholds for identifying duplicates:

    • thrG: Maximum energy difference (\(\Delta E\)) [kcal/mol].

    • thrB: Maximum difference in Rotational Constants (\(\Delta B\)) [cm⁻¹].

  • thrGMAX (float) Overrides the maximum energy window cut-off. Conformers with \(\Delta E > \text{thrGMAX}\) (relative to the global minimum) are discarded.