# Theoretical Background Ensemble Analyzer (EnAn) streamlines the processing of conformational ensembles by integrating geometric filtering, unsupervised clustering, and advanced thermochemical corrections. This section details the physical and mathematical models implemented in the software. ## 1. Conformational Pruning Strategy A critical bottleneck in conformational analysis is the redundancy of structures generated by stochastic search algorithms. EnAn implements a computationally efficient **dual-filter** to remove duplicates without requiring expensive RMSD (Root-Mean-Square Deviation) superposition for every pair. ### 1.1 Energy Filtering (The "Window") The first tier of filtering acts on the electronic energy ($\Delta E$). Given the global minimum found so far ($E_{min}$), any conformer $i$ is discarded if: $$ (E_i - E_{min}) > \text{thr}G_\text{max} $$ This ensures that only chemically relevant conformers (accessible at thermal equilibrium) are processed. ### 1.2 Geometric Filtering: Rotational Constants To identify duplicates, EnAn utilizes **Rotational Constants** as geometric descriptors. Rotational constants depend on the principal moments of inertia ($I$) and are invariant to translation and rotation of the molecular frame. Two conformers $i$ (check) and $j$ (reference) are defined as **identical** if they satisfy *both* conditions: 1. **Energy Equivalence**: $$|E_i - E_j| < \text{thrG}$$ 2. **Geometric Equivalence**: $$|B_i - B_j| < \text{thrB}$$ Where $B$ is the scalar norm of the rotational constant vector. This method reduces the complexity from $O(N^2)$ RMSD alignments to simple scalar comparisons. > **Note**: An invariant RMSD based on EDM eigenvalues is calculated for logged duplicates. --- ## 2. Invariant Clustering Algorithm For structural classification and ensemble reduction, EnAn moves away from Cartesian coordinate dependence to avoid alignment issues. It employs the **Spectrum of the Euclidean Distance Matrix (EDM)**. ### 2.1 The Euclidean Distance Matrix For a molecule with $N$ atoms, the EDM ($\mathbf{D}$) is an $N \times N$ symmetric matrix where each element represents the squared Euclidean distance between atom $i$ and atom $j$: $$ D_{ij} = ||\mathbf{r}_i - \mathbf{r}_j||^2 $$ ### 2.2 Eigenvalues as Descriptors The set of eigenvalues $\{\lambda_1, \lambda_2, \dots, \lambda_N\}$ of the matrix $\mathbf{D}$ is invariant to: * Translation * Rotation * Atom numbering (permutations, if sorted) This eigenvalue vector serves as the input feature vector for **Principal Component Analysis (PCA)**. The resulting Principal Components (PCs) are then fed into a **K-Means** clustering algorithm. The optimal number of clusters ($k$) is automatically determined using the **Silhouette Score**. --- ## 3. Thermochemistry: The qRRHO Approximation Standard Rigid-Rotor Harmonic Oscillator (RRHO) approximations often fail for flexible molecules with low-frequency vibrational modes ($< 100 \text{ cm}^{-1}$), leading to infinite entropy errors. EnAn implements the **Quasi-Rigid Rotor Harmonic Oscillator (qRRHO)** approximation described by [Grimme (2012)](https://doi.org/10.1002/chem.201200497). This method interpolates between rigid-rotor and harmonic oscillator limits using a damping function $w(\nu)$: $$ w(\nu) = \frac{1}{1 + (\omega_0 / \nu)^\alpha} $$ where $\omega_0$ is the cut-off frequency (default: $100 \text{ cm}^{-1}$) and $\alpha$ is the damping power (default: 4). Thermodynamic properties are calculated assuming a non-interacting particle ensemble (Ideal Gas approximation). The total Internal Energy ($U$) and Entropy ($S$) are obtained by summing the contributions from translational, rotational, electronic, and vibrational degrees of freedom. ### 3.1 Internal Energy ($U$) The thermal contribution to the internal energy is calculated as: $$ U_\text{tot} = U_\text{trans} + U_\text{rot} + U_\text{vib} $$ * **Translational Energy ($U_{trans}$)**: Derived from the equipartition theorem for 3 degrees of freedom: $$U_\text{trans} = \frac{3}{2} k_B T$$ * **Rotational Energy ($U_{rot}$)**: Depends on the molecular geometry (linearity): * **Non-linear**: $\frac{3}{2} k_B T$ (3 rotational degrees of freedom). * **Linear**: $k_B T$ (2 rotational degrees of freedom). * **Vibrational Energy ($U_{vib}$)**: EnAn employs the **qRRHO** (Quasi-Rigid Rotor Harmonic Oscillator) scheme to correct the unphysical behavior of low-frequency modes. The energy is interpolated between a Rigid-Rotor-like term (for low frequencies) and the Harmonic Oscillator term (for high frequencies): $$ U_\text{vib} = \sum_{i}^{N_\text{modes}} \left( w(\nu_i) U_\text{HO}(\nu_i) + [1 - w(\nu_i)] \frac{1}{2} k_B T \right) $$ Where: - $U_\text{HO}(\nu) = \frac{h \nu c}{e^{\frac{h \nu c}{k_B T}} - 1}$ is the standard harmonic oscillator thermal energy. - $w(\nu)$ is the damping function. ### 3.2 Entropy ($S$) Total entropy is defined as $S_\text{tot} = S_\text{trans} + S_\text{rot} + S_\text{el} + S_\text{vib}$. - **Translational Entropy ($S_\text{trans}$)**: Calculated using the **Sackur-Tetrode equation**: $$S_\text{trans} = k_B \left( \ln \left( \left( \frac{2 \pi MW k_B T}{N_A h^2} \right)^{3/2} \frac{k_B T}{P} \right) + \frac{5}{2} \right)$$ - **Rotational Entropy ($S_{rot}$)**: Calculated within the **Rigid Rotor (RR)** approximation: * **Non-linear**: $$S_\text{rot} = k_B \left( \ln \left( \frac{\sqrt{\pi} T^{3/2}}{\sigma \sqrt{\theta_A \theta_B \theta_C}} \right) + \frac{3}{2} \right)$$ * **Linear**: $$S_\text{rot} = k_B \left( \ln \left( \frac{T}{\sigma \theta} \right) + 1 \right)$$ Where $\theta_x = \frac{h c B_x}{k_B}$ are the characteristic rotational temperatures derived from the rotational constants $B$, and $\sigma$ is the symmetry number, considered for all molecules as 1 (thus with a $C_1$ point group). * **Electronic Entropy ($S_\text{el}$)**: Determined solely by the spin multiplicity ($m$): $$S_\text{el} = k_B \ln(m)$$ * **Vibrational Entropy ($S_\text{vib}$)**: Following the method by Grimme. entropy is interpolated between the Harmonic Oscillator limit ($S_\text{HO}$) and a free-rotor entropy term ($S_\text{rot}^\text{Grimme}$) using the damping function $w(\nu)$: $$ S_\text{vib} = \sum_{i}^{N_\text{modes}} \left( w(\nu_i) S_\text{HO}(\nu_i) + [1 - w(\nu_i)] S_\text{rot}^\text{Grimme}(\nu_i) \right) $$ The rotor contribution $S_{rot}^{Grimme}$ considers the effective moment of inertia of the vibrational mode ($\mu$) and the average molecular rotational constant ($B_\text{av}$): $$S_\text{rot}^\text{Grimme} = \frac{1}{2} k_B \left( 1 + \ln \left( \frac{8 \pi^3 \mu B_\text{av}}{\mu + B_\text{av}} \frac{k_B T}{h^2} \right) \right)$$ ### 3.3 Enthalpy ($H$) and Gibbs Free Energy ($G$) The final thermodynamic potentials are assembled as follows: $$ H = E_\text{SCF} + \text{ZPVE} + U_\text{tot} + k_B T $$ $$ G = H - T S_\text{tot} $$ Where: * $E_{SCF}$ is the electronic energy from the QM calculation. * $\text{ZPVE}$ (Zero Point Vibrational Energy) is the sum of $\frac{1}{2} h \nu c$ for all real frequencies. * $k_B T$ is the $PV$ work term for an ideal gas ($H = U + PV$). --- ## 4. Boltzmann Weighting and Spectra Final spectral properties are computed as a Boltzmann-weighted average of the individual active conformers. ### 4.1 Population Calculation The probability $p_i$ of finding the system in conformer $i$ at temperature $T$ is derived from the relative Gibbs Free Energy ($\Delta G_i$): $$ p_i = \frac{e^{-\Delta G_i / k_B T}}{\sum_j e^{-\Delta G_j / k_B T}} $$ ### 4.2 Spectral Convolution EnAn distinguishes between vibronic and electronic spectra for the convolution function $f(\nu, \nu_0, \sigma)$: 1. **Vibronic Spectra (IR, VCD)**: Use a **Lorentzian** line-shape function to model lifetime broadening: $$ f_\text{Lorentz}(x; x_0, \gamma) = \frac{\text{FWHM}^2}{\text{FWHM}^2 + 4(x - x_0)^2} $$ 2. **Electronic Spectra (UV-Vis, ECD)**: Use a **Gaussian** line-shape function to model inhomogeneous broadening: $$ f_\text{Gauss}(x; x_0, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-x_0}{\sigma}\right)^2 } $$ The final intensity $I_{total}(\nu)$ is the weighted sum: $$ I_\text{total}(\nu) = \sum_i^{N_{conf}} p_i \sum_k^{N_{modes}} I_{i,k} \cdot f(\nu, \nu_{i,k}, \text{FWHM}) $$