To meet your requirements for building QSAR equations, QSAR+ enables you to work with descriptors in a variety of ways.
You manage descriptors by using the several control panels (described later in this chapter). Descriptor management includes activities such as identifying the descriptors with which you want to work, displaying and selecting only descriptors in a specific class, specifying preferences for the various descriptors, and adding descriptors to the study table.
When QSAR+ is installed, you can access a descriptor database that contains the equations used to calculate molecular descriptors. You can edit this database to modify the supplied descriptors, create new descriptors, specify which descriptors should be considered default descriptors, create new descriptor categories, and control the format in which the results of descriptor calculations are displayed in the study table.
In Start-up and Configuration:
You can see the descriptors in each set by selecting Descriptors/Databases from the study table menu bar. This opens the Descriptor Database control panel, which contains a list of descriptors.
The message at the top of the Descriptor Database control panel identifies the current default set.
|EPenalty||Conformational energy penalty.|
|LowEne||Lowest energy conformer.|
|Charge||Sum of partial charges.|
|Fcharge||Sum of formal charges.|
|Apol||Sum of atomic polarizabilities.|
|HOMO||Highest occupied molecular orbital.|
|LUMO||Lowest unoccupied molecular orbital.|
|InfoContent||Graph-theoretical Information-content indices.|
|Molecular shape analysis (MSA)|
|Fo||Common overlap volume (ratio).|
|NCOSV||Non-common overlap steric volume.|
|ShapeRMS||Rms to shape reference.|
|COSV||Common overlap steric volume.|
|SRVol||Shape reference volume.|
|LUMO_MOPAC||Lowest unoccupied molecular orbital from MOPAC.|
|DIPOLE_MOPAC||Dipole moment from MOPAC.|
|HF_MOPAC||Heat of formation from MOPAC.|
|HOMO_MOPAC||Highest occupied molecular orbital from MOPAC.|
|Receptor_energies||Molecule-receptor interaction energies.|
|Receptor_RSA||Molecule-receptor points interaction energies.|
|RadOfGyration||Radius of gyration.|
|Jurs descriptors||Jurs charged partial surface areas descriptors.|
|Shadow indices||Surface area projections descriptors.|
|Area||Molecular surface area.|
|PMI||Principal moment of inertia.|
|Rotlbonds||Number of rotatable bonds.|
|Hbond acceptor||Number of hydrogen-bond acceptor groups.|
|Hbond donor||Number of hydrogen-bond donor groups.|
|Chiral centers||Count of the number of chiral centers (R or S) present in a molecule.|
|AlogP||Ghose and Crippen logP.|
|AlogP98||Log of the partition coefficient, atom-type value.|
|Fh2o||Desolvation free energy for water.|
|Foct||Desolvation free energy for octanol.|
|Hf||Heat of formation.|
|MolRef||Ghose and Crippen molar refractivity.|
|Kappa indices||Molecular shape kappa indices.|
|PHI||Molecular flexibility index.|
|Chi indices||Kier & Hall chi connectivity indices.|
|log Z||Logarithm of Hosoya index.|
The Descriptors control panel contains a list of the descriptors in the current descriptors database. These may be selected by clicking the descriptor name in the first column, for example, clicking EPenalty causes that row of the descriptor table to become highlighted, which means it will be added to the study table (see the next section for details). To unselect a descriptor, click any part of the table other than the first column, so that the highlight is turned off.
The Descriptors control panel contains controls that allow you to select groups of descriptors. The left popup controls whether the action that occurs when you click the associated action button is to Select, Deselect, or Display the selected descriptors. For example, if you want to select all the conformational descriptors, you can do so by choosing Select in the left popup and then setting the Descriptors in Family popup (far right) to Conformational. Now when you click the (unlabeled) action button (below ADD), the conformational descriptors are selected. To deselect them, change the Display popup to Deselect, then click the action button again.
If you find the display of all the descriptors at the same time distracting, you can display just the selected descriptors by setting the popup to Display.
Another way to select a subset of descriptors is to use the All/Default popup. To see the effect of this control, set the Descriptors in Family popup to Electronic, select Default from the All/Default popup, then click the action button.
When the Descriptors in Family popup is set to Electronic, for example, the Preferences button is labelled Electronic. When you click this newly active pushbutton, a control panel appears, which allows you to customize certain aspects of the way the electronic descriptors are calculated. For example, if you decide that only the total dipole moment is needed, uncheck the XYZ Components checkbox. Now only the total dipole moment (calculated from atomic partial charges) is added to the study table.
Preferences for the calculation of other types of descriptors can be set in the same way.
If Edge-based is checked, the four buttons below apply to information indices based on the edge adjacency and edge distance matrices, specifically,
For a detailed explanation of this descriptor, see Chapter 5, Theory: QSAR+ descriptors.
Receptor descriptor preferences
Setting the family popup in the Descriptors control panel to Receptor and clicking the Receptor pushbutton opens two control panels: Receptor-Model Interactions and RSA Preferences (receptor surface analysis).
Open the Spatial Descriptors control panel by setting the family popup in the Descriptors control panel to Spatial and then selecting the Spatial button.
Jurs charged partial surface area parameters
The definition of polar atoms and the probe radius for the solvent-accessible surface area calculation can also be customized with the Spatial Descriptors control panel.
|Checkbox||Toggles calculation of descriptors|
|Solvent Accessible Surface Area||SAS area descriptor Jurs-SASA|
|Partial Charged Surface Areas||Jurs-PPSA-1, Jurs-PNSA-1, Jurs-DPSA-1|
|Total Charge Weighted Surface Areas||Jurs-PPSA-2, Jurs-PNSA-2 and Jurs-DPSA-2|
|Atomic Charge Weighted Surface Areas||Jurs-PPSA-3, Jurs-PNSA-3, and Jurs-DPSA-4|
|Fractional Charged Partial Surface Areas||Jurs-FPSA-1, Jurs-FPSA-2, Jurs-FPSA-3, Jurs-FNSA-1, Jurs-FNSA-2, Jurs-FNSA-3|
|Surface Weighted Charged Partial Surface Areas||Jurs-WPSA-1, Jurs-WPSA-2, Jurs-WPSA-3, Jurs-WNSA-1, Jurs-WNSA-2, Jurs-WNSA-3|
|Relative Positive and Negative Charges||Jurs-RPCG, Jurs-RNCG, Jurs-RPCS, Jurs-RNCS|
|Relative Polar and Apolar Surface Areas||Jurs-TPSA, Jurs-TASA, Jurs-RPSA, and Jurs-RASA|
For an explanation of the shadow indices see the Shadow indices section on page 97 under Theory. The correlation between the Shadow Parameters checkboxes and the descriptor names is:
|Checkbox||Toggles calculation of descriptors|
|Areas of Molecular Shadows||Shadow-XY, Shadow-XZ, and Shadow-YZ|
|Fractional Areas of Molecular Shadows||
Shadow-XYfr, Shadow-XZfr, and |
|Extents of Molecular Shadows||Shadow-nu, Shadow-Xleng, Shadow-Yleng, and Shadow-Zleng|
Defining hydrogen-bond acceptors and donors and rotatable bonds
The definitions of hydrogen-bond acceptors, hydrogen-bond donors, and rotatable bonds can be customized with the Structural Descriptors control panel.
Thermodynamic descriptors preferences
The 115 atom types defined in the calculation of AlogP98 are now available as descriptors. To calculate them, select the entry AlogP_atypes in the Thermodynamic family in the descriptor table. Each AlogP98 atom-type value represents the number of atoms of that type in the molecule. An additional atom type called Unkown_Type can also be added to the table, together with the other AlogP98 atom types. A value greater than zero for this descriptor indicates the presence of atoms that couldn't be classified as any of the defined AlogP98 atom types. The AlogP Atom Types control panel allows you to select the elements to be taken into account.
Topological descriptors preferences
For an explanation of the topological descriptors see the discussion of graph-theoretical (page 73) and information-content descriptors (page 91).
To change preferences for topological descriptors, set the family popup in the Descriptors control panel to Topological and select the Topological pushbutton. The correlation between the checkboxes in the Topological Descriptors control panel and the descriptors is:
|Checkbox||Toggles calculation of descriptors|
|Unmodified||Molecular connectivity Indices CHI-0, CHI-1, and CHI-2|
|Valence-modified||Valence-modified connectivity index, a refinement which takes into account the atomic number and order of connected bonds.|
|Subgraph Order From and To||Range of allowable orders in subgraphs: 0 through M, where M is the number of edges in the graph.|
|Subgraph Type||Checkboxes Path, Cluster, Path/Cluster, and Ring specify the subgraph types used with the molecular and valence-modified connectivity indices.|
|Kier & Hall Kappa Shape Indices||Shapes of molecules in terms of the count of atoms (One), count of branchings (Two), and count of paths of length 3 (Three).|
|Subgraph Counts||Path, Cluster, Path/Cluster, and Ring subgraphs found in the model.|
|Balaban Indices||Characterize the shape of a molecule, which can take account of the covalent radii (JX) and electronegativity (JY) of the atoms of the model.|
Adding descriptors to the study table
When you have selected the set of descriptors that you want to use, you add them to the study table by clicking the ADD button in the Descriptors control panel.
Using ISIS keys and Daylight fingerprints
To work with ISIS keys, select Descriptors/Fingerprints/Isis Keys from the study table to open the 2D Fingerprints Isis Keys control panel. With this control panel, you can:
The first control panel (Receptor-Model Interactions) is concerned with addition of the receptor energy descriptors to the study table. To learn more about the receptor energy descriptors, see Receptor descriptors under Theory.
The second control panel (RSA Preferences) controls the addition of interaction energies at each vertex of the surface. You may add only the van der Waals (steric) component of the interaction energy or only the electrostatic component or both, by checking the VDW, ELE, and TOT (total) checkboxes.
a. Add all surface points
b. Add every Nth surface point
a. Add points with variance higher than threshold
b. Add percentage of points with highest variance
a. Add points with correlation higher than threshold
b. Add percentage of points with highest correlation^2
Next, click the action button on the extreme left side of the Descriptors control panel (underneath the ADD button). This displays the receptor descriptors Receptor_energies and Receptor_RSA. To select the Receptor_RSA descriptor, click the cell containing the label Receptor_RSA. To add the receptor surface data to the study table, then click the ADD pushbutton. The receptor surface points are added to the study table.
These points may be displayed with the Manage Independent Columns control panel, which is accessed by selecting the Variables/Manage Independent menu item in the study table. Set the 3D-QSAR Labels popup to RSA and click the Label Independent Variables action button.
Surface points in the study table are displayed on the receptor surface model as a label, for example, TOT/123. The first part of the label refers to the type of energy term specified in the RSA Preferences control panel under Include Molecule-Surface Point Interaction Energies. The second part is the number of the surface point and is the same index as the Surface point index in the first column of the output of the Receptor List function.
Typically, the next stage is to calculate a QSAR that relates the receptor surface energy at each surface point to experimental activity data. For a guide to calculating QSARs, see Chapter 15, Using the equation viewer, and Chapter 3, QSAR+ QuickStart.
Using pKa descriptors
For the pKa program to be found by Cerius2, it must be listed in the applcomm.db file in $C2DIR/libraries/applcomm.db. The form of the entry is:
A unix pKa pathnamewhere pathname is replaced by the pathname of your pKa application.
1. Open the appropriate descriptor database
2. Set the pKa descriptor preferences:
3. Add the pKa descriptors to the study table
A count of pKa columns begins with the string n_pKa_. This is followed by the range of values being counted. For example, n_pKa_0.00_14.00 is a count of pKas with values between 0.00 and 14.00.
A list of pKa columns begins with the string pKa_. The first number tells which pKa value among the selected pKas is held in this column. The second number gives the maximum number of pKas to be listed. The third number specifies whether the pKas are listed from low to high (number = 0) or from high to low (number = 1), The fourth number specifies whether a range (number = 0) or a lower (number = 1) or upper (number = 2) bound is used to select the pKas to list. If a range is used, it is followed by two numbers specifying the range. If a lower or upper bound is used, it is followed by the number specifying the bound. For example, pKa_1_2_0_2_14.00 is the lowest pKa of a maximum of two pKas under the bound of 14.00.
The panel is divided into three sections, one each for the ADME models (Egan et. al 2000). Each is described in the following sections.
First select a model type:
First select a model type:
|0||Very High||Brain-Blood ratio greater than 5:1|
|1||High||Brain-Blood ratio between 1:1 and 5:1|
|2||Medium||Brain-Blood ratio between 0.3:1 and 1:1|
|3||Low||Brain-Blood ratio less than 0.3:1|
|4||Undefined||Outside 99% confidence ellipse|
|5||Alogp98||Warning: molecules with one or more unknown Alogp98 types|
Report Solubility Level Values: Check this to include a column of solubility levels corresponding to the logarithm of the water solubility.
|< -8.0||0||extremely low solubility, lower than 95% of drugs|
|-8.0 to -6.0||1||very low solubility, at border line of 95% of drugs|
|-6.0 to -4.0||2||low solubility, at lower end of 95% of drugs|
|-4.0 to -2.0||3||good,slight soluble to soluble|
|-2.0 to 0.0||4||optimal solubility|
|> 0.0||5||very soluble, perhaps too soluble|
|1000||6||Warning: molecules with one or more unknown Alogp98 types|
Rule of five
Reports the number of violations to Lipinski's Rule of 5 (Lipinski et al. 1997):
|400 dipeptides||Fast Descriptors||1127||1080||1004|
|625 benzodiazepines||Fast Descriptors||1306||1130||1083|
|1000 ACD molecules||Fast Descriptors||1987||1674||1720|
|400 dipeptides||Study Table||16||16||17|
|625 benzodiazepines||Study Table||17||17||18|
|1000 ACD molecules||Study Table||21||20||24|
Once the ADME descriptors have been calculated and saved in either the Study Table or BDF files, the results can be analyzed using tools accessible from the menu bar in the Study Table (under Descriptors/ADME...) or from the new menu bar in the Select BDF panel (Analysis/ADME Models...).
Analyzing ADME descriptors
Intestinal Absorption Model
You can analyze the results of either or both of the following models:
The PLOT button generates a plot of PSA vs. AlogP98, such as the one shown below.
Two check boxes below specify the display of the 95% and 99% confidence limit ellipses obtained in the development of the model (Lipinski et al. 1997).
There are also options to display BBB Penetration model ellipses, which occupy a slightly different position in the plot. The Absorption level is calculated based on the position of each molecule in the PSA vs. AlogP98 plot:
|0||Good||Inside 95% Ellipse|
|1||Moderate||Inside 99% Ellipse|
|2||Poor||Inside box defined by PSA between 0 and 150 and AlogP98 between -2 and 7|
|3||Very Poor||Outside box|
|4||Undefined||Molecules with unknown AlogP98 atom types|
By default the plot is centered on the good absorption areas (around the 95% and 99% ellipses) and the points are color-coded according to Absorption level.
BBB Penetration Model
The ADME BBB Penetration model control panel works in a similar way to the absorption model control panel.
Try pusing the PRINT button to generate sample output.
Water Solubility Model
The ADME Solubility control panel is relatively simple and self-explanatory.
A descriptor database is a Cerius2 table containing equations and equation coefficients used to calculate molecular descriptors. When QSAR+ is installed, you can access a database that contains over 100 spatial, electronic, thermodynamic, conformational, and other descriptors.
Editing a descriptor database
Because the descriptor database is accessed as a Cerius2 table, you should be familiar with Cerius2 tables before performing any activities described in this section. For information about tables and basic table operations, see Cerius2 Modeling Environment.
Opening a descriptor database
You select and open a descriptor database in a descriptor database table before you can edit it. The default database name is listed in the text window when you open QSAR+.
If you have only a single database or if you want to use the currently selected database, select Descriptors/Databases in the study table or on the QSAR card. The Descriptor Database control panel appears.
The descriptor database table contains one row for each descriptor. Each row contains columns, some of which are described below (to see all columns, use the horizontal scroll bar).
You can change the set of default descriptors by editing the Default column.
1. Select the cell in the Default column for that descriptor.
2. Clear the edit window and enter 1.
3. Press <Return> or click any other cell in the table.
1. Select a cell in the Default column.
2. Press <Return> or clear the edit window and enter 0.
3. Click any other cell in the table.
1. Insert a new row in the Descriptor Database table using the Insert tool.
2. In the Family column of the new row, enter a family name.
3. Enter a descriptor equation in the Value column using valid math and molecular operators.
ecount(col "Structure", "Cl") + ecount(col "Structure", "Br")
4. In the Description column, enter a short description of the descriptor. For example, enter:
Number of halogen atoms
5. In the 3D column, enter 0 if your descriptor is not a 3D descriptor. Enter 1 if the descriptor is 3D.
6. In the Default column, enter 1 if you want the descriptor to be
part of the default set. Enter 0 if the descriptor is not to be a
default descriptor (Identifying default descriptors on page 168).
7. In the Format column, enter the format for descriptor values to
be displayed in the study table. The choices are float, integer, or
8. In the Decimal column, enter the number of decimal places to
be displayed in a descriptor value. If you entered integer in the
Format column, enter 0.
9. In the Units column, enter the units (for example, kcal/mol) to
be applied to the descriptor value. If no units are to be applied,
leave the cell blank.
10. If the descriptor can be modified from a Cerius2 control panel,
enter the name of the control panel in the Panel column. Otherwise,
leave this column blank.
11. To name the descriptor, click the first column in the row, then
click the Prop (properties) tool. The Table Properties control
panel appears. Select Row from the Properties popup.
12. Enter a name (for example, Halogens) in the Row Name entry
13. Click APPLY TO. The row name is entered in the first column
of the selected row. QSAR+ sorts the descriptor list as it performs
calculations, so the position of a descriptor in the list may
14. Save the database containing the new descriptor. You can save
the descriptor to the current database, to another existing database,
or to a new database. For more information, Saving a
descriptor database on page 171.
|To activate a new descriptor, you must first save the descriptor database with the descriptor in it.|
When you finish creating a descriptor, you can check to see that it is correctly entered by adding it to the study table and inspecting the generated data (see Adding descriptors to the study table on page 154).
You can modify an existing descriptor in a database by editing the entry for the descriptor in the Value column of the descriptor database table. For example, to modify the Halogens descriptor defined above so that it counts fluorine as well as chlorine and bromine atoms, enter:
ecount(col "Structure", "Cl") + ecount(col "Structure", "Br") + ecount(col "Structure", "F")in the Value column for the descriptor.
Save the database to activate the edited descriptor (see Saving a descriptor database on page 171).
When you finish modifying a descriptor, you can check to see that the modifications are correct by adding it to the study table and inspecting the generated data (see Adding descriptors to the study table on page 154).
Controlling the descriptor display format
You can control the numerical format of a descriptor value using one of the following options: floating decimal (float), integer (integer), or scientific notation (scientific).
Creating new descriptor categories
The entry in the Family column of the descriptor database table categorizes descriptors and determines the list of choices in the family popup in the Descriptors control panel.
You can create new categories of descriptors by placing new entries in the Family column. For example, if investigator Jones wants to place all saved equations in a category named Jones-QSARs, Jones simply enters this designation in the Family column for the rows containing QSARs and saves the modified table. The value Jones-QSARs now appears as a choice in the family popup on the Descriptors control panel.
Saving a descriptor database
If you make a change in the descriptor database table, that change is not activated until the table is saved and then read back into Cerius2 again with OPEN DATABASE.