Researchers have an increasingly powerful set of cheminformatics tools that they can use to construct and refine libraries of compounds to match complex, multi-parametric sets of data and property values. Similarity searching, clustering, and filtering can all help explore chemical space and reduce an overwhelmingly large set of potential structures to a more manageable subset of representative compounds that might be synthesized or acquired.

Time and budgets are key factors in research projects, so if a team decides on acquiring needed compounds rather than synthesis, up-to-date and reliable searchable information on compound availability and price becomes crucial.

Suppliers’ catalogs list “available” compounds, but knowing if a listed compound is ready to deliver from stock, is on back order, or has to be synthesized from scratch can help a scientist to decide whether to order that specific compound, or to look for an alternative. At the same time, the current price of each compound must be factored in to ensure that a complete library can be acquired with the funds available in the budget. Sample availability and price are both very dynamic in the compound supply market, and keeping up-to-date requires constant attention.

With these challenges in mind, MolPort has developed a Cluster Search tool, that takes in pre-clustered lists of compounds and searches for the optimum set of compounds with ready availability and the best pricing within a specified budget cap.

Cluster search

Creating Representative Subsets of Natural Product Chemical Space

Technical University of Denmark (DTU) have an interest in natural product chemistry, and their teams wanted to explore the known set of ca. 200K natural product structures to determine if they could create and acquire a reduced, diverse subset of representative compounds that covered a substantial part of the natural product chemical space.

They clustered and filtered the 200K natural products into smaller subset groups using the cheminformatics tools and techniques described here. They also applied additional physicochemical filters such as SLogP to further refine the lists of clusters. The next step was to select a set of structures that were each close to the centroid of an individual cluster, and that were also commercially available and reasonably cheap; and for this study, they focused on the smallest library that could be purchased in reasonable quantities for < 100 US$ per 5 mg.

This final step of searching for related structures in clusters that also match up-to-date availability and pricing requirements can be cumbersome and repetitive, requiring successive searches over catalog data and hit list manipulations, and this is where the MolPort Cluster Search tool proved to be an invaluable time-saver.

MolPort Cluster Search Tool in Action

The Cluster Search tool starts with a list of representative structures (as SMILES, molfiles, or SDfiles) that can be scaffold-based analogs related to a cluster centroid, or Tanimoto-ranked similar compounds. Researchers can specify the number of the centroid and example structures to be searched for, optionally ranking example structures in a cluster in order of preference. Two key additions to the structure search parameters are availability and cost, and these are factored in by adding amounts required and a cap on price per mg as part of the standard List Search capabilities.

These structure/amount/price parameters are used to rapidly search MolPort’s up-to-date, comprehensive database of 7.6M + available stock compounds. This results in a list of structure matches which are available for delivery from stock in the required amounts, and within the specified price cap. This hit list can be used to simply generate a quote for acceptance and processing.

The scientists used the MolPort Cluster Search tool in this way and were able to identify 117 compounds available from 15 suppliers, which met all the criteria and covered a large amount of the natural product chemical space. They actually ordered 134 compounds via MolPort’s ordering service and were very pleased to receive 133 of them within the expected four-week deadline.


Application of the Cluster Search tool by MolPort to your clustered structures is available on request from MolPort. The tool can help researchers easily acquire smaller, targeted, focused subset libraries from larger clustered sets of structures, while at the same time ensuring ready sample availability and affordability. If achieving broad structural diversity with the smallest number of compounds and with a limited budget is important in your research, please contact MolPort to learn more about the Cluster Search tool.