Abstract: | A conceptual clustering system, CLUSMOL/S, has been developed to classify protein sequences from a user-defined point of view. Given a grouping of amino acids as a viewpoint, the system constructs taxonomic trees of sequences based on minimum information criterion. Every tree node expresses itself as a generic consensus sequence that consists of specific consensus amino acids, insertionideletion points, and generic amino acids with a specified character. The resulting tree and generic sequences show the similarity-based relationships among sequences and their characteristics. Application to vertebrate cytochromes c yields an acceptable cladrogram only when amino acids are grouped by volume and length of sidechains. The result indicates that the steric factor is the most important constraint in the process of protein evolution. © Munksgaard 1995. |