A small automaton for word recognition in DNA sequences and its application to consensus analysis of regulatory elements in DNA regions controlling gene expression.

C Lefèvre, JE Ikeda

Proc Int Conf Intell Syst Mol Biol | Published : 1993


A method for pattern analysis of DNA sequence data is considered. A space economical automaton for word recognition was presented elsewhere together with an algorithm for its compilation in linear time. An algorithm for the localization of words including imperfect matches (motif search) was developed. A program was implemented on the Macintosh and used extensively for the representation of the word composition of DNA data. We explore different sets of regulatory sequences to illustrate the performance of this method. In mammalian DNA, this analysis reveals "consensus motifs" corresponding to functional (or putative) cis-acting elements mediating the regulation of gene expression.