I have investigated systems for on-line, cumulative learning of compositional hierarchies embedded within predictive probabilistic models. The hierarchies are learned unsupervised from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higherlevel knowledge representation. These systems are examples of a rare combination unsupervised, on-line structure learning (specifically structure growth). The system described here embeds a compositional hierarchy within an undirected graphical model based directly on Boltzmann machines, extended to handle categorical variables. A novel on-line chunking rule creates new nodes corresponding to frequently occurring patterns that are combinations of existing known patterns. This work can be viewed as a direct (and long overdue) attempt to explain how the hierarchical compositional structure of classic models such as McClelland and Rumelhart s Interactive Activation model of context effects in letter perception can be learned automatically.