High-quality audio can be generated from written descriptions using the MusicLM tool. This model, a hierarchical sequence-to-sequence modeling job, can produce continuous music at 24 kHz for several minutes. The tool can use both text and melody to condition the generated music, enabling it to adapt whistled or hummed melodies to fit the genre specified in a subtitle. Painting descriptions, instruments, genres, levels of musician expertise, locations, and time periods can all be input into the tool to generate musical results. There are a number of text prompt and semantic token variations that can be generated by the utility.