Google AI Tool Creates Music from Written Descriptions - VOA Learning English

This week, Google researchers published a paper describing results from an artificial intelligence (AI) tool built to create music.

The tool, called MusicLM, is not the first AI music tool to launch. But the examples Google provides demonstrate musical creative ability based on a limited set of descriptive words.

AI shows how complex computer systems have been trained to behave in human-like ways.

Tools like ChatGPT can quickly produce, or generate, written documents that compare well with the work by humans. ChatGPT and similar systems require powerful computers to operate complex machine-learning models. The San Francisco-based company OpenAI launched ChatGPT late last year.

Developers train such systems on huge amounts of data to learn methods for recreating different forms of content. For example, computer-generated content could include written material, design elements, art or music.

ChatGPT has recently received a lot of attention for its ability to generate complex writings and other content from just a simple description in natural language.

Google’s MusicLM

Google engineers explain the MusicLM system this way:

First, a user comes up with a word or words that describe the kind of music they want the tool to create.

For example, a user could enter this short phrase into the system: “a continuous calming violin backed by a soft guitar sound.” The descriptions entered can include different music styles, instruments or other existing sounds.

Several different music examples produced by MusicLM were published online. Some of the generated music came from just one- or two-word descriptions, such as “jazz,” “rock” or “techno.” The system created other examples from more detailed descriptions containing whole sentences.

In one example, Google researchers include these instructions to MusicLM: “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds…”

In the resulting recording, the music seems to keep very close to the description. The team said that the more detailed the description is, the better the system can attempt to produce it.

The MusicLM model operates similarly to the machine-learning systems used by ChatGPT. Such tools can produce human-like results because they are trained on huge amounts of data. Many different materials are fed into the systems to permit them to learn complex skills to create realistic works.

In addition to generating new music from written descriptions, the team said the system can also create examples based on a person’s own singing, humming, whistling or playing an instrument.

The researchers said the tool “produces high-quality music...over several minutes, while being faithful to the text conditioning signal.”

At this time, the Google team has not released the MusicLM models for public use. This differs from ChatGPT, which was made available online for users to experiment with in November.

However, Google announced it was releasing a “high-quality dataset” of more than 5,500 music-writing pairs prepared by professional musicians called MusicCaps. The researchers took that step to assist in the development of other AI music generators.

The MusicLM researchers said they believe they have designed a new tool to help anyone quickly and easily create high-quality music selections. However, the team said it also recognizes some risks linked to the machine learning process.

One of the biggest issues the researchers identified was “biases present in the training data.” A bias might be including too much of one side and not enough of the other. The researchers said this raises a question “about appropriateness for music generation for cultures underrepresented in the training data.”

The team said it plans to continue to study any system results that could be considered cultural appropriation. The goal would be to limit biases through more development and testing.

In addition, the researchers said they plan to keep improving the system to include lyrics generation, text conditioning and better voice and music quality.

I’m Bryan Lynn.

Bryan Lynn wrote this story for VOA Learning English, based on reports from Google.