Recently, Google has created an AI system which is able to create music according to a text description. This follows the trend of AI being used to generate images, as well as helping people compose written works.
Google, the online search and advertising behemoth, called this system ‘MusicLM.’ An academic paper released on January 26th from Google’s researchers explains MusicLM as “a model that produces high-quality music from text descriptions, such as ‘a soothing violin tune complemented by a distorted guitar riff.'”
The paper states that MusicLM has been proven to be able to adapt to both a text caption and a melody, whether it be a hum or a whistle.
The paper specifies that MusicLM can be fed with user-generated descriptions, for example “enchanting jazz track having a memorable sax solo and a vocalist” or “Berlin 90s techno accompanied by a deep bass and firm kick,” and still give back the right results.
One can hear some of the tunes created with MusicLM here.
The rapid growth of OpenAI’s ChatGPT, an artificial intelligence-driven natural language processing tool, has caused Google to take action, with reports of the company’s management having declared a “code red” according to The New York Times. In response, Google is allegedly preparing to unveil more than 20 AI-based projects in 2021, including a version of Google Search that incorporates AI technology.
TechCrunch mentions that MusicLM is not the only AI music generator, as other projects such as Riffusion, Dance Diffusion, Google’s AudioLM and OpenAI’s Jukebox have been exploring the same field.
It is unlikely that MusicLM will be available to the public soon due to worries about the possibility of programming biases causing technological issues, lack of representation, and misuse of creative content. To illustrate, during a trial, Google’s research team discovered that one percent of the music produced by MusicLM was identical to the songs it was trained on.
It is possible to reduce the risk of plagiarism by altering the structure of a text while preserving its contextual and semantic content. This can be achieved by rearranging the words, phrases, and sentences while keeping the original meaning intact.
Seok Chen is a mass communication graduate from the City University of Hong Kong.