AI Music Production Is Getting Easier, But Is It Getting Better?


Ever since Neanderthals began making flutes out of bone about 60,000 years ago, technology has shaped how and why we make sounds for one another.

So much of the musical experience we take for granted today would have been unimaginable even 200 years ago: analog recording, electric amplification, digital distribution, and portable, streaming playback. Now, we have generative AI that can render text, still and moving images, and sounds from only a text prompt, whether that prompt comes from a skilled professional musician or a second grader.

New AI music generators combine a variety of extant technologies behind a text interface, promising to imbue casuals with the gift of musical creation and streamline the process for accomplished musicians. Whether these applications are transformative or a passing novelty, they create an unavoidable tension: AI-generated music can expand the potential of human creativity, but in doing so, it could choke off the livelihoods of the musicians who make it possible.

From those first Flintstones-y jam sessions until relatively recently, there was no good way to make money from music: Music appears so much earlier in human history than money, let alone the music industry. We’ve found all sorts of other reasons to make music anyway. Making, hearing, dancing to, and talking about music are fundamental human experiences, an essential means for relating to one another. Given this irrepressible human compulsion, AI won’t destroy music — but it can radically shift who is making it, what we hear, and why we create it in the first place.

Today, there are millions of people worldwide who depend on music to support themselves and their families, and music-generating machines threaten their livelihoods. The prospect of, say, AI-generated commercial jingles may seem benign, but that earworm is also a paycheck for whoever made it, income that might allow them to pay their rent and develop their talent at the same time. The next Barry Manilow or Sheryl Crow might be toiling away at royalty-free stock music as you read this. Their automated replacement might move some tires or cans of soup, but it’s not likely to make a timeless record or sell out a tour — and the people who could have had those jobs become less likely to reach those heights.

Our present-day music ecosystem — of which the music business is just one component — is chaotic and complex and inefficient, but that is all the more reason to be cautious about clear-cutting: We can’t just regrow a forest that has been growing for millennia.

As our means of creation, collaboration, and consumption evolve, new musical niches emerge to be filled, and the species that can’t adapt is vulnerable to extinction. From chart-topping pop songs to homemade obscurities to your insurance company’s hold music, much of what is familiar about music today is already undergoing a radical change.

Musicians, producers, and others have been using non-generative AI tools for years. Cher popularized auto-tune with “Believe” over a quarter of a century ago, and countless artists have used it since to “correct” their pitch. Record labels use AI to scan social media platforms for unlicensed use of songs they own, and Shazam works roughly the same way in recognizing audio. Engineers use it to streamline the mixing and mastering process. More recently, Get Back director Peter Jackson employed the technology to isolate individual tracks from a mixed-down recording to recover studio conversations and produce a lost Beatles song.

But there’s a categorical difference between these assistive tools and generative AI applications, like Suno and Udio, which can produce full songs from nothing but a few words.

The new musical AIs all work a little differently and continue to evolve, but they tend to operate similarly to other generative AI tools: They analyze a vast data set and apply the patterns found in that trove to make probabilistic predictions.

To do this for audio, developers compile a massive collection of songs (through agreements with license holders and/or scraping publicly available data without permission) and their associated metadata (artist and song titles, genres, years, descriptions, liner notes, anything relevant and accessible). All this is typically made possible with the help of underpaid workers in the Global South to annotate that data at a mammoth scale.

The developers then prepare that data set for a machine learning model, which is (reductively) a vast network of connections, each of which is assigned a numerical “weight.” People then “train” the model, teaching it to observe patterns within the data set and providing feedback to the model by evaluating its predictions. Based on those patterns, the model can take a short snippet of audio or a text prompt and predict what should come next, and then what comes after that, and so on.

Developers tweak the weights to generate more listenable and predictable outputs from the same inputs. AI music generators combine two buckets of technology: the musical tools professionals have been using in the studio for decades, and the large language models that allow casual users to summon their power.

Any music AI generator is only as good as the data it’s trained on. These systems require a vast amount of data, and a model trained on a biased data set will reproduce those biases in its output. Whose voices are included in that huge crate of music, and whose are left out? Today’s AI models are liable to exclude vast swaths of music, especially from musical traditions that predate recording technology and come from non-Western origins. As currently designed, they are more likely to produce stereotypical sounds within a genre or style than they are to produce anything peculiar, let alone innovative or interesting. Generative AI systems have a bias toward mediocrity, but transcendent music is found on the margins.

“What would be lost from human creativity and diversity if musicians come to rely on predictive models trained on selective data sets that exclude the majority of the world’s many cultures and languages?” Lauren M.E. Goodlad, chair of Rutgers University’s Critical AI initiative, told me.

Legally, the musicians watching as AI models are trained on their work have the same concerns as the New York Times, Getty, and other publishers and creators who are filing lawsuits against AI companies: data provenance. Though some companies are careful to train their models on only licensed data, others use whatever they can get their hands on, arguing that anything that’s publicly accessible falls under fair use for this purpose. The RIAA, the dominant music recording trade organization in the US, is now suing Suno and Udio for “copyright infringement … on a massive scale.” (Disclosure: Vox Media is one of several publishers that has signed partnership agreements with OpenAI. Our reporting remains editorially independent.)

Polls frequently find that most people disapprove of AI companies scraping public data without permission. But while a number of high-profile suits are on the docket, it’s not yet clear how the legal system will affect the companies scraping all this human creativity without permission, let alone compensation. If these practices are not forestalled soon, the least scrupulous actors will quickly accrue power and the fancy lobbyists and lawyers that come along with it. (Soullessness: It’s not just for machines!)

These issues are urgent now because they become less tractable with time, and some in the field are pushing back. Ed Newton-Rex was VP of audio at Stability AI when it launched Stable Audio, an AI music and sound generator, last fall. He quit the company just a couple of months later over its stance on data scraping: Newton-Rex’s team had trained Stable Audio on only licensed data, but company leadership issued a public comment to the US Copyright Office that AI development is “an acceptable, transformative, and socially-beneficial use of existing content that is protected by fair use.”

To push back against unlicensed scraping, Newton-Rex founded Fairly Trained, which reviews and certifies the data sets used by AI companies. For now, the non-profit can only certify whether or not the content in a company’s data set has been properly licensed. Someday, it may be able to account for more granular detail (like whether the artist explicitly consented to this sort of use or merely failed to opt out) and other concerns like bias mitigation.

As a musician and composer of choral and piano music himself, he sees this as a pivotal moment for the field.

“Generative AI models generally compete with their training data,” Newton-Rex said. “There’s frankly a limited amount of time that people spend listening to music. There’s a limited royalty pool. And so the more of the music that is made with these systems, the less is going to human musicians.”

As FTC chair Lina Khan noted last month, if a person creates content or information that an AI company scrapes, and then the content or information the AI generator produces competes with the original producer “to dislodge them from the market and divert businesses … that could be an unfair method of competition” that would violate antitrust law.

Marc Ribot is one of more than 200 musicians who signed the Artist Rights Alliance’s statement against this practice earlier this year, and he is an active member of the Music Workers Alliance’s steering committee on AI. As a practicing guitarist since the 1970s, Ribot has witnessed firsthand how technology has shaped the industry, watching recording budgets steadily shrink for decades.

“I’m not in any way, shape, or form against the technology itself,” Ribot says. After losing original recordings he made in the ’90s, he used AI himself to isolate individual tracks from the final mix. But he sees the current moment as a critical opportunity to push back on this technology before the firms that own it get too big to regulate.

“The real dividing line between useful and disastrous is very simple,” Ribot said. “It’s whether the producers of the music or whatever else is being injected [as training data] have a real, functional right of consent. [AI music generators] regurgitate what they ingest, and oftentimes they produce things with large chunks of copyrighted material. That’s the output. But even if they haven’t, even if the output isn’t the violation, the ingestion itself is a violation.”

Ribot said musicians have long been apathetic about AI, but he’s observed a “sea change in attitude about digital exploitation issues” over the last few years, motivated by the SAG-AFTRA actors union and the Writers Guild of America strikes last year and ongoing lawsuits against AI companies, as well as a more sophisticated understanding of surveillance capitalism and civil liberties. Whereas musicians may have seen each other as competition just a couple of years ago — even if the pie is getting smaller, there are still a few artists who can strike it rich — AI poses a threat to the field as a whole that may not benefit even the luckiest among them.

One of the first examples of AI-generated music dates back to 1956: a string quartet piece composed by the ILLIAC I computer and programmed by University of Illinois at Urbana-Champaign professors Lejaren Hiller and Leonard Isaacson.

After the technological leaps of more recent years, artists including Holly Herndon, Arca, YACHT, Taryn Southern, and Brian Eno are now using generative AI to experiment with their creative practices. AI’s tendency to produce “hallucinations” and other nonsensical outputs, though dangerous in other contexts, can, in music, be a means of inspiration. Just as other sonic technologies have come to be defined by their dissonance — CD distortion, 8-bit compression, the cracking of a human voice too powerful for the throat that emits it, “events too momentous for the medium assigned to record them,” as Brian Eno writes in A Year with Swollen Appendices — AI-generated music may be most valuable when it’s most distinct.

Iván Paz, a musician with a PhD in computer science, develops AI systems for his own live performances. Starting with a blank screen, he writes the code in real time (displayed for the audience to read) and trains the model as he responds to the sounds it produces, which can be surprising or abrasive or outright disastrous. The result feels a bit like playing an instrument but also like improvising with another musician.

“If your algorithm is acting at a really low level, then you feel like playing an instrument, because you are actually tweaking, for example, the synthesis parameters,” Paz said. “But if the algorithm is defining the form of the musical piece, then it’s like playing with an agent that is defining what is going to happen next.”

For an exhibition at the Center for Contemporary Culture in Barcelona earlier this year, Paz worked with singer Maria Arnal to create a timbre transfer model for her voice. They asked visitors to sing short song passages; then, the model blended those voices with Arnal’s to create a new singing voice. In another project, Paz’s colleague Shelly Knotts trained a model on her own compositions as a means to avoid repetition in her work: It analyzes her music to discover patterns, but instead of suggesting her most probable next step, it suggests a more unlikely continuation.

The next step in AI musical evolution could come down to processing speeds. Live-coding is possible with some types of models, but others take too long to render the music to create it in a live show. Electronic instruments like synthesizers were initially created to emulate acoustic sounds and eventually developed their own unique character over time. Paz sees generative AI’s ultimate potential in making new sounds we can’t currently imagine, let alone produce. In this context — one where artificial intelligence assists a performer — AI is no more likely to “replace” a musician than a digital tuner or a delay pedal.

Even so, other corners of the music industry are embracing AI toward more destructive ends. While AI may not (and may never) produce music better than a human can, it can currently make acceptable music at far greater speed and scale — and “acceptable” is often the only bar a track has to clear.

Most of the time you hear music, you don’t know who made it. The jingle you hear in a commercial. The ambient score in a movie or TV show or a podcast or video game. The loops a hip-hop producer samples in a beat. This is the part of the industry most likely to be upended by generative AI. Bloomberg reports teachers are using Suno to make musical teaching aids. Gizmodo notes the target audience for Adobe’s Project Music GenAI Control, another AI music generator, is people looking to make background music quickly and cheaply, like podcasters and YouTubers, with the ability to specify the mood, tone, and length of a track.

Whether you enjoy or even notice it, these types of music have historically been made by people. But automated AI music generation may cost those musicians their jobs — and many of them use that income to support their more creatively fulfilling but less financially viable pursuits. You might never see an AI musician take the stage, but you’re still likely to see fewer human musicians as a result of this technology.

For their part, the music industry’s power players already believe AI will become a pillar of their business — their concern is who will reap the rewards. Spotify won’t restrict AI-generated music absent outright imitation that risks litigation. Universal Music Group (UMG) and YouTube launched the YouTube Music AI Incubator to develop AI tools with UMG artists. At the same time, UMG is also one of more than 150 organizations — including ASCAP, BMI, RIAA, and AFL-CIO — in the Human Artistry Campaign coalition to establish an ethical framework for AI use in creative fields. They don’t want to ban the technology, but they want a stake in the outcomes.

More than 100,000 new tracks are uploaded to streaming services every single day. Digital streaming platforms have a major incentive to diminish the proportion of human-generated royalty tracks their users play. Spotify alone paid out $9 billion in royalties last year, the majority of its $14 billion revenue. In the past, the world’s largest music streaming company has increased the availability and prominence of royalty-free tracks and may still be doing so. AI music generators are an easy way to create royalty-free music that can edge real, royalty-earning artists off popular playlists, redistributing that per-stream income from artists to the platform itself.

For established artists, there is new power — and new danger. After a stroke, country star Randy Travis has trouble speaking, let alone singing, but with the help of AI trained on his existing catalog, he can reproduce his vocals digitally.

At the same time, an anonymous producer can create a credible-sounding Drake/The Weeknd collaboration and rack up millions of streams. In May, producer Metro Boomin caught some heat during the real Drake’s beef with Kendrick Lamar. Metro Boomin released a beat featuring AI-generated samples for anyone to use, which Drake himself then sampled and rapped over, publishing the new track on streaming services. King Willonius, who used Udio to create the original track Metro Boomin remixed, has hired an attorney to maintain the rights to his contributions.

These latter examples demonstrate how music made quickly can supplant music made well. In the streaming economy, volume and velocity are everything: Artists are incentivized to produce quantity over quality.

“[A future AI-generated hit] is not going to be something that people are going to come back to and study, the way that they continue to do with the great releases of the record era,” musician Jaime Brooks said. Brooks has released records under her own name and with bands Elite Gymnastics and Default Genders, and she blogs about the music industry in her newsletter The Seat of Loss. “But it still generates engagement, and so a world where everything that’s on the top Spotify chart is something that isn’t built to last, that is just sort of meant to entertain you that day and never be thought of, it would be just fine for all these companies. They don’t need it to be art for it to make them money.”

So much of modern technology exists primarily to simulate or simplify, which can foster amateurization. File-sharing made obsessive record-collecting accessible to anyone with a hard drive and a modem, cell phone cameras allowed everyone in the crowd to document the show, and now streaming audio gives us all dynamic playlists tailored to our mood and advertising cohorts. Generative AI can make music production easier for laypeople, too. This can radically change not just how much music we come across but our relationship to the form as a whole. If creating a popular song requires only as much effort as drafting a viral tweet, much of the creative energy currently contained within social media could divert into prompt-based music generation.

Brooks sees this as a regressive phenomenon, emphasizing to-the-minute topicality over timeless depth, charts topped by audio memes and novelty singles aimed at the lowest-browed listeners, just as shallow songs like “Take Me Out to the Ball Game” — written by two people who had never attended a baseball game — once dominated the airwaves.

“That’s the direction these services are going to push music,” Brooks said.“It’s not going to be about creativity at all. Between the way that these models work and the algorithmic feeds, it’s all just a big repository of the past. It’s not going to push the sound of recordings forward. It is going to accelerate the journey of recordings from the center of American pop culture to the dustbin.”

The current state of generative AI music is more mix-and-match than truly generative. It’s not exactly a tribute band — more like an expansive take on revivalism. It can only produce sounds from what’s in the training data, and while it can combine and blend and refract those elements in new ways, it can’t really experiment outside of that.

Musicians will tell you that there are only so many notes that can be played, or that all sound is just a matter of frequency and wavelength, and so there is only so much that can be done in purely musical terms. But there’s more to music than just arranging some chords or rhythms, just as there’s more to designing recipes than just picking from a finite list of ingredients and techniques.

Ribot is a guitarist renowned for his experimentation and his ability to draw from disparate influences and blend them into something new. On its face, it sounds a lot like the value proposition generative AI advocates put forth, but he says there are fundamental differences between a human and a machine doing the same thing.

“I can’t make it through a 12-bar blues solo without quoting somebody,” Ribot said. “We should privilege the human right to do that. I have a pretty good sense of when I’m crossing the line. I know I can quote this much of a Charlie Parker song without it being a Charlie Parker song, and I know I can mess it up this much and it’ll be cool.”

Ribot’s 1990 album Rootless Cosmopolitans includes a “cover” of Jimi Hendrix’s “The Wind Cries Mary.” With reverence for Hendrix, Ribot’s version is abstract, lyrics barked over a skronky guitar bearing little to no resemblance to the original song beyond the guitar tone, omitting Hendrix’s melody, chords, and rhythm. Still, Ribot listed it as a cover on the album and pays a mechanical licensing fee on every sale or stream.

“That system should be preserved and is worth fighting for,” Ribot said, “We don’t get paid minimum wage when we sit in on a record. We don’t have anything guaranteed even when we perform. [Copyright] is literally the only economic right we have.”

Ribot’s discursive practice is part of a long tradition: Music as a medium is defined by an awareness of and respect for what came before, which can still grow and change rather than merely recycle. “What drives change in music are changes in people’s mood and needs and possibilities and things they love and things they are enraged at. Humans can learn to take the feelings and events and entirety of their life and represent it on their guitar or piano. That widens the field as experience widens, as history lengthens, as groups spring up that need expression and ideas.”

Historically, there’s been a sacred compact between musicians and audiences, one that involves authenticity and humanity. Of the millions of Taylor Swift fans who attended the Eras Tour, many of them can give you a detailed account of her personal life. The same is true of the audiences for Beyoncé, Harry Styles, Elton John, or any of the biggest touring artists. You need a real person to sell out arenas. Nobody would even watch The Masked Singer if they didn’t think they’d recognize the performers once they’re unmasked.

When we listen to music intentionally, we often listen hermeneutically, as though the song is a corridor to a larger space of understanding other people’s experiences and perspectives. Consider Nirvana. As the aesthetic deviance of grunge coalesced with modern studio techniques at just the right moment, Nevermind found a huge audience not only because of how it sounds but because Kurt Cobain’s personal arc — the meteoric rise and tragic early death of an angsty small-town kid who became a rock superstar in open defiance of (some) pop star conventions — resonated with people.

While the band acknowledged the musicians who inspired them — the Pixies, the Gap Band, et al. — Nirvana’s records are ultimately the unique result of choices Cobain, his bandmates, and their collaborators made, an expression and reflection of their experiences and ideals. Art is definitionally the product of human decision-making.

Some music generated by AI, just like other forms of process music, still retains this human element: Because artists like Iván Paz and Shelly Knotts are relying on largely automated models, they are creating the system, making countless decisions about how it works, and deciding what to do with whatever sounds it produces.

But the AI music that threatens human musicians, which takes little more than a few words and produces entire songs from them, is inherently limited because it can only look inward at its data and backward in time, not outward and thus never forward. The guitar was invented centuries ago, but an AI model trained on music from before Sister Rosetta Tharpe’s heyday in the 1940s would likely not produce anything resembling an electric one. Hip-hop is a music style based on samples and the repackaging of other artists’ work (sometimes in forms or contexts the original artist is not thrilled about), but a model trained on music from before 1973 would fail to produce anything like it.

There are countless reasons people listen to music, but there are just as many reasons people make it. Humans have been making sounds for one another for thousands of years, and for almost all of that time, it would be foolish to imagine making a living at it — impossible to even consider amplifying it, let alone recording it. People made music anyway.

There is a tension here that predates AI. On one hand, record labels and digital streaming platforms believe, mostly correctly, that the music market wants familiarity more than anything, which is why most of the money is in the sales of established artists’ back catalogs, with one report indicating those sales accounted for 70 percent of the US music market in 2021. Chart-toppers sound increasingly similar. Streaming platform algorithms often serve up the same songs repeatedly.

On the other hand, there is an intrinsic human need for surprise, innovation, transgression. That’s different for each individual. The goals of a massive corporation — scale and surveillance thereof, mostly — differ from those of its users at large and to a person, and the larger its user base becomes, the more inclined that company is to automate. Neither AI music generators nor dynamically generated playlists nor any other algorithmically predictive system is inherently good or bad: The outcomes depend entirely on who controls them and to what end.

No matter what happens, though, no company will ever have a monopoly on music. No species even does. Birds do it. Bees do it. Whales in the sea do it. Some of it is, to a human ear, quite beautiful. But even with all that natural melody, all the music people have already made, and all the music AI will either help with or make itself, the human urge to create and to express persists. Music exists in our world for reasons besides commerce.

More often, the reason is fairly simple: A person or group of people decided it should exist and then made it so. That will continue, no matter how much sonic sludge the machines churn out.



Source link

About The Author

Scroll to Top