Stability.AI

Stability AI launch Stable Audio Open – A Sound Design & Audio Generation Model

The company behind Stable Diffusion text-to-image generation AI technology launched Stable Audio Open, an open source text-to-audio sound design model.

A new open-source model called Stable Audio Open just rolled out, and it’s designed to generate short audio samples, sound effects, and production elements just by using text prompts. This is a big step forward in making generative audio capabilities available to sound designers, musicians, and creative communities. While text-to-image models have made remarkable progress, the open-source AI community has been slightly behind in terms of audio generation capabilities. Despite the existence of some audio models, their effectiveness and documentation have been lacking. However, with the release of Stable Audio Open, the open-source AI community is finally catching up in the domain of audio generation.

What Can It Do?

Stable Audio Open allows anyone to generate up to 47 seconds of high-quality audio data from a simple text prompt. It’s particularly good at creating drum beats, instrument riffs, ambient sounds, foley recordings, and other audio samples that can be used for music production and sound design.

One of the coolest things about this open-source release is that users can fine-tune the model using their own custom audio data. For example, a drummer could fine-tune the model with their own drum recordings to generate new beats. The quality of Stable Audio Open is quite commendable, especially when compared to other models in the market that often fail to achieve decent results.

The commercial product, Stable Audio on the other hand can produce high-quality, full tracks with coherent musical structure up to three minutes in length, as well as advanced capabilities like audio-to-audio generation and coherent multi-part musical compositions. Stable Audio 2.0 came out in April 2024. Have a preview of Stable Audio 2.0 down below:

Stable Audio Open is specialized for audio samples, sound effects, and production elements. While it can generate short musical clips, it’s not optimized for full songs, melodies, or vocals. This open model provides a glimpse into generative AI for sound design while prioritizing responsible development alongside creative communities.

How to access?

The Stable Audio Open model weights are now available on Hugging Face.

So, if you’re a sound designer, musician, or just someone who loves playing with audio, give Stable Audio Open a try and see what kind of cool sounds you can create!