What are the differences in the models?

Currently, Suno has 3 models to choose from. Each has some different characteristics:

v3.5 is an ‘aesthetic’ model that is good at composition and singing. It is responsive to metatags, and favors mainstream genres.

v3 is a ‘versatile’ model that is better at sub-genres and genre-mashups. It accepts longer detailed Style Prompts, but is less responsive to metatags.

v2 is an older model that still has value in sound experiments, DJ samples, and ironic styles. Compositions are musically interesting but can be repetitive, and the sound quality is comparatively harsh.

v1/Chirp was a Discord bot, now discontinued.

All models are capable of making complete songs with vocals.

Time

The main difference in the models is how long each can generate:

v3.5 – up to 4:00
v3 – up to 2:00
v2 – up to 1:20

Each model is capable of extending a clip:

v3.5 – up to 2:00
v3 – up to 1:00
v2 – up to 0:40

Extensions are half the length – they use part of their memory to listen to the earlier clip. Newer models remember more of the song, making them better at repeating the chorus and continuing the same composition.

Audio Quality

Newer models don’t necessarily replace the older ones. However, each is limited by the state of the technology at the time they were created.

In just a few months, Suno’s models are capable of generating longer audio, with more accurate and subtle voices, at better bitrates.

We can switch between all 3 models, each extending on the others, and capable of being stitched together as full songs, but newer models are better at sounding like the older models, than the reverse.

Style

Style words are broadly the same across models, but a favorite style word for v3 might not work quite the same in v3.5.

3.5 seems to need more emphasis in the Style Prompt to override it’s aesthetic training. Try chaining the style prompt with hyphens, or spamming the same words a few times.

As we explore more of the latest model, we’ll be updating the information here.

Problems

Each model has weaknesses.

v3.5 is difficult to end. Some users find it less creative, and difficult to use Style Prompts.

v3 tends to layer voices, double the lead singer, and sing entire genres with a choir. Metatags aren’t as useful as other models.

v2 voices are relatively flat and unsubtle. The dynamic range feels crushed.