Now 300% faster!

AI Music

How to Make Music with AI: A Comprehensive Guide

AI Music Generator FAQ

Table of Contents

Song Structure Tags

There are other parts to a song than just the verse/chorus pattern. We can influence the song structure with metatags, although the AI tends to have a mind of its own and follow its own pattern.


This one is notoriously unreliable. It's probably better to describe it like an instrumental break.

  • [Short Instrumental Intro]


A hook is a repetitive phrase or instrumental. Try repeating a short line 2 – 4 times with or without the label.

  • [Catchy Hook]


A break is a few bars of the song where the lead instruments or singer go silent, and the accompanying instruments play. A [Break] can sometimes be used strategically to interrupt the current pattern.

  • [Break]
  • [Percussion Break]


Interlude is a useful tag to create an instrumental section within the lyrics.

  • [melodic interlude]


An Outro can help to prime the song to end, and may create a loop to fade out in post edit. Refrain seems to get more 'creative' when wrapping up the end of the song, while Big Finish may change the melody or tempo to create a climax.

  • [Outro]
  • [Refrain]
  • [Big Finish]


An end tag in the lyrics may work best alone as its own clip. Clear the Style Prompt, or add 'End' to the style description.

  • [End]
  • [Fade Out]
  • [Fade to End]

As always, prompting an AI is not like paying someone to edit your music on Fiverr. The reliability of these tags can be influenced by the lyrics, the song cycle, and the AI just being random.

Are duets possible?

Yes and no. Duets sometimes happen spontaneously, and might continue as long as both voices are heard near the end of the clip before. HOWEVER, the voices will be random, and often the wrong gender. Duets are more of an exploit, definitely not a feature.

Add Duet to the Style Prompt

We can try for a duet in the style prompt:

  • duet
  • male and female duet
  • romantic duet

Some genres may be easier than others. Try 'Broadway' or 'Musical Theater' – genres where duets are common.

Tagging each section

Metatags within the lyrics might assign multiple voices, but the gender and consistency are not reliable:

[Mary] This is my line…

[John] But now I am singing…

[Mary and John] There is a chance both of us are singing together…

Similarly, rap battles and spoken announcers can be formatted like scripted dialog.

Rapped Verse

Genre's can be used as style words, and the ai might create a new voice just to sing that part – especially if the genre has a favorite gender:

[rapped verse] – typically male [powerpop chorus] – typically female

Not very reliable

Duets are not reliable. The voices will swap roles, you might get 2 voices that are too similar, or the song with the 'good' voices is not the best version. Creating a duet is fun, but it can also soak your credits.

Pre-chorus and Bridge

Pre-chorus and bridge are for stray lyrics outside the main pattern. They build anticipation as the song transitions, and often don't rhyme or appear to fit the meter.

  • [Pre-Chorus]
  • [Bridge]

AI-generated lyrics sometimes include a pre-chorus without labeling it, causing Chirp to sing it as an awkward extra line that doesn't fit the meter.

Adding a metatag label tells Chirp this break in the song pattern is intentional, and it should be sung as its own pattern.

Example: Lyrics generated by AI

[Verse] Cruisin' down the streets with nowhere to go Miles of cars, it's a never-ending show Round and round, it's like a crazy maze Every parking space, a mirage, a haze Anxiety's building, it's drivin' me insane

[Chorus] Driving in circles, looking for a spot I'm runnin' out of gas, I'm losing all my shots

Example: Edited to add a [Pre-chorus]

[Verse] Cruisin' down the streets with nowhere to go Miles of cars, it's a never-ending show Round and round, it's like a crazy maze Every parking space, a mirage, a haze

[Pre-chorus] Anxiety's building, it's drivin' me insane

[Chorus] Driving in circles, looking for a spot I'm runnin' out of gas, I'm losing all my shots

Pre-chorus is a lead-in to a chorus. A bridge can go anywhere. It may be enough to simply set apart the lyrics with any descriptive tag:

  • [Shout]
  • [Whimsical]
  • [Melancholy]


  • [Syncopated Bass]

Overloading the metatag can cause the instruction to be ignored, or it might be sung as part of the lyrics.

  • [call and response between percussion and bass]

Verse and Chorus

A pattern that works well for songs is to have a verse and chorus in each clip. The tags are not necessary, the AI will build a song pattern from the lyrics with or without them.

Verses are usually rhythmic and restrained, while the chorus has more melody and energy. A chorus is usually the 'hook' of the song, when it repeats it makes the song feel intentional and emotional.

AI-generated lyrics often use 1 verse and 1 chorus per clip.

  • [Verse]
  • [Chorus]

Add descriptive Style Words to metatags to guide how the lyrics should be sung.

  • [Sad Verse]
  • [Happy Chorus]

Use musical terms to influence the genre.

  • [Rapped Verse]
  • [Powerpop Chorus]

Lyrics are Stronger than Metatags

Metatags can 'nudge' the AI within the lyrics, but the lyric-structure, the current song pattern, and the Style Prompt, are stronger influences than the tags.

Even when they work, they don't always work. When they do work, it can still feel like a casino.

  • A fast rap song needs more words per line than a slow ballad.
  • Verse and Chorus need different syllable-per-line counts and phrasing, or they will sound the same and blur together.
  • It's possible to have a song that's just verses, when all the lyrics follow the same pattern and rhyme scheme.

Instrumental Tags

Songs can have instrumental sections which can be prompted the same as [Verse] and [Chorus], but without lyrics the landmarks aren't as clear.

An instrumental 'break' can replace a verse standing as its own section, or might be a short bridge in the music. These seem to work best when only one is used at a time, but adding commas inside the prompt may work. Experiment!

Prompt examples include:

  • [Break]
  • [Instrumental Interlude]
  • [Melodic Bass]
  • [Percussion Break]
  • [Syncopated Bass]
  • [Fingerstyle Guitar Solo]
  • [Build]
  • [Bass Drop]

Stay in the genre

The genre is important! You may need to describe the instrument within the Style Prompt if you want to manipulate it with metatags.

A [Bass Drop] is a common feature of EDM-genre, but it makes no sense in an acoustic guitar solo.

A [Bluegrass Banjo Interlude] will be easier to conjure within a Country-genre song, but might not work at all within an Orchestral Symphony....

Then again, it might work if a 'banjo' is added to the style prompt.

Experiment with 'instrumental lyrics'

The ai will sometimes respond to un-singable text as a musical instrument. A few lines of punctuation-only might help to force a short instrumental solo.

Less reliable, and sometimes hilarious, try onomatopoeic words that mimic the sounds of the musical instruments. Often they are sung as lyrics, but sometimes trigger the intended instrument.

[Percussion Break] . .! .. .! !! ... ! ! !

[sad trombone] waah-Waaah-WAAH

[chugging guitar] chuka-chuka-chuka-chuka

Voice Tags

A singing voice is generated randomly for each song, but we can influence the voice in both the style prompt and in lyrics prompts.


The Style and Lyrics prompt will influence the type of voice chosen for your song:

  • HipHop may default to an urban male
  • Country will often sing with a western accent
  • Jazz may feature a soulful female vocal
  • Pop vocals are often female


Chirp can sing natively in many languages, even switching languages inside the lyrics. No special prompt is needed, the language is auto-detected.


Voice and gender can be described within the Style/Genre Prompt, although it is not always reliable:

  • sultry male singer
  • female narrator
  • bass male vocal

Singing Style

Some known style-words to try, both in the style description and as lyric prompts:

  • Gregorian chant
  • Melismatic
  • Narration
  • Spoken Word
  • Sprechgesang
  • Emotional
  • Sultry
  • Resonant
  • Ethereal
  • Lounge Singer
  • Vocaloid


Style vocals can be used as song-section metatags

  • [Female Narrator]
  • [Diva Solo]
  • [Gospel Choir]
  • [Primal Scream]
  • [Rap Verse]