Setting up with SoundID VoiceAI plugin

Getting started with SoundID VoiceAI - from downloading and setting up a trial to loading the plugin in DAW and exploring the features, this step-by-step guide covers the entire process.

 

In this article:

 

SoundID VoiceAI

SoundID VoiceAI is a voice and instrument AI transformation plugin for DAW. It allows changing the recorded singing voice to that of another human being or an instrument using AI technology:

  • Voice model library: transform your vocal track into a realistic singing voice from a studio-grade AI library of 23 voice models
  • Instrument model library: transform your melodic humming or beatbox to sound like drums, guitar violin, or other instruments from a studio-grade AI library of 21 instrument models

 

Transform singing voice tracks, generate backing vocals from a single voice track, transform speaking voice tracks, mimic instruments with your voice, and transform vocal inputs into realistic instruments for quick transfers of melodic ideas into DAW or creative sound generation, turn beatboxing into drums, and more.

 

 

Learn more about the use cases and advantages here: What is SoundID VoiceAI?

 

 

Download and install

SoundID VoiceAI voice and instrument transformation plugin can be used in DAW (e.g. Cubase, Logic Pro X, Pro Tools, etc.), and the AI audio processing is cloud-based. Here are the basic system requirements for using SoundID VoiceAI:

  • macOS 11 Big Sur, 12 Monterey, 13 Ventura, 14 Sonoma
  • Windows 10, 11
  • DAW or other plugin host app that supports AU, AAX, or VST3 plugin formats
  • SoundID VoiceAI processing tokens available in your Sonarworks Account
  • Stable internet connection, as cloud processing is used (offline use not supported)

 

SoundID VoiceAI installer (download here) will install the plugins in the default plugin install directories on macOS and Windows:

 

Macintosh HD/Library/Audio/Plug-Ins/Components/SoundIDVoiceAI.component
Macintosh HD/Library/Application Support/Avid/Audio/Plug-Ins/SoundIDVoiceAI.aaxplugin
Macintosh HD/Library/Audio/Plug-Ins/VST3/SoundIDVoiceAI.vst3

 

C:\Program Files\Common Files\VST3\Sonarworks\SoundIDVoiceAI\SoundIDVoiceAI.vst3
C:\Program Files\Common Files\Avid\Audio\Plug-Ins\SoundIDVoiceAI.aaxplugin\Contents\x64\SoundIDVoiceAI.aaxplugin

 

 

Load and trial/activate the plugin in your DAW

To start working with SoundID VoiceAI, load the plugin on any voice or instrument track in your DAW project:

  1. Download and install the SoundID VoiceAI plugin
  2. Launch your DAW and load the SoundID VoiceAI plugin on an audio track
  3. Log in to your Sonarworks Account, or Sign up to create a new account
    • Click on Start trial to start a 7-day trial with 9000 free processing tokens
    • Click on Activate on this device if you already have unused tokens available in your account
  4. Return to DAW - the plugin will be activated

 

VoiceAI - Logic Pro.png

 

 

Capture audio

Before the target voice or instrument model AI processing can be applied, the input audio of the DAW project track must be captured:

  1. Click on Capture to Arm the plugin
  2. Select your DAW playback position and start playback
  3. Click on Stop to complete the capture
  4. Click on Remove to delete the last capture and start over

 

Once the capture is Stopped, the exact audio capture duration and region timestamps will be displayed.

 

Audio capture - VoiceAI.png

 

Captured audio - SoundID VoiceAI.png

 

 

Important to know when capturing audio

  • The audio capture mechanics depend on smooth continuous playback. Don't change the playback position while an audio capture is in progress.
  • The positioning of the AI replacement audio will depend on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
  • The plugin supports a single audio capture per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
    • Use two plugin instances on the same track
    • Capture a single (longer) clip with both fragments
  • If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
  • There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the cache folder.

 

Learn more about the plugin mechanics here: How to use SoundID VoiceAI in your DAW

 

 

Select your preset and apply AI processing

  1. Click on Voices or Creative to select the target voice or instrument preset
  2. Click on '▶' ("play") to preview how the preset sounds at its best vocal range
  3. If your source pitch is similar to the preset preview, proceed to Start processing
  4. If the results sound too high or low, use Transpose to adjust the output pitch by seminotes, and process again
  5. Use the AI voice button to Enable/Disable the transformation on the track

 

Before committing to process the entire track, it's a good idea to highlight and process a smaller section of the track first and ensure the results sound good. Processing takes approximately 2.5x the time of the captured audio duration.

 

1 minute of audio processing costs 600 tokens. The token amount needed for processing will always be displayed on the Start processing button. You can check your balance in the plugin, or in your Sonarworks Account. Learn more about tokens below.

 

Note: Learn more about optimal preset selection and Transpose use below.

 

Model selection - SoundID VoiceAI.png

 

Audio processing complete - SoundID Voice.png

 

 

Important to know when processing

  • Processing requires a minimum of 70 tokens (7 seconds) followed by 10 token increments.
  • Tokens will still be deducted if processing is Canceled while in progress.
  • Repeated AI processing of the same audio source will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • The positioning of the AI replacement audio relies on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
  • The plugin supports a single audio capture and processing per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
    • Use two plugin instances on the same track
    • Capture a single (longer) clip with both fragments
  • If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
  • There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the Cache folder.

 

Optimal preset selection and Transpose use for voice transformation

Transpose

The primary use case for SoundID VoiceAI is transforming a singing voice into a realistic singing voice of another human being. Ideally, the original input should match the best input pitch - see the preset descriptions for what recorded audio pitch will generate the best results. If the natural vocal range difference is significant between the input audio and the applied preset, pitch adjustments can be made with the Transpose feature. 

 

Transpose allows pitch adjustments by semitones (half steps) for the generated audio. 12 steps of the Transpose parameter value corresponds to an octave. Transpose can be adjusted to +/- 4 octaves (48 steps up or down). If the Transpose value is unaltered, the pitch will remain the same.

 

Transpose.png

 

Achieving optimal results becomes more straightforward and efficient when certain parameters are considered, particularly when a project is fixed in a specific key. Before processing a vocal track, we recommend taking the following steps:

  • Preview the preset by clicking on "" (play button).
  • Evaluate the best input pitch to find a suitable preset without Transposing the output pitch.
  • Use Transpose according to the preset model's vocal range:
    • If the target preset sings in a higher pitch than your input voice track, increase the value of the Transpose parameter.
    • If the target preset sings in a lower pitch than your input voice track, decrease the value of the Transpose parameter.
  • Process a small section and evaluate the results before committing to process the entire track.

 

Note: Transpose must be adjusted before processing - further Transpose adjustments will require re-processing.

Note: Transpose values below or above 12 might produce unexpected results. Using Transpose with Drums will have a small impact on the overall sound and is not advised

 

Creative (instrument) transformation

With the Creative presets you can transform humming and beatboxing into tracks that sound like instruments, discover new ways of generating sounds and melodies, and create demo songs quickly. Here are some ideas to consider:

  • Mimic instruments with your voice and transform vocal inputs into realistic instruments for quick transfer of melodic ideas into DAW or creative sound generation.
  • Turn beatboxing into drums. Record a few bars of beatboxing to create a drum track.
  • Transform existing instrument tracks. Convert your guitar solo into a saxophone solo, use your guitar to create a realistic bass guitar track, or use a trumpet track to harmonize, and create an entire brass section of various instruments, and much more.
  • Use virtual instruments for creative AI processing.

 

Input/output audio quality and properties

SoundID VoiceAI plugin can cater to a relatively wide range of recording quality for the input track. Regular phone microphone recordings in a random space with reverb are perfectly okay to use - after processing, the output results will have the properties of studio-quality audio captured with a great microphone.

 

This applies only to a certain degree, there are some limits to take into consideration:

  • Repeated AI processing on the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • Excessive reverb on the input audio can lead to melodic artifacts in the output.
  • When applied to non-English singing, some amount of English accent might bleed over into the processed voice depending on the preset applied.
  • The AI models can sometimes introduce artifacts such as clipping "s'es" into the processing results. This is typically resolved by re-processing or adjusting the Transpose setting to a value closer to the input track pitch. 
  • The AI models work great for normal spoken voice tracks too, however, when applied to extreme emotional states of speech such as whispering or shouting, artifacts are possible.
  • Repeated AI processing of the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • The intonation of the input voice audio is a key aspect of the AI models. Raspiness in the voice (rough, raspy, strained, or breathy properties), can lead to artifacts in the processing results.

 

There is additional documentation available on the audio properties and the AI models used in SoundID VoiceAI (AI model training data, data protection of the processed audio, etc.), learn more here: Input/output audio quality and properties

 

 

Tokens and minutes

SoundID VoiceAI is a pay-as-you-go model, enabling you to pay for the token packs needed for audio processing only. There are no subscription fees or other hidden charges involved, and the SoundID VoiceAI plugin itself is free to download and install. Here is what you need to know:

  • Processing cost: 600 tokens per 1 minute of audio processing.
  • A minimum charge of 70 tokens (7 seconds) applies for each processing instance, followed by increments of 10 tokens (1 second).
  • Transpose adjustments to an already processed audio capture will require re-processing

 

 

Here's a realistic example of tokens spent in a specific scenario - vocal replacement for a full song:

  1. Capturing a 12 seconds sample audio of a voice track
  2. Processing the sample with 5 voice presets and trying 3 different Transpose settings on each preset to find the best fit: 12x5x3 = 180 seconds / 3 minutes = 1800 tokens
  3. Processing the entire vocal track of 2.5 minutes = 1500 tokens
  4. Total processing time and token cost: 5.5 minutes = 3300 tokens

 

 

SoundID VoiceAI token packs can be purchased from your Sonarworks Account:

  • Small token pack: 72 000 tokens (120 minutes of audio processing) - 19.99 EUR/USD
  • Medium token pack: 180 000 tokens (300 minutes of audio processing) - 39.99 EUR/USD
  • Large token pack: 360 000 tokens (600 minutes of audio processing) - 69.99 EUR/USD

 

A 7-day trial with 9000 free tokens is available in your Sonarworks Account. If you haven't created a Sonarworks Account in the past, sign up here

 

Note: The trial tokens will expire once the 7-day trial runs out, or once a token purchase is made. 

 

Buy tokens in Sonarworks Account.png

 

Learn more about the token system here: Tokens and minutes

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

0 comments

Please sign in to leave a comment.