When I was watching Wall-E  I was thinking: “How did Ben Burtt make all this sounds?”. They were robotic but at the same time very lively and organic. So I did a little bit of research and discovered the world of the Kyma Sound Design Language. Since Wall-E brought me to Kyma I thought I will dedicate the first tutorial to him. Anyway, let’s get started!

First we have to record someone saying “Wall-E”. So open the Tape Recorder Tool and give it a try. Don’t try to mimic how Wall-E says his name in the movie – we will deal about this later. Choose your favorite version and create a spectrum file. If you don’t know how to do this there is a step-by-step guide in Kyma X Revealed! (Pages 169-175).

Here’s my version:

Let’s start with the “SpectruminRAM” prototype. Search for it and double click it to create a new Sound.

kymaguy.com - walle 1

Click on the disk icon next to the “Analysis” parameter field and insert your spectrum file. Don’t play the Sound yet because the “SpectruminRAM” prototype outputs no audio (you can play it though but make sure to turn down your speakers). Its outputs are a series of spectra indexed by Timeindex. The left output contains amplitudes and the right output contains frequencies.

What we need is an “OscillatorBank” to resynthesize the spectra. Drag it behind the “SpectruminRAM” and play the Sound. You should hear a stretched version of your recording and the VCS is showing you one fader to control the duration. Actually you aren’t listening to your recording but some oscillators (in my case 256 – depends on the spectrum file) whose amplitude and frequency envelopes are controlled by the spectrum of your recording.

What about the module called “linear time”? We don’t need it anymore Search for the “constant” prototype and replace the “linear time” module with it. Now your Sound should look like this:

kymaguy.com - walle 2

The “constant” prototype is now the input of the “SpectruminRAM”s TimeIndex. The TimeIndex specifies where we are in the series of spectra starting at -1 and ending at 1. Let’s change the value of “constant” to an EventValue by typing: “!TimeIndex” into the value field. Play the Sound and have some fun with the fader.

So now we have control over the timing – nice! What we need now is to have control over the frequency as well. To achieve this we have to split the signal after “SpectruminRAM” into left and right signal (remember: left = amplitudes, right = frequencies). Insert a “ChannelJoin” prototype after “SpectruminRAM” and delete the “delay left chan only” module afterwards. It should look like this now:

kymaguy.com - walle 3

Now we can alter the frequency spectrum by using a prototype called “ScaleAndOffset”. Insert it after the “rightOnly” module. The EventValues for Scale and Offset will be generated automatically.  You should end up with something like this (I just added a level control and an annotation module):

kymaguy.com - walle fin

Play your Sound and have fun with the faders

If you control your Sound with the iPad (like me) I suggest using the following EventValues:

“!PenX” instead of “!TimeIndex”

“!PenY” instead of “!Scale”

“!PenZ” instead of “!Offset”

You can also make PenX and PenY aggregate in the VCS to control them with your mouse. My VCS looks like this:

kymaguy.com - walle vcs

Here’s what it sounds like:


Download .kym file

Visit the Kyma Forum Thread


Wall-E saying Wall-E

Making Of Wall-E (by Ben Burtt himself)

9 replies
    • kymaguy
      kymaguy says:

      Hi Pete,
      Yes we met at the KISS 13. Of yourse you are allowed to know my real name (wanted to add it to the About section anyway). I’m Gustav Scholda and I was the guy who had a lot of questions about the Rotory Phaser, also I was there when you were explaining the Crossfilter on the first day (lucky me ). Remember?

  1. Alex diplock
    Alex diplock says:

    Well done for this fantastic site.. I learned a lot! I’m still just finding my way with Kyma so thanks and please continue

  2. soundmodel
    soundmodel says:

    So it’s basically just FFT-based pitch shifting on voice (just by reading the graph)?
    Which is “low qualified” by not giving enough partials (rather, only e.g. 256), which eats out the human character from it?

    I thought it was some more complicated thing like formant modification.

    • kymaguy
      kymaguy says:

      It’s not only about Pitch, but also about Time. The x-axis of the tablet (!PenX) controls the TimeIndex. Also 256 partials are fair enough IMO, it doesn’t sound de-humanized just because of that, in fact it sounds pretty much like the orginal sample when using 256 partials without any modifications.


Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply