top of page


Generative models for the translation of image domains


Generative models for the translation of image domains

Last updated on

May 10, 2022

In 2020 was celebrated the 250th anniversary of Beethoven's birth and we collaborated with composer Alexander Schubert and Ensemble Resonanz in the creation of a piece for the festival PODIUM Esslingen #BeBeethoven. Alexander was one of the invited artists, so we were quite fortunate to take part in this fantastic project by creating a generative tool for the visualizations. During the creative process it is not possible to define in advance the desired output, so a fast iterative process is crucial to complete the task on time. As part of the project technical documentation, Alexander has made a very detailed video that, among other things, talks about the neural architectures that we designed for controlling the generation of images, the tools that we built for exploring their latent spaces and the musical piece that such images inspired.

"During the creative process it is not possible to define in advance the desired output, so a fast iterative process is crucial to complete the task on time."

More concretely, our part was to design a neural architecture to generate reconstructions from input images, as well as the generation of images from input noise. The reconstruction of input images allows the manipulation of incoming video streams, while noise can be used to generate completely new images. For the reconstruction of input images we used a VAE and for the generation of images from noise we used a Wasserstein GAN. As VAEs tend to remove the high-frequency components in images, we connected the Encoder from the VAE to the Generator of the GAN and taught them to map their latent spaces together, allowing us to generate sharper images.


Such architecture has simpler components than more sophisticated architectures, like Pix2Pix or CycleGANs, but with the advantage that the networks need to be trained only once and later on we can determine, in a much shorter amount of time, the prototypes that represent the features that we wish to manipulate, like opening the mouth, adding beard or morphing into another person.

Outvise made an excellent post detailing the creative process behind the project. You can find it here as part of their blog series.


You can watch the complete 30 min concert video in the following page.

GUI for exploring the latent space.

Furthermore, we also implemented a graphical user interface (GUI) to manipulate the generated images semantically, meaning, in ways that make sense to people rather than more low-level visual features like pixels. This process was particularly enlightening for us as, differently to industry projects, the objective was not clearly pre-defined and part of our job was to design an architecture that helped the composer to explore the nature of the generated images while reducing by months the training times. We defined the requirements of each dataset based on the output of the previous model and repeated this cycle a few times. Once we found the output that was satisfactory, Alexander composed the music and the videography.

The concert was first presented in October 2020 at #bebeethoven and in September 2021 was awarded the Golden Nica at the Prix Ars Electronica. You can watch the concert video here.







Alexander Schubert

String ensemble:

Ensemble Resonanz

Audio Deep Learning:
Antoine Caillon, Philippe Esling, Benjamin Levy (IRCAM, Paris)

Video Deep Learning:
Jorge Davila-Chacon (Heldenkombinat Technologies)

Convergence was developed as part of #bebeethoven, a project of PODIUM Esslingen.
Funded by Kulturstiftung des Bundes.
Digital version commissioned by Eclat Festival. 

Filed Under

#GenerativeModels #VAE #GAN #ComputerVision #Arts

  • Blanco Icono de YouTube
  • White Facebook Icon
  • White Twitter Icon
  • White LinkedIn Icon
bottom of page