Thursday 30 November 2023

Stable Diffusion ControlNet

 I've been ploughing on with looking at ControlNet... I fed Google Colab some cash and installed ControlNet - once I'd gone through the initial pain of getting my head around the Colab notebook, adding ControlNet was pretty straightforward, although there's a minor gotcha that it won't warn you that you don't have any ControlNet models installed (you have to install them during the initial start-up of the notebook), it will just generate the image without adding the controls :(

Since I'm interested in pose, I started with that. What's cool is that the input is just an image - the pose is extracted for you. You can obviously find images with the pose you want, or, if you are trying to be an artist, you can use posable models on the web to generate the pose you actually want... Or even manipulate a real life person or doll and photo that.

I set-up this reference pose:

And applied it to a picture of a "worker" in a vector style:

What's interesting is that it appears to act like any other constraint on the model - i.e. image2image or the prompt - meaning that it's not 100% mandated that the model follows it, but also meaning that the model attempts to 'rationalise' (non tech term) the pose - the more unlikely the pose and the more at odds with the prompt, the more weird the results... 

I then tried with two people posed:

And got:

Again, interesting to note that the model attempts to 'make sense' of the pose by putting them on a beam. Also interesting to note that the tension between the wacky poses and the prompt is resolved via a compromise, not applying the pose as a hard constraint. In practice, I suspect this means I'll have to break the image up when there is a tension between what I want to see and how I want to constrain it - i.e. I'll have to feed it a set of requests that make 'sense' in the context of the training material and then stitch it back together again...


No comments:

Post a Comment