Avatars
Generative AI models can be customized for specific tasks or datasets without retraining the entire model from scratch. Two notable techniques for this are adapters and LoRA (Low-Rank Adaptation).
Last updated
Generative AI models can be customized for specific tasks or datasets without retraining the entire model from scratch. Two notable techniques for this are adapters and LoRA (Low-Rank Adaptation).
Last updated
In this section, we provide the capabilities of saving adapters that can be quickly and flexibilty integrated into your image inference pipelines. These customizable additions can simply be added to your prompt using a special token format and will be applied to your image and video output.
The face and style customizations are based upon the IP-Adapter, a decoupled cross-attention mechanism for text and image features. For technical details on how IP-Adapters work, see .
We make it easy to utilize multi-adapters using special prompt tokens to generate customizable content. As an example, if we redo the tutorial from the diffusers library, , we can use IP-adapters to achieve consistency in style and person.
In the Block Entropy dashboard, you can create Avatars as seen in Figure 1.
Once you have a collection of Avatars, you can then use these avatars within your prompts by using the special token, e.g. to use the first avatar, you would prompt with, "<be-style:cartoon1:0.7>". This has the equivalent effect of utilizing an ip adapter with scale of 0.7,
You can also combine IP-adapters using multiple special tokens, "<be-style:cartoon1:0.7> <be-face:face1:0.3>" in different weights, and different combinations.
An image prompt, "<be-face:lucy:0.9> <be-style:cartoon1:0.5> <be-pose:pose1:0.7> A woman playing tennis, blue jacket" has the resulting image,
At the moment, we only support LoRAs for the Stable Diffusion XL pipeline of maximum size 175MB. You can upload LoRAs in the Avatars panel, Figure 4.
Use of the LoRAs can be added in the same way as the Adapters and ControlNet in the generation prompt. Here, we use the prompt, "<be-lora:toy:0.9> <be-pose1:0.7> toy_face A woman playing tennis, blue jacket". Notice, for the LoRA to work here, you still need to use the keyword phrase that the LoRA was trained on, "toy_face". Also note that the IP-Adapters were removed as they interfere with the generation of the LoRA style.
ControlNet is a conditional diffusion method that can utilize the given spatial contexts for controlled generation. The techincal details of controlnet can be found here, . For our avatars, you have the ability to upload an image of a particular pose that you may want to mimic in your image generation. Given this particular set of Avatars, see Figure 2, we can generate a customized image.
LoRAs or Low Rank Adapters directly alter the transformer mechanism through a weighted cross attention mechnaism. The weights are a fraction of the size of a typical attention matrix due to its low rank and thus can be added to the original at runtime with minimal overhead. More techincal details about LoRAs can be found here.