AI想象家

Stable Diffusion结构解析-以图像生成图像（图生图，img2img）

手把手教你入门绘图超强的AI绘画，用户只需要输入一段图片的文字描述，即可生成精美的绘画。给大家带来了全新保姆级教程资料包（文末可获取）

AIGC专栏3——Stable Diffusion结构解析-以图像生成图像（图生图，img2img）为例

学习前言
源码下载地址
网络构建
- 一、什么是Stable Diffusion（SD）
- 二、Stable Diffusion的组成
- 三、img2img生成流程
- - 1、输入图片编码
  - 2、文本编码
  - 3、采样流程
  - - a、生成初始噪声
    - b、对噪声进行N次采样
    - c、单次采样解析
    - - I、预测噪声
      - II、施加噪声
    - d、预测噪声过程中的网络结构解析
    - - I、apply_model方法解析
      - II、UNetModel模型解析
  - 4、隐空间解码生成图片
图像到图像预测过程代码

学习前言

用了很久的Stable Diffusion，但从来没有好好解析过它内部的结构，写个博客记录一下，嘿嘿。

源码下载地址

https://github.com/bubbliiiing/stable-diffusion

喜欢的可以点个star噢。

网络构建

一、什么是Stable Diffusion（SD）

Stable Diffusion是比较新的一个扩散模型，翻译过来是稳定扩散，虽然名字叫稳定扩散，但实际上换个seed生成的结果就完全不一样，非常不稳定哈。

Stable Diffusion最开始的应用应该是文本生成图像，即文生图，随着技术的发展Stable Diffusion不仅支持image2image图生图的生成，还支持ControlNet等各种控制方法来定制生成的图像。

Stable Diffusion基于扩散模型，所以不免包含不断去噪的过程，如果是图生图的话，还有不断加噪的过程，此时离不开DDPM那张老图，如下：

Stable Diffusion相比于DDPM，使用了DDIM采样器，使用了隐空间的扩散，另外使用了非常大的LAION-5B数据集进行预训练。

直接Finetune Stable Diffusion大多数同学应该是无法cover住成本的，不过Stable Diffusion有很多轻量Finetune的方案，比如Lora、Textual Inversion等，但这是后话。

本文主要是解析一下整个SD模型的结构组成，一次扩散，多次扩散的流程。

大模型、AIGC是当前行业的趋势，不会的话容易被淘汰，hh。

txt2img的原理如博文
Diffusion扩散模型学习2——Stable Diffusion结构解析-以文本生成图像（txt2img）为例
所示。

二、Stable Diffusion的组成

Stable Diffusion由四大部分组成。
1、Sampler采样器。
2、Variational Autoencoder (VAE) 变分自编码器。
3、UNet 主网络，噪声预测器。
4、CLIPEmbedder文本编码器。

每一部分都很重要，我们以图像生成图像为例进行解析。既然是图像生成图像，那么我们的输入有两个，一个是文本，另外一个是图片。

三、img2img生成流程

生成流程分为四个部分：
1、对图片进行VAE编码，根据denoise数值进行加噪声。
2、Prompt文本编码。
3、根据denoise数值进行若干次采样。
4、使用VAE进行解码。

相比于文生图，图生图的输入发生了变化，不再以Gaussian noise作为初始化，而是以加噪后的图像特征为初始化，这样便以图像的方式为模型注入了信息。

详细来讲，如上图所示：

第一步为对输入的图像利用VAE编码，获得输入图像的Latent特征；然后使用该Latent特征基于DDIM Sampler进行加噪，此时获得输入图片加噪后的特征。假设我们设置denoise数值为0.8，总步数为20步，那么第一步中，我们会对输入图片进行0.8x20次的加噪声，剩下4步不加，可理解为打乱了80%的特征，保留20%的特征。
第二步是对输入的文本进行编码，获得文本特征；
第三步是根据denoise数值对第一步中获得的 加噪后的特征 进行若干次采样。还是以第一步中denoise数值为0.8为例，我们只加了0.8x20次噪声，那么我们也只需要进行0.8x20次采样就可以恢复出图片了。
第四步是将采样后的图片利用VAE的Decoder进行恢复。

with torch.no_grad():
    if seed == -1:
        seed = random.randint(0, 65535)
    seed_everything(seed)

    # ----------------------- #
    #   对输入图片进行编码并加噪
    # ----------------------- #
    if image_path is not None:
        img = HWC3(np.array(img, np.uint8))
        img = torch.from_numpy(img.copy()).float().cuda() / 127.0 - 1.0
        img = torch.stack([img for _ in range(num_samples)], dim=0)
        img = einops.rearrange(img, 'b h w c -> b c h w').clone()

        ddim_sampler.make_schedule(ddim_steps, ddim_eta=eta, verbose=True)
        t_enc = min(int(denoise_strength * ddim_steps), ddim_steps - 1)
        z = model.get_first_stage_encoding(model.encode_first_stage(img))
        z_enc = ddim_sampler.stochastic_encode(z, torch.tensor([t_enc] * num_samples).to(model.device))

    # ----------------------- #
    #   获得编码后的prompt
    # ----------------------- #
    cond    = {"c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
    un_cond = {"c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
    H, W    = input_shape
    shape   = (4, H // 8, W // 8)

    if image_path is not None:
        samples = ddim_sampler.decode(z_enc, cond, t_enc, unconditional_guidance_scale=scale, unconditional_conditioning=un_cond)
    else:
        # ----------------------- #
        #   进行采样
        # ----------------------- #
        samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
                                                        shape, cond, verbose=False, eta=eta,
                                                        unconditional_guidance_scale=scale,
                                                        unconditional_conditioning=un_cond)

    # ----------------------- #
    #   进行解码
    # ----------------------- #
    x_samples = model.decode_first_stage(samples)
    x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)

1、输入图片编码

在图生图中，我们首先要指定一张参考的图像，然后在这个参考图像上开始工作：
1、利用VAE编码器对这张参考图像进行编码，使其进入隐空间，只有进入了隐空间，网络才知道这个图像是什么；
2、然后使用该Latent特征基于DDIM Sampler进行加噪，此时获得输入图片加噪后的特征。加噪的逻辑如下：

denoise可认为是重建的比例，1代表全部重建，0代表不重建；
假设我们设置denoise数值为0.8，总步数为20步；我们会对输入图片进行0.8x20次的加噪声，剩下4步不加，可理解为80%的特征，保留20%的特征；不过就算加完20步噪声，原始输入图片的信息还是有一点保留的，不是完全不保留。

此时我们便获得在隐空间加噪后的图像，后续会在这个 隐空间加噪后的图像 的基础上进行采样。

with torch.no_grad():
    if seed == -1:
        seed = random.randint(0, 65535)
    seed_everything(seed)

    # ----------------------- #
    #   对输入图片进行编码并加噪
    # ----------------------- #
    if image_path is not None:
        img = HWC3(np.array(img, np.uint8))
        img = torch.from_numpy(img.copy()).float().cuda() / 127.0 - 1.0
        img = torch.stack([img for _ in range(num_samples)], dim=0)
        img = einops.rearrange(img, 'b h w c -> b c h w').clone()

        ddim_sampler.make_schedule(ddim_steps, ddim_eta=eta, verbose=True)
        t_enc = min(int(denoise_strength * ddim_steps), ddim_steps - 1)
        z = model.get_first_stage_encoding(model.encode_first_stage(img))
        z_enc = ddim_sampler.stochastic_encode(z, torch.tensor([t_enc] * num_samples).to(model.device))

2、文本编码

文本编码的思路比较简单，直接使用CLIP的文本编码器进行编码就可以了，在代码中定义了一个FrozenCLIPEmbedder类别，使用了transformers库的CLIPTokenizer和CLIPTextModel。

在前传过程中，我们对输入进来的文本首先利用CLIPTokenizer进行编码，然后使用CLIPTextModel进行特征提取，通过FrozenCLIPEmbedder，我们可以获得一个[batch_size, 77, 768]的特征向量。

class FrozenCLIPEmbedder(AbstractEncoder):
    """Uses the CLIP transformer encoder for text (from huggingface)"""
    LAYERS = [
        "last",
        "pooled",
        "hidden"
    ]
    def __init__(self, version="openai/clip-vit-large-patch14", device="cuda", max_length=77,
                 freeze=True, layer="last", layer_idx=None):  # clip-vit-base-patch32
        super().__init__()
        assert layer in self.LAYERS
        # 定义文本的tokenizer和transformer
        self.tokenizer      = CLIPTokenizer.from_pretrained(version)
        self.transformer    = CLIPTextModel.from_pretrained(version)
        self.device         = device
        self.max_length     = max_length
        # 冻结模型参数
        if freeze:
            self.freeze()
        self.layer = layer
        self.layer_idx = layer_idx
        if layer == "hidden":
            assert layer_idx is not None
            assert 0 <= abs(layer_idx) <= 12

    def freeze(self):
        self.transformer = self.transformer.eval()
        # self.train = disabled_train
        for param in self.parameters():
            param.requires_grad = False

    def forward(self, text):
        # 对输入的图片进行分词并编码，padding直接padding到77的长度。
        batch_encoding  = self.tokenizer(text, truncation=True, max_length=self.max_length, return_length=True,
                                        return_overflowing_tokens=False, padding="max_length", return_tensors="pt")
        # 拿出input_ids然后传入transformer进行特征提取。
        tokens          = batch_encoding["input_ids"].to(self.device)
        outputs         = self.transformer(input_ids=tokens, output_hidden_states=self.layer=="hidden")
        # 取出所有的token
        if self.layer == "last":
            z = outputs.last_hidden_state
        elif self.layer == "pooled":
            z = outputs.pooler_output[:, None, :]
        else:
            z = outputs.hidden_states[self.layer_idx]
        return z

    def encode(self, text):
        return self(text)

3、采样流程

a、生成初始噪声

在图生图中，我们的初始噪声获取于参考图片，所以参考第一步就可以获得图生图的噪声

b、对噪声进行N次采样

既然Stable Diffusion是一个不断扩散的过程，那么少不了不断的去噪声，那么怎么去噪声便是一个问题。

在上一步中，我们已经获得了一个图生图的噪声，它是一个符合正态分布的向量，我们便从它开始去噪声。

我们会对ddim_timesteps的时间步取反，因为我们现在是去噪声而非加噪声，然后对其进行一个循环，由于我们此时不再是txt2img中的采样流程，我们使用sampler的另外一个方法decode，循环的代码如下：

@torch.no_grad()
def decode(self, x_latent, cond, t_start, unconditional_guidance_scale=1.0, unconditional_conditioning=None,
           use_original_steps=False):

    # 使用ddim的时间步
    # 这里内容看起来很多，但是其实很少，本质上就是取了self.ddim_timesteps，然后把它reversed一下
    timesteps = np.arange(self.ddpm_num_timesteps) if use_original_steps else self.ddim_timesteps
    timesteps = timesteps[:t_start]

    time_range = np.flip(timesteps)
    total_steps = timesteps.shape[0]
    print(f"Running DDIM Sampling with {total_steps} timesteps")

    iterator = tqdm(time_range, desc='Decoding image', total=total_steps)
    x_dec = x_latent
    for i, step in enumerate(iterator):
        index = total_steps - i - 1
        ts = torch.full((x_latent.shape[0],), step, device=x_latent.device, dtype=torch.long)
        # 进行单次采样
        x_dec, _ = self.p_sample_ddim(x_dec, cond, ts, index=index, use_original_steps=use_original_steps,
                                      unconditional_guidance_scale=unconditional_guidance_scale,
                                      unconditional_conditioning=unconditional_conditioning)
    return x_dec

c、单次采样解析

I、预测噪声

在进行单词采样前，需要首先判断是否有neg prompt，如果有，我们需要同时处理neg prompt，否则仅仅需要处理pos prompt。实际使用的时候一般都有neg prompt（效果会好一些），所以默认进入对应的处理过程。

在处理neg prompt时，我们对输入进来的隐向量和步数进行复制，一个属于pos prompt，一个属于neg prompt。torch.cat默认堆叠维度为0，所以是在batch_size维度进行堆叠，二者不会互相影响。然后我们将pos prompt和neg prompt堆叠到一个batch中，也是在batch_size维度堆叠。

# 首先判断是否由neg prompt，unconditional_conditioning是由neg prompt获得的
if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
    e_t = self.model.apply_model(x, t, c)
else:
    # 一般都是有neg prompt的，所以进入到这里
    # 在这里我们对隐向量和步数进行复制，一个属于pos prompt，一个属于neg prompt
    # torch.cat默认堆叠维度为0，所以是在bs维度进行堆叠，二者不会互相影响
    x_in = torch.cat([x] * 2)
    t_in = torch.cat([t] * 2)
    # 然后我们将pos prompt和neg prompt堆叠到一个batch中
    if isinstance(c, dict):
        assert isinstance(unconditional_conditioning, dict)
        c_in = dict()
        for k in c:
            if isinstance(c[k], list):
                c_in[k] = [
                    torch.cat([unconditional_conditioning[k][i], c[k][i]])
                    for i in range(len(c[k]))
                ]
            else:
                c_in[k] = torch.cat([unconditional_conditioning[k], c[k]])
    else:
        c_in = torch.cat([unconditional_conditioning, c])

堆叠完后，我们将隐向量、步数和prompt条件一起传入网络中，将结果在bs维度进行使用chunk进行分割。

因为我们在堆叠时，neg prompt放在了前面。因此分割好后，前半部分e_t_uncond属于利用neg prompt得到的，后半部分e_t属于利用pos prompt得到的，我们本质上应该扩大pos prompt的影响，远离neg prompt的影响。因此，我们使用e_t-e_t_uncond计算二者的距离，使用scale扩大二者的距离。在e_t_uncond基础上，得到最后的隐向量。

# 堆叠完后，隐向量、步数和prompt条件一起传入网络中，将结果在bs维度进行使用chunk进行分割
e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)

此时获得的e_t就是通过隐向量和prompt共同获得的预测噪声啦。

II、施加噪声

获得噪声就OK了吗？显然不是的，我们还要将获得的新噪声，按照一定的比例添加到原来的原始噪声上。

这个地方我们最好结合ddim中的公式来看，我们需要获得 α ˉ t \bar{\alpha}_t αˉt、 α ˉ t − 1 \bar{\alpha}_{t-1} αˉt−1、 σ t \sigma_t σt、 1 − α ˉ t \sqrt{1-\bar{\alpha}_t} 1−αˉt 。

代码中，我们其实已经预先计算好了这些参数。我们只需要直接取出即可，下方的a_t也就是公式中括号外的 α ˉ t \bar{\alpha}_t αˉt，a_prev 就是公式中的 α ˉ t − 1 \bar{\alpha}_{t-1} αˉt−1，sigma_t就是公式中的 σ t \sigma_t σt，sqrt_one_minus_at就是公式中的 1 − α ˉ t \sqrt{1-\bar{\alpha}_t} 1−αˉt 。

# 根据采样器选择参数
alphas      = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas
alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev
sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
sigmas      = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas

# 根据步数选择参数，
# 这里的index就是上面循环中的total_steps - i - 1
a_t         = torch.full((b, 1, 1, 1), alphas[index], device=device)
a_prev      = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)
sigma_t     = torch.full((b, 1, 1, 1), sigmas[index], device=device)
sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index],device=device)

其实这一步我们只是把公式需要用到的系数全都拿了出来，方便后面的加减乘除。然后我们便在代码中实现上述的公式。

# current prediction for x_0
# 公式中的最左边
pred_x0             = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
if quantize_denoised:
    pred_x0, _, *_  = self.model.first_stage_model.quantize(pred_x0)
# direction pointing to x_t
# 公式的中间
dir_xt              = (1. - a_prev - sigma_t**2).sqrt() * e_t
# 公式最右边
noise               = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature
if noise_dropout > 0.:
    noise           = torch.nn.functional.dropout(noise, p=noise_dropout)
x_prev              = a_prev.sqrt() * pred_x0 + dir_xt + noise
# 输出添加完公式的结果
return x_prev, pred_x0

d、预测噪声过程中的网络结构解析

I、apply_model方法解析

在3.a的预测噪声过程中，我们使用了model.apply_model方法进行噪声的预测，这个方法具体做了什么被隐掉了，我们看看具体做的工作。

apply_model方法在ldm.models.diffusion.ddpm.py文件中。在apply_model中，我们将x_noisy传入self.model中预测噪声。

x_recon = self.model(x_noisy, t, **cond)

self.model是一个预先构建好的类，定义在ldm.models.diffusion.ddpm.py文件的1416行，内部包含Stable Diffusion的Unet网络，self.model的功能有点类似于包装器，根据模型选择的特征融合方式，进行文本与上文生成的噪声的融合。

c_concat代表使用堆叠的方式进行融合，c_crossattn代表使用attention的方式融合。

class DiffusionWrapper(pl.LightningModule):
    def __init__(self, diff_model_config, conditioning_key):
        super().__init__()
        self.sequential_cross_attn = diff_model_config.pop("sequential_crossattn", False)
        # stable diffusion的unet网络
        self.diffusion_model = instantiate_from_config(diff_model_config)
        self.conditioning_key = conditioning_key
        assert self.conditioning_key in [None, 'concat', 'crossattn', 'hybrid', 'adm', 'hybrid-adm', 'crossattn-adm']

    def forward(self, x, t, c_concat: list = None, c_crossattn: list = None, c_adm=None):
        if self.conditioning_key is None:
            out = self.diffusion_model(x, t)
        elif self.conditioning_key == 'concat':
            xc = torch.cat([x] + c_concat, dim=1)
            out = self.diffusion_model(xc, t)
        elif self.conditioning_key == 'crossattn':
            if not self.sequential_cross_attn:
                cc = torch.cat(c_crossattn, 1)
            else:
                cc = c_crossattn
            out = self.diffusion_model(x, t, context=cc)
        elif self.conditioning_key == 'hybrid':
            xc = torch.cat([x] + c_concat, dim=1)
            cc = torch.cat(c_crossattn, 1)
            out = self.diffusion_model(xc, t, context=cc)
        elif self.conditioning_key == 'hybrid-adm':
            assert c_adm is not None
            xc = torch.cat([x] + c_concat, dim=1)
            cc = torch.cat(c_crossattn, 1)
            out = self.diffusion_model(xc, t, context=cc, y=c_adm)
        elif self.conditioning_key == 'crossattn-adm':
            assert c_adm is not None
            cc = torch.cat(c_crossattn, 1)
            out = self.diffusion_model(x, t, context=cc, y=c_adm)
        elif self.conditioning_key == 'adm':
            cc = c_crossattn[0]
            out = self.diffusion_model(x, t, y=cc)
        else:
            raise NotImplementedError()

        return out

代码中的self.diffusion_model便是Stable Diffusion的Unet网络，网络结构位于ldm.modules.diffusionmodules.openaimodel.py文件中的UNetModel类。

II、UNetModel模型解析

UNetModel主要做的工作是结合时间步t和文本Embedding计算这一时刻的噪声。尽管UNet的思路非常简单，但是在StableDiffusion中，UNetModel由ResBlock和Transformer模块组成，整体来讲相比于普通的UNet复杂一些。

Prompt通过Frozen CLIP Text Encoder获得Text Embedding，Timesteps通过全连接（MLP）获得Timesteps Embedding；

ResBlock用于结合时间步Timesteps Embedding，Transformer模块用于结合文本Text Embedding。

我在这里放一张大图，同学们可以看到内部shape的变化。

Unet代码如下所示：

class UNetModel(nn.Module):
    """
    The full UNet model with attention and timestep embedding.
    :param in_channels: channels in the input Tensor.
    :param model_channels: base channel count for the model.
    :param out_channels: channels in the output Tensor.
    :param num_res_blocks: number of residual blocks per downsample.
    :param attention_resolutions: a collection of downsample rates at which
        attention will take place. May be a set, list, or tuple.
        For example, if this contains 4, then at 4x downsampling, attention
        will be used.
    :param dropout: the dropout probability.
    :param channel_mult: channel multiplier for each level of the UNet.
    :param conv_resample: if True, use learned convolutions for upsampling and
        downsampling.
    :param dims: determines if the signal is 1D, 2D, or 3D.
    :param num_classes: if specified (as an int), then this model will be
        class-conditional with `num_classes` classes.
    :param use_checkpoint: use gradient checkpointing to reduce memory usage.
    :param num_heads: the number of attention heads in each attention layer.
    :param num_heads_channels: if specified, ignore num_heads and instead use
                               a fixed channel width per attention head.
    :param num_heads_upsample: works with num_heads to set a different number
                               of heads for upsampling. Deprecated.
    :param use_scale_shift_norm: use a FiLM-like conditioning mechanism.
    :param resblock_updown: use residual blocks for up/downsampling.
    :param use_new_attention_order: use a different attention pattern for potentially
                                    increased efficiency.
    """

    def __init__(
        self,
        image_size,
        in_channels,
        model_channels,
        out_channels,
        num_res_blocks,
        attention_resolutions,
        dropout=0,
        channel_mult=(1, 2, 4, 8),
        conv_resample=True,
        dims=2,
        num_classes=None,
        use_checkpoint=False,
        use_fp16=False,
        num_heads=-1,
        num_head_channels=-1,
        num_heads_upsample=-1,
        use_scale_shift_norm=False,
        resblock_updown=False,
        use_new_attention_order=False,
        use_spatial_transformer=False,    # custom transformer support
        transformer_depth=1,              # custom transformer support
        context_dim=None,                 # custom transformer support
        n_embed=None,                     # custom support for prediction of discrete ids into codebook of first stage vq model
        legacy=True,
    ):
        super().__init__()
        if use_spatial_transformer:
            assert context_dim is not None, 'Fool!! You forgot to include the dimension of your cross-attention conditioning...'

        if context_dim is not None:
            assert use_spatial_transformer, 'Fool!! You forgot to use the spatial transformer for your cross-attention conditioning...'
            from omegaconf.listconfig import ListConfig
            if type(context_dim) == ListConfig:
                context_dim = list(context_dim)

        if num_heads_upsample == -1:
            num_heads_upsample = num_heads

        if num_heads == -1:
            assert num_head_channels != -1, 'Either num_heads or num_head_channels has to be set'

        if num_head_channels == -1:
            assert num_heads != -1, 'Either num_heads or num_head_channels has to be set'

        self.image_size = image_size
        self.in_channels = in_channels
        self.model_channels = model_channels
        self.out_channels = out_channels
        self.num_res_blocks = num_res_blocks
        self.attention_resolutions = attention_resolutions
        self.dropout = dropout
        self.channel_mult = channel_mult
        self.conv_resample = conv_resample
        self.num_classes = num_classes
        self.use_checkpoint = use_checkpoint
        self.dtype = th.float16 if use_fp16 else th.float32
        self.num_heads = num_heads
        self.num_head_channels = num_head_channels
        self.num_heads_upsample = num_heads_upsample
        self.predict_codebook_ids = n_embed is not None

        # 用于计算当前采样时间t的embedding
        time_embed_dim  = model_channels * 4
        self.time_embed = nn.Sequential(
            linear(model_channels, time_embed_dim),
            nn.SiLU(),
            linear(time_embed_dim, time_embed_dim),
        )

        if self.num_classes is not None:
            self.label_emb = nn.Embedding(num_classes, time_embed_dim)
        
        # 定义输入模块的第一个卷积
        # TimestepEmbedSequential也可以看作一个包装器，根据层的种类进行时间或者文本的融合。
        self.input_blocks = nn.ModuleList(
            [
                TimestepEmbedSequential(
                    conv_nd(dims, in_channels, model_channels, 3, padding=1)
                )
            ]
        )
        self._feature_size  = model_channels
        input_block_chans   = [model_channels]
        ch                  = model_channels
        ds                  = 1
        # 对channel_mult进行循环，channel_mult一共有四个值，代表unet四个部分通道的扩张比例
        # [1, 2, 4, 4]
        for level, mult in enumerate(channel_mult):
            # 每个部分循环两次
            # 添加一个ResBlock和一个AttentionBlock
            for _ in range(num_res_blocks):
                # 先添加一个ResBlock
                # 用于对输入的噪声进行通道数的调整，并且融合t的特征
                layers = [
                    ResBlock(
                        ch,
                        time_embed_dim,
                        dropout,
                        out_channels=mult * model_channels,
                        dims=dims,
                        use_checkpoint=use_checkpoint,
                        use_scale_shift_norm=use_scale_shift_norm,
                    )
                ]
                # ch便是上述ResBlock的输出通道数
                ch = mult * model_channels
                if ds in attention_resolutions:
                    # num_heads=8
                    if num_head_channels == -1:
                        dim_head = ch // num_heads
                    else:
                        num_heads = ch // num_head_channels
                        dim_head = num_head_channels
                    if legacy:
                        #num_heads = 1
                        dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
                    # 使用了SpatialTransformer自注意力，加强全局特征，融合文本的特征
                    layers.append(
                        AttentionBlock(
                            ch,
                            use_checkpoint=use_checkpoint,
                            num_heads=num_heads,
                            num_head_channels=dim_head,
                            use_new_attention_order=use_new_attention_order,
                        ) if not use_spatial_transformer else SpatialTransformer(
                            ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim
                        )
                    )
                self.input_blocks.append(TimestepEmbedSequential(*layers))
                self._feature_size += ch
                input_block_chans.append(ch)
            # 如果不是四个部分中的最后一个部分，那么都要进行下采样。
            if level != len(channel_mult) - 1:
                out_ch = ch
                # 在此处进行下采样
                # 一般直接使用Downsample模块
                self.input_blocks.append(
                    TimestepEmbedSequential(
                        ResBlock(
                            ch,
                            time_embed_dim,
                            dropout,
                            out_channels=out_ch,
                            dims=dims,
                            use_checkpoint=use_checkpoint,
                            use_scale_shift_norm=use_scale_shift_norm,
                            down=True,
                        )
                        if resblock_updown
                        else Downsample(
                            ch, conv_resample, dims=dims, out_channels=out_ch
                        )
                    )
                )
                # 为下一阶段定义参数。
                ch = out_ch
                input_block_chans.append(ch)
                ds *= 2
                self._feature_size += ch

        if num_head_channels == -1:
            dim_head = ch // num_heads
        else:
            num_heads = ch // num_head_channels
            dim_head = num_head_channels
        if legacy:
            #num_heads = 1
            dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
        # 定义中间层
        # ResBlock + SpatialTransformer + ResBlock
        self.middle_block = TimestepEmbedSequential(
            ResBlock(
                ch,
                time_embed_dim,
                dropout,
                dims=dims,
                use_checkpoint=use_checkpoint,
                use_scale_shift_norm=use_scale_shift_norm,
            ),
            AttentionBlock(
                ch,
                use_checkpoint=use_checkpoint,
                num_heads=num_heads,
                num_head_channels=dim_head,
                use_new_attention_order=use_new_attention_order,
            ) if not use_spatial_transformer else SpatialTransformer(
                            ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim
                        ),
            ResBlock(
                ch,
                time_embed_dim,
                dropout,
                dims=dims,
                use_checkpoint=use_checkpoint,
                use_scale_shift_norm=use_scale_shift_norm,
            ),
        )
        self._feature_size += ch

        # 定义Unet上采样过程
        self.output_blocks = nn.ModuleList([])
        # 循环把channel_mult反了过来
        for level, mult in list(enumerate(channel_mult))[::-1]:
            # 上采样时每个部分循环三次
            for i in range(num_res_blocks + 1):
                ich = input_block_chans.pop()
                # 首先添加ResBlock层
                layers = [
                    ResBlock(
                        ch + ich,
                        time_embed_dim,
                        dropout,
                        out_channels=model_channels * mult,
                        dims=dims,
                        use_checkpoint=use_checkpoint,
                        use_scale_shift_norm=use_scale_shift_norm,
                    )
                ]
                ch = model_channels * mult
                # 然后进行SpatialTransformer自注意力
                if ds in attention_resolutions:
                    if num_head_channels == -1:
                        dim_head = ch // num_heads
                    else:
                        num_heads = ch // num_head_channels
                        dim_head = num_head_channels
                    if legacy:
                        #num_heads = 1
                        dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
                    layers.append(
                        AttentionBlock(
                            ch,
                            use_checkpoint=use_checkpoint,
                            num_heads=num_heads_upsample,
                            num_head_channels=dim_head,
                            use_new_attention_order=use_new_attention_order,
                        ) if not use_spatial_transformer else SpatialTransformer(
                            ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim
                        )
                    )
                # 如果不是channel_mult循环的第一个
                # 且
                # 是num_res_blocks循环的最后一次，则进行上采样
                if level and i == num_res_blocks:
                    out_ch = ch
                    layers.append(
                        ResBlock(
                            ch,
                            time_embed_dim,
                            dropout,
                            out_channels=out_ch,
                            dims=dims,
                            use_checkpoint=use_checkpoint,
                            use_scale_shift_norm=use_scale_shift_norm,
                            up=True,
                        )
                        if resblock_updown
                        else Upsample(ch, conv_resample, dims=dims, out_channels=out_ch)
                    )
                    ds //= 2
                self.output_blocks.append(TimestepEmbedSequential(*layers))
                self._feature_size += ch

        # 最后在输出部分进行一次卷积
        self.out = nn.Sequential(
            normalization(ch),
            nn.SiLU(),
            zero_module(conv_nd(dims, model_channels, out_channels, 3, padding=1)),
        )
        if self.predict_codebook_ids:
            self.id_predictor = nn.Sequential(
            normalization(ch),
            conv_nd(dims, model_channels, n_embed, 1),
            #nn.LogSoftmax(dim=1)  # change to cross_entropy and produce non-normalized logits
        )

    def convert_to_fp16(self):
        """
        Convert the torso of the model to float16.
        """
        self.input_blocks.apply(convert_module_to_f16)
        self.middle_block.apply(convert_module_to_f16)
        self.output_blocks.apply(convert_module_to_f16)

    def convert_to_fp32(self):
        """
        Convert the torso of the model to float32.
        """
        self.input_blocks.apply(convert_module_to_f32)
        self.middle_block.apply(convert_module_to_f32)
        self.output_blocks.apply(convert_module_to_f32)

    def forward(self, x, timesteps=None, context=None, y=None,**kwargs):
        """
        Apply the model to an input batch.
        :param x: an [N x C x ...] Tensor of inputs.
        :param timesteps: a 1-D batch of timesteps.
        :param context: conditioning plugged in via crossattn
        :param y: an [N] Tensor of labels, if class-conditional.
        :return: an [N x C x ...] Tensor of outputs.
        """
        assert (y is not None) == (
            self.num_classes is not None
        ), "must specify y if and only if the model is class-conditional"
        hs      = []
        # 用于计算当前采样时间t的embedding
        t_emb   = timestep_embedding(timesteps, self.model_channels, repeat_only=False)
        emb     = self.time_embed(t_emb)

        if self.num_classes is not None:
            assert y.shape == (x.shape[0],)
            emb = emb + self.label_emb(y)

        # 对输入模块进行循环，进行下采样并且融合时间特征与文本特征。
        h = x.type(self.dtype)
        for module in self.input_blocks:
            h = module(h, emb, context)
            hs.append(h)

        # 中间模块的特征提取
        h = self.middle_block(h, emb, context)

        # 上采样模块的特征提取
        for module in self.output_blocks:
            h = th.cat([h, hs.pop()], dim=1)
            h = module(h, emb, context)
        h = h.type(x.dtype)
        # 输出模块
        if self.predict_codebook_ids:
            return self.id_predictor(h)
        else:
            return self.out(h)

4、隐空间解码生成图片

通过上述步骤，已经可以多次采样获得结果，然后我们便可以通过隐空间解码生成图片。

隐空间解码生成图片的过程非常简单，将上文多次采样后的结果，使用decode_first_stage方法即可生成图片。

在decode_first_stage方法中，网络调用VAE对获取到的64x64x3的隐向量进行解码，获得512x512x3的图片。

@torch.no_grad()
def decode_first_stage(self, z, predict_cids=False, force_not_quantize=False):
    if predict_cids:
        if z.dim() == 4:
            z = torch.argmax(z.exp(), dim=1).long()
        z = self.first_stage_model.quantize.get_codebook_entry(z, shape=None)
        z = rearrange(z, 'b h w c -> b c h w').contiguous()

    z = 1. / self.scale_factor * z
	# 一般无需分割输入，所以直接将x_noisy传入self.model中，在下面else进行
    if hasattr(self, "split_input_params"):
    	......
    else:
        if isinstance(self.first_stage_model, VQModelInterface):
            return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)
        else:
            return self.first_stage_model.decode(z)

图像到图像预测过程代码

整体预测代码如下：

import os
import random

import cv2
import einops
import numpy as np
import torch
from PIL import Image
from pytorch_lightning import seed_everything

from ldm_hacked import *

# ----------------------- #
#   使用的参数
# ----------------------- #
# config的地址
config_path = "model_data/sd_v15.yaml"
# 模型的地址
model_path  = "model_data/v1-5-pruned-emaonly.safetensors"
# fp16，可以加速与节省显存
sd_fp16     = True
vae_fp16    = True

# ----------------------- #
#   生成图片的参数
# ----------------------- #
# 生成的图像大小为input_shape，对于img2img会进行Centter Crop
input_shape = [512, 512]
# 一次生成几张图像
num_samples = 1
# 采样的步数
ddim_steps  = 20
# 采样的种子，为-1的话则随机。
seed        = 12345
# eta
eta         = 0
# denoise强度，for img2img
denoise_strength = 1.0

# ----------------------- #
#   提示词相关参数
# ----------------------- #
# 提示词
prompt      = "a cute cat, with yellow leaf, trees"
# 正面提示词
a_prompt    = "best quality, extremely detailed"
# 负面提示词
n_prompt    = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"
# 正负扩大倍数
scale       = 9
# img2img使用，如果不想img2img这设置为None。
image_path  = None

# ----------------------- #
#   保存路径
# ----------------------- #
save_path   = "imgs/outputs_imgs"

# ----------------------- #
#   创建模型
# ----------------------- #
model   = create_model(config_path).cpu()
model.load_state_dict(load_state_dict(model_path, location='cuda'), strict=False)
model   = model.cuda()
ddim_sampler = DDIMSampler(model)
if sd_fp16:
    model = model.half()

if image_path is not None:
    img = Image.open(image_path)
    img = crop_and_resize(img, input_shape[0], input_shape[1])

with torch.no_grad():
    if seed == -1:
        seed = random.randint(0, 65535)
    seed_everything(seed)
    
    # ----------------------- #
    #   对输入图片进行编码并加噪
    # ----------------------- #
    if image_path is not None:
        img = HWC3(np.array(img, np.uint8))
        img = torch.from_numpy(img.copy()).float().cuda() / 127.0 - 1.0
        img = torch.stack([img for _ in range(num_samples)], dim=0)
        img = einops.rearrange(img, 'b h w c -> b c h w').clone()
        if vae_fp16:
            img = img.half()
            model.first_stage_model = model.first_stage_model.half()
        else:
            model.first_stage_model = model.first_stage_model.float()

        ddim_sampler.make_schedule(ddim_steps, ddim_eta=eta, verbose=True)
        t_enc = min(int(denoise_strength * ddim_steps), ddim_steps - 1)
        z = model.get_first_stage_encoding(model.encode_first_stage(img))
        z_enc = ddim_sampler.stochastic_encode(z, torch.tensor([t_enc] * num_samples).to(model.device))
        z_enc = z_enc.half() if sd_fp16 else z_enc.float()

    # ----------------------- #
    #   获得编码后的prompt
    # ----------------------- #
    cond    = {"c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
    un_cond = {"c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
    H, W    = input_shape
    shape   = (4, H // 8, W // 8)

    if image_path is not None:
        samples = ddim_sampler.decode(z_enc, cond, t_enc, unconditional_guidance_scale=scale, unconditional_conditioning=un_cond)
    else:
        # ----------------------- #
        #   进行采样
        # ----------------------- #
        samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
                                                        shape, cond, verbose=False, eta=eta,
                                                        unconditional_guidance_scale=scale,
                                                        unconditional_conditioning=un_cond)

    # ----------------------- #
    #   进行解码
    # ----------------------- #
    x_samples = model.decode_first_stage(samples.half() if vae_fp16 else samples.float())

    x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)

# ----------------------- #
#   保存图片
# ----------------------- #
if not os.path.exists(save_path):
    os.makedirs(save_path)
for index, image in enumerate(x_samples):
    cv2.imwrite(os.path.join(save_path, str(index) + ".jpg"), cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

AI绘画所有方向的学习路线思维导图

这里为大家提供了总的路线图。它的用处就在于，你可以按照上面的知识点去找对应的学习资源，保证自己学得较为全面。如果下面这个学习路线能帮助大家将AI利用到自身工作上去，那么我的使命也就完成了：

stable diffusion新手0基础入门PDF

AI绘画必备工具

温馨提示：篇幅有限，已打包文件夹，获取方式在：文末

AI绘画基础+速成+进阶使用教程

观看零基础学习视频，看视频学习是最快捷也是最有效果的方式，跟着视频中老师的思路，从基础到深入，还是很容易入门的。

12000+AI关键词大合集

这份完整版的AI绘画资料我已经打包好，戳下方蓝色字体，即可免费领取！

CSDN大礼包：《全套AI绘画基础学习资源包》免费分享

你可能感兴趣的:(stable,diffusion)

stable diffusion-系统课程：0基础系统性学习AI绘画，小白也能轻松上手顺心网创
本课程是AI绘画工具stablediffusion的系统课程，内容通俗且细致，让小白也能上手。课程大纲基础部分1.前置要求+整合包安装+启动器使用2.纯净原版安装+使用介绍3.文生图精讲4.图生图精讲5.涂鸦、局部重绘、涂鸦重绘6.上传蒙版、批量处理7.模型精讲8.提示词精讲9.插件的认识与安装10.脚本的安装及使用11controlnet基础讲解12.cn-线性控制类型13.cn-深度和法线进阶
Duckdb处理excel文件 __风__ duckdb excel
duckdb通过xlsx扩展读写excel文件，但是不支持xls格式。具体可以参考https://duckdb.org/docs/stable/guides/file_formats/excel_importhttps://duckdb.org/docs/stable/guides/file_formats/excel_export常用测试例子：使用duckdbcli工具将PG的数据导入到exce
银河麒麟V10离线安装Docker checkQQ 安装部署记录 Devops工具使用 Liunx运维工具 docker 容器运维
场景：内网环境，无法连接公网，需要在麒麟系统部署一个docker环境运行容器。一、准备docker离线安装包：Indexoflinux/static/stable/x86_64/https://download.docker.com/linux/static/stable/x86_64/选择合适的版本，这里个人选择的20.10.14二、上传压缩包到服务器后进行解压tar--strip-compon
扩散模型（Diffusion Model）简介
参考：Diffusionmodel—扩散模型-CSDN博客；由浅入深了解DiffusionModel-知乎；https://arxiv.org/abs/2308.093881.概述扩散模型是一种生成模型。可用在视觉生成任务上，如图像超分辨率、去模糊、JPEG伪影移除、阴影移除、去雾/霾/雨等等。扩散模型分为前向（扩散）过程和逆过程。前向过程逐步为图像增加逐像素噪声，直到图像满足高斯噪声；逆
鲲鹏麒麟离线安装Docker angushine docker
服务器信息[root@testinstall]#cat/etc/kylin-releaseKylinLinuxAdvancedServerreleaseV10(Tercel)下载安装包访问https://download.docker.com/linux/static/stable/aarch64/找到合适的版本，这里采用18.09.9这个版本访问如下链接下载安装包wgethttps://down
科研：diffusion生成MNIST程序实现 Menger_Wen 科研：diffusion 人工智能机器学习 stable diffusion python
科研：diffusion生成MNIST程序实现第一部分：填写部分的详细解释1.`diffusion.py`中的`batch_extend_like`方法2.`diffusion.py`中的`ode_reverse`方法3.`sde_schedule.py`中的`sde_forward`方法第二部分：逐行解释两个程序1.`diffusion.py`（Diffusion类）`__init__`方法`b
AI人工智能领域，Stable Diffusion掀起的技术风暴 AI大模型应用工坊人工智能 stable diffusion ai
AI人工智能领域，StableDiffusion掀起的技术风暴关键词：AI人工智能、StableDiffusion、技术风暴、图像生成、扩散模型摘要：本文深入探讨了AI人工智能领域中StableDiffusion所掀起的技术风暴。首先介绍了StableDiffusion的背景，包括其目的、预期读者和文档结构等。详细阐述了核心概念与联系，通过文本示意图和Mermaid流程图进行清晰展示。对核心算法原
【论文阅读】Few-Shot PPG Signal Generation via Guided Diffusion Models Bosenya12 论文阅读
从少量样本数据选择到后处理的整体框架。首先，扩散模型在N样本数据集和指导下的训练。接着，模型生成一个增强的数据集，并进一步优化以提高保真度。最后，这些合成数据与少量样本训练数据集结合，用于基准模型的训练和评估。数据分布从最初的红色变为保真度增强的蓝色，这表明模型与真实数据更加吻合，如简化后的数据分布示意图所示。这篇文章的核心内容是介绍了一种名为BG-Diff（Bi-GuidedDiffusion）
python笔记-Selenium谷歌浏览器驱动下载 hero.zhong python 笔记 selenium
Selenium谷歌浏览器驱动下载地址：https://googlechromelabs.github.io/chrome-for-testing/#stable下面是遇到的问题：python网络爬虫技术中使用谷歌浏览器代码，报错：OSError:[WinError193]%1不是有效的Win32应用程序：遇到错误OSError:[WinError193]%1不是有效的Win32应用程序通常意味着
【Linux学习】Linux安装并配置Redis
安装Redis在Linux系统上安装Redis可以通过包管理器或源码编译两种方式进行。以下是两种方法的详细步骤。使用包管理器安装Redis（以Ubuntu为例）：sudoaptupdatesudoaptinstallredis-server通过源码编译安装Redis：wgethttps://download.redis.io/redis-stable.tar.gztar-xzvfredis-sta
DPDK网卡PMD驱动风流网民 DPDK DPDK
以/home/user/dpdk-stable-18.11.11/drivers/net/i40e目录下的驱动为例源代码文件有#lsbasei40e_ethdev_vf.ci40e_logs.hi40e_regs.hi40e_rxtx_vec_altivec.ci40e_rxtx_vec_neon.ci40e_vf_representor.crte_pmd_i40e.ci40e_ethdev.ci
《从Backprop到Diffusion：深度学习的算法进化树全景图》 HeartException 学习人工智能
前言前些天发现了一个巨牛的人工智能免费学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站《从Backprop到Diffusion：深度学习的算法进化树全景图》**展开系统性解析。全文基于算法原理-技术突破-产业重塑的三层逻辑链，融合2025年最新研究成果与产业数据，呈现深度学习四十年的底层技术迁徙路径从Backprop到Diffusion：深度学习的算法进化树全景图副标题：一部算法
PHP接单涨薪系列（九）之计算机视觉实战：PHP+Stable Diffusion接单指南（2025高溢价秘籍）攻城狮凌霄 PHP PHP接单涨薪 AI php 计算机视觉 stable diffusion
案例场景某电商公司使用本方案后，产品图制作成本降低90%，广告转化率提升35%，单月节省设计费用超¥80,000。本文将彻底解密如何用PHP+AI视觉技术接取高单价设计外包，让你在竞争激烈的市场中脱颖而出！一、视觉设计市场的AI革命1.1传统设计vsAI设计设计任务传统流程AI流程需求沟通初稿设计反复修改最终交付AI生成微调即时交付2025年设计市场数据对比：指标传统设计AI设计提升幅度单图制作时
Stable Diffusion生成素描风格的技术要点 AI智能应用 stable diffusion 人工智能 ai
StableDiffusion生成素描风格的技术要点关键词：StableDiffusion、素描风格、图像生成、技术要点、AI绘画摘要：本文围绕StableDiffusion生成素描风格图像展开，深入探讨了其中的技术要点。先介绍了StableDiffusion和素描风格的背景知识，接着详细解释了核心概念，包括StableDiffusion的工作原理和素描风格的特征。然后阐述了生成素描风格图像的核心
NVIDIA Isaac GR00T N1.5 人形机器人强化学习入门教程（五）强化学习与机器人控制仿真机器人与具身智能人工智能机器人深度学习神经网络强化学习模仿学习具身智能
系列文章目录目录系列文章目录前言一、更深入的理解1.1实体化动作头微调1.1.1实体标签1.1.2工作原理1.1.3支持的实现1.2高级调优参数1.2.1模型组件1.2.1.1视觉编码器（tune_visual）1.2.1.2语言模型（tune_llm）1.2.1.3投影器（tune_projector）1.2.1.4扩散模型（tune_diffusion_model）1.2.2理解数据转换1.2
Flutter开发 -flutter1.22.x升级踩坑记 CodingFire Flutter实用开发技巧合集 Flutter升级 flutter1.22 升级 1.22.1 flutterSDK升级
1.22版本相关：flutterSDK：1.22.1（目前最新版为1.22.2）dart：2.10.1LHHdeMacBook-Pro:nextzcy$dart--versionDartSDKversion:2.10.1(stable)(TueOct610:54:
MIT 6.S184 Lec01 Flow and Diffusion Models 克斯维尔的明天_ 机器学习人工智能
MIT6.S184Lec01FlowandDiffusionModels本节中，我们将描述如何通过模拟一个适当构造的微分方程来获得所需的转换。例如，流匹配和扩散模型分别涉及模拟常微分方程（ODE）和随机微分方程（SDE）。因此，本节的目标是定义和构建这些生成模型。具体来说，我们首先定义ODE和SDE，并讨论它们的模拟。其次，我们描述如何使用深度神经网络对ODE/SDE进行参数化。从中推导出流模型和
WebRTC入门与提高2：WebRTC开发环境音视频开发老马音视频开发流媒体服务器音视频实时音视频视频编解码 webrtc c++
2.1安装vscode下载和安装vscodevscode官网：VisualStudioCode-CodeEditing.Redefined下载地址：https://vscode.cdn.azure.cn/stable/1b8e8302e405050205e69b59abb3559592bb9e60/VSCodeUserSetup-x64-1.31.1.exe下载完后按引导安装即可2.1.1配置vs
Step-by-Step Diffusion&Flow Model Notes 克斯维尔的明天_ 机器学习人工智能深度学习算法
Step-by-StepNotesFundamentalsofDiffusion生成模型的目标与扩散模型的基本思想生成模型的目标生成模型的目的是给定一组来自某个未知分布p∗(x)p^{*}(x)p∗(x)的独立同分布(i.i.d.)样本，构建一个采样器，能够近似地从相同的分布中生成新的样本。例如，假设我们有一组狗的图像训练集，这些图像来自某个潜在分布pdogp_{\text{dog}}pdog，我
离线配置vscode ssh信息 ma1096539894 vscode ide 编辑器
1、在有网环境下下载vscode**.exe；并配置ssh插件***;2、将.vscode路径下的extensions。替换到内网同路径下;3、获取内网的vscode唯一id;4、在外网中拼接id;下载对应的https://vscode.download.prss.microsoft.com/dbazure/download/stable/******************/vscode-ser
Dimba: Transformer-Mamba Diffusion Models————3 Methodology
图解图片中的每个模块详解1.文本输入(Text)描述：输入的文本描述了一个具有具体特征的场景。功能：提供关于要生成图像的详细信息。2.T5模型(TexttoFeature)描述：使用T5模型将文本转换为特征向量。功能：提取文本中的语义信息，为后续的图像生成提供条件。3.图像输入(Image)描述：输入图像通过变分自编码器(VAE)编码器处理。功能：将图像转换为潜在表示，用于添加噪声并进行扩散过程。
1、快速上手 [代码级手把手解析diffusers库] Yuezero_ AIGC 人工智能深度学习
快速上手Pipeline内部执行步骤后续更新计划diffusers是HuggingFace推出的一个diffusion库，它提供了简单方便的diffusion推理训练pipe，同时拥有一个模型和数据社区，代码可以像torchhub一样直接从指定的仓库去调用别人上传的数据集和pretraincheckpoint。除此之外，安装方便，代码结构清晰，注释齐全，二次开发会十分有效率。diffusers使用
linux深度学习问题汇总不想改代码备忘录 linux python 深度学习 pytorch 人工智能 1024程序员节
目录一、异常问题1.segementationfault(coredump)2.Illegalinstruction(coredumped)3.死锁4.掉卡二、通用方法1.查看重启记录2.系统性能监控3.后台执行命令4.异常日志三、深度学习技术1.普通网络改DDP训练，单机多卡，pytorch四、专业内容方法1.微调diffusion类模型本文记录一些在使用linux服务器进行深度学习时遇到的问题
离线安装 Docker 和 Docker Compose 教程海洋猿云原生 docker 运维 linux ubuntu
一、离线安装（一）安装Docker下载Docker安装包访问Docker官方静态安装包页面：https://download.docker.com/linux/static/stable/x86_64/Indexoflinux/static/stable/x86_64/解压安装包并移动文件tar-xvfdocker-27.1.0.tgzmvdocker/*/usr/bin/将Docker注册为sy
深入了解Stable Diffusion：解锁AI图像生成的神秘密码 ????? DTcode7 AI生产力 AI AIGC stable diffusion AI生产力前沿
深入了解StableDiffusion：解锁AI图像生成的神秘密码?????StableDiffusion：AI的像素炼金术士基础概念：从扩散到聚焦的魔法技术深潜：核心机制解析反向扩散算法代码实验室：动手实践StableDiffusion的魔法示例一：一句话，一个世界示例二：风格迁移的艺术实战技巧与最佳实践实际挑战与解决方案结语：艺术与科技的无限对话在这个数字洪流涌动的时代，AI图像生成技术正以前
利用Python驾驭Stable Diffusion：原理解析、扩展开发与高级应用
个人网站:【摸鱼游戏】【神级代码资源网站】【星海网址导航】摸鱼、技术交流群点此查看详情引言随着生成式AI的迅猛发展，StableDiffusion已成为图像生成领域最受欢迎的开源模型之一。其以开放性、高质量输出和广泛社区支持赢得了无数开发者的青睐。本文将从原理出发，结合Python工具链，深入剖析如何掌握StableDiffusion的本质，并基于其能力进行扩展开发与高级应用。一、StableDi
ClickHouse：在 CentOS7.4 中编译 ClickHouse
目录一、环境准备二、创建编译使用的脚本三、编译ClickHouse一、环境准备1.1、CentOS版本为7.4.17081.2、从githubcloneClickHouse源码，checkout到tagv21.2.6.1-stable。cloneClickHOuse代码的时候需要把依赖的子项目也都clone下来，命令如下：gitclone--recursivehttps://github.com/
AI绘画背后的技术：Stable Diffusion原理详解与实战 AI学长带你学AI ai
AI绘画背后的技术：StableDiffusion原理详解与实战关键词：StableDiffusion、扩散模型、AI绘画、潜在空间、文本生成图像摘要：本文将带你揭开AI绘画“魔法”背后的核心技术——StableDiffusion的神秘面纱。我们会用“给小学生讲故事”的方式，从生活中的例子出发，逐步解释扩散模型的底层逻辑、StableDiffusion的关键创新，并用Python代码实战演示如何生
Stable Diffusion 项目实战落地：从0到1 掌握ControlNet：打造光影字形的创意秘技第一篇 w风雨无阻w AI应用实践 stable diffusion AI作画人工智能 ai绘画 AIGC
大家好呀，欢迎来到AI造字工坊！在这篇文章中，我们将带领你走进一个神奇的世界——ControlNet。你可能听说过它，但可能还没摸清它的深奥之处。今天，我们就来揭开它神秘的面纱，轻松带你玩转字形设计！话说回来，相信大家对图片生成、提示词、放大操作、抽卡这些基本操作已经不陌生了吧？从最初的“小白”，到如今的“AI图片小达人”，我们已经走过了不少路程。但今天，不同于以前的步骤，我们要接触到一个更加强大
【安装Stable Diffusion以及遇到问题和总结】岁月玲珑 AI stable diffusion AI编程 AI作画
在本地安装部署StableDiffusion，需要准备好硬件环境，安装相关依赖，然后配置模型。下面为你详细介绍安装部署的步骤：一、硬件要求显卡：需要NVIDIAGPU，显存至少6GB，推荐8GB及以上。系统：Windows10/11、Linux（Ubuntu等）或macOS（需要Rosetta2）。内存：至少16GBRAM。存储空间：准备10GB以上的可用空间。二、软件准备首先要安装Python和
Linux的Initrd机制被触发 linux
Linux 的 initrd 技术是一个非常普遍使用的机制，linux2.6 内核的 initrd 的文件格式由原来的文件系统镜像文件转变成了 cpio 格式，变化不仅反映在文件格式上， linux 内核对这两种格式的 initrd 的处理有着截然的不同。本文首先介绍了什么是 initrd 技术，然后分别介绍了 Linux2.4 内核和 2.6 内核的 initrd 的处理流程。最后通过对 Lin
maven本地仓库路径修改 bitcarter maven
默认maven本地仓库路径：C:\Users\Administrator\.m2 修改maven本地仓库路径方法： 1.打开E:\maven\apache-maven-2.2.1\conf\settings.xml 2.找到
XSD和XML中的命名空间 darrenzhu xml xsd schema namespace 命名空间
http://www.360doc.com/content/12/0418/10/9437165_204585479.shtml http://blog.csdn.net/wanghuan203/article/details/9203621 http://blog.csdn.net/wanghuan203/article/details/9204337 http://www.cn
Java 求素数运算周凡杨 java 算法素数
网络上对求素数之解数不胜数，我在此总结归纳一下，同时对一些编码，加以改进，效率有成倍热提高。第一种：原理: 6N(+-)1法任何一个自然数，总可以表示成为如下的形式之一： 6N，6N+1，6N+2，6N+3，6N+4，6N+5 (N=0，1，2，…)
java 单例模式 g21121 java
想必单例模式大家都不会陌生，有如下两种方式来实现单例模式： class Singleton { private static Singleton instance=new Singleton(); private Singleton(){} static Singleton getInstance() { return instance; }
Linux下Mysql源码安装 510888780 mysql
1.假设已经有mysql-5.6.23-linux-glibc2.5-x86_64.tar.gz (1)创建mysql的安装目录及数据库存放目录解压缩下载的源码包，目录结构，特殊指定的目录除外：
32位和64位操作系统墙头上一根草 32位和64位操作系统
32位和64位操作系统是指：CPU一次处理数据的能力是32位还是64位。现在市场上的CPU一般都是64位的，但是这些CPU并不是真正意义上的64 位CPU，里面依然保留了大部分32位的技术，只是进行了部分64位的改进。32位和64位的区别还涉及了内存的寻址方面，32位系统的最大寻址空间是2 的32次方= 4294967296（bit）= 4（GB）左右，而64位系统的最大寻址空间的寻址空间则达到了
我的spring学习笔记10-轻量级_Spring框架 aijuans Spring 3
一、问题提问： → 请简单介绍一下什么是轻量级？轻量级（Leightweight）是相对于一些重量级的容器来说的，比如Spring的核心是一个轻量级的容器，Spring的核心包在文件容量上只有不到1M大小，使用Spring核心包所需要的资源也是很少的，您甚至可以在小型设备中使用Spring。
mongodb 环境搭建及简单CURD antlove Web Install curd NoSQL mongo
一搭建mongodb环境 1. 在mongo官网下载mongodb 2. 在本地创建目录 "D:\Program Files\mongodb-win32-i386-2.6.4\data\db" 3. 运行mongodb服务 [mongod.exe --dbpath "D:\Program Files\mongodb-win32-i386-2.6.4\data\
数据字典和动态视图百合不是茶 oracle 数据字典动态视图系统和对象权限
数据字典（data dictionary）是 Oracle 数据库的一个重要组成部分，这是一组用于记录数据库信息的只读（read-only）表。随着数据库的启动而启动,数据库关闭时数据字典也关闭数据字典中包含数据库中所有方案对象（schema object）的定义(包括表，视图，索引，簇，同义词，序列，过程，函数，包，触发器等等) 数据库为一
多线程编程一般规则 bijian1013 java thread 多线程 java多线程
如果两个工两个以上的线程都修改一个对象，那么把执行修改的方法定义为被同步的，如果对象更新影响到只读方法，那么只读方法也要定义成同步的。不要滥用同步。如果在一个对象内的不同的方法访问的不是同一个数据，就不要将方法设置为synchronized的。
将文件或目录拷贝到另一个Linux系统的命令scp bijian1013 linux unix scp
一.功能说明 scp就是security copy，用于将文件或者目录从一个Linux系统拷贝到另一个Linux系统下。scp传输数据用的是SSH协议，保证了数据传输的安全，其格式如下： scp 远程用户名@IP地址：文件的绝对路径
【持久化框架MyBatis3五】MyBatis3一对多关联查询 bit1129 Mybatis3
以教员和课程为例介绍一对多关联关系，在这里认为一个教员可以叫多门课程，而一门课程只有1个教员教，这种关系在实际中不太常见，通过教员和课程是多对多的关系。示例数据：地址表： CREATE TABLE ADDRESSES ( ADDR_ID INT(11) NOT NULL AUTO_INCREMENT, STREET VAR
cookie状态判断引发的查找问题 bitcarter form cgi
先说一下我们的业务背景： 1.前台将图片和文本通过form表单提交到后台，图片我们都做了base64的编码，并且前台图片进行了压缩 2.form中action是一个cgi服务 3.后台cgi服务同时供PC，H5，APP 4.后台cgi中调用公共的cookie状态判断方法（公共的，大家都用，几年了没有问题）问题：（折腾两天。。。。） 1.PC端cgi服务正常调用，cookie判断没
通过Nginx,Tomcat访问日志(access log)记录请求耗时 ronin47
一、Nginx通过$upstream_response_time $request_time统计请求和后台服务响应时间 nginx.conf使用配置方式： log_format main '$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_r
java-67- n个骰子的点数。把n个骰子扔在地上，所有骰子朝上一面的点数之和为S。输入n，打印出S的所有可能的值出现的概率。 bylijinnan java
public class ProbabilityOfDice { /** * Q67 n个骰子的点数 * 把n个骰子扔在地上，所有骰子朝上一面的点数之和为S。输入n，打印出S的所有可能的值出现的概率。 * 在以下求解过程中，我们把骰子看作是有序的。 * 例如当n=2时，我们认为（1，2）和（2，1）是两种不同的情况 */ private stati
看别人的博客，觉得心情很好 Cb123456 博客心情
以为写博客，就是总结，就和日记一样吧，同时也在督促自己。今天看了好长时间博客: 职业规划: http://www.iteye.com/blogs/subjects/zhiyeguihua android学习: 1.http://byandby.i
[JWFD开源工作流]尝试用原生代码引擎实现循环反馈拓扑分析 comsci 工作流
我们已经不满足于仅仅跳跃一次，通过对引擎的升级，今天我测试了一下循环反馈模式，大概跑了200圈，引擎报一个溢出错误在一个流程图的结束节点中嵌入一段方程，每次引擎运行到这个节点的时候，通过实时编译器GM模块，计算这个方程，计算结果与预设值进行比较，符合条件则跳跃到开始节点，继续新一轮拓扑分析，直到遇到
JS常用的事件及方法 cwqcwqmax9 js
事件描述 onactivate 当对象设置为活动元素时触发。 onafterupdate 当成功更新数据源对象中的关联对象后在数据绑定对象上触发。 onbeforeactivate 对象要被设置为当前元素前立即触发。 onbeforecut 当选中区从文档中删除之前在源对象触发。 onbeforedeactivate 在 activeElement 从当前对象变为父文档其它对象之前立即
正则表达式验证日期格式 dashuaifu 正则表达式 IT其它 java其它
正则表达式验证日期格式 function isDate(d){ var v = d.match(/^(\d{4})-(\d{1,2})-(\d{1,2})$/i); if(!v) { this.focus(); return false; } } <input value="2000-8-8" onblu
Yii CModel.rules() 方法、validate预定义完整列表、以及说说验证 dcj3sjt126com yii
public array rules () {return} array 要调用 validate() 时应用的有效性规则。返回属性的有效性规则。声明验证规则，应重写此方法。每个规则是数组具有以下结构：array('attribute list', 'validator name', 'on'=>'scenario name', ...validation
UITextAttributeTextColor = deprecated in iOS 7.0 dcj3sjt126com ios
In this lesson we used the key "UITextAttributeTextColor" to change the color of the UINavigationBar appearance to white. This prompts a warning "first deprecated in iOS 7.0." Ins
判断一个数是质数的几种方法 EmmaZhao Math python
质数也叫素数，是只能被1和它本身整除的正整数，最小的质数是2，目前发现的最大的质数是p=2^57885161-1【注1】。判断一个数是质数的最简单的方法如下： def isPrime1(n): for i in range(2, n): if n % i == 0: return False return True 但是在上面的方法中有一些冗余的计算，所以
SpringSecurity工作原理小解读坏我一锅粥 SpringSecurity
SecurityContextPersistenceFilter ConcurrentSessionFilter WebAsyncManagerIntegrationFilter HeaderWriterFilter CsrfFilter LogoutFilter Use
JS实现自适应宽度的Tag切换 ini JavaScript html Web css html5
效果体验：http://hovertree.com/texiao/js/3.htm 该效果使用纯JavaScript代码，实现TAB页切换效果，TAB标签根据内容自适应宽度，点击TAB标签切换内容页。 HTML文件代码： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"
Hbase Rest API : 数据查询 kane_xie REST hbase
hbase（hadoop）是用java编写的，有些语言（例如python）能够对它提供良好的支持，但也有很多语言使用起来并不是那么方便，比如c#只能通过thrift访问。Rest就能很好的解决这个问题。Hbase的org.apache.hadoop.hbase.rest包提供了rest接口，它内嵌了jetty作为servlet容器。启动命令：./bin/hbase rest s
JQuery实现鼠标拖动元素移动位置（源码+注释）明子健 jquery js 源码拖动鼠标
欢迎讨论指正！ print.html代码： <!DOCTYPE html> <html> <head> <meta http-equiv=Content-Type content="text/html;charset=utf-8"> <title>发票打印</title> &l
Postgresql 连表更新字段语法 update qifeifei PostgreSQL
下面这段sql本来目的是想更新条件下的数据，可是这段sql却更新了整个表的数据。sql如下： UPDATE tops_visa.visa_order SET op_audit_abort_pass_date = now() FROM tops_visa.visa_order as t1 INNER JOIN tops_visa.visa_visitor as t2 ON t1.
将redis,memcache结合使用的方案? tcrct redis cache
公司架构上使用了阿里云的服务，由于阿里的kvstore收费相当高，打算自建，自建后就需要自己维护，所以就有了一个想法，针对kvstore(redis)及ocs(memcache)的特点，想自己开发一个cache层，将需要用到list，set，map等redis方法的继续使用redis来完成，将整条记录放在memcache下，即findbyid，save等时就memcache，其它就对应使用redi
开发中遇到的诡异的bug wudixiaotie bug
今天我们服务器组遇到个问题：我们的服务是从Kafka里面取出数据，然后把offset存储到ssdb中，每个topic和partition都对应ssdb中不同的key，服务启动之后，每次kafka数据更新我们这边收到消息，然后存储之后就发现ssdb的值偶尔是-2,这就奇怪了，最开始我们是在代码中打印存储的日志，发现没什么问题，后来去查看ssdb的日志，才发现里面每次set的时候都会对同一个key