conditional GAN
就是点乘 , shape要一样
[1,2,3] .* [2,2,2]
ans = [2,4,6]
φt 指的是 text embedding (使用 pre-train model)
μ(φt ) : mean ,指平均值
Σ(φt): diagonal covariance matrix
I0 : real image
t : the text description
pdata : true data distribution.
z is a noise vector randomly sampled from a given distribution pz (e.g., Gaussian distribution used in this paper).
G0
φt 喂进 全连接层产生 μ0 σ0 (σ0 are the values in the diagonal of Σ0)
c0 = μ0 + σ0 ⊙ ε (where ⊙ is 点乘, ε ∼ N (0, I )).
c0和 Nz dimensional noise vector 连接在一起concat
D0
真图片,假图片都喂进 down-sampling
再把 text embedding 通过全连接 变成 Md × Md × Nd ( 4 x 4 x 128)
concat起来,使用1x1 convolution 加只有一个节点的全连接层得到 [ 0 or 1 ]
G1
如图,需要 conditional augmentation 和 stage1 结果 ,
得到 c , 和stageI的结果down-sampling完 concat 在一起,
经过 res , 再 upsampling
输出 256 x 256 图片
D1
input:
1.256x256 的 真图片
2.256x256 的 G1 图片
output:
[ 0 or 1]
Upsample
https://stackoverflow.com/questions/29587179/load-pickle-filecomes-from-python3-in-python2
python3:
import pandas as pd
import pickle
df = pd.read_pickle('text2ImgData.pkl')
with open("pkle.txt","wb")as f:
f.write(pickle.dumps(df, 2))
python2:
import cPickle
with open("../dataset/pkle.txt","rb") as f:
a=cPickle.loads(f.read())
print(a)
np_array=df.values()
l = [['1', ' 1', ' 3'], ['2', ' 3', ' 5'], ['3'], ['4', ' 5'], ['5', ' 1'], ['6', ' 6'], ['7']]
result = [map(int,i) for i in l]
[[1, 1, 3], [2, 3, 5], [3], [4, 5], [5, 1], [6, 6], [7]]
======================================================
numpy:
np.array(i).astype(int)
pip install tensorboard-pytorch tensorboardX
RuntimeError: expected Double tensor (got Float tensor)
A fix would be to call .double() on your model
(or .float() on the input)
网友真的牛逼
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace name.ipynb
解决方法
pip install --upgrade pip
I chose stackGAN as my text-to-image model this time.Because this model’s result is better than most of the others.
This tensorflow implementation is my first reference architecture.But I gived up caused the code is so hard to understand and the PrettyTensor module annoyed me so much. So i take this pytorch implementation as my final reference.
Since the stackGAN is separated to 2 part , the first output image size is 64 x 64 and second output is 256 x 256 image. So this is the result when train after 100 epoch.
64 x 64 (batch_size = 40)
I used same noised,so the result is like this:
ID = 3296 [list([‘9’, ‘2’, ‘17’, ‘9’, ‘521’, ‘1’, ‘6’, ‘11’, ‘13’, ‘18’, ‘3’, ‘626’, ‘89’, ‘8’, ‘21’, ‘101’, ‘5427’, ‘5427’, ‘5427’, ‘5427’])]
ID = 3323 [list([‘4’, ‘1’, ‘15’, ‘22’, ‘3’, ‘44’, ‘13’, ‘18’, ‘7’, ‘2’, ‘10’, ‘6’, ‘141’, ‘3’, ‘113’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’])]
ID = 5187 [list([‘9’, ‘1’, ‘5’, ‘45’, ‘11’, ‘2’, ‘119’, ‘9’, ‘20’, ‘19’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’])]
ID = 8101 [list([‘4’, ‘1’, ‘15’, ‘12’, ‘3’, ‘11’, ‘13’, ‘18’, ‘7’, ‘2’, ‘10’, ‘6’, ‘40’, ‘3’, ‘26’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’])]
ID = 5682 [list([‘9’, ‘1’, ‘15’, ‘20’, ‘13’, ‘18’, ‘7’, ‘8’, ‘25’, ‘33’, ‘7’, ‘53’, ‘27’, ‘61’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’, ‘5427’])]
this pytorch code is not for oxford-flower dataset but for coco dataset. So I changed the input shape to [data_length,sentences_num,word_embedding] and set z dimention & condition dimention to 156.The word embedding content taked from TA.
setting D_LearningRate to 0.0002 and G_LearningRate to 0.0002 as the author do .
I haven’t used pre-train model,cause the model is for coco dataset. So I train 256x256 model after 64x64 model .