Pytorch-day08-模型进阶训练技巧-checkpoint

PyTorch 模型进阶训练技巧

  • 自定义损失函数
  • 动态调整学习率

典型案例:loss上下震荡
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BndMyRX0-1692613806232)(attachment:image-2.png)]

1、自定义损失函数

  • 1、PyTorch已经提供了很多常用的损失函数,但是有些非通用的损失函数并未提供,比如:DiceLoss、HuberLoss…等
  • 2、模型如果出现loss震荡,在经过调整数据集或超参后,现象依然存在,非通用损失函数或自定义损失函数针对特定模型会有更好的效果

比如:DiceLoss是医学影像分割常用的损失函数,定义如下:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Fsl0SyZ6-1692613806233)(attachment:image-2.png)]

  • Dice系数, 是一种集合相似度度量函数,通常用于计算两个样本的相似度(值范围为 [0, 1]):
  • ∣X∩Y∣表示X和Y之间的交集,∣ X ∣ 和∣ Y ∣ 分别表示X和Y的元素个数,其中,分子中的系数 2,是因为分母存在重复计算 X 和 Y 之间的共同元素的原因.
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch.optim.lr_scheduler import LambdaLR
from torch.optim.lr_scheduler import StepLR
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import transforms
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np
#DiceLoss 实现 Vnet 医学影像分割模型的损失函数
class DiceLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(DiceLoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()                  
        dice_loss = 1 - (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)

        return dice_loss
#自定义实现多分类损失函数 处理多分类
# cross_entropy + L2正则化
class MyLoss(torch.nn.Module):
    def __init__(self, weight_decay=0.01):
        super(MyLoss, self).__init__()
        self.weight_decay = weight_decay

    def forward(self, inputs, targets):
        ce_loss = F.cross_entropy(inputs, targets)
        l2_loss = torch.tensor(0., requires_grad=True).to(inputs.device)
        for name, param in self.named_parameters():
            if 'weight' in name:
                l2_loss += torch.norm(param)
        loss = ce_loss + self.weight_decay * l2_loss
        return loss

注:

  • 在自定义损失函数时,涉及到数学运算时,我们最好全程使用PyTorch提供的张量计算接口
  • 利用Pytorch张量自带的求导机制
#超参数定义
# 批次的大小
batch_size = 16 #可选32、64、128
# 优化器的学习率
lr = 1e-4
#运行epoch
max_epochs = 2
# 方案二:使用“device”,后续对要使用GPU的变量用.to(device)即可
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") # 指明调用的GPU为1号
# 数据读取
#cifar10数据集为例给出构建Dataset类的方式
from torchvision import datasets

#“data_transform”可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义
data_transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
                   ])


train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform)
test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform)

#构建好Dataset后,就可以使用DataLoader来按批次读入数据了
train_loader = torch.utils.data.DataLoader(train_cifar_dataset, 
                                           batch_size=batch_size, num_workers=4, 
                                           shuffle=True, drop_last=True)

test_loader = torch.utils.data.DataLoader(test_cifar_dataset, 
                                         batch_size=batch_size, num_workers=4, 
                                         shuffle=False)


# restnet50 pretrained
Resnet50 = torchvision.models.resnet50(pretrained=True)
Resnet50.fc.out_features=10
print(Resnet50)
D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer2): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer3): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=10, bias=True)
)
#训练&验证

# 定义损失函数和优化器
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 损失函数:自定义损失函数
criterion = MyLoss()
# 优化器
optimizer = torch.optim.Adam(Resnet50.parameters(), lr=lr)
epoch = max_epochs
Resnet50 = Resnet50.to(device)
total_step = len(train_loader)
train_all_loss = []
test_all_loss = []

for i in range(epoch):
    Resnet50.train()
    train_total_loss = 0
    train_total_num = 0
    train_total_correct = 0

    for iter, (images,labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        train_total_correct += (outputs.argmax(1) == labels).sum().item()
        
        #backword
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_total_num += labels.shape[0]
        train_total_loss += loss.item()
        print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0]))
    
    Resnet50.eval()
    test_total_loss = 0
    test_total_correct = 0
    test_total_num = 0
    for iter,(images,labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        test_total_correct += (outputs.argmax(1) == labels).sum().item()
        test_total_loss += loss.item()
        test_total_num += labels.shape[0]
    print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format(
        i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100
    
    ))
    train_all_loss.append(np.round(train_total_loss / train_total_num,4))
    test_all_loss.append(np.round(test_total_loss / test_total_num,4))

Epoch [1/10], Iter [1/3125], train_loss:0.710159
Epoch [1/10], Iter [2/3125], train_loss:0.761919
Epoch [1/10], Iter [3/3125], train_loss:0.748266
Epoch [1/10], Iter [4/3125], train_loss:0.777146
Epoch [1/10], Iter [5/3125], train_loss:0.699766
Epoch [1/10], Iter [6/3125], train_loss:0.741773
Epoch [1/10], Iter [7/3125], train_loss:0.687201
Epoch [1/10], Iter [8/3125], train_loss:0.618017
Epoch [1/10], Iter [9/3125], train_loss:0.653016
Epoch [1/10], Iter [10/3125], train_loss:0.690120
Epoch [1/10], Iter [11/3125], train_loss:0.648009
Epoch [1/10], Iter [12/3125], train_loss:0.694650
Epoch [1/10], Iter [13/3125], train_loss:0.502452
Epoch [1/10], Iter [14/3125], train_loss:0.538519
Epoch [1/10], Iter [15/3125], train_loss:0.596250
Epoch [1/10], Iter [16/3125], train_loss:0.607648
Epoch [1/10], Iter [17/3125], train_loss:0.574751
Epoch [1/10], Iter [18/3125], train_loss:0.584658
Epoch [1/10], Iter [19/3125], train_loss:0.428719
Epoch [1/10], Iter [20/3125], train_loss:0.530868
Epoch [1/10], Iter [21/3125], train_loss:0.496522
Epoch [1/10], Iter [22/3125], train_loss:0.463315
Epoch [1/10], Iter [23/3125], train_loss:0.453258
Epoch [1/10], Iter [24/3125], train_loss:0.409726
Epoch [1/10], Iter [25/3125], train_loss:0.422388
Epoch [1/10], Iter [26/3125], train_loss:0.414946
Epoch [1/10], Iter [27/3125], train_loss:0.512142
Epoch [1/10], Iter [28/3125], train_loss:0.400936
Epoch [1/10], Iter [29/3125], train_loss:0.405139
Epoch [1/10], Iter [30/3125], train_loss:0.346599
Epoch [1/10], Iter [31/3125], train_loss:0.388829
Epoch [1/10], Iter [32/3125], train_loss:0.389818
Epoch [1/10], Iter [33/3125], train_loss:0.420276
Epoch [1/10], Iter [34/3125], train_loss:0.376930
Epoch [1/10], Iter [35/3125], train_loss:0.385421
Epoch [1/10], Iter [36/3125], train_loss:0.308666
Epoch [1/10], Iter [37/3125], train_loss:0.287350
Epoch [1/10], Iter [38/3125], train_loss:0.235770
Epoch [1/10], Iter [39/3125], train_loss:0.238073
Epoch [1/10], Iter [40/3125], train_loss:0.255732
Epoch [1/10], Iter [41/3125], train_loss:0.351971
Epoch [1/10], Iter [42/3125], train_loss:0.255061
Epoch [1/10], Iter [43/3125], train_loss:0.372930
Epoch [1/10], Iter [44/3125], train_loss:0.294059
Epoch [1/10], Iter [45/3125], train_loss:0.291519
Epoch [1/10], Iter [46/3125], train_loss:0.293720
Epoch [1/10], Iter [47/3125], train_loss:0.313904
Epoch [1/10], Iter [48/3125], train_loss:0.468409
Epoch [1/10], Iter [49/3125], train_loss:0.289942
Epoch [1/10], Iter [50/3125], train_loss:0.314422
Epoch [1/10], Iter [51/3125], train_loss:0.193365
Epoch [1/10], Iter [52/3125], train_loss:0.280942
Epoch [1/10], Iter [53/3125], train_loss:0.194293
Epoch [1/10], Iter [54/3125], train_loss:0.271868
Epoch [1/10], Iter [55/3125], train_loss:0.244220
Epoch [1/10], Iter [56/3125], train_loss:0.203591
Epoch [1/10], Iter [57/3125], train_loss:0.253909
Epoch [1/10], Iter [58/3125], train_loss:0.189856
Epoch [1/10], Iter [59/3125], train_loss:0.251850
Epoch [1/10], Iter [60/3125], train_loss:0.231074
Epoch [1/10], Iter [61/3125], train_loss:0.226731
Epoch [1/10], Iter [62/3125], train_loss:0.175667
Epoch [1/10], Iter [63/3125], train_loss:0.184940
Epoch [1/10], Iter [64/3125], train_loss:0.210438
Epoch [1/10], Iter [65/3125], train_loss:0.190574
Epoch [1/10], Iter [66/3125], train_loss:0.238683
Epoch [1/10], Iter [67/3125], train_loss:0.195508
Epoch [1/10], Iter [68/3125], train_loss:0.152640
Epoch [1/10], Iter [69/3125], train_loss:0.240555
Epoch [1/10], Iter [70/3125], train_loss:0.134351
Epoch [1/10], Iter [71/3125], train_loss:0.183020
Epoch [1/10], Iter [72/3125], train_loss:0.211488
Epoch [1/10], Iter [73/3125], train_loss:0.140310
Epoch [1/10], Iter [74/3125], train_loss:0.162346
Epoch [1/10], Iter [75/3125], train_loss:0.175559
Epoch [1/10], Iter [76/3125], train_loss:0.165264
Epoch [1/10], Iter [77/3125], train_loss:0.232803
Epoch [1/10], Iter [78/3125], train_loss:0.175323
Epoch [1/10], Iter [79/3125], train_loss:0.215453
Epoch [1/10], Iter [80/3125], train_loss:0.229922
Epoch [1/10], Iter [81/3125], train_loss:0.166971
Epoch [1/10], Iter [82/3125], train_loss:0.252459
Epoch [1/10], Iter [83/3125], train_loss:0.175405
Epoch [1/10], Iter [84/3125], train_loss:0.174851
Epoch [1/10], Iter [85/3125], train_loss:0.219277
Epoch [1/10], Iter [86/3125], train_loss:0.200698
Epoch [1/10], Iter [87/3125], train_loss:0.164529
Epoch [1/10], Iter [88/3125], train_loss:0.223835
Epoch [1/10], Iter [89/3125], train_loss:0.132322
Epoch [1/10], Iter [90/3125], train_loss:0.185210
Epoch [1/10], Iter [91/3125], train_loss:0.125042
Epoch [1/10], Iter [92/3125], train_loss:0.127481
Epoch [1/10], Iter [93/3125], train_loss:0.213097
Epoch [1/10], Iter [94/3125], train_loss:0.191506
Epoch [1/10], Iter [95/3125], train_loss:0.169901
Epoch [1/10], Iter [96/3125], train_loss:0.177843
Epoch [1/10], Iter [97/3125], train_loss:0.192217
Epoch [1/10], Iter [98/3125], train_loss:0.186991
Epoch [1/10], Iter [99/3125], train_loss:0.127605
Epoch [1/10], Iter [100/3125], train_loss:0.130038
Epoch [1/10], Iter [101/3125], train_loss:0.139159
Epoch [1/10], Iter [102/3125], train_loss:0.152760
Epoch [1/10], Iter [103/3125], train_loss:0.152227
Epoch [1/10], Iter [104/3125], train_loss:0.128511
Epoch [1/10], Iter [105/3125], train_loss:0.126772
Epoch [1/10], Iter [106/3125], train_loss:0.220105
Epoch [1/10], Iter [107/3125], train_loss:0.163889
Epoch [1/10], Iter [108/3125], train_loss:0.205263
Epoch [1/10], Iter [109/3125], train_loss:0.181927
Epoch [1/10], Iter [110/3125], train_loss:0.126500
Epoch [1/10], Iter [111/3125], train_loss:0.154556
Epoch [1/10], Iter [112/3125], train_loss:0.169978
Epoch [1/10], Iter [113/3125], train_loss:0.166387
Epoch [1/10], Iter [114/3125], train_loss:0.160409
Epoch [1/10], Iter [115/3125], train_loss:0.123102
Epoch [1/10], Iter [116/3125], train_loss:0.133461
Epoch [1/10], Iter [117/3125], train_loss:0.136813
Epoch [1/10], Iter [118/3125], train_loss:0.100353
Epoch [1/10], Iter [119/3125], train_loss:0.126170
Epoch [1/10], Iter [120/3125], train_loss:0.141422
Epoch [1/10], Iter [121/3125], train_loss:0.157280
Epoch [1/10], Iter [122/3125], train_loss:0.113595
Epoch [1/10], Iter [123/3125], train_loss:0.159074
Epoch [1/10], Iter [124/3125], train_loss:0.108684
Epoch [1/10], Iter [125/3125], train_loss:0.175729
Epoch [1/10], Iter [126/3125], train_loss:0.071910
Epoch [1/10], Iter [127/3125], train_loss:0.124298
Epoch [1/10], Iter [128/3125], train_loss:0.115980
Epoch [1/10], Iter [129/3125], train_loss:0.132223
Epoch [1/10], Iter [130/3125], train_loss:0.114184
Epoch [1/10], Iter [131/3125], train_loss:0.123914
Epoch [1/10], Iter [132/3125], train_loss:0.150845
Epoch [1/10], Iter [133/3125], train_loss:0.208639
Epoch [1/10], Iter [134/3125], train_loss:0.106705
Epoch [1/10], Iter [135/3125], train_loss:0.177262
Epoch [1/10], Iter [136/3125], train_loss:0.157350
Epoch [1/10], Iter [137/3125], train_loss:0.149479
Epoch [1/10], Iter [138/3125], train_loss:0.096941
Epoch [1/10], Iter [139/3125], train_loss:0.174548
Epoch [1/10], Iter [140/3125], train_loss:0.156214
Epoch [1/10], Iter [141/3125], train_loss:0.135187
Epoch [1/10], Iter [142/3125], train_loss:0.136901
Epoch [1/10], Iter [143/3125], train_loss:0.122161
Epoch [1/10], Iter [144/3125], train_loss:0.139143
Epoch [1/10], Iter [145/3125], train_loss:0.119795
Epoch [1/10], Iter [146/3125], train_loss:0.122523
Epoch [1/10], Iter [147/3125], train_loss:0.136952
Epoch [1/10], Iter [148/3125], train_loss:0.175852
Epoch [1/10], Iter [149/3125], train_loss:0.107031
Epoch [1/10], Iter [150/3125], train_loss:0.175130
Epoch [1/10], Iter [151/3125], train_loss:0.159306
Epoch [1/10], Iter [152/3125], train_loss:0.149552
Epoch [1/10], Iter [153/3125], train_loss:0.166173
Epoch [1/10], Iter [154/3125], train_loss:0.165044
Epoch [1/10], Iter [155/3125], train_loss:0.116875
Epoch [1/10], Iter [156/3125], train_loss:0.104037
Epoch [1/10], Iter [157/3125], train_loss:0.129057
Epoch [1/10], Iter [158/3125], train_loss:0.141920
Epoch [1/10], Iter [159/3125], train_loss:0.102720
Epoch [1/10], Iter [160/3125], train_loss:0.097012
Epoch [1/10], Iter [161/3125], train_loss:0.157148
Epoch [1/10], Iter [162/3125], train_loss:0.117710
Epoch [1/10], Iter [163/3125], train_loss:0.112908
Epoch [1/10], Iter [164/3125], train_loss:0.096563
Epoch [1/10], Iter [165/3125], train_loss:0.076501
Epoch [1/10], Iter [166/3125], train_loss:0.147476
Epoch [1/10], Iter [167/3125], train_loss:0.177934
Epoch [1/10], Iter [168/3125], train_loss:0.121549
Epoch [1/10], Iter [169/3125], train_loss:0.124102
Epoch [1/10], Iter [170/3125], train_loss:0.097225
Epoch [1/10], Iter [171/3125], train_loss:0.104199
Epoch [1/10], Iter [172/3125], train_loss:0.150368
Epoch [1/10], Iter [173/3125], train_loss:0.098011
Epoch [1/10], Iter [174/3125], train_loss:0.131318
Epoch [1/10], Iter [175/3125], train_loss:0.120925
Epoch [1/10], Iter [176/3125], train_loss:0.120460
Epoch [1/10], Iter [177/3125], train_loss:0.106729
Epoch [1/10], Iter [178/3125], train_loss:0.161727
Epoch [1/10], Iter [179/3125], train_loss:0.169705
Epoch [1/10], Iter [180/3125], train_loss:0.142939
Epoch [1/10], Iter [181/3125], train_loss:0.120374
Epoch [1/10], Iter [182/3125], train_loss:0.120579
Epoch [1/10], Iter [183/3125], train_loss:0.093452
Epoch [1/10], Iter [184/3125], train_loss:0.102697
Epoch [1/10], Iter [185/3125], train_loss:0.129010
Epoch [1/10], Iter [186/3125], train_loss:0.127772
Epoch [1/10], Iter [187/3125], train_loss:0.121482
Epoch [1/10], Iter [188/3125], train_loss:0.153874
Epoch [1/10], Iter [189/3125], train_loss:0.122253
Epoch [1/10], Iter [190/3125], train_loss:0.135232
Epoch [1/10], Iter [191/3125], train_loss:0.095962
Epoch [1/10], Iter [192/3125], train_loss:0.159813
Epoch [1/10], Iter [193/3125], train_loss:0.110215
Epoch [1/10], Iter [194/3125], train_loss:0.103142
Epoch [1/10], Iter [195/3125], train_loss:0.106792
Epoch [1/10], Iter [196/3125], train_loss:0.108262
Epoch [1/10], Iter [197/3125], train_loss:0.109841
Epoch [1/10], Iter [198/3125], train_loss:0.141134
Epoch [1/10], Iter [199/3125], train_loss:0.104478
Epoch [1/10], Iter [200/3125], train_loss:0.119154
Epoch [1/10], Iter [201/3125], train_loss:0.143389
Epoch [1/10], Iter [202/3125], train_loss:0.106533
Epoch [1/10], Iter [203/3125], train_loss:0.104834
Epoch [1/10], Iter [204/3125], train_loss:0.096285
Epoch [1/10], Iter [205/3125], train_loss:0.192590
Epoch [1/10], Iter [206/3125], train_loss:0.131787
Epoch [1/10], Iter [207/3125], train_loss:0.093841
Epoch [1/10], Iter [208/3125], train_loss:0.093261
Epoch [1/10], Iter [209/3125], train_loss:0.090215
Epoch [1/10], Iter [210/3125], train_loss:0.062551
Epoch [1/10], Iter [211/3125], train_loss:0.103201
Epoch [1/10], Iter [212/3125], train_loss:0.101281
Epoch [1/10], Iter [213/3125], train_loss:0.112832
Epoch [1/10], Iter [214/3125], train_loss:0.109726
Epoch [1/10], Iter [215/3125], train_loss:0.193847
Epoch [1/10], Iter [216/3125], train_loss:0.114712
Epoch [1/10], Iter [217/3125], train_loss:0.096408
Epoch [1/10], Iter [218/3125], train_loss:0.104277
Epoch [1/10], Iter [219/3125], train_loss:0.101230
Epoch [1/10], Iter [220/3125], train_loss:0.088779
Epoch [1/10], Iter [221/3125], train_loss:0.122967
Epoch [1/10], Iter [222/3125], train_loss:0.132155
Epoch [1/10], Iter [223/3125], train_loss:0.106906
Epoch [1/10], Iter [224/3125], train_loss:0.101865
Epoch [1/10], Iter [225/3125], train_loss:0.094080
Epoch [1/10], Iter [226/3125], train_loss:0.117470
Epoch [1/10], Iter [227/3125], train_loss:0.107198
Epoch [1/10], Iter [228/3125], train_loss:0.113856
Epoch [1/10], Iter [229/3125], train_loss:0.113308
Epoch [1/10], Iter [230/3125], train_loss:0.136503
Epoch [1/10], Iter [231/3125], train_loss:0.096320
Epoch [1/10], Iter [232/3125], train_loss:0.131607
Epoch [1/10], Iter [233/3125], train_loss:0.140338
Epoch [1/10], Iter [234/3125], train_loss:0.125807
Epoch [1/10], Iter [235/3125], train_loss:0.109107
Epoch [1/10], Iter [236/3125], train_loss:0.104653
Epoch [1/10], Iter [237/3125], train_loss:0.112867
Epoch [1/10], Iter [238/3125], train_loss:0.096239
Epoch [1/10], Iter [239/3125], train_loss:0.113070
Epoch [1/10], Iter [240/3125], train_loss:0.138504
Epoch [1/10], Iter [241/3125], train_loss:0.116264
Epoch [1/10], Iter [242/3125], train_loss:0.140497
Epoch [1/10], Iter [243/3125], train_loss:0.111269
Epoch [1/10], Iter [244/3125], train_loss:0.126607
Epoch [1/10], Iter [245/3125], train_loss:0.166210
Epoch [1/10], Iter [246/3125], train_loss:0.114601
Epoch [1/10], Iter [247/3125], train_loss:0.086945
Epoch [1/10], Iter [248/3125], train_loss:0.117582
Epoch [1/10], Iter [249/3125], train_loss:0.103387
Epoch [1/10], Iter [250/3125], train_loss:0.105529
Epoch [1/10], Iter [251/3125], train_loss:0.095726
Epoch [1/10], Iter [252/3125], train_loss:0.099371
Epoch [1/10], Iter [253/3125], train_loss:0.086019
Epoch [1/10], Iter [254/3125], train_loss:0.117785
Epoch [1/10], Iter [255/3125], train_loss:0.095674
Epoch [1/10], Iter [256/3125], train_loss:0.107202
Epoch [1/10], Iter [257/3125], train_loss:0.106855
Epoch [1/10], Iter [258/3125], train_loss:0.089076
Epoch [1/10], Iter [259/3125], train_loss:0.085481
Epoch [1/10], Iter [260/3125], train_loss:0.105372
Epoch [1/10], Iter [261/3125], train_loss:0.135841
Epoch [1/10], Iter [262/3125], train_loss:0.091050
Epoch [1/10], Iter [263/3125], train_loss:0.104396
Epoch [1/10], Iter [264/3125], train_loss:0.085995
Epoch [1/10], Iter [265/3125], train_loss:0.082015
Epoch [1/10], Iter [266/3125], train_loss:0.101983
Epoch [1/10], Iter [267/3125], train_loss:0.082330
Epoch [1/10], Iter [268/3125], train_loss:0.096020
Epoch [1/10], Iter [269/3125], train_loss:0.107438
Epoch [1/10], Iter [270/3125], train_loss:0.108927
Epoch [1/10], Iter [271/3125], train_loss:0.090110
Epoch [1/10], Iter [272/3125], train_loss:0.082612
Epoch [1/10], Iter [273/3125], train_loss:0.124343
Epoch [1/10], Iter [274/3125], train_loss:0.134607
Epoch [1/10], Iter [275/3125], train_loss:0.103530
Epoch [1/10], Iter [276/3125], train_loss:0.088286
Epoch [1/10], Iter [277/3125], train_loss:0.120471
Epoch [1/10], Iter [278/3125], train_loss:0.090534
Epoch [1/10], Iter [279/3125], train_loss:0.098560
Epoch [1/10], Iter [280/3125], train_loss:0.093890
Epoch [1/10], Iter [281/3125], train_loss:0.114845
Epoch [1/10], Iter [282/3125], train_loss:0.155583
Epoch [1/10], Iter [283/3125], train_loss:0.084580
Epoch [1/10], Iter [284/3125], train_loss:0.078266
Epoch [1/10], Iter [285/3125], train_loss:0.089209
Epoch [1/10], Iter [286/3125], train_loss:0.129949
Epoch [1/10], Iter [287/3125], train_loss:0.068909
Epoch [1/10], Iter [288/3125], train_loss:0.120867
Epoch [1/10], Iter [289/3125], train_loss:0.107639
Epoch [1/10], Iter [290/3125], train_loss:0.099353
Epoch [1/10], Iter [291/3125], train_loss:0.132016
Epoch [1/10], Iter [292/3125], train_loss:0.090960
Epoch [1/10], Iter [293/3125], train_loss:0.101058
Epoch [1/10], Iter [294/3125], train_loss:0.096238
Epoch [1/10], Iter [295/3125], train_loss:0.084716
Epoch [1/10], Iter [296/3125], train_loss:0.079769
Epoch [1/10], Iter [297/3125], train_loss:0.124798
Epoch [1/10], Iter [298/3125], train_loss:0.096835
Epoch [1/10], Iter [299/3125], train_loss:0.089952
Epoch [1/10], Iter [300/3125], train_loss:0.095460
Epoch [1/10], Iter [301/3125], train_loss:0.086470
Epoch [1/10], Iter [302/3125], train_loss:0.105848
Epoch [1/10], Iter [303/3125], train_loss:0.130099
Epoch [1/10], Iter [304/3125], train_loss:0.131335
Epoch [1/10], Iter [305/3125], train_loss:0.103911
Epoch [1/10], Iter [306/3125], train_loss:0.092839
Epoch [1/10], Iter [307/3125], train_loss:0.128423
Epoch [1/10], Iter [308/3125], train_loss:0.101717
Epoch [1/10], Iter [309/3125], train_loss:0.102042
Epoch [1/10], Iter [310/3125], train_loss:0.108195
Epoch [1/10], Iter [311/3125], train_loss:0.116109
Epoch [1/10], Iter [312/3125], train_loss:0.107782
Epoch [1/10], Iter [313/3125], train_loss:0.102813
Epoch [1/10], Iter [314/3125], train_loss:0.095960
Epoch [1/10], Iter [315/3125], train_loss:0.086566
Epoch [1/10], Iter [316/3125], train_loss:0.081492
Epoch [1/10], Iter [317/3125], train_loss:0.077582
Epoch [1/10], Iter [318/3125], train_loss:0.053461
Epoch [1/10], Iter [319/3125], train_loss:0.084671
Epoch [1/10], Iter [320/3125], train_loss:0.088476
Epoch [1/10], Iter [321/3125], train_loss:0.105547
Epoch [1/10], Iter [322/3125], train_loss:0.079457
Epoch [1/10], Iter [323/3125], train_loss:0.080500
Epoch [1/10], Iter [324/3125], train_loss:0.116692
Epoch [1/10], Iter [325/3125], train_loss:0.095060
Epoch [1/10], Iter [326/3125], train_loss:0.090416
Epoch [1/10], Iter [327/3125], train_loss:0.068069
Epoch [1/10], Iter [328/3125], train_loss:0.110763
Epoch [1/10], Iter [329/3125], train_loss:0.060889
Epoch [1/10], Iter [330/3125], train_loss:0.110807
Epoch [1/10], Iter [331/3125], train_loss:0.122002
Epoch [1/10], Iter [332/3125], train_loss:0.115815
Epoch [1/10], Iter [333/3125], train_loss:0.067004
Epoch [1/10], Iter [334/3125], train_loss:0.063815
Epoch [1/10], Iter [335/3125], train_loss:0.120017
Epoch [1/10], Iter [336/3125], train_loss:0.104086
Epoch [1/10], Iter [337/3125], train_loss:0.091577
Epoch [1/10], Iter [338/3125], train_loss:0.084077
Epoch [1/10], Iter [339/3125], train_loss:0.113410
Epoch [1/10], Iter [340/3125], train_loss:0.061866
Epoch [1/10], Iter [341/3125], train_loss:0.101881
Epoch [1/10], Iter [342/3125], train_loss:0.107144
Epoch [1/10], Iter [343/3125], train_loss:0.142906
Epoch [1/10], Iter [344/3125], train_loss:0.072013
Epoch [1/10], Iter [345/3125], train_loss:0.088949
Epoch [1/10], Iter [346/3125], train_loss:0.067578
Epoch [1/10], Iter [347/3125], train_loss:0.086871
Epoch [1/10], Iter [348/3125], train_loss:0.068842
Epoch [1/10], Iter [349/3125], train_loss:0.086257
Epoch [1/10], Iter [350/3125], train_loss:0.112828
Epoch [1/10], Iter [351/3125], train_loss:0.090362
Epoch [1/10], Iter [352/3125], train_loss:0.092230
Epoch [1/10], Iter [353/3125], train_loss:0.058990
Epoch [1/10], Iter [354/3125], train_loss:0.114826
Epoch [1/10], Iter [355/3125], train_loss:0.076303
Epoch [1/10], Iter [356/3125], train_loss:0.115605
Epoch [1/10], Iter [357/3125], train_loss:0.083856
Epoch [1/10], Iter [358/3125], train_loss:0.114196
Epoch [1/10], Iter [359/3125], train_loss:0.154424
Epoch [1/10], Iter [360/3125], train_loss:0.103248
Epoch [1/10], Iter [361/3125], train_loss:0.093536
Epoch [1/10], Iter [362/3125], train_loss:0.064217
Epoch [1/10], Iter [363/3125], train_loss:0.103777
Epoch [1/10], Iter [364/3125], train_loss:0.049145
Epoch [1/10], Iter [365/3125], train_loss:0.085676
Epoch [1/10], Iter [366/3125], train_loss:0.095860
Epoch [1/10], Iter [367/3125], train_loss:0.045282
Epoch [1/10], Iter [368/3125], train_loss:0.102015
Epoch [1/10], Iter [369/3125], train_loss:0.073394
Epoch [1/10], Iter [370/3125], train_loss:0.080284
Epoch [1/10], Iter [371/3125], train_loss:0.094347
Epoch [1/10], Iter [372/3125], train_loss:0.085500
Epoch [1/10], Iter [373/3125], train_loss:0.119371
Epoch [1/10], Iter [374/3125], train_loss:0.095046
Epoch [1/10], Iter [375/3125], train_loss:0.118757
Epoch [1/10], Iter [376/3125], train_loss:0.107976
Epoch [1/10], Iter [377/3125], train_loss:0.090448
Epoch [1/10], Iter [378/3125], train_loss:0.085898
Epoch [1/10], Iter [379/3125], train_loss:0.110092
Epoch [1/10], Iter [380/3125], train_loss:0.093738
Epoch [1/10], Iter [381/3125], train_loss:0.094126
Epoch [1/10], Iter [382/3125], train_loss:0.087205
Epoch [1/10], Iter [383/3125], train_loss:0.083657
Epoch [1/10], Iter [384/3125], train_loss:0.080641
Epoch [1/10], Iter [385/3125], train_loss:0.101648
Epoch [1/10], Iter [386/3125], train_loss:0.102539
Epoch [1/10], Iter [387/3125], train_loss:0.090064
Epoch [1/10], Iter [388/3125], train_loss:0.140402
Epoch [1/10], Iter [389/3125], train_loss:0.100177
Epoch [1/10], Iter [390/3125], train_loss:0.106683
Epoch [1/10], Iter [391/3125], train_loss:0.072911
Epoch [1/10], Iter [392/3125], train_loss:0.094680
Epoch [1/10], Iter [393/3125], train_loss:0.097260
Epoch [1/10], Iter [394/3125], train_loss:0.104942
Epoch [1/10], Iter [395/3125], train_loss:0.133387
Epoch [1/10], Iter [396/3125], train_loss:0.131581
Epoch [1/10], Iter [397/3125], train_loss:0.107176
Epoch [1/10], Iter [398/3125], train_loss:0.076420
Epoch [1/10], Iter [399/3125], train_loss:0.071057
Epoch [1/10], Iter [400/3125], train_loss:0.102585
Epoch [1/10], Iter [401/3125], train_loss:0.071347
Epoch [1/10], Iter [402/3125], train_loss:0.104381
Epoch [1/10], Iter [403/3125], train_loss:0.111743
Epoch [1/10], Iter [404/3125], train_loss:0.081141
Epoch [1/10], Iter [405/3125], train_loss:0.071977
Epoch [1/10], Iter [406/3125], train_loss:0.095490
Epoch [1/10], Iter [407/3125], train_loss:0.085300
Epoch [1/10], Iter [408/3125], train_loss:0.068072
Epoch [1/10], Iter [409/3125], train_loss:0.068445
Epoch [1/10], Iter [410/3125], train_loss:0.092671
Epoch [1/10], Iter [411/3125], train_loss:0.066765
Epoch [1/10], Iter [412/3125], train_loss:0.107009
Epoch [1/10], Iter [413/3125], train_loss:0.072693
Epoch [1/10], Iter [414/3125], train_loss:0.088150
Epoch [1/10], Iter [415/3125], train_loss:0.090847
Epoch [1/10], Iter [416/3125], train_loss:0.077029
Epoch [1/10], Iter [417/3125], train_loss:0.102404
Epoch [1/10], Iter [418/3125], train_loss:0.138703
Epoch [1/10], Iter [419/3125], train_loss:0.074720
Epoch [1/10], Iter [420/3125], train_loss:0.103256
Epoch [1/10], Iter [421/3125], train_loss:0.091416
Epoch [1/10], Iter [422/3125], train_loss:0.104568
Epoch [1/10], Iter [423/3125], train_loss:0.077688
Epoch [1/10], Iter [424/3125], train_loss:0.090047
Epoch [1/10], Iter [425/3125], train_loss:0.127545
Epoch [1/10], Iter [426/3125], train_loss:0.088344
Epoch [1/10], Iter [427/3125], train_loss:0.101759
Epoch [1/10], Iter [428/3125], train_loss:0.079185
Epoch [1/10], Iter [429/3125], train_loss:0.063097
Epoch [1/10], Iter [430/3125], train_loss:0.121180
Epoch [1/10], Iter [431/3125], train_loss:0.101340
Epoch [1/10], Iter [432/3125], train_loss:0.128714
Epoch [1/10], Iter [433/3125], train_loss:0.062577
Epoch [1/10], Iter [434/3125], train_loss:0.091420
Epoch [1/10], Iter [435/3125], train_loss:0.090504
Epoch [1/10], Iter [436/3125], train_loss:0.119372
Epoch [1/10], Iter [437/3125], train_loss:0.066290
Epoch [1/10], Iter [438/3125], train_loss:0.119662
Epoch [1/10], Iter [439/3125], train_loss:0.110264
Epoch [1/10], Iter [440/3125], train_loss:0.079450
Epoch [1/10], Iter [441/3125], train_loss:0.111833
Epoch [1/10], Iter [442/3125], train_loss:0.094980
Epoch [1/10], Iter [443/3125], train_loss:0.111621
Epoch [1/10], Iter [444/3125], train_loss:0.082750
Epoch [1/10], Iter [445/3125], train_loss:0.104502
Epoch [1/10], Iter [446/3125], train_loss:0.114041
Epoch [1/10], Iter [447/3125], train_loss:0.071238
Epoch [1/10], Iter [448/3125], train_loss:0.088294
Epoch [1/10], Iter [449/3125], train_loss:0.069142
Epoch [1/10], Iter [450/3125], train_loss:0.129054
Epoch [1/10], Iter [451/3125], train_loss:0.091864
Epoch [1/10], Iter [452/3125], train_loss:0.080189
Epoch [1/10], Iter [453/3125], train_loss:0.060313
Epoch [1/10], Iter [454/3125], train_loss:0.129373
Epoch [1/10], Iter [455/3125], train_loss:0.073149
Epoch [1/10], Iter [456/3125], train_loss:0.073206
Epoch [1/10], Iter [457/3125], train_loss:0.088790
Epoch [1/10], Iter [458/3125], train_loss:0.066144
Epoch [1/10], Iter [459/3125], train_loss:0.103504
Epoch [1/10], Iter [460/3125], train_loss:0.060709
Epoch [1/10], Iter [461/3125], train_loss:0.108793
Epoch [1/10], Iter [462/3125], train_loss:0.093702
Epoch [1/10], Iter [463/3125], train_loss:0.116326
Epoch [1/10], Iter [464/3125], train_loss:0.104743
Epoch [1/10], Iter [465/3125], train_loss:0.082492
Epoch [1/10], Iter [466/3125], train_loss:0.092319
Epoch [1/10], Iter [467/3125], train_loss:0.065833
Epoch [1/10], Iter [468/3125], train_loss:0.051208
Epoch [1/10], Iter [469/3125], train_loss:0.093229
Epoch [1/10], Iter [470/3125], train_loss:0.095329
Epoch [1/10], Iter [471/3125], train_loss:0.099470
Epoch [1/10], Iter [472/3125], train_loss:0.072319
Epoch [1/10], Iter [473/3125], train_loss:0.062743
Epoch [1/10], Iter [474/3125], train_loss:0.108008
Epoch [1/10], Iter [475/3125], train_loss:0.046297
Epoch [1/10], Iter [476/3125], train_loss:0.077335
Epoch [1/10], Iter [477/3125], train_loss:0.088254
Epoch [1/10], Iter [478/3125], train_loss:0.101036
Epoch [1/10], Iter [479/3125], train_loss:0.083029
Epoch [1/10], Iter [480/3125], train_loss:0.097751
Epoch [1/10], Iter [481/3125], train_loss:0.096469
Epoch [1/10], Iter [482/3125], train_loss:0.087993
Epoch [1/10], Iter [483/3125], train_loss:0.099732
Epoch [1/10], Iter [484/3125], train_loss:0.073528
Epoch [1/10], Iter [485/3125], train_loss:0.101679
Epoch [1/10], Iter [486/3125], train_loss:0.100552
Epoch [1/10], Iter [487/3125], train_loss:0.087380
Epoch [1/10], Iter [488/3125], train_loss:0.121468
Epoch [1/10], Iter [489/3125], train_loss:0.097617
Epoch [1/10], Iter [490/3125], train_loss:0.104743
Epoch [1/10], Iter [491/3125], train_loss:0.078716
Epoch [1/10], Iter [492/3125], train_loss:0.098265
Epoch [1/10], Iter [493/3125], train_loss:0.082094
Epoch [1/10], Iter [494/3125], train_loss:0.087327
Epoch [1/10], Iter [495/3125], train_loss:0.069399
Epoch [1/10], Iter [496/3125], train_loss:0.066200
Epoch [1/10], Iter [497/3125], train_loss:0.068601
Epoch [1/10], Iter [498/3125], train_loss:0.126001
Epoch [1/10], Iter [499/3125], train_loss:0.085090
Epoch [1/10], Iter [500/3125], train_loss:0.109014
Epoch [1/10], Iter [501/3125], train_loss:0.106699
Epoch [1/10], Iter [502/3125], train_loss:0.082973
Epoch [1/10], Iter [503/3125], train_loss:0.095683
Epoch [1/10], Iter [504/3125], train_loss:0.113937
Epoch [1/10], Iter [505/3125], train_loss:0.032092
Epoch [1/10], Iter [506/3125], train_loss:0.071751
Epoch [1/10], Iter [507/3125], train_loss:0.082614
Epoch [1/10], Iter [508/3125], train_loss:0.076657
Epoch [1/10], Iter [509/3125], train_loss:0.078356
Epoch [1/10], Iter [510/3125], train_loss:0.109523
Epoch [1/10], Iter [511/3125], train_loss:0.108152
Epoch [1/10], Iter [512/3125], train_loss:0.092030
Epoch [1/10], Iter [513/3125], train_loss:0.115947
Epoch [1/10], Iter [514/3125], train_loss:0.108748
Epoch [1/10], Iter [515/3125], train_loss:0.091761
Epoch [1/10], Iter [516/3125], train_loss:0.073188
Epoch [1/10], Iter [517/3125], train_loss:0.120827
Epoch [1/10], Iter [518/3125], train_loss:0.067271
Epoch [1/10], Iter [519/3125], train_loss:0.050369
Epoch [1/10], Iter [520/3125], train_loss:0.070868
Epoch [1/10], Iter [521/3125], train_loss:0.113249
Epoch [1/10], Iter [522/3125], train_loss:0.090670
Epoch [1/10], Iter [523/3125], train_loss:0.104130
Epoch [1/10], Iter [524/3125], train_loss:0.095427
Epoch [1/10], Iter [525/3125], train_loss:0.141192
Epoch [1/10], Iter [526/3125], train_loss:0.076236
Epoch [1/10], Iter [527/3125], train_loss:0.117406
Epoch [1/10], Iter [528/3125], train_loss:0.114006
Epoch [1/10], Iter [529/3125], train_loss:0.066016
Epoch [1/10], Iter [530/3125], train_loss:0.093731
Epoch [1/10], Iter [531/3125], train_loss:0.072306
Epoch [1/10], Iter [532/3125], train_loss:0.074725
Epoch [1/10], Iter [533/3125], train_loss:0.090788
Epoch [1/10], Iter [534/3125], train_loss:0.071732
Epoch [1/10], Iter [535/3125], train_loss:0.083744
Epoch [1/10], Iter [536/3125], train_loss:0.066183
Epoch [1/10], Iter [537/3125], train_loss:0.116836
Epoch [1/10], Iter [538/3125], train_loss:0.086225
Epoch [1/10], Iter [539/3125], train_loss:0.097140
Epoch [1/10], Iter [540/3125], train_loss:0.076652
Epoch [1/10], Iter [541/3125], train_loss:0.058895
Epoch [1/10], Iter [542/3125], train_loss:0.068447
Epoch [1/10], Iter [543/3125], train_loss:0.071758
Epoch [1/10], Iter [544/3125], train_loss:0.055181
Epoch [1/10], Iter [545/3125], train_loss:0.058409
Epoch [1/10], Iter [546/3125], train_loss:0.101034
Epoch [1/10], Iter [547/3125], train_loss:0.078014
Epoch [1/10], Iter [548/3125], train_loss:0.101554
Epoch [1/10], Iter [549/3125], train_loss:0.099358
Epoch [1/10], Iter [550/3125], train_loss:0.086353
Epoch [1/10], Iter [551/3125], train_loss:0.087590
Epoch [1/10], Iter [552/3125], train_loss:0.050383
Epoch [1/10], Iter [553/3125], train_loss:0.100233
Epoch [1/10], Iter [554/3125], train_loss:0.095480
Epoch [1/10], Iter [555/3125], train_loss:0.093082
Epoch [1/10], Iter [556/3125], train_loss:0.077300
Epoch [1/10], Iter [557/3125], train_loss:0.097098
Epoch [1/10], Iter [558/3125], train_loss:0.108629
Epoch [1/10], Iter [559/3125], train_loss:0.080039
Epoch [1/10], Iter [560/3125], train_loss:0.086488
Epoch [1/10], Iter [561/3125], train_loss:0.105568
Epoch [1/10], Iter [562/3125], train_loss:0.079867
Epoch [1/10], Iter [563/3125], train_loss:0.094058
Epoch [1/10], Iter [564/3125], train_loss:0.071488
Epoch [1/10], Iter [565/3125], train_loss:0.068944
Epoch [1/10], Iter [566/3125], train_loss:0.107989
Epoch [1/10], Iter [567/3125], train_loss:0.072702
Epoch [1/10], Iter [568/3125], train_loss:0.092457
Epoch [1/10], Iter [569/3125], train_loss:0.116950
Epoch [1/10], Iter [570/3125], train_loss:0.057468
Epoch [1/10], Iter [571/3125], train_loss:0.067517
Epoch [1/10], Iter [572/3125], train_loss:0.069241
Epoch [1/10], Iter [573/3125], train_loss:0.112788
Epoch [1/10], Iter [574/3125], train_loss:0.135044
Epoch [1/10], Iter [575/3125], train_loss:0.139375
Epoch [1/10], Iter [576/3125], train_loss:0.083855
Epoch [1/10], Iter [577/3125], train_loss:0.111794
Epoch [1/10], Iter [578/3125], train_loss:0.087120
Epoch [1/10], Iter [579/3125], train_loss:0.089663
Epoch [1/10], Iter [580/3125], train_loss:0.074575
Epoch [1/10], Iter [581/3125], train_loss:0.064921
Epoch [1/10], Iter [582/3125], train_loss:0.192595
Epoch [1/10], Iter [583/3125], train_loss:0.107797
Epoch [1/10], Iter [584/3125], train_loss:0.077203
Epoch [1/10], Iter [585/3125], train_loss:0.123417
Epoch [1/10], Iter [586/3125], train_loss:0.082694
Epoch [1/10], Iter [587/3125], train_loss:0.075541
Epoch [1/10], Iter [588/3125], train_loss:0.097291
Epoch [1/10], Iter [589/3125], train_loss:0.052539
Epoch [1/10], Iter [590/3125], train_loss:0.066947
Epoch [1/10], Iter [591/3125], train_loss:0.061442
Epoch [1/10], Iter [592/3125], train_loss:0.066907
Epoch [1/10], Iter [593/3125], train_loss:0.059535
Epoch [1/10], Iter [594/3125], train_loss:0.074935
Epoch [1/10], Iter [595/3125], train_loss:0.084690
Epoch [1/10], Iter [596/3125], train_loss:0.063918
Epoch [1/10], Iter [597/3125], train_loss:0.063785
Epoch [1/10], Iter [598/3125], train_loss:0.108638
Epoch [1/10], Iter [599/3125], train_loss:0.086835
Epoch [1/10], Iter [600/3125], train_loss:0.098556
Epoch [1/10], Iter [601/3125], train_loss:0.075705
Epoch [1/10], Iter [602/3125], train_loss:0.059754
Epoch [1/10], Iter [603/3125], train_loss:0.054489
Epoch [1/10], Iter [604/3125], train_loss:0.073924
Epoch [1/10], Iter [605/3125], train_loss:0.094530
Epoch [1/10], Iter [606/3125], train_loss:0.053714
Epoch [1/10], Iter [607/3125], train_loss:0.090675
Epoch [1/10], Iter [608/3125], train_loss:0.078084
Epoch [1/10], Iter [609/3125], train_loss:0.066804
Epoch [1/10], Iter [610/3125], train_loss:0.100219
Epoch [1/10], Iter [611/3125], train_loss:0.075962
Epoch [1/10], Iter [612/3125], train_loss:0.070294
Epoch [1/10], Iter [613/3125], train_loss:0.071478
Epoch [1/10], Iter [614/3125], train_loss:0.096717
Epoch [1/10], Iter [615/3125], train_loss:0.086769
Epoch [1/10], Iter [616/3125], train_loss:0.104664
Epoch [1/10], Iter [617/3125], train_loss:0.072344
Epoch [1/10], Iter [618/3125], train_loss:0.074144
Epoch [1/10], Iter [619/3125], train_loss:0.084967
Epoch [1/10], Iter [620/3125], train_loss:0.095983
Epoch [1/10], Iter [621/3125], train_loss:0.068011
Epoch [1/10], Iter [622/3125], train_loss:0.051430
Epoch [1/10], Iter [623/3125], train_loss:0.072359
Epoch [1/10], Iter [624/3125], train_loss:0.051836
Epoch [1/10], Iter [625/3125], train_loss:0.103024
Epoch [1/10], Iter [626/3125], train_loss:0.088216
Epoch [1/10], Iter [627/3125], train_loss:0.061990
Epoch [1/10], Iter [628/3125], train_loss:0.107665
Epoch [1/10], Iter [629/3125], train_loss:0.076811
Epoch [1/10], Iter [630/3125], train_loss:0.123782
Epoch [1/10], Iter [631/3125], train_loss:0.094078
Epoch [1/10], Iter [632/3125], train_loss:0.059769
Epoch [1/10], Iter [633/3125], train_loss:0.066241
Epoch [1/10], Iter [634/3125], train_loss:0.071580
Epoch [1/10], Iter [635/3125], train_loss:0.076411
Epoch [1/10], Iter [636/3125], train_loss:0.110754
Epoch [1/10], Iter [637/3125], train_loss:0.065504
Epoch [1/10], Iter [638/3125], train_loss:0.083259
Epoch [1/10], Iter [639/3125], train_loss:0.107182
Epoch [1/10], Iter [640/3125], train_loss:0.060376
Epoch [1/10], Iter [641/3125], train_loss:0.077829
Epoch [1/10], Iter [642/3125], train_loss:0.100774
Epoch [1/10], Iter [643/3125], train_loss:0.087143
Epoch [1/10], Iter [644/3125], train_loss:0.060597
Epoch [1/10], Iter [645/3125], train_loss:0.101928
Epoch [1/10], Iter [646/3125], train_loss:0.092720
Epoch [1/10], Iter [647/3125], train_loss:0.081452
Epoch [1/10], Iter [648/3125], train_loss:0.097151
Epoch [1/10], Iter [649/3125], train_loss:0.070104
Epoch [1/10], Iter [650/3125], train_loss:0.094944
Epoch [1/10], Iter [651/3125], train_loss:0.056059
Epoch [1/10], Iter [652/3125], train_loss:0.065773
Epoch [1/10], Iter [653/3125], train_loss:0.087860
Epoch [1/10], Iter [654/3125], train_loss:0.088647
Epoch [1/10], Iter [655/3125], train_loss:0.074508
Epoch [1/10], Iter [656/3125], train_loss:0.078260
Epoch [1/10], Iter [657/3125], train_loss:0.068859
Epoch [1/10], Iter [658/3125], train_loss:0.080638
Epoch [1/10], Iter [659/3125], train_loss:0.101420
Epoch [1/10], Iter [660/3125], train_loss:0.084931
Epoch [1/10], Iter [661/3125], train_loss:0.066806
Epoch [1/10], Iter [662/3125], train_loss:0.105629
Epoch [1/10], Iter [663/3125], train_loss:0.084870
Epoch [1/10], Iter [664/3125], train_loss:0.071970
Epoch [1/10], Iter [665/3125], train_loss:0.087836
Epoch [1/10], Iter [666/3125], train_loss:0.100669
Epoch [1/10], Iter [667/3125], train_loss:0.077280
Epoch [1/10], Iter [668/3125], train_loss:0.116738
Epoch [1/10], Iter [669/3125], train_loss:0.061395
Epoch [1/10], Iter [670/3125], train_loss:0.090685
Epoch [1/10], Iter [671/3125], train_loss:0.080947
Epoch [1/10], Iter [672/3125], train_loss:0.095348
Epoch [1/10], Iter [673/3125], train_loss:0.092972
Epoch [1/10], Iter [674/3125], train_loss:0.107024
Epoch [1/10], Iter [675/3125], train_loss:0.084352
Epoch [1/10], Iter [676/3125], train_loss:0.059006
Epoch [1/10], Iter [677/3125], train_loss:0.092779
Epoch [1/10], Iter [678/3125], train_loss:0.077512
Epoch [1/10], Iter [679/3125], train_loss:0.096963
Epoch [1/10], Iter [680/3125], train_loss:0.096011
Epoch [1/10], Iter [681/3125], train_loss:0.079866
Epoch [1/10], Iter [682/3125], train_loss:0.075723
Epoch [1/10], Iter [683/3125], train_loss:0.085611
Epoch [1/10], Iter [684/3125], train_loss:0.123355
Epoch [1/10], Iter [685/3125], train_loss:0.069978
Epoch [1/10], Iter [686/3125], train_loss:0.077491
Epoch [1/10], Iter [687/3125], train_loss:0.055490
Epoch [1/10], Iter [688/3125], train_loss:0.067270
Epoch [1/10], Iter [689/3125], train_loss:0.114452
Epoch [1/10], Iter [690/3125], train_loss:0.079901
Epoch [1/10], Iter [691/3125], train_loss:0.090492
Epoch [1/10], Iter [692/3125], train_loss:0.072870
Epoch [1/10], Iter [693/3125], train_loss:0.065780
Epoch [1/10], Iter [694/3125], train_loss:0.078856
Epoch [1/10], Iter [695/3125], train_loss:0.062660
Epoch [1/10], Iter [696/3125], train_loss:0.094964
Epoch [1/10], Iter [697/3125], train_loss:0.085245
Epoch [1/10], Iter [698/3125], train_loss:0.096854
Epoch [1/10], Iter [699/3125], train_loss:0.056521
Epoch [1/10], Iter [700/3125], train_loss:0.064707
Epoch [1/10], Iter [701/3125], train_loss:0.102361
Epoch [1/10], Iter [702/3125], train_loss:0.083936
Epoch [1/10], Iter [703/3125], train_loss:0.071545
Epoch [1/10], Iter [704/3125], train_loss:0.056376
Epoch [1/10], Iter [705/3125], train_loss:0.075224
Epoch [1/10], Iter [706/3125], train_loss:0.088155
Epoch [1/10], Iter [707/3125], train_loss:0.075692
Epoch [1/10], Iter [708/3125], train_loss:0.077199
Epoch [1/10], Iter [709/3125], train_loss:0.069121
Epoch [1/10], Iter [710/3125], train_loss:0.077576
Epoch [1/10], Iter [711/3125], train_loss:0.069567
Epoch [1/10], Iter [712/3125], train_loss:0.075430
Epoch [1/10], Iter [713/3125], train_loss:0.070002
Epoch [1/10], Iter [714/3125], train_loss:0.083099
Epoch [1/10], Iter [715/3125], train_loss:0.129424
Epoch [1/10], Iter [716/3125], train_loss:0.076017
Epoch [1/10], Iter [717/3125], train_loss:0.093424
Epoch [1/10], Iter [718/3125], train_loss:0.046105
Epoch [1/10], Iter [719/3125], train_loss:0.103817
Epoch [1/10], Iter [720/3125], train_loss:0.063443
Epoch [1/10], Iter [721/3125], train_loss:0.068008
Epoch [1/10], Iter [722/3125], train_loss:0.080830
Epoch [1/10], Iter [723/3125], train_loss:0.063206
Epoch [1/10], Iter [724/3125], train_loss:0.046125
Epoch [1/10], Iter [725/3125], train_loss:0.098638
Epoch [1/10], Iter [726/3125], train_loss:0.059091
Epoch [1/10], Iter [727/3125], train_loss:0.104707
Epoch [1/10], Iter [728/3125], train_loss:0.060244
Epoch [1/10], Iter [729/3125], train_loss:0.056369
Epoch [1/10], Iter [730/3125], train_loss:0.066725
Epoch [1/10], Iter [731/3125], train_loss:0.078067
Epoch [1/10], Iter [732/3125], train_loss:0.074055
Epoch [1/10], Iter [733/3125], train_loss:0.035916
Epoch [1/10], Iter [734/3125], train_loss:0.066059
Epoch [1/10], Iter [735/3125], train_loss:0.118576
Epoch [1/10], Iter [736/3125], train_loss:0.095265
Epoch [1/10], Iter [737/3125], train_loss:0.085072
Epoch [1/10], Iter [738/3125], train_loss:0.076775
Epoch [1/10], Iter [739/3125], train_loss:0.077835
Epoch [1/10], Iter [740/3125], train_loss:0.071196
Epoch [1/10], Iter [741/3125], train_loss:0.068851
Epoch [1/10], Iter [742/3125], train_loss:0.041999
Epoch [1/10], Iter [743/3125], train_loss:0.074546
Epoch [1/10], Iter [744/3125], train_loss:0.098691
Epoch [1/10], Iter [745/3125], train_loss:0.100539
Epoch [1/10], Iter [746/3125], train_loss:0.079695
Epoch [1/10], Iter [747/3125], train_loss:0.078971
Epoch [1/10], Iter [748/3125], train_loss:0.081766
Epoch [1/10], Iter [749/3125], train_loss:0.089490
Epoch [1/10], Iter [750/3125], train_loss:0.077093
Epoch [1/10], Iter [751/3125], train_loss:0.077361
Epoch [1/10], Iter [752/3125], train_loss:0.114653
Epoch [1/10], Iter [753/3125], train_loss:0.047497
Epoch [1/10], Iter [754/3125], train_loss:0.121098
Epoch [1/10], Iter [755/3125], train_loss:0.070111
Epoch [1/10], Iter [756/3125], train_loss:0.069042
Epoch [1/10], Iter [757/3125], train_loss:0.073422
Epoch [1/10], Iter [758/3125], train_loss:0.070171
Epoch [1/10], Iter [759/3125], train_loss:0.104445
Epoch [1/10], Iter [760/3125], train_loss:0.075994
Epoch [1/10], Iter [761/3125], train_loss:0.057151
Epoch [1/10], Iter [762/3125], train_loss:0.086842
Epoch [1/10], Iter [763/3125], train_loss:0.050175
Epoch [1/10], Iter [764/3125], train_loss:0.114565
Epoch [1/10], Iter [765/3125], train_loss:0.088730
Epoch [1/10], Iter [766/3125], train_loss:0.084020
Epoch [1/10], Iter [767/3125], train_loss:0.055446
Epoch [1/10], Iter [768/3125], train_loss:0.073858
Epoch [1/10], Iter [769/3125], train_loss:0.076490
Epoch [1/10], Iter [770/3125], train_loss:0.117408
Epoch [1/10], Iter [771/3125], train_loss:0.074123
Epoch [1/10], Iter [772/3125], train_loss:0.091184
Epoch [1/10], Iter [773/3125], train_loss:0.101151
Epoch [1/10], Iter [774/3125], train_loss:0.069927
Epoch [1/10], Iter [775/3125], train_loss:0.078611
Epoch [1/10], Iter [776/3125], train_loss:0.076168
Epoch [1/10], Iter [777/3125], train_loss:0.098598
Epoch [1/10], Iter [778/3125], train_loss:0.080934
Epoch [1/10], Iter [779/3125], train_loss:0.065147
Epoch [1/10], Iter [780/3125], train_loss:0.092266
Epoch [1/10], Iter [781/3125], train_loss:0.088162
Epoch [1/10], Iter [782/3125], train_loss:0.048683
Epoch [1/10], Iter [783/3125], train_loss:0.068024
Epoch [1/10], Iter [784/3125], train_loss:0.061430
Epoch [1/10], Iter [785/3125], train_loss:0.084588
Epoch [1/10], Iter [786/3125], train_loss:0.055528
Epoch [1/10], Iter [787/3125], train_loss:0.069858
Epoch [1/10], Iter [788/3125], train_loss:0.066797
Epoch [1/10], Iter [789/3125], train_loss:0.055900
Epoch [1/10], Iter [790/3125], train_loss:0.081083
Epoch [1/10], Iter [791/3125], train_loss:0.104611
Epoch [1/10], Iter [792/3125], train_loss:0.069633
Epoch [1/10], Iter [793/3125], train_loss:0.076716
Epoch [1/10], Iter [794/3125], train_loss:0.058692
Epoch [1/10], Iter [795/3125], train_loss:0.071644
Epoch [1/10], Iter [796/3125], train_loss:0.075141
Epoch [1/10], Iter [797/3125], train_loss:0.057095
Epoch [1/10], Iter [798/3125], train_loss:0.091708
Epoch [1/10], Iter [799/3125], train_loss:0.082720
Epoch [1/10], Iter [800/3125], train_loss:0.082454
Epoch [1/10], Iter [801/3125], train_loss:0.062604
Epoch [1/10], Iter [802/3125], train_loss:0.064724
Epoch [1/10], Iter [803/3125], train_loss:0.070556
Epoch [1/10], Iter [804/3125], train_loss:0.062924
Epoch [1/10], Iter [805/3125], train_loss:0.068634
Epoch [1/10], Iter [806/3125], train_loss:0.125406
Epoch [1/10], Iter [807/3125], train_loss:0.105064
Epoch [1/10], Iter [808/3125], train_loss:0.094673
Epoch [1/10], Iter [809/3125], train_loss:0.058413
Epoch [1/10], Iter [810/3125], train_loss:0.068775
Epoch [1/10], Iter [811/3125], train_loss:0.082067
Epoch [1/10], Iter [812/3125], train_loss:0.069499
Epoch [1/10], Iter [813/3125], train_loss:0.046804
Epoch [1/10], Iter [814/3125], train_loss:0.052497
Epoch [1/10], Iter [815/3125], train_loss:0.039903
Epoch [1/10], Iter [816/3125], train_loss:0.075335
Epoch [1/10], Iter [817/3125], train_loss:0.118900
Epoch [1/10], Iter [818/3125], train_loss:0.095827
Epoch [1/10], Iter [819/3125], train_loss:0.080276
Epoch [1/10], Iter [820/3125], train_loss:0.078976
Epoch [1/10], Iter [821/3125], train_loss:0.067389
Epoch [1/10], Iter [822/3125], train_loss:0.039839
Epoch [1/10], Iter [823/3125], train_loss:0.084257
Epoch [1/10], Iter [824/3125], train_loss:0.086442
Epoch [1/10], Iter [825/3125], train_loss:0.067308
Epoch [1/10], Iter [826/3125], train_loss:0.065607
Epoch [1/10], Iter [827/3125], train_loss:0.076576
Epoch [1/10], Iter [828/3125], train_loss:0.059056
Epoch [1/10], Iter [829/3125], train_loss:0.045432
Epoch [1/10], Iter [830/3125], train_loss:0.097930
Epoch [1/10], Iter [831/3125], train_loss:0.029969
Epoch [1/10], Iter [832/3125], train_loss:0.089879
Epoch [1/10], Iter [833/3125], train_loss:0.065557
Epoch [1/10], Iter [834/3125], train_loss:0.055370
Epoch [1/10], Iter [835/3125], train_loss:0.078189
Epoch [1/10], Iter [836/3125], train_loss:0.078902
Epoch [1/10], Iter [837/3125], train_loss:0.049187
Epoch [1/10], Iter [838/3125], train_loss:0.073233
Epoch [1/10], Iter [839/3125], train_loss:0.042756
Epoch [1/10], Iter [840/3125], train_loss:0.095991
Epoch [1/10], Iter [841/3125], train_loss:0.054647
Epoch [1/10], Iter [842/3125], train_loss:0.090404
Epoch [1/10], Iter [843/3125], train_loss:0.084048
Epoch [1/10], Iter [844/3125], train_loss:0.042351
Epoch [1/10], Iter [845/3125], train_loss:0.110720
Epoch [1/10], Iter [846/3125], train_loss:0.058698
Epoch [1/10], Iter [847/3125], train_loss:0.065574
Epoch [1/10], Iter [848/3125], train_loss:0.103704
Epoch [1/10], Iter [849/3125], train_loss:0.092518
Epoch [1/10], Iter [850/3125], train_loss:0.105825
Epoch [1/10], Iter [851/3125], train_loss:0.092112
Epoch [1/10], Iter [852/3125], train_loss:0.060410
Epoch [1/10], Iter [853/3125], train_loss:0.053077
Epoch [1/10], Iter [854/3125], train_loss:0.096419
Epoch [1/10], Iter [855/3125], train_loss:0.070295
Epoch [1/10], Iter [856/3125], train_loss:0.038191
Epoch [1/10], Iter [857/3125], train_loss:0.067107
Epoch [1/10], Iter [858/3125], train_loss:0.068591
Epoch [1/10], Iter [859/3125], train_loss:0.118834
Epoch [1/10], Iter [860/3125], train_loss:0.057502
Epoch [1/10], Iter [861/3125], train_loss:0.112667
Epoch [1/10], Iter [862/3125], train_loss:0.068514
Epoch [1/10], Iter [863/3125], train_loss:0.078345
Epoch [1/10], Iter [864/3125], train_loss:0.086322
Epoch [1/10], Iter [865/3125], train_loss:0.060227
Epoch [1/10], Iter [866/3125], train_loss:0.069537
Epoch [1/10], Iter [867/3125], train_loss:0.051423
Epoch [1/10], Iter [868/3125], train_loss:0.065481
Epoch [1/10], Iter [869/3125], train_loss:0.078509
Epoch [1/10], Iter [870/3125], train_loss:0.087949
Epoch [1/10], Iter [871/3125], train_loss:0.089137
Epoch [1/10], Iter [872/3125], train_loss:0.097406
Epoch [1/10], Iter [873/3125], train_loss:0.058960
Epoch [1/10], Iter [874/3125], train_loss:0.058738
Epoch [1/10], Iter [875/3125], train_loss:0.061488
Epoch [1/10], Iter [876/3125], train_loss:0.066018
Epoch [1/10], Iter [877/3125], train_loss:0.074891
Epoch [1/10], Iter [878/3125], train_loss:0.086487
Epoch [1/10], Iter [879/3125], train_loss:0.036267
Epoch [1/10], Iter [880/3125], train_loss:0.052825
Epoch [1/10], Iter [881/3125], train_loss:0.086232
Epoch [1/10], Iter [882/3125], train_loss:0.067304
Epoch [1/10], Iter [883/3125], train_loss:0.090174
Epoch [1/10], Iter [884/3125], train_loss:0.074173
Epoch [1/10], Iter [885/3125], train_loss:0.103388
Epoch [1/10], Iter [886/3125], train_loss:0.063061
Epoch [1/10], Iter [887/3125], train_loss:0.111390
Epoch [1/10], Iter [888/3125], train_loss:0.082873
Epoch [1/10], Iter [889/3125], train_loss:0.067860
Epoch [1/10], Iter [890/3125], train_loss:0.069580
Epoch [1/10], Iter [891/3125], train_loss:0.071146
Epoch [1/10], Iter [892/3125], train_loss:0.046750
Epoch [1/10], Iter [893/3125], train_loss:0.069989
Epoch [1/10], Iter [894/3125], train_loss:0.054033
Epoch [1/10], Iter [895/3125], train_loss:0.091311
Epoch [1/10], Iter [896/3125], train_loss:0.089567
Epoch [1/10], Iter [897/3125], train_loss:0.082130
Epoch [1/10], Iter [898/3125], train_loss:0.115708
Epoch [1/10], Iter [899/3125], train_loss:0.099699
Epoch [1/10], Iter [900/3125], train_loss:0.084736
Epoch [1/10], Iter [901/3125], train_loss:0.099145
Epoch [1/10], Iter [902/3125], train_loss:0.096519
Epoch [1/10], Iter [903/3125], train_loss:0.070268
Epoch [1/10], Iter [904/3125], train_loss:0.048972
Epoch [1/10], Iter [905/3125], train_loss:0.055735
Epoch [1/10], Iter [906/3125], train_loss:0.092406
Epoch [1/10], Iter [907/3125], train_loss:0.094186
Epoch [1/10], Iter [908/3125], train_loss:0.058645
Epoch [1/10], Iter [909/3125], train_loss:0.059716
Epoch [1/10], Iter [910/3125], train_loss:0.066300
Epoch [1/10], Iter [911/3125], train_loss:0.055384
Epoch [1/10], Iter [912/3125], train_loss:0.063149
Epoch [1/10], Iter [913/3125], train_loss:0.078833
Epoch [1/10], Iter [914/3125], train_loss:0.047108
Epoch [1/10], Iter [915/3125], train_loss:0.095854
Epoch [1/10], Iter [916/3125], train_loss:0.067950
Epoch [1/10], Iter [917/3125], train_loss:0.089043
Epoch [1/10], Iter [918/3125], train_loss:0.091433
Epoch [1/10], Iter [919/3125], train_loss:0.071309
Epoch [1/10], Iter [920/3125], train_loss:0.064289
Epoch [1/10], Iter [921/3125], train_loss:0.075466
Epoch [1/10], Iter [922/3125], train_loss:0.041136
Epoch [1/10], Iter [923/3125], train_loss:0.069332
Epoch [1/10], Iter [924/3125], train_loss:0.103374
Epoch [1/10], Iter [925/3125], train_loss:0.048819
Epoch [1/10], Iter [926/3125], train_loss:0.102714
Epoch [1/10], Iter [927/3125], train_loss:0.059707
Epoch [1/10], Iter [928/3125], train_loss:0.103872
Epoch [1/10], Iter [929/3125], train_loss:0.071671
Epoch [1/10], Iter [930/3125], train_loss:0.043527
Epoch [1/10], Iter [931/3125], train_loss:0.101342
Epoch [1/10], Iter [932/3125], train_loss:0.090892
Epoch [1/10], Iter [933/3125], train_loss:0.084326
Epoch [1/10], Iter [934/3125], train_loss:0.085523
Epoch [1/10], Iter [935/3125], train_loss:0.104836
Epoch [1/10], Iter [936/3125], train_loss:0.071485
Epoch [1/10], Iter [937/3125], train_loss:0.075505
Epoch [1/10], Iter [938/3125], train_loss:0.055048
Epoch [1/10], Iter [939/3125], train_loss:0.052603
Epoch [1/10], Iter [940/3125], train_loss:0.052872
Epoch [1/10], Iter [941/3125], train_loss:0.046744
Epoch [1/10], Iter [942/3125], train_loss:0.084774
Epoch [1/10], Iter [943/3125], train_loss:0.089809
Epoch [1/10], Iter [944/3125], train_loss:0.077171
Epoch [1/10], Iter [945/3125], train_loss:0.053297
Epoch [1/10], Iter [946/3125], train_loss:0.048126
Epoch [1/10], Iter [947/3125], train_loss:0.069072
Epoch [1/10], Iter [948/3125], train_loss:0.081771
Epoch [1/10], Iter [949/3125], train_loss:0.086464
Epoch [1/10], Iter [950/3125], train_loss:0.078226
Epoch [1/10], Iter [951/3125], train_loss:0.070242
Epoch [1/10], Iter [952/3125], train_loss:0.065498
Epoch [1/10], Iter [953/3125], train_loss:0.057135
Epoch [1/10], Iter [954/3125], train_loss:0.087012
Epoch [1/10], Iter [955/3125], train_loss:0.087501
Epoch [1/10], Iter [956/3125], train_loss:0.076051
Epoch [1/10], Iter [957/3125], train_loss:0.093375
Epoch [1/10], Iter [958/3125], train_loss:0.098896
Epoch [1/10], Iter [959/3125], train_loss:0.094898
Epoch [1/10], Iter [960/3125], train_loss:0.051544
Epoch [1/10], Iter [961/3125], train_loss:0.112901
Epoch [1/10], Iter [962/3125], train_loss:0.064911
Epoch [1/10], Iter [963/3125], train_loss:0.127530
Epoch [1/10], Iter [964/3125], train_loss:0.060438
Epoch [1/10], Iter [965/3125], train_loss:0.073689
Epoch [1/10], Iter [966/3125], train_loss:0.058125
Epoch [1/10], Iter [967/3125], train_loss:0.076736
Epoch [1/10], Iter [968/3125], train_loss:0.076557
Epoch [1/10], Iter [969/3125], train_loss:0.064269
Epoch [1/10], Iter [970/3125], train_loss:0.078429
Epoch [1/10], Iter [971/3125], train_loss:0.053220
Epoch [1/10], Iter [972/3125], train_loss:0.059810
Epoch [1/10], Iter [973/3125], train_loss:0.061482
Epoch [1/10], Iter [974/3125], train_loss:0.059918
Epoch [1/10], Iter [975/3125], train_loss:0.095541
Epoch [1/10], Iter [976/3125], train_loss:0.066343
Epoch [1/10], Iter [977/3125], train_loss:0.063362
Epoch [1/10], Iter [978/3125], train_loss:0.049746
Epoch [1/10], Iter [979/3125], train_loss:0.076230
Epoch [1/10], Iter [980/3125], train_loss:0.085253
Epoch [1/10], Iter [981/3125], train_loss:0.055329
Epoch [1/10], Iter [982/3125], train_loss:0.073866
Epoch [1/10], Iter [983/3125], train_loss:0.090456
Epoch [1/10], Iter [984/3125], train_loss:0.065264
Epoch [1/10], Iter [985/3125], train_loss:0.094808
Epoch [1/10], Iter [986/3125], train_loss:0.083755
Epoch [1/10], Iter [987/3125], train_loss:0.100000
Epoch [1/10], Iter [988/3125], train_loss:0.044194
Epoch [1/10], Iter [989/3125], train_loss:0.089688
Epoch [1/10], Iter [990/3125], train_loss:0.061354
Epoch [1/10], Iter [991/3125], train_loss:0.072798
Epoch [1/10], Iter [992/3125], train_loss:0.055077
Epoch [1/10], Iter [993/3125], train_loss:0.066739
Epoch [1/10], Iter [994/3125], train_loss:0.085635
Epoch [1/10], Iter [995/3125], train_loss:0.062349
Epoch [1/10], Iter [996/3125], train_loss:0.055486
Epoch [1/10], Iter [997/3125], train_loss:0.061249
Epoch [1/10], Iter [998/3125], train_loss:0.046875
Epoch [1/10], Iter [999/3125], train_loss:0.078696
Epoch [1/10], Iter [1000/3125], train_loss:0.071514
Epoch [1/10], Iter [1001/3125], train_loss:0.084848
Epoch [1/10], Iter [1002/3125], train_loss:0.051532
Epoch [1/10], Iter [1003/3125], train_loss:0.084807
Epoch [1/10], Iter [1004/3125], train_loss:0.088694
Epoch [1/10], Iter [1005/3125], train_loss:0.081654
Epoch [1/10], Iter [1006/3125], train_loss:0.067032
Epoch [1/10], Iter [1007/3125], train_loss:0.124414
Epoch [1/10], Iter [1008/3125], train_loss:0.080349
Epoch [1/10], Iter [1009/3125], train_loss:0.036862
Epoch [1/10], Iter [1010/3125], train_loss:0.076840
Epoch [1/10], Iter [1011/3125], train_loss:0.042844
Epoch [1/10], Iter [1012/3125], train_loss:0.078605
Epoch [1/10], Iter [1013/3125], train_loss:0.044502
Epoch [1/10], Iter [1014/3125], train_loss:0.080783
Epoch [1/10], Iter [1015/3125], train_loss:0.071481
Epoch [1/10], Iter [1016/3125], train_loss:0.085543
Epoch [1/10], Iter [1017/3125], train_loss:0.107438
Epoch [1/10], Iter [1018/3125], train_loss:0.076212
Epoch [1/10], Iter [1019/3125], train_loss:0.078109
Epoch [1/10], Iter [1020/3125], train_loss:0.047839
Epoch [1/10], Iter [1021/3125], train_loss:0.090297
Epoch [1/10], Iter [1022/3125], train_loss:0.060652
Epoch [1/10], Iter [1023/3125], train_loss:0.107761
Epoch [1/10], Iter [1024/3125], train_loss:0.075100
Epoch [1/10], Iter [1025/3125], train_loss:0.065084
Epoch [1/10], Iter [1026/3125], train_loss:0.086126
Epoch [1/10], Iter [1027/3125], train_loss:0.076870
Epoch [1/10], Iter [1028/3125], train_loss:0.090435
Epoch [1/10], Iter [1029/3125], train_loss:0.071291
Epoch [1/10], Iter [1030/3125], train_loss:0.072460
Epoch [1/10], Iter [1031/3125], train_loss:0.065093
Epoch [1/10], Iter [1032/3125], train_loss:0.046128
Epoch [1/10], Iter [1033/3125], train_loss:0.081843
Epoch [1/10], Iter [1034/3125], train_loss:0.098334
Epoch [1/10], Iter [1035/3125], train_loss:0.044121
Epoch [1/10], Iter [1036/3125], train_loss:0.067291
Epoch [1/10], Iter [1037/3125], train_loss:0.055147
Epoch [1/10], Iter [1038/3125], train_loss:0.075272
Epoch [1/10], Iter [1039/3125], train_loss:0.097143
Epoch [1/10], Iter [1040/3125], train_loss:0.083308
Epoch [1/10], Iter [1041/3125], train_loss:0.083002
Epoch [1/10], Iter [1042/3125], train_loss:0.074888
Epoch [1/10], Iter [1043/3125], train_loss:0.097697
Epoch [1/10], Iter [1044/3125], train_loss:0.049311
Epoch [1/10], Iter [1045/3125], train_loss:0.081692
Epoch [1/10], Iter [1046/3125], train_loss:0.064942
Epoch [1/10], Iter [1047/3125], train_loss:0.044580
Epoch [1/10], Iter [1048/3125], train_loss:0.085176
Epoch [1/10], Iter [1049/3125], train_loss:0.063269
Epoch [1/10], Iter [1050/3125], train_loss:0.077601
Epoch [1/10], Iter [1051/3125], train_loss:0.105948
Epoch [1/10], Iter [1052/3125], train_loss:0.059415
Epoch [1/10], Iter [1053/3125], train_loss:0.094063
Epoch [1/10], Iter [1054/3125], train_loss:0.092959
Epoch [1/10], Iter [1055/3125], train_loss:0.092067
Epoch [1/10], Iter [1056/3125], train_loss:0.067009
Epoch [1/10], Iter [1057/3125], train_loss:0.098917
Epoch [1/10], Iter [1058/3125], train_loss:0.057587
Epoch [1/10], Iter [1059/3125], train_loss:0.130291
Epoch [1/10], Iter [1060/3125], train_loss:0.067882
Epoch [1/10], Iter [1061/3125], train_loss:0.060654
Epoch [1/10], Iter [1062/3125], train_loss:0.055052
Epoch [1/10], Iter [1063/3125], train_loss:0.113558
Epoch [1/10], Iter [1064/3125], train_loss:0.092149
Epoch [1/10], Iter [1065/3125], train_loss:0.080471
Epoch [1/10], Iter [1066/3125], train_loss:0.077791
Epoch [1/10], Iter [1067/3125], train_loss:0.064857
Epoch [1/10], Iter [1068/3125], train_loss:0.061791
Epoch [1/10], Iter [1069/3125], train_loss:0.092346
Epoch [1/10], Iter [1070/3125], train_loss:0.061829
Epoch [1/10], Iter [1071/3125], train_loss:0.052066
Epoch [1/10], Iter [1072/3125], train_loss:0.060261
Epoch [1/10], Iter [1073/3125], train_loss:0.052576
Epoch [1/10], Iter [1074/3125], train_loss:0.091335
Epoch [1/10], Iter [1075/3125], train_loss:0.085970
Epoch [1/10], Iter [1076/3125], train_loss:0.051026
Epoch [1/10], Iter [1077/3125], train_loss:0.054480
Epoch [1/10], Iter [1078/3125], train_loss:0.076401
Epoch [1/10], Iter [1079/3125], train_loss:0.067915
Epoch [1/10], Iter [1080/3125], train_loss:0.080814
Epoch [1/10], Iter [1081/3125], train_loss:0.079265
Epoch [1/10], Iter [1082/3125], train_loss:0.064177
Epoch [1/10], Iter [1083/3125], train_loss:0.070294
Epoch [1/10], Iter [1084/3125], train_loss:0.076654
Epoch [1/10], Iter [1085/3125], train_loss:0.048900
Epoch [1/10], Iter [1086/3125], train_loss:0.080051
Epoch [1/10], Iter [1087/3125], train_loss:0.062221
Epoch [1/10], Iter [1088/3125], train_loss:0.053528
Epoch [1/10], Iter [1089/3125], train_loss:0.078500
Epoch [1/10], Iter [1090/3125], train_loss:0.054167
Epoch [1/10], Iter [1091/3125], train_loss:0.060830
Epoch [1/10], Iter [1092/3125], train_loss:0.070064
Epoch [1/10], Iter [1093/3125], train_loss:0.059513
Epoch [1/10], Iter [1094/3125], train_loss:0.064300
Epoch [1/10], Iter [1095/3125], train_loss:0.064953
Epoch [1/10], Iter [1096/3125], train_loss:0.098469
Epoch [1/10], Iter [1097/3125], train_loss:0.070608
Epoch [1/10], Iter [1098/3125], train_loss:0.063558
Epoch [1/10], Iter [1099/3125], train_loss:0.047807
Epoch [1/10], Iter [1100/3125], train_loss:0.040138
Epoch [1/10], Iter [1101/3125], train_loss:0.054244
Epoch [1/10], Iter [1102/3125], train_loss:0.094688
Epoch [1/10], Iter [1103/3125], train_loss:0.040553
Epoch [1/10], Iter [1104/3125], train_loss:0.054478
Epoch [1/10], Iter [1105/3125], train_loss:0.051893
Epoch [1/10], Iter [1106/3125], train_loss:0.063331
Epoch [1/10], Iter [1107/3125], train_loss:0.092488
Epoch [1/10], Iter [1108/3125], train_loss:0.079674
Epoch [1/10], Iter [1109/3125], train_loss:0.082050
Epoch [1/10], Iter [1110/3125], train_loss:0.053623
Epoch [1/10], Iter [1111/3125], train_loss:0.142942
Epoch [1/10], Iter [1112/3125], train_loss:0.071629
Epoch [1/10], Iter [1113/3125], train_loss:0.070982
Epoch [1/10], Iter [1114/3125], train_loss:0.096225
Epoch [1/10], Iter [1115/3125], train_loss:0.071539
Epoch [1/10], Iter [1116/3125], train_loss:0.058115
Epoch [1/10], Iter [1117/3125], train_loss:0.069117
Epoch [1/10], Iter [1118/3125], train_loss:0.048873
Epoch [1/10], Iter [1119/3125], train_loss:0.041571
Epoch [1/10], Iter [1120/3125], train_loss:0.062927
Epoch [1/10], Iter [1121/3125], train_loss:0.060754
Epoch [1/10], Iter [1122/3125], train_loss:0.072750
Epoch [1/10], Iter [1123/3125], train_loss:0.112615
Epoch [1/10], Iter [1124/3125], train_loss:0.051256
Epoch [1/10], Iter [1125/3125], train_loss:0.086577
Epoch [1/10], Iter [1126/3125], train_loss:0.058549
Epoch [1/10], Iter [1127/3125], train_loss:0.038518
Epoch [1/10], Iter [1128/3125], train_loss:0.080108
Epoch [1/10], Iter [1129/3125], train_loss:0.088471
Epoch [1/10], Iter [1130/3125], train_loss:0.062608
Epoch [1/10], Iter [1131/3125], train_loss:0.029030
Epoch [1/10], Iter [1132/3125], train_loss:0.102873
Epoch [1/10], Iter [1133/3125], train_loss:0.044108
Epoch [1/10], Iter [1134/3125], train_loss:0.062481
Epoch [1/10], Iter [1135/3125], train_loss:0.070823
Epoch [1/10], Iter [1136/3125], train_loss:0.056807
Epoch [1/10], Iter [1137/3125], train_loss:0.086398
Epoch [1/10], Iter [1138/3125], train_loss:0.070901
Epoch [1/10], Iter [1139/3125], train_loss:0.057244
Epoch [1/10], Iter [1140/3125], train_loss:0.084820
Epoch [1/10], Iter [1141/3125], train_loss:0.060651
Epoch [1/10], Iter [1142/3125], train_loss:0.050026
Epoch [1/10], Iter [1143/3125], train_loss:0.051782
Epoch [1/10], Iter [1144/3125], train_loss:0.078317
Epoch [1/10], Iter [1145/3125], train_loss:0.101919
Epoch [1/10], Iter [1146/3125], train_loss:0.066825
Epoch [1/10], Iter [1147/3125], train_loss:0.058590
Epoch [1/10], Iter [1148/3125], train_loss:0.065694
Epoch [1/10], Iter [1149/3125], train_loss:0.073218
Epoch [1/10], Iter [1150/3125], train_loss:0.055545
Epoch [1/10], Iter [1151/3125], train_loss:0.091100
Epoch [1/10], Iter [1152/3125], train_loss:0.064072
Epoch [1/10], Iter [1153/3125], train_loss:0.056346
Epoch [1/10], Iter [1154/3125], train_loss:0.051450
Epoch [1/10], Iter [1155/3125], train_loss:0.092154
Epoch [1/10], Iter [1156/3125], train_loss:0.042432
Epoch [1/10], Iter [1157/3125], train_loss:0.089265
Epoch [1/10], Iter [1158/3125], train_loss:0.060625
Epoch [1/10], Iter [1159/3125], train_loss:0.099431
Epoch [1/10], Iter [1160/3125], train_loss:0.083928
Epoch [1/10], Iter [1161/3125], train_loss:0.035615
Epoch [1/10], Iter [1162/3125], train_loss:0.085633
Epoch [1/10], Iter [1163/3125], train_loss:0.072629
Epoch [1/10], Iter [1164/3125], train_loss:0.025984
Epoch [1/10], Iter [1165/3125], train_loss:0.039261
Epoch [1/10], Iter [1166/3125], train_loss:0.069321
Epoch [1/10], Iter [1167/3125], train_loss:0.069004
Epoch [1/10], Iter [1168/3125], train_loss:0.089742
Epoch [1/10], Iter [1169/3125], train_loss:0.079844
Epoch [1/10], Iter [1170/3125], train_loss:0.072411
Epoch [1/10], Iter [1171/3125], train_loss:0.067221
Epoch [1/10], Iter [1172/3125], train_loss:0.042146
Epoch [1/10], Iter [1173/3125], train_loss:0.057201
Epoch [1/10], Iter [1174/3125], train_loss:0.080315
Epoch [1/10], Iter [1175/3125], train_loss:0.071066
Epoch [1/10], Iter [1176/3125], train_loss:0.052890
Epoch [1/10], Iter [1177/3125], train_loss:0.068389
Epoch [1/10], Iter [1178/3125], train_loss:0.064046
Epoch [1/10], Iter [1179/3125], train_loss:0.077891
Epoch [1/10], Iter [1180/3125], train_loss:0.048555
Epoch [1/10], Iter [1181/3125], train_loss:0.050501
Epoch [1/10], Iter [1182/3125], train_loss:0.048259
Epoch [1/10], Iter [1183/3125], train_loss:0.062327
Epoch [1/10], Iter [1184/3125], train_loss:0.109548
Epoch [1/10], Iter [1185/3125], train_loss:0.065658
Epoch [1/10], Iter [1186/3125], train_loss:0.093734
Epoch [1/10], Iter [1187/3125], train_loss:0.063664
Epoch [1/10], Iter [1188/3125], train_loss:0.037065
Epoch [1/10], Iter [1189/3125], train_loss:0.057139
Epoch [1/10], Iter [1190/3125], train_loss:0.036839
Epoch [1/10], Iter [1191/3125], train_loss:0.067464
Epoch [1/10], Iter [1192/3125], train_loss:0.066957
Epoch [1/10], Iter [1193/3125], train_loss:0.084686
Epoch [1/10], Iter [1194/3125], train_loss:0.052129
Epoch [1/10], Iter [1195/3125], train_loss:0.088091
Epoch [1/10], Iter [1196/3125], train_loss:0.108515
Epoch [1/10], Iter [1197/3125], train_loss:0.066917
Epoch [1/10], Iter [1198/3125], train_loss:0.081250
Epoch [1/10], Iter [1199/3125], train_loss:0.060395
Epoch [1/10], Iter [1200/3125], train_loss:0.111344
Epoch [1/10], Iter [1201/3125], train_loss:0.067042
Epoch [1/10], Iter [1202/3125], train_loss:0.056118
Epoch [1/10], Iter [1203/3125], train_loss:0.100409
Epoch [1/10], Iter [1204/3125], train_loss:0.079419
Epoch [1/10], Iter [1205/3125], train_loss:0.044308
Epoch [1/10], Iter [1206/3125], train_loss:0.053429
Epoch [1/10], Iter [1207/3125], train_loss:0.045393
Epoch [1/10], Iter [1208/3125], train_loss:0.056517
Epoch [1/10], Iter [1209/3125], train_loss:0.051357
Epoch [1/10], Iter [1210/3125], train_loss:0.074712
Epoch [1/10], Iter [1211/3125], train_loss:0.067255
Epoch [1/10], Iter [1212/3125], train_loss:0.066072
Epoch [1/10], Iter [1213/3125], train_loss:0.036946
Epoch [1/10], Iter [1214/3125], train_loss:0.074870
Epoch [1/10], Iter [1215/3125], train_loss:0.095798
Epoch [1/10], Iter [1216/3125], train_loss:0.058114
Epoch [1/10], Iter [1217/3125], train_loss:0.067285
Epoch [1/10], Iter [1218/3125], train_loss:0.076193
Epoch [1/10], Iter [1219/3125], train_loss:0.069693
Epoch [1/10], Iter [1220/3125], train_loss:0.072604
Epoch [1/10], Iter [1221/3125], train_loss:0.064588
Epoch [1/10], Iter [1222/3125], train_loss:0.070116
Epoch [1/10], Iter [1223/3125], train_loss:0.078694
Epoch [1/10], Iter [1224/3125], train_loss:0.073832
Epoch [1/10], Iter [1225/3125], train_loss:0.057916
Epoch [1/10], Iter [1226/3125], train_loss:0.074006
Epoch [1/10], Iter [1227/3125], train_loss:0.094362
Epoch [1/10], Iter [1228/3125], train_loss:0.052954
Epoch [1/10], Iter [1229/3125], train_loss:0.066249
Epoch [1/10], Iter [1230/3125], train_loss:0.037475
Epoch [1/10], Iter [1231/3125], train_loss:0.037161
Epoch [1/10], Iter [1232/3125], train_loss:0.080392
Epoch [1/10], Iter [1233/3125], train_loss:0.064337
Epoch [1/10], Iter [1234/3125], train_loss:0.036732
Epoch [1/10], Iter [1235/3125], train_loss:0.080269
Epoch [1/10], Iter [1236/3125], train_loss:0.073352
Epoch [1/10], Iter [1237/3125], train_loss:0.071526
Epoch [1/10], Iter [1238/3125], train_loss:0.064553
Epoch [1/10], Iter [1239/3125], train_loss:0.094893
Epoch [1/10], Iter [1240/3125], train_loss:0.061000
Epoch [1/10], Iter [1241/3125], train_loss:0.069262
Epoch [1/10], Iter [1242/3125], train_loss:0.079779
Epoch [1/10], Iter [1243/3125], train_loss:0.066429
Epoch [1/10], Iter [1244/3125], train_loss:0.046146
Epoch [1/10], Iter [1245/3125], train_loss:0.054782
Epoch [1/10], Iter [1246/3125], train_loss:0.080050
Epoch [1/10], Iter [1247/3125], train_loss:0.081471
Epoch [1/10], Iter [1248/3125], train_loss:0.065746
Epoch [1/10], Iter [1249/3125], train_loss:0.037090
Epoch [1/10], Iter [1250/3125], train_loss:0.076876
Epoch [1/10], Iter [1251/3125], train_loss:0.051030
Epoch [1/10], Iter [1252/3125], train_loss:0.042274
Epoch [1/10], Iter [1253/3125], train_loss:0.068953
Epoch [1/10], Iter [1254/3125], train_loss:0.077853
Epoch [1/10], Iter [1255/3125], train_loss:0.078600
Epoch [1/10], Iter [1256/3125], train_loss:0.029034
Epoch [1/10], Iter [1257/3125], train_loss:0.067805
Epoch [1/10], Iter [1258/3125], train_loss:0.105204
Epoch [1/10], Iter [1259/3125], train_loss:0.044573
Epoch [1/10], Iter [1260/3125], train_loss:0.098438
Epoch [1/10], Iter [1261/3125], train_loss:0.044922
Epoch [1/10], Iter [1262/3125], train_loss:0.077494
Epoch [1/10], Iter [1263/3125], train_loss:0.068515
Epoch [1/10], Iter [1264/3125], train_loss:0.082361
Epoch [1/10], Iter [1265/3125], train_loss:0.065620
Epoch [1/10], Iter [1266/3125], train_loss:0.061101
Epoch [1/10], Iter [1267/3125], train_loss:0.072236
Epoch [1/10], Iter [1268/3125], train_loss:0.057902
Epoch [1/10], Iter [1269/3125], train_loss:0.078264
Epoch [1/10], Iter [1270/3125], train_loss:0.053628
Epoch [1/10], Iter [1271/3125], train_loss:0.076903
Epoch [1/10], Iter [1272/3125], train_loss:0.055117
Epoch [1/10], Iter [1273/3125], train_loss:0.122055
Epoch [1/10], Iter [1274/3125], train_loss:0.041958
Epoch [1/10], Iter [1275/3125], train_loss:0.110160
Epoch [1/10], Iter [1276/3125], train_loss:0.080354
Epoch [1/10], Iter [1277/3125], train_loss:0.036007
Epoch [1/10], Iter [1278/3125], train_loss:0.051821
Epoch [1/10], Iter [1279/3125], train_loss:0.103632
Epoch [1/10], Iter [1280/3125], train_loss:0.105166
Epoch [1/10], Iter [1281/3125], train_loss:0.068429
Epoch [1/10], Iter [1282/3125], train_loss:0.072354
Epoch [1/10], Iter [1283/3125], train_loss:0.058038
Epoch [1/10], Iter [1284/3125], train_loss:0.071881
Epoch [1/10], Iter [1285/3125], train_loss:0.033587
Epoch [1/10], Iter [1286/3125], train_loss:0.041231
Epoch [1/10], Iter [1287/3125], train_loss:0.072158
Epoch [1/10], Iter [1288/3125], train_loss:0.037460
Epoch [1/10], Iter [1289/3125], train_loss:0.052904
Epoch [1/10], Iter [1290/3125], train_loss:0.051290
Epoch [1/10], Iter [1291/3125], train_loss:0.076521
Epoch [1/10], Iter [1292/3125], train_loss:0.045308
Epoch [1/10], Iter [1293/3125], train_loss:0.077797
Epoch [1/10], Iter [1294/3125], train_loss:0.050401
Epoch [1/10], Iter [1295/3125], train_loss:0.054285
Epoch [1/10], Iter [1296/3125], train_loss:0.071456
Epoch [1/10], Iter [1297/3125], train_loss:0.069530
Epoch [1/10], Iter [1298/3125], train_loss:0.063551
Epoch [1/10], Iter [1299/3125], train_loss:0.060730
Epoch [1/10], Iter [1300/3125], train_loss:0.054880
Epoch [1/10], Iter [1301/3125], train_loss:0.049532
Epoch [1/10], Iter [1302/3125], train_loss:0.069171
Epoch [1/10], Iter [1303/3125], train_loss:0.061904
Epoch [1/10], Iter [1304/3125], train_loss:0.047012
Epoch [1/10], Iter [1305/3125], train_loss:0.045866
Epoch [1/10], Iter [1306/3125], train_loss:0.042385
Epoch [1/10], Iter [1307/3125], train_loss:0.050176
Epoch [1/10], Iter [1308/3125], train_loss:0.082048
Epoch [1/10], Iter [1309/3125], train_loss:0.042563
Epoch [1/10], Iter [1310/3125], train_loss:0.078971
Epoch [1/10], Iter [1311/3125], train_loss:0.086524
Epoch [1/10], Iter [1312/3125], train_loss:0.056474
Epoch [1/10], Iter [1313/3125], train_loss:0.037732
Epoch [1/10], Iter [1314/3125], train_loss:0.078819
Epoch [1/10], Iter [1315/3125], train_loss:0.082700
Epoch [1/10], Iter [1316/3125], train_loss:0.092105
Epoch [1/10], Iter [1317/3125], train_loss:0.059939
Epoch [1/10], Iter [1318/3125], train_loss:0.073690
Epoch [1/10], Iter [1319/3125], train_loss:0.049467
Epoch [1/10], Iter [1320/3125], train_loss:0.086146
Epoch [1/10], Iter [1321/3125], train_loss:0.061879
Epoch [1/10], Iter [1322/3125], train_loss:0.093417
Epoch [1/10], Iter [1323/3125], train_loss:0.041446
Epoch [1/10], Iter [1324/3125], train_loss:0.055495
Epoch [1/10], Iter [1325/3125], train_loss:0.061338
Epoch [1/10], Iter [1326/3125], train_loss:0.057086
Epoch [1/10], Iter [1327/3125], train_loss:0.051174
Epoch [1/10], Iter [1328/3125], train_loss:0.054015
Epoch [1/10], Iter [1329/3125], train_loss:0.061765
Epoch [1/10], Iter [1330/3125], train_loss:0.066730
Epoch [1/10], Iter [1331/3125], train_loss:0.054490
Epoch [1/10], Iter [1332/3125], train_loss:0.057822
Epoch [1/10], Iter [1333/3125], train_loss:0.063132
Epoch [1/10], Iter [1334/3125], train_loss:0.069564
Epoch [1/10], Iter [1335/3125], train_loss:0.044150
Epoch [1/10], Iter [1336/3125], train_loss:0.080780
Epoch [1/10], Iter [1337/3125], train_loss:0.058406
Epoch [1/10], Iter [1338/3125], train_loss:0.049550
Epoch [1/10], Iter [1339/3125], train_loss:0.044474
Epoch [1/10], Iter [1340/3125], train_loss:0.055215
Epoch [1/10], Iter [1341/3125], train_loss:0.097746
Epoch [1/10], Iter [1342/3125], train_loss:0.071166
Epoch [1/10], Iter [1343/3125], train_loss:0.050535
Epoch [1/10], Iter [1344/3125], train_loss:0.065595
Epoch [1/10], Iter [1345/3125], train_loss:0.069312
Epoch [1/10], Iter [1346/3125], train_loss:0.068984
Epoch [1/10], Iter [1347/3125], train_loss:0.114133
Epoch [1/10], Iter [1348/3125], train_loss:0.053902
Epoch [1/10], Iter [1349/3125], train_loss:0.039486
Epoch [1/10], Iter [1350/3125], train_loss:0.077412
Epoch [1/10], Iter [1351/3125], train_loss:0.105866
Epoch [1/10], Iter [1352/3125], train_loss:0.036934
Epoch [1/10], Iter [1353/3125], train_loss:0.028790
Epoch [1/10], Iter [1354/3125], train_loss:0.044115
Epoch [1/10], Iter [1355/3125], train_loss:0.050180
Epoch [1/10], Iter [1356/3125], train_loss:0.035173
Epoch [1/10], Iter [1357/3125], train_loss:0.066359
Epoch [1/10], Iter [1358/3125], train_loss:0.061649
Epoch [1/10], Iter [1359/3125], train_loss:0.090383
Epoch [1/10], Iter [1360/3125], train_loss:0.094560
Epoch [1/10], Iter [1361/3125], train_loss:0.051187
Epoch [1/10], Iter [1362/3125], train_loss:0.051535
Epoch [1/10], Iter [1363/3125], train_loss:0.086489
Epoch [1/10], Iter [1364/3125], train_loss:0.064312
Epoch [1/10], Iter [1365/3125], train_loss:0.035589
Epoch [1/10], Iter [1366/3125], train_loss:0.074556
Epoch [1/10], Iter [1367/3125], train_loss:0.095972
Epoch [1/10], Iter [1368/3125], train_loss:0.079113
Epoch [1/10], Iter [1369/3125], train_loss:0.075476
Epoch [1/10], Iter [1370/3125], train_loss:0.055053
Epoch [1/10], Iter [1371/3125], train_loss:0.036419
Epoch [1/10], Iter [1372/3125], train_loss:0.082008
Epoch [1/10], Iter [1373/3125], train_loss:0.035035
Epoch [1/10], Iter [1374/3125], train_loss:0.061965
Epoch [1/10], Iter [1375/3125], train_loss:0.090616
Epoch [1/10], Iter [1376/3125], train_loss:0.071584
Epoch [1/10], Iter [1377/3125], train_loss:0.062969
Epoch [1/10], Iter [1378/3125], train_loss:0.049597
Epoch [1/10], Iter [1379/3125], train_loss:0.042371
Epoch [1/10], Iter [1380/3125], train_loss:0.058470
Epoch [1/10], Iter [1381/3125], train_loss:0.089132
Epoch [1/10], Iter [1382/3125], train_loss:0.042923
Epoch [1/10], Iter [1383/3125], train_loss:0.066922
Epoch [1/10], Iter [1384/3125], train_loss:0.055818
Epoch [1/10], Iter [1385/3125], train_loss:0.077349
Epoch [1/10], Iter [1386/3125], train_loss:0.034871
Epoch [1/10], Iter [1387/3125], train_loss:0.034735
Epoch [1/10], Iter [1388/3125], train_loss:0.041610
Epoch [1/10], Iter [1389/3125], train_loss:0.078672
Epoch [1/10], Iter [1390/3125], train_loss:0.079922
Epoch [1/10], Iter [1391/3125], train_loss:0.053695
Epoch [1/10], Iter [1392/3125], train_loss:0.094359
Epoch [1/10], Iter [1393/3125], train_loss:0.066231
Epoch [1/10], Iter [1394/3125], train_loss:0.053103
Epoch [1/10], Iter [1395/3125], train_loss:0.054961
Epoch [1/10], Iter [1396/3125], train_loss:0.069908
Epoch [1/10], Iter [1397/3125], train_loss:0.036498
Epoch [1/10], Iter [1398/3125], train_loss:0.070611
Epoch [1/10], Iter [1399/3125], train_loss:0.046233
Epoch [1/10], Iter [1400/3125], train_loss:0.045637
Epoch [1/10], Iter [1401/3125], train_loss:0.026635
Epoch [1/10], Iter [1402/3125], train_loss:0.051463
Epoch [1/10], Iter [1403/3125], train_loss:0.072863
Epoch [1/10], Iter [1404/3125], train_loss:0.039532
Epoch [1/10], Iter [1405/3125], train_loss:0.094029
Epoch [1/10], Iter [1406/3125], train_loss:0.107056
Epoch [1/10], Iter [1407/3125], train_loss:0.068884
Epoch [1/10], Iter [1408/3125], train_loss:0.045376
Epoch [1/10], Iter [1409/3125], train_loss:0.035768
Epoch [1/10], Iter [1410/3125], train_loss:0.058423
Epoch [1/10], Iter [1411/3125], train_loss:0.105580
Epoch [1/10], Iter [1412/3125], train_loss:0.059442
Epoch [1/10], Iter [1413/3125], train_loss:0.056727
Epoch [1/10], Iter [1414/3125], train_loss:0.046670
Epoch [1/10], Iter [1415/3125], train_loss:0.052132
Epoch [1/10], Iter [1416/3125], train_loss:0.086853
Epoch [1/10], Iter [1417/3125], train_loss:0.053923
Epoch [1/10], Iter [1418/3125], train_loss:0.043211
Epoch [1/10], Iter [1419/3125], train_loss:0.042907
Epoch [1/10], Iter [1420/3125], train_loss:0.044250
Epoch [1/10], Iter [1421/3125], train_loss:0.084763
Epoch [1/10], Iter [1422/3125], train_loss:0.063013
Epoch [1/10], Iter [1423/3125], train_loss:0.031712
Epoch [1/10], Iter [1424/3125], train_loss:0.066372
Epoch [1/10], Iter [1425/3125], train_loss:0.079808
Epoch [1/10], Iter [1426/3125], train_loss:0.070664
Epoch [1/10], Iter [1427/3125], train_loss:0.042726
Epoch [1/10], Iter [1428/3125], train_loss:0.047623
Epoch [1/10], Iter [1429/3125], train_loss:0.054263
Epoch [1/10], Iter [1430/3125], train_loss:0.065956
Epoch [1/10], Iter [1431/3125], train_loss:0.067826
Epoch [1/10], Iter [1432/3125], train_loss:0.049903
Epoch [1/10], Iter [1433/3125], train_loss:0.058264
Epoch [1/10], Iter [1434/3125], train_loss:0.082112
Epoch [1/10], Iter [1435/3125], train_loss:0.048372
Epoch [1/10], Iter [1436/3125], train_loss:0.089613
Epoch [1/10], Iter [1437/3125], train_loss:0.070496
Epoch [1/10], Iter [1438/3125], train_loss:0.048467
Epoch [1/10], Iter [1439/3125], train_loss:0.048719
Epoch [1/10], Iter [1440/3125], train_loss:0.051029
Epoch [1/10], Iter [1441/3125], train_loss:0.066726
Epoch [1/10], Iter [1442/3125], train_loss:0.074743
Epoch [1/10], Iter [1443/3125], train_loss:0.062530
Epoch [1/10], Iter [1444/3125], train_loss:0.031921
Epoch [1/10], Iter [1445/3125], train_loss:0.082468
Epoch [1/10], Iter [1446/3125], train_loss:0.066029
Epoch [1/10], Iter [1447/3125], train_loss:0.079104
Epoch [1/10], Iter [1448/3125], train_loss:0.050547
Epoch [1/10], Iter [1449/3125], train_loss:0.070847
Epoch [1/10], Iter [1450/3125], train_loss:0.066685
Epoch [1/10], Iter [1451/3125], train_loss:0.062502
Epoch [1/10], Iter [1452/3125], train_loss:0.039792
Epoch [1/10], Iter [1453/3125], train_loss:0.074898
Epoch [1/10], Iter [1454/3125], train_loss:0.082731
Epoch [1/10], Iter [1455/3125], train_loss:0.051062
Epoch [1/10], Iter [1456/3125], train_loss:0.081949
Epoch [1/10], Iter [1457/3125], train_loss:0.048781
Epoch [1/10], Iter [1458/3125], train_loss:0.031672
Epoch [1/10], Iter [1459/3125], train_loss:0.081797
Epoch [1/10], Iter [1460/3125], train_loss:0.043624
Epoch [1/10], Iter [1461/3125], train_loss:0.042655
Epoch [1/10], Iter [1462/3125], train_loss:0.065425
Epoch [1/10], Iter [1463/3125], train_loss:0.051312
Epoch [1/10], Iter [1464/3125], train_loss:0.069975
Epoch [1/10], Iter [1465/3125], train_loss:0.054417
Epoch [1/10], Iter [1466/3125], train_loss:0.068450
Epoch [1/10], Iter [1467/3125], train_loss:0.055852
Epoch [1/10], Iter [1468/3125], train_loss:0.056495
Epoch [1/10], Iter [1469/3125], train_loss:0.048216
Epoch [1/10], Iter [1470/3125], train_loss:0.116062
Epoch [1/10], Iter [1471/3125], train_loss:0.076963
Epoch [1/10], Iter [1472/3125], train_loss:0.061780
Epoch [1/10], Iter [1473/3125], train_loss:0.057824
Epoch [1/10], Iter [1474/3125], train_loss:0.051863
Epoch [1/10], Iter [1475/3125], train_loss:0.064877
Epoch [1/10], Iter [1476/3125], train_loss:0.026023
Epoch [1/10], Iter [1477/3125], train_loss:0.071512
Epoch [1/10], Iter [1478/3125], train_loss:0.046893
Epoch [1/10], Iter [1479/3125], train_loss:0.086675
Epoch [1/10], Iter [1480/3125], train_loss:0.056367
Epoch [1/10], Iter [1481/3125], train_loss:0.086944
Epoch [1/10], Iter [1482/3125], train_loss:0.059426
Epoch [1/10], Iter [1483/3125], train_loss:0.062180
Epoch [1/10], Iter [1484/3125], train_loss:0.036093
Epoch [1/10], Iter [1485/3125], train_loss:0.053832
Epoch [1/10], Iter [1486/3125], train_loss:0.059764
Epoch [1/10], Iter [1487/3125], train_loss:0.069709
Epoch [1/10], Iter [1488/3125], train_loss:0.058866
Epoch [1/10], Iter [1489/3125], train_loss:0.042857
Epoch [1/10], Iter [1490/3125], train_loss:0.051318
Epoch [1/10], Iter [1491/3125], train_loss:0.046036
Epoch [1/10], Iter [1492/3125], train_loss:0.067652
Epoch [1/10], Iter [1493/3125], train_loss:0.068058
Epoch [1/10], Iter [1494/3125], train_loss:0.058382
Epoch [1/10], Iter [1495/3125], train_loss:0.071653
Epoch [1/10], Iter [1496/3125], train_loss:0.030701
Epoch [1/10], Iter [1497/3125], train_loss:0.085657
Epoch [1/10], Iter [1498/3125], train_loss:0.051193
Epoch [1/10], Iter [1499/3125], train_loss:0.047368
Epoch [1/10], Iter [1500/3125], train_loss:0.056843
Epoch [1/10], Iter [1501/3125], train_loss:0.077672
Epoch [1/10], Iter [1502/3125], train_loss:0.046002
Epoch [1/10], Iter [1503/3125], train_loss:0.050379
Epoch [1/10], Iter [1504/3125], train_loss:0.067272
Epoch [1/10], Iter [1505/3125], train_loss:0.039557
Epoch [1/10], Iter [1506/3125], train_loss:0.072687
Epoch [1/10], Iter [1507/3125], train_loss:0.049326
Epoch [1/10], Iter [1508/3125], train_loss:0.072209
Epoch [1/10], Iter [1509/3125], train_loss:0.092582
Epoch [1/10], Iter [1510/3125], train_loss:0.049500
Epoch [1/10], Iter [1511/3125], train_loss:0.037127
Epoch [1/10], Iter [1512/3125], train_loss:0.062338
Epoch [1/10], Iter [1513/3125], train_loss:0.047520
Epoch [1/10], Iter [1514/3125], train_loss:0.069938
Epoch [1/10], Iter [1515/3125], train_loss:0.058069
Epoch [1/10], Iter [1516/3125], train_loss:0.070114
Epoch [1/10], Iter [1517/3125], train_loss:0.071238
Epoch [1/10], Iter [1518/3125], train_loss:0.036374
Epoch [1/10], Iter [1519/3125], train_loss:0.067921
Epoch [1/10], Iter [1520/3125], train_loss:0.103123
Epoch [1/10], Iter [1521/3125], train_loss:0.084642
Epoch [1/10], Iter [1522/3125], train_loss:0.052527
Epoch [1/10], Iter [1523/3125], train_loss:0.060209
Epoch [1/10], Iter [1524/3125], train_loss:0.078986
Epoch [1/10], Iter [1525/3125], train_loss:0.055619
Epoch [1/10], Iter [1526/3125], train_loss:0.035694
Epoch [1/10], Iter [1527/3125], train_loss:0.067099
Epoch [1/10], Iter [1528/3125], train_loss:0.058410
Epoch [1/10], Iter [1529/3125], train_loss:0.073605
Epoch [1/10], Iter [1530/3125], train_loss:0.048546
Epoch [1/10], Iter [1531/3125], train_loss:0.059657
Epoch [1/10], Iter [1532/3125], train_loss:0.064168
Epoch [1/10], Iter [1533/3125], train_loss:0.037178
Epoch [1/10], Iter [1534/3125], train_loss:0.053720
Epoch [1/10], Iter [1535/3125], train_loss:0.076513
Epoch [1/10], Iter [1536/3125], train_loss:0.058834
Epoch [1/10], Iter [1537/3125], train_loss:0.071573
Epoch [1/10], Iter [1538/3125], train_loss:0.060269
Epoch [1/10], Iter [1539/3125], train_loss:0.052749
Epoch [1/10], Iter [1540/3125], train_loss:0.037708
Epoch [1/10], Iter [1541/3125], train_loss:0.066439
Epoch [1/10], Iter [1542/3125], train_loss:0.090691
Epoch [1/10], Iter [1543/3125], train_loss:0.056245
Epoch [1/10], Iter [1544/3125], train_loss:0.055924
Epoch [1/10], Iter [1545/3125], train_loss:0.041803
Epoch [1/10], Iter [1546/3125], train_loss:0.048068
Epoch [1/10], Iter [1547/3125], train_loss:0.036092
Epoch [1/10], Iter [1548/3125], train_loss:0.043875
Epoch [1/10], Iter [1549/3125], train_loss:0.079322
Epoch [1/10], Iter [1550/3125], train_loss:0.039852
Epoch [1/10], Iter [1551/3125], train_loss:0.103905
Epoch [1/10], Iter [1552/3125], train_loss:0.091744
Epoch [1/10], Iter [1553/3125], train_loss:0.055681
Epoch [1/10], Iter [1554/3125], train_loss:0.092191
Epoch [1/10], Iter [1555/3125], train_loss:0.062235
Epoch [1/10], Iter [1556/3125], train_loss:0.057970
Epoch [1/10], Iter [1557/3125], train_loss:0.067547
Epoch [1/10], Iter [1558/3125], train_loss:0.055146
Epoch [1/10], Iter [1559/3125], train_loss:0.054776
Epoch [1/10], Iter [1560/3125], train_loss:0.027517
Epoch [1/10], Iter [1561/3125], train_loss:0.072663
Epoch [1/10], Iter [1562/3125], train_loss:0.058465
Epoch [1/10], Iter [1563/3125], train_loss:0.046655
Epoch [1/10], Iter [1564/3125], train_loss:0.119325
Epoch [1/10], Iter [1565/3125], train_loss:0.054731
Epoch [1/10], Iter [1566/3125], train_loss:0.081642
Epoch [1/10], Iter [1567/3125], train_loss:0.048881
Epoch [1/10], Iter [1568/3125], train_loss:0.058173
Epoch [1/10], Iter [1569/3125], train_loss:0.069358
Epoch [1/10], Iter [1570/3125], train_loss:0.061475
Epoch [1/10], Iter [1571/3125], train_loss:0.065325
Epoch [1/10], Iter [1572/3125], train_loss:0.070670
Epoch [1/10], Iter [1573/3125], train_loss:0.081902
Epoch [1/10], Iter [1574/3125], train_loss:0.049094
Epoch [1/10], Iter [1575/3125], train_loss:0.056214
Epoch [1/10], Iter [1576/3125], train_loss:0.069279
Epoch [1/10], Iter [1577/3125], train_loss:0.056715
Epoch [1/10], Iter [1578/3125], train_loss:0.099390
Epoch [1/10], Iter [1579/3125], train_loss:0.051443
Epoch [1/10], Iter [1580/3125], train_loss:0.066337
Epoch [1/10], Iter [1581/3125], train_loss:0.032681
Epoch [1/10], Iter [1582/3125], train_loss:0.036135
Epoch [1/10], Iter [1583/3125], train_loss:0.133781
Epoch [1/10], Iter [1584/3125], train_loss:0.039585
Epoch [1/10], Iter [1585/3125], train_loss:0.040581
Epoch [1/10], Iter [1586/3125], train_loss:0.045098
Epoch [1/10], Iter [1587/3125], train_loss:0.079372
Epoch [1/10], Iter [1588/3125], train_loss:0.083663
Epoch [1/10], Iter [1589/3125], train_loss:0.057084
Epoch [1/10], Iter [1590/3125], train_loss:0.070563
Epoch [1/10], Iter [1591/3125], train_loss:0.065010
Epoch [1/10], Iter [1592/3125], train_loss:0.047786
Epoch [1/10], Iter [1593/3125], train_loss:0.060590
Epoch [1/10], Iter [1594/3125], train_loss:0.081765
Epoch [1/10], Iter [1595/3125], train_loss:0.056855
Epoch [1/10], Iter [1596/3125], train_loss:0.039855
Epoch [1/10], Iter [1597/3125], train_loss:0.046420
Epoch [1/10], Iter [1598/3125], train_loss:0.043999
Epoch [1/10], Iter [1599/3125], train_loss:0.046221
Epoch [1/10], Iter [1600/3125], train_loss:0.064322
Epoch [1/10], Iter [1601/3125], train_loss:0.026215
Epoch [1/10], Iter [1602/3125], train_loss:0.035398
Epoch [1/10], Iter [1603/3125], train_loss:0.082975
Epoch [1/10], Iter [1604/3125], train_loss:0.069643
Epoch [1/10], Iter [1605/3125], train_loss:0.074299
Epoch [1/10], Iter [1606/3125], train_loss:0.036288
Epoch [1/10], Iter [1607/3125], train_loss:0.089655
Epoch [1/10], Iter [1608/3125], train_loss:0.052850
Epoch [1/10], Iter [1609/3125], train_loss:0.103227
Epoch [1/10], Iter [1610/3125], train_loss:0.021318
Epoch [1/10], Iter [1611/3125], train_loss:0.053062
Epoch [1/10], Iter [1612/3125], train_loss:0.064742
Epoch [1/10], Iter [1613/3125], train_loss:0.041883
Epoch [1/10], Iter [1614/3125], train_loss:0.046411
Epoch [1/10], Iter [1615/3125], train_loss:0.058942
Epoch [1/10], Iter [1616/3125], train_loss:0.044977
Epoch [1/10], Iter [1617/3125], train_loss:0.041410
Epoch [1/10], Iter [1618/3125], train_loss:0.084004
Epoch [1/10], Iter [1619/3125], train_loss:0.064973
Epoch [1/10], Iter [1620/3125], train_loss:0.083455
Epoch [1/10], Iter [1621/3125], train_loss:0.061671
Epoch [1/10], Iter [1622/3125], train_loss:0.040480
Epoch [1/10], Iter [1623/3125], train_loss:0.058023
Epoch [1/10], Iter [1624/3125], train_loss:0.059297
Epoch [1/10], Iter [1625/3125], train_loss:0.056020
Epoch [1/10], Iter [1626/3125], train_loss:0.070588
Epoch [1/10], Iter [1627/3125], train_loss:0.057357
Epoch [1/10], Iter [1628/3125], train_loss:0.056434
Epoch [1/10], Iter [1629/3125], train_loss:0.063109
Epoch [1/10], Iter [1630/3125], train_loss:0.088339
Epoch [1/10], Iter [1631/3125], train_loss:0.098464
Epoch [1/10], Iter [1632/3125], train_loss:0.085437
Epoch [1/10], Iter [1633/3125], train_loss:0.056909
Epoch [1/10], Iter [1634/3125], train_loss:0.044746
Epoch [1/10], Iter [1635/3125], train_loss:0.058112
Epoch [1/10], Iter [1636/3125], train_loss:0.051674
Epoch [1/10], Iter [1637/3125], train_loss:0.073020
Epoch [1/10], Iter [1638/3125], train_loss:0.054744
Epoch [1/10], Iter [1639/3125], train_loss:0.020978
Epoch [1/10], Iter [1640/3125], train_loss:0.040359
Epoch [1/10], Iter [1641/3125], train_loss:0.078304
Epoch [1/10], Iter [1642/3125], train_loss:0.042950
Epoch [1/10], Iter [1643/3125], train_loss:0.035843
Epoch [1/10], Iter [1644/3125], train_loss:0.075233
Epoch [1/10], Iter [1645/3125], train_loss:0.057683
Epoch [1/10], Iter [1646/3125], train_loss:0.058583
Epoch [1/10], Iter [1647/3125], train_loss:0.054886
Epoch [1/10], Iter [1648/3125], train_loss:0.074777
Epoch [1/10], Iter [1649/3125], train_loss:0.035126
Epoch [1/10], Iter [1650/3125], train_loss:0.030282
Epoch [1/10], Iter [1651/3125], train_loss:0.065689
Epoch [1/10], Iter [1652/3125], train_loss:0.038346
Epoch [1/10], Iter [1653/3125], train_loss:0.077780
Epoch [1/10], Iter [1654/3125], train_loss:0.057102
Epoch [1/10], Iter [1655/3125], train_loss:0.054383
Epoch [1/10], Iter [1656/3125], train_loss:0.033800
Epoch [1/10], Iter [1657/3125], train_loss:0.047648
Epoch [1/10], Iter [1658/3125], train_loss:0.040589
Epoch [1/10], Iter [1659/3125], train_loss:0.057799
Epoch [1/10], Iter [1660/3125], train_loss:0.060077
Epoch [1/10], Iter [1661/3125], train_loss:0.045393
Epoch [1/10], Iter [1662/3125], train_loss:0.051922
Epoch [1/10], Iter [1663/3125], train_loss:0.122704
Epoch [1/10], Iter [1664/3125], train_loss:0.048353
Epoch [1/10], Iter [1665/3125], train_loss:0.021179
Epoch [1/10], Iter [1666/3125], train_loss:0.076526
Epoch [1/10], Iter [1667/3125], train_loss:0.079436
Epoch [1/10], Iter [1668/3125], train_loss:0.039214
Epoch [1/10], Iter [1669/3125], train_loss:0.042830
Epoch [1/10], Iter [1670/3125], train_loss:0.042728
Epoch [1/10], Iter [1671/3125], train_loss:0.048967
Epoch [1/10], Iter [1672/3125], train_loss:0.054698
Epoch [1/10], Iter [1673/3125], train_loss:0.041978
Epoch [1/10], Iter [1674/3125], train_loss:0.073049
Epoch [1/10], Iter [1675/3125], train_loss:0.037080
Epoch [1/10], Iter [1676/3125], train_loss:0.027289
Epoch [1/10], Iter [1677/3125], train_loss:0.060551
Epoch [1/10], Iter [1678/3125], train_loss:0.045196
Epoch [1/10], Iter [1679/3125], train_loss:0.080010
Epoch [1/10], Iter [1680/3125], train_loss:0.053764
Epoch [1/10], Iter [1681/3125], train_loss:0.073596
Epoch [1/10], Iter [1682/3125], train_loss:0.070110
Epoch [1/10], Iter [1683/3125], train_loss:0.047264
Epoch [1/10], Iter [1684/3125], train_loss:0.061473
Epoch [1/10], Iter [1685/3125], train_loss:0.041371
Epoch [1/10], Iter [1686/3125], train_loss:0.049107
Epoch [1/10], Iter [1687/3125], train_loss:0.051743
Epoch [1/10], Iter [1688/3125], train_loss:0.109640
Epoch [1/10], Iter [1689/3125], train_loss:0.048228
Epoch [1/10], Iter [1690/3125], train_loss:0.050521
Epoch [1/10], Iter [1691/3125], train_loss:0.079257
Epoch [1/10], Iter [1692/3125], train_loss:0.042919
Epoch [1/10], Iter [1693/3125], train_loss:0.058962
Epoch [1/10], Iter [1694/3125], train_loss:0.072977
Epoch [1/10], Iter [1695/3125], train_loss:0.029940
Epoch [1/10], Iter [1696/3125], train_loss:0.072861
Epoch [1/10], Iter [1697/3125], train_loss:0.075670
Epoch [1/10], Iter [1698/3125], train_loss:0.065588
Epoch [1/10], Iter [1699/3125], train_loss:0.067763
Epoch [1/10], Iter [1700/3125], train_loss:0.037320
Epoch [1/10], Iter [1701/3125], train_loss:0.084554
Epoch [1/10], Iter [1702/3125], train_loss:0.046403
Epoch [1/10], Iter [1703/3125], train_loss:0.040859
Epoch [1/10], Iter [1704/3125], train_loss:0.058458
Epoch [1/10], Iter [1705/3125], train_loss:0.066891
Epoch [1/10], Iter [1706/3125], train_loss:0.100955
Epoch [1/10], Iter [1707/3125], train_loss:0.062376
Epoch [1/10], Iter [1708/3125], train_loss:0.068730
Epoch [1/10], Iter [1709/3125], train_loss:0.038045
Epoch [1/10], Iter [1710/3125], train_loss:0.060304
Epoch [1/10], Iter [1711/3125], train_loss:0.046575
Epoch [1/10], Iter [1712/3125], train_loss:0.048462
Epoch [1/10], Iter [1713/3125], train_loss:0.072498
Epoch [1/10], Iter [1714/3125], train_loss:0.052895
Epoch [1/10], Iter [1715/3125], train_loss:0.065395
Epoch [1/10], Iter [1716/3125], train_loss:0.076119
Epoch [1/10], Iter [1717/3125], train_loss:0.084909
Epoch [1/10], Iter [1718/3125], train_loss:0.058882
Epoch [1/10], Iter [1719/3125], train_loss:0.064582
Epoch [1/10], Iter [1720/3125], train_loss:0.056367
Epoch [1/10], Iter [1721/3125], train_loss:0.059624
Epoch [1/10], Iter [1722/3125], train_loss:0.058548
Epoch [1/10], Iter [1723/3125], train_loss:0.071492
Epoch [1/10], Iter [1724/3125], train_loss:0.087462
Epoch [1/10], Iter [1725/3125], train_loss:0.038312
Epoch [1/10], Iter [1726/3125], train_loss:0.039811
Epoch [1/10], Iter [1727/3125], train_loss:0.047398
Epoch [1/10], Iter [1728/3125], train_loss:0.054377
Epoch [1/10], Iter [1729/3125], train_loss:0.061826
Epoch [1/10], Iter [1730/3125], train_loss:0.051879
Epoch [1/10], Iter [1731/3125], train_loss:0.105766
Epoch [1/10], Iter [1732/3125], train_loss:0.058592
Epoch [1/10], Iter [1733/3125], train_loss:0.058135
Epoch [1/10], Iter [1734/3125], train_loss:0.077106
Epoch [1/10], Iter [1735/3125], train_loss:0.053300
Epoch [1/10], Iter [1736/3125], train_loss:0.099648
Epoch [1/10], Iter [1737/3125], train_loss:0.038420
Epoch [1/10], Iter [1738/3125], train_loss:0.074359
Epoch [1/10], Iter [1739/3125], train_loss:0.075496
Epoch [1/10], Iter [1740/3125], train_loss:0.026707
Epoch [1/10], Iter [1741/3125], train_loss:0.051810
Epoch [1/10], Iter [1742/3125], train_loss:0.061063
Epoch [1/10], Iter [1743/3125], train_loss:0.070292
Epoch [1/10], Iter [1744/3125], train_loss:0.042350
Epoch [1/10], Iter [1745/3125], train_loss:0.059614
Epoch [1/10], Iter [1746/3125], train_loss:0.025684
Epoch [1/10], Iter [1747/3125], train_loss:0.044094
Epoch [1/10], Iter [1748/3125], train_loss:0.039633
Epoch [1/10], Iter [1749/3125], train_loss:0.061609
Epoch [1/10], Iter [1750/3125], train_loss:0.059462
Epoch [1/10], Iter [1751/3125], train_loss:0.085215
Epoch [1/10], Iter [1752/3125], train_loss:0.061459
Epoch [1/10], Iter [1753/3125], train_loss:0.051309
Epoch [1/10], Iter [1754/3125], train_loss:0.055947
Epoch [1/10], Iter [1755/3125], train_loss:0.082786
Epoch [1/10], Iter [1756/3125], train_loss:0.097624
Epoch [1/10], Iter [1757/3125], train_loss:0.061017
Epoch [1/10], Iter [1758/3125], train_loss:0.070072
Epoch [1/10], Iter [1759/3125], train_loss:0.075882
Epoch [1/10], Iter [1760/3125], train_loss:0.039222
Epoch [1/10], Iter [1761/3125], train_loss:0.071271
Epoch [1/10], Iter [1762/3125], train_loss:0.043728
Epoch [1/10], Iter [1763/3125], train_loss:0.060507
Epoch [1/10], Iter [1764/3125], train_loss:0.072506
Epoch [1/10], Iter [1765/3125], train_loss:0.056758
Epoch [1/10], Iter [1766/3125], train_loss:0.043773
Epoch [1/10], Iter [1767/3125], train_loss:0.053143
Epoch [1/10], Iter [1768/3125], train_loss:0.092098
Epoch [1/10], Iter [1769/3125], train_loss:0.027869
Epoch [1/10], Iter [1770/3125], train_loss:0.057473
Epoch [1/10], Iter [1771/3125], train_loss:0.060365
Epoch [1/10], Iter [1772/3125], train_loss:0.040789
Epoch [1/10], Iter [1773/3125], train_loss:0.064049
Epoch [1/10], Iter [1774/3125], train_loss:0.063056
Epoch [1/10], Iter [1775/3125], train_loss:0.051557
Epoch [1/10], Iter [1776/3125], train_loss:0.054645
Epoch [1/10], Iter [1777/3125], train_loss:0.039127
Epoch [1/10], Iter [1778/3125], train_loss:0.024407
Epoch [1/10], Iter [1779/3125], train_loss:0.052543
Epoch [1/10], Iter [1780/3125], train_loss:0.046873
Epoch [1/10], Iter [1781/3125], train_loss:0.041262
Epoch [1/10], Iter [1782/3125], train_loss:0.080122
Epoch [1/10], Iter [1783/3125], train_loss:0.050520
Epoch [1/10], Iter [1784/3125], train_loss:0.055967
Epoch [1/10], Iter [1785/3125], train_loss:0.035253
Epoch [1/10], Iter [1786/3125], train_loss:0.079063
Epoch [1/10], Iter [1787/3125], train_loss:0.074867
Epoch [1/10], Iter [1788/3125], train_loss:0.055334
Epoch [1/10], Iter [1789/3125], train_loss:0.057995
Epoch [1/10], Iter [1790/3125], train_loss:0.040717
Epoch [1/10], Iter [1791/3125], train_loss:0.077024
Epoch [1/10], Iter [1792/3125], train_loss:0.050221
Epoch [1/10], Iter [1793/3125], train_loss:0.094391
Epoch [1/10], Iter [1794/3125], train_loss:0.074695
Epoch [1/10], Iter [1795/3125], train_loss:0.058015
Epoch [1/10], Iter [1796/3125], train_loss:0.047358
Epoch [1/10], Iter [1797/3125], train_loss:0.065972
Epoch [1/10], Iter [1798/3125], train_loss:0.045176
Epoch [1/10], Iter [1799/3125], train_loss:0.038734
Epoch [1/10], Iter [1800/3125], train_loss:0.066014
Epoch [1/10], Iter [1801/3125], train_loss:0.046584
Epoch [1/10], Iter [1802/3125], train_loss:0.057352
Epoch [1/10], Iter [1803/3125], train_loss:0.036245
Epoch [1/10], Iter [1804/3125], train_loss:0.040863
Epoch [1/10], Iter [1805/3125], train_loss:0.120763
Epoch [1/10], Iter [1806/3125], train_loss:0.031612
Epoch [1/10], Iter [1807/3125], train_loss:0.073508
Epoch [1/10], Iter [1808/3125], train_loss:0.059417
Epoch [1/10], Iter [1809/3125], train_loss:0.072521
Epoch [1/10], Iter [1810/3125], train_loss:0.063052
Epoch [1/10], Iter [1811/3125], train_loss:0.059529
Epoch [1/10], Iter [1812/3125], train_loss:0.046363
Epoch [1/10], Iter [1813/3125], train_loss:0.073090
Epoch [1/10], Iter [1814/3125], train_loss:0.034225
Epoch [1/10], Iter [1815/3125], train_loss:0.085764
Epoch [1/10], Iter [1816/3125], train_loss:0.046848
Epoch [1/10], Iter [1817/3125], train_loss:0.059717
Epoch [1/10], Iter [1818/3125], train_loss:0.047675
Epoch [1/10], Iter [1819/3125], train_loss:0.084691
Epoch [1/10], Iter [1820/3125], train_loss:0.079962
Epoch [1/10], Iter [1821/3125], train_loss:0.089780
Epoch [1/10], Iter [1822/3125], train_loss:0.060596
Epoch [1/10], Iter [1823/3125], train_loss:0.049416
Epoch [1/10], Iter [1824/3125], train_loss:0.091829
Epoch [1/10], Iter [1825/3125], train_loss:0.086237
Epoch [1/10], Iter [1826/3125], train_loss:0.051125
Epoch [1/10], Iter [1827/3125], train_loss:0.097379
Epoch [1/10], Iter [1828/3125], train_loss:0.102906
Epoch [1/10], Iter [1829/3125], train_loss:0.080723
Epoch [1/10], Iter [1830/3125], train_loss:0.040206
Epoch [1/10], Iter [1831/3125], train_loss:0.059156
Epoch [1/10], Iter [1832/3125], train_loss:0.043076
Epoch [1/10], Iter [1833/3125], train_loss:0.029663
Epoch [1/10], Iter [1834/3125], train_loss:0.051820
Epoch [1/10], Iter [1835/3125], train_loss:0.068084
Epoch [1/10], Iter [1836/3125], train_loss:0.036504
Epoch [1/10], Iter [1837/3125], train_loss:0.048193
Epoch [1/10], Iter [1838/3125], train_loss:0.053339
Epoch [1/10], Iter [1839/3125], train_loss:0.051840
Epoch [1/10], Iter [1840/3125], train_loss:0.019614
Epoch [1/10], Iter [1841/3125], train_loss:0.055469
Epoch [1/10], Iter [1842/3125], train_loss:0.069309
Epoch [1/10], Iter [1843/3125], train_loss:0.077044
Epoch [1/10], Iter [1844/3125], train_loss:0.091119
Epoch [1/10], Iter [1845/3125], train_loss:0.056013
Epoch [1/10], Iter [1846/3125], train_loss:0.052507
Epoch [1/10], Iter [1847/3125], train_loss:0.079659
Epoch [1/10], Iter [1848/3125], train_loss:0.053403
Epoch [1/10], Iter [1849/3125], train_loss:0.077848
Epoch [1/10], Iter [1850/3125], train_loss:0.051112
Epoch [1/10], Iter [1851/3125], train_loss:0.046792
Epoch [1/10], Iter [1852/3125], train_loss:0.041306
Epoch [1/10], Iter [1853/3125], train_loss:0.043293
Epoch [1/10], Iter [1854/3125], train_loss:0.051519
Epoch [1/10], Iter [1855/3125], train_loss:0.055836
Epoch [1/10], Iter [1856/3125], train_loss:0.047736
Epoch [1/10], Iter [1857/3125], train_loss:0.069006
Epoch [1/10], Iter [1858/3125], train_loss:0.046833
Epoch [1/10], Iter [1859/3125], train_loss:0.112520
Epoch [1/10], Iter [1860/3125], train_loss:0.049536
Epoch [1/10], Iter [1861/3125], train_loss:0.054126
Epoch [1/10], Iter [1862/3125], train_loss:0.079082
Epoch [1/10], Iter [1863/3125], train_loss:0.046699
Epoch [1/10], Iter [1864/3125], train_loss:0.042452
Epoch [1/10], Iter [1865/3125], train_loss:0.050977
Epoch [1/10], Iter [1866/3125], train_loss:0.037490
Epoch [1/10], Iter [1867/3125], train_loss:0.044270
Epoch [1/10], Iter [1868/3125], train_loss:0.022775
Epoch [1/10], Iter [1869/3125], train_loss:0.048254
Epoch [1/10], Iter [1870/3125], train_loss:0.047147
Epoch [1/10], Iter [1871/3125], train_loss:0.064558
Epoch [1/10], Iter [1872/3125], train_loss:0.033295
Epoch [1/10], Iter [1873/3125], train_loss:0.037831
Epoch [1/10], Iter [1874/3125], train_loss:0.035450
Epoch [1/10], Iter [1875/3125], train_loss:0.120475
Epoch [1/10], Iter [1876/3125], train_loss:0.065689
Epoch [1/10], Iter [1877/3125], train_loss:0.051821
Epoch [1/10], Iter [1878/3125], train_loss:0.030954
Epoch [1/10], Iter [1879/3125], train_loss:0.055886
Epoch [1/10], Iter [1880/3125], train_loss:0.046567
Epoch [1/10], Iter [1881/3125], train_loss:0.054960
Epoch [1/10], Iter [1882/3125], train_loss:0.060007
Epoch [1/10], Iter [1883/3125], train_loss:0.042093
Epoch [1/10], Iter [1884/3125], train_loss:0.042883
Epoch [1/10], Iter [1885/3125], train_loss:0.072663
Epoch [1/10], Iter [1886/3125], train_loss:0.047739
Epoch [1/10], Iter [1887/3125], train_loss:0.072337
Epoch [1/10], Iter [1888/3125], train_loss:0.032112
Epoch [1/10], Iter [1889/3125], train_loss:0.063742
Epoch [1/10], Iter [1890/3125], train_loss:0.126797
Epoch [1/10], Iter [1891/3125], train_loss:0.060045
Epoch [1/10], Iter [1892/3125], train_loss:0.050613
Epoch [1/10], Iter [1893/3125], train_loss:0.018665
Epoch [1/10], Iter [1894/3125], train_loss:0.118631
Epoch [1/10], Iter [1895/3125], train_loss:0.072257
Epoch [1/10], Iter [1896/3125], train_loss:0.048342
Epoch [1/10], Iter [1897/3125], train_loss:0.053053
Epoch [1/10], Iter [1898/3125], train_loss:0.046766
Epoch [1/10], Iter [1899/3125], train_loss:0.041298
Epoch [1/10], Iter [1900/3125], train_loss:0.039161
Epoch [1/10], Iter [1901/3125], train_loss:0.052756
Epoch [1/10], Iter [1902/3125], train_loss:0.088474
Epoch [1/10], Iter [1903/3125], train_loss:0.054476
Epoch [1/10], Iter [1904/3125], train_loss:0.074824
Epoch [1/10], Iter [1905/3125], train_loss:0.038476
Epoch [1/10], Iter [1906/3125], train_loss:0.034390
Epoch [1/10], Iter [1907/3125], train_loss:0.031541
Epoch [1/10], Iter [1908/3125], train_loss:0.042509
Epoch [1/10], Iter [1909/3125], train_loss:0.048603
Epoch [1/10], Iter [1910/3125], train_loss:0.033619
Epoch [1/10], Iter [1911/3125], train_loss:0.088345
Epoch [1/10], Iter [1912/3125], train_loss:0.073088
Epoch [1/10], Iter [1913/3125], train_loss:0.053431
Epoch [1/10], Iter [1914/3125], train_loss:0.074593
Epoch [1/10], Iter [1915/3125], train_loss:0.067950
Epoch [1/10], Iter [1916/3125], train_loss:0.036191
Epoch [1/10], Iter [1917/3125], train_loss:0.057052
Epoch [1/10], Iter [1918/3125], train_loss:0.062682
Epoch [1/10], Iter [1919/3125], train_loss:0.073875
Epoch [1/10], Iter [1920/3125], train_loss:0.059812
Epoch [1/10], Iter [1921/3125], train_loss:0.049579
Epoch [1/10], Iter [1922/3125], train_loss:0.111791
Epoch [1/10], Iter [1923/3125], train_loss:0.076176
Epoch [1/10], Iter [1924/3125], train_loss:0.049307
Epoch [1/10], Iter [1925/3125], train_loss:0.037029
Epoch [1/10], Iter [1926/3125], train_loss:0.078327
Epoch [1/10], Iter [1927/3125], train_loss:0.073983
Epoch [1/10], Iter [1928/3125], train_loss:0.071034
Epoch [1/10], Iter [1929/3125], train_loss:0.072575
Epoch [1/10], Iter [1930/3125], train_loss:0.035677
Epoch [1/10], Iter [1931/3125], train_loss:0.078652
Epoch [1/10], Iter [1932/3125], train_loss:0.050624
Epoch [1/10], Iter [1933/3125], train_loss:0.061268
Epoch [1/10], Iter [1934/3125], train_loss:0.030012
Epoch [1/10], Iter [1935/3125], train_loss:0.064447
Epoch [1/10], Iter [1936/3125], train_loss:0.067326
Epoch [1/10], Iter [1937/3125], train_loss:0.047509
Epoch [1/10], Iter [1938/3125], train_loss:0.080461
Epoch [1/10], Iter [1939/3125], train_loss:0.065088
Epoch [1/10], Iter [1940/3125], train_loss:0.045047
Epoch [1/10], Iter [1941/3125], train_loss:0.048151
Epoch [1/10], Iter [1942/3125], train_loss:0.041551
Epoch [1/10], Iter [1943/3125], train_loss:0.062923
Epoch [1/10], Iter [1944/3125], train_loss:0.047921
Epoch [1/10], Iter [1945/3125], train_loss:0.055047
Epoch [1/10], Iter [1946/3125], train_loss:0.047319
Epoch [1/10], Iter [1947/3125], train_loss:0.079555
Epoch [1/10], Iter [1948/3125], train_loss:0.060398
Epoch [1/10], Iter [1949/3125], train_loss:0.024709
Epoch [1/10], Iter [1950/3125], train_loss:0.057181
Epoch [1/10], Iter [1951/3125], train_loss:0.073039
Epoch [1/10], Iter [1952/3125], train_loss:0.080788
Epoch [1/10], Iter [1953/3125], train_loss:0.027360
Epoch [1/10], Iter [1954/3125], train_loss:0.099107
Epoch [1/10], Iter [1955/3125], train_loss:0.039013
Epoch [1/10], Iter [1956/3125], train_loss:0.085083
Epoch [1/10], Iter [1957/3125], train_loss:0.061486
Epoch [1/10], Iter [1958/3125], train_loss:0.054446
Epoch [1/10], Iter [1959/3125], train_loss:0.069039
Epoch [1/10], Iter [1960/3125], train_loss:0.040418
Epoch [1/10], Iter [1961/3125], train_loss:0.073553
Epoch [1/10], Iter [1962/3125], train_loss:0.045772
Epoch [1/10], Iter [1963/3125], train_loss:0.060261
Epoch [1/10], Iter [1964/3125], train_loss:0.065421
Epoch [1/10], Iter [1965/3125], train_loss:0.076194
Epoch [1/10], Iter [1966/3125], train_loss:0.064436
Epoch [1/10], Iter [1967/3125], train_loss:0.076793
Epoch [1/10], Iter [1968/3125], train_loss:0.055979
Epoch [1/10], Iter [1969/3125], train_loss:0.029151
Epoch [1/10], Iter [1970/3125], train_loss:0.038949
Epoch [1/10], Iter [1971/3125], train_loss:0.041652
Epoch [1/10], Iter [1972/3125], train_loss:0.057385
Epoch [1/10], Iter [1973/3125], train_loss:0.063295
Epoch [1/10], Iter [1974/3125], train_loss:0.065931
Epoch [1/10], Iter [1975/3125], train_loss:0.063027
Epoch [1/10], Iter [1976/3125], train_loss:0.069438
Epoch [1/10], Iter [1977/3125], train_loss:0.043597
Epoch [1/10], Iter [1978/3125], train_loss:0.077617
Epoch [1/10], Iter [1979/3125], train_loss:0.075510
Epoch [1/10], Iter [1980/3125], train_loss:0.064318
Epoch [1/10], Iter [1981/3125], train_loss:0.057600
Epoch [1/10], Iter [1982/3125], train_loss:0.051950
Epoch [1/10], Iter [1983/3125], train_loss:0.060522
Epoch [1/10], Iter [1984/3125], train_loss:0.043160
Epoch [1/10], Iter [1985/3125], train_loss:0.046968
Epoch [1/10], Iter [1986/3125], train_loss:0.030345
Epoch [1/10], Iter [1987/3125], train_loss:0.067975
Epoch [1/10], Iter [1988/3125], train_loss:0.070917
Epoch [1/10], Iter [1989/3125], train_loss:0.050825
Epoch [1/10], Iter [1990/3125], train_loss:0.056659
Epoch [1/10], Iter [1991/3125], train_loss:0.075110
Epoch [1/10], Iter [1992/3125], train_loss:0.018620
Epoch [1/10], Iter [1993/3125], train_loss:0.086012
Epoch [1/10], Iter [1994/3125], train_loss:0.061522
Epoch [1/10], Iter [1995/3125], train_loss:0.115937
Epoch [1/10], Iter [1996/3125], train_loss:0.045985
Epoch [1/10], Iter [1997/3125], train_loss:0.053937
Epoch [1/10], Iter [1998/3125], train_loss:0.070547
Epoch [1/10], Iter [1999/3125], train_loss:0.042071
Epoch [1/10], Iter [2000/3125], train_loss:0.043023
Epoch [1/10], Iter [2001/3125], train_loss:0.081274
Epoch [1/10], Iter [2002/3125], train_loss:0.066850
Epoch [1/10], Iter [2003/3125], train_loss:0.033427
Epoch [1/10], Iter [2004/3125], train_loss:0.061561
Epoch [1/10], Iter [2005/3125], train_loss:0.062892
Epoch [1/10], Iter [2006/3125], train_loss:0.029832
Epoch [1/10], Iter [2007/3125], train_loss:0.084254
Epoch [1/10], Iter [2008/3125], train_loss:0.086006
Epoch [1/10], Iter [2009/3125], train_loss:0.075942
Epoch [1/10], Iter [2010/3125], train_loss:0.086731
Epoch [1/10], Iter [2011/3125], train_loss:0.061293
Epoch [1/10], Iter [2012/3125], train_loss:0.031159
Epoch [1/10], Iter [2013/3125], train_loss:0.094308
Epoch [1/10], Iter [2014/3125], train_loss:0.058767
Epoch [1/10], Iter [2015/3125], train_loss:0.042780
Epoch [1/10], Iter [2016/3125], train_loss:0.053814
Epoch [1/10], Iter [2017/3125], train_loss:0.044383
Epoch [1/10], Iter [2018/3125], train_loss:0.054721
Epoch [1/10], Iter [2019/3125], train_loss:0.037710
Epoch [1/10], Iter [2020/3125], train_loss:0.050791
Epoch [1/10], Iter [2021/3125], train_loss:0.088299
Epoch [1/10], Iter [2022/3125], train_loss:0.023384
Epoch [1/10], Iter [2023/3125], train_loss:0.059585
Epoch [1/10], Iter [2024/3125], train_loss:0.047600
Epoch [1/10], Iter [2025/3125], train_loss:0.050966
Epoch [1/10], Iter [2026/3125], train_loss:0.069498
Epoch [1/10], Iter [2027/3125], train_loss:0.059679
Epoch [1/10], Iter [2028/3125], train_loss:0.054175
Epoch [1/10], Iter [2029/3125], train_loss:0.048971
Epoch [1/10], Iter [2030/3125], train_loss:0.055469
Epoch [1/10], Iter [2031/3125], train_loss:0.042843
Epoch [1/10], Iter [2032/3125], train_loss:0.054261
Epoch [1/10], Iter [2033/3125], train_loss:0.034696
Epoch [1/10], Iter [2034/3125], train_loss:0.050647
Epoch [1/10], Iter [2035/3125], train_loss:0.075666
Epoch [1/10], Iter [2036/3125], train_loss:0.082343
Epoch [1/10], Iter [2037/3125], train_loss:0.050409
Epoch [1/10], Iter [2038/3125], train_loss:0.050441
Epoch [1/10], Iter [2039/3125], train_loss:0.068800
Epoch [1/10], Iter [2040/3125], train_loss:0.064183
Epoch [1/10], Iter [2041/3125], train_loss:0.033020
Epoch [1/10], Iter [2042/3125], train_loss:0.068810
Epoch [1/10], Iter [2043/3125], train_loss:0.036257
Epoch [1/10], Iter [2044/3125], train_loss:0.060899
Epoch [1/10], Iter [2045/3125], train_loss:0.061538
Epoch [1/10], Iter [2046/3125], train_loss:0.044145
Epoch [1/10], Iter [2047/3125], train_loss:0.039485
Epoch [1/10], Iter [2048/3125], train_loss:0.042501
Epoch [1/10], Iter [2049/3125], train_loss:0.063631
Epoch [1/10], Iter [2050/3125], train_loss:0.046520
Epoch [1/10], Iter [2051/3125], train_loss:0.055999
Epoch [1/10], Iter [2052/3125], train_loss:0.063847
Epoch [1/10], Iter [2053/3125], train_loss:0.069343
Epoch [1/10], Iter [2054/3125], train_loss:0.052924
Epoch [1/10], Iter [2055/3125], train_loss:0.036919
Epoch [1/10], Iter [2056/3125], train_loss:0.054971
Epoch [1/10], Iter [2057/3125], train_loss:0.048387
Epoch [1/10], Iter [2058/3125], train_loss:0.084165
Epoch [1/10], Iter [2059/3125], train_loss:0.044616
Epoch [1/10], Iter [2060/3125], train_loss:0.033628
Epoch [1/10], Iter [2061/3125], train_loss:0.027558
Epoch [1/10], Iter [2062/3125], train_loss:0.055136
Epoch [1/10], Iter [2063/3125], train_loss:0.062519
Epoch [1/10], Iter [2064/3125], train_loss:0.050408
Epoch [1/10], Iter [2065/3125], train_loss:0.033982
Epoch [1/10], Iter [2066/3125], train_loss:0.087878
Epoch [1/10], Iter [2067/3125], train_loss:0.044555
Epoch [1/10], Iter [2068/3125], train_loss:0.036030
Epoch [1/10], Iter [2069/3125], train_loss:0.047172
Epoch [1/10], Iter [2070/3125], train_loss:0.057118
Epoch [1/10], Iter [2071/3125], train_loss:0.050927
Epoch [1/10], Iter [2072/3125], train_loss:0.055021
Epoch [1/10], Iter [2073/3125], train_loss:0.042873
Epoch [1/10], Iter [2074/3125], train_loss:0.069662
Epoch [1/10], Iter [2075/3125], train_loss:0.086718
Epoch [1/10], Iter [2076/3125], train_loss:0.060907
Epoch [1/10], Iter [2077/3125], train_loss:0.055302
Epoch [1/10], Iter [2078/3125], train_loss:0.063130
Epoch [1/10], Iter [2079/3125], train_loss:0.041546
Epoch [1/10], Iter [2080/3125], train_loss:0.079889
Epoch [1/10], Iter [2081/3125], train_loss:0.059205
Epoch [1/10], Iter [2082/3125], train_loss:0.077855
Epoch [1/10], Iter [2083/3125], train_loss:0.040796
Epoch [1/10], Iter [2084/3125], train_loss:0.063951
Epoch [1/10], Iter [2085/3125], train_loss:0.060815
Epoch [1/10], Iter [2086/3125], train_loss:0.105773
Epoch [1/10], Iter [2087/3125], train_loss:0.055865
Epoch [1/10], Iter [2088/3125], train_loss:0.058389
Epoch [1/10], Iter [2089/3125], train_loss:0.085886
Epoch [1/10], Iter [2090/3125], train_loss:0.037964
Epoch [1/10], Iter [2091/3125], train_loss:0.037571
Epoch [1/10], Iter [2092/3125], train_loss:0.051286
Epoch [1/10], Iter [2093/3125], train_loss:0.072742
Epoch [1/10], Iter [2094/3125], train_loss:0.027918
Epoch [1/10], Iter [2095/3125], train_loss:0.064145
Epoch [1/10], Iter [2096/3125], train_loss:0.062825
Epoch [1/10], Iter [2097/3125], train_loss:0.047760
Epoch [1/10], Iter [2098/3125], train_loss:0.051347
Epoch [1/10], Iter [2099/3125], train_loss:0.066230
Epoch [1/10], Iter [2100/3125], train_loss:0.062902
Epoch [1/10], Iter [2101/3125], train_loss:0.047526
Epoch [1/10], Iter [2102/3125], train_loss:0.039127
Epoch [1/10], Iter [2103/3125], train_loss:0.046777
Epoch [1/10], Iter [2104/3125], train_loss:0.059681
Epoch [1/10], Iter [2105/3125], train_loss:0.061811
Epoch [1/10], Iter [2106/3125], train_loss:0.039108
Epoch [1/10], Iter [2107/3125], train_loss:0.075459
Epoch [1/10], Iter [2108/3125], train_loss:0.063627
Epoch [1/10], Iter [2109/3125], train_loss:0.035721
Epoch [1/10], Iter [2110/3125], train_loss:0.060149
Epoch [1/10], Iter [2111/3125], train_loss:0.067085
Epoch [1/10], Iter [2112/3125], train_loss:0.059505
Epoch [1/10], Iter [2113/3125], train_loss:0.056017
Epoch [1/10], Iter [2114/3125], train_loss:0.020455
Epoch [1/10], Iter [2115/3125], train_loss:0.081689
Epoch [1/10], Iter [2116/3125], train_loss:0.039513
Epoch [1/10], Iter [2117/3125], train_loss:0.048386
Epoch [1/10], Iter [2118/3125], train_loss:0.059267
Epoch [1/10], Iter [2119/3125], train_loss:0.082934
Epoch [1/10], Iter [2120/3125], train_loss:0.060041
Epoch [1/10], Iter [2121/3125], train_loss:0.061388
Epoch [1/10], Iter [2122/3125], train_loss:0.042897
Epoch [1/10], Iter [2123/3125], train_loss:0.045056
Epoch [1/10], Iter [2124/3125], train_loss:0.060849
Epoch [1/10], Iter [2125/3125], train_loss:0.049667
Epoch [1/10], Iter [2126/3125], train_loss:0.048343
Epoch [1/10], Iter [2127/3125], train_loss:0.068228
Epoch [1/10], Iter [2128/3125], train_loss:0.037251
Epoch [1/10], Iter [2129/3125], train_loss:0.027494
Epoch [1/10], Iter [2130/3125], train_loss:0.064851
Epoch [1/10], Iter [2131/3125], train_loss:0.044079
Epoch [1/10], Iter [2132/3125], train_loss:0.058055
Epoch [1/10], Iter [2133/3125], train_loss:0.028688
Epoch [1/10], Iter [2134/3125], train_loss:0.063009
Epoch [1/10], Iter [2135/3125], train_loss:0.049375
Epoch [1/10], Iter [2136/3125], train_loss:0.070779
Epoch [1/10], Iter [2137/3125], train_loss:0.061121
Epoch [1/10], Iter [2138/3125], train_loss:0.045141
Epoch [1/10], Iter [2139/3125], train_loss:0.032898
Epoch [1/10], Iter [2140/3125], train_loss:0.044351
Epoch [1/10], Iter [2141/3125], train_loss:0.056783
Epoch [1/10], Iter [2142/3125], train_loss:0.056133
Epoch [1/10], Iter [2143/3125], train_loss:0.088715
Epoch [1/10], Iter [2144/3125], train_loss:0.068217
Epoch [1/10], Iter [2145/3125], train_loss:0.043055
Epoch [1/10], Iter [2146/3125], train_loss:0.032986
Epoch [1/10], Iter [2147/3125], train_loss:0.041009
Epoch [1/10], Iter [2148/3125], train_loss:0.044360
Epoch [1/10], Iter [2149/3125], train_loss:0.065169
Epoch [1/10], Iter [2150/3125], train_loss:0.075291
Epoch [1/10], Iter [2151/3125], train_loss:0.050981
Epoch [1/10], Iter [2152/3125], train_loss:0.062930
Epoch [1/10], Iter [2153/3125], train_loss:0.058825
Epoch [1/10], Iter [2154/3125], train_loss:0.076227
Epoch [1/10], Iter [2155/3125], train_loss:0.083203
Epoch [1/10], Iter [2156/3125], train_loss:0.063778
Epoch [1/10], Iter [2157/3125], train_loss:0.045961
Epoch [1/10], Iter [2158/3125], train_loss:0.070411
Epoch [1/10], Iter [2159/3125], train_loss:0.064471
Epoch [1/10], Iter [2160/3125], train_loss:0.056950
Epoch [1/10], Iter [2161/3125], train_loss:0.074447
Epoch [1/10], Iter [2162/3125], train_loss:0.052749
Epoch [1/10], Iter [2163/3125], train_loss:0.057865
Epoch [1/10], Iter [2164/3125], train_loss:0.037370
Epoch [1/10], Iter [2165/3125], train_loss:0.103615
Epoch [1/10], Iter [2166/3125], train_loss:0.076190
Epoch [1/10], Iter [2167/3125], train_loss:0.044481
Epoch [1/10], Iter [2168/3125], train_loss:0.050516
Epoch [1/10], Iter [2169/3125], train_loss:0.036114
Epoch [1/10], Iter [2170/3125], train_loss:0.037495
Epoch [1/10], Iter [2171/3125], train_loss:0.058162
Epoch [1/10], Iter [2172/3125], train_loss:0.072126
Epoch [1/10], Iter [2173/3125], train_loss:0.058480
Epoch [1/10], Iter [2174/3125], train_loss:0.057047
Epoch [1/10], Iter [2175/3125], train_loss:0.058543
Epoch [1/10], Iter [2176/3125], train_loss:0.044135
Epoch [1/10], Iter [2177/3125], train_loss:0.021453
Epoch [1/10], Iter [2178/3125], train_loss:0.091287
Epoch [1/10], Iter [2179/3125], train_loss:0.030686
Epoch [1/10], Iter [2180/3125], train_loss:0.043142
Epoch [1/10], Iter [2181/3125], train_loss:0.061297
Epoch [1/10], Iter [2182/3125], train_loss:0.052431
Epoch [1/10], Iter [2183/3125], train_loss:0.064683
Epoch [1/10], Iter [2184/3125], train_loss:0.052090
Epoch [1/10], Iter [2185/3125], train_loss:0.059552
Epoch [1/10], Iter [2186/3125], train_loss:0.043549
Epoch [1/10], Iter [2187/3125], train_loss:0.039106
Epoch [1/10], Iter [2188/3125], train_loss:0.033696
Epoch [1/10], Iter [2189/3125], train_loss:0.059473
Epoch [1/10], Iter [2190/3125], train_loss:0.042966
Epoch [1/10], Iter [2191/3125], train_loss:0.038413
Epoch [1/10], Iter [2192/3125], train_loss:0.048166
Epoch [1/10], Iter [2193/3125], train_loss:0.062529
Epoch [1/10], Iter [2194/3125], train_loss:0.063281
Epoch [1/10], Iter [2195/3125], train_loss:0.068794
Epoch [1/10], Iter [2196/3125], train_loss:0.060039
Epoch [1/10], Iter [2197/3125], train_loss:0.059375
Epoch [1/10], Iter [2198/3125], train_loss:0.052642
Epoch [1/10], Iter [2199/3125], train_loss:0.046952
Epoch [1/10], Iter [2200/3125], train_loss:0.071861
Epoch [1/10], Iter [2201/3125], train_loss:0.044257
Epoch [1/10], Iter [2202/3125], train_loss:0.057232
Epoch [1/10], Iter [2203/3125], train_loss:0.039750
Epoch [1/10], Iter [2204/3125], train_loss:0.074284
Epoch [1/10], Iter [2205/3125], train_loss:0.029797
Epoch [1/10], Iter [2206/3125], train_loss:0.058231
Epoch [1/10], Iter [2207/3125], train_loss:0.066111
Epoch [1/10], Iter [2208/3125], train_loss:0.067477
Epoch [1/10], Iter [2209/3125], train_loss:0.065425
Epoch [1/10], Iter [2210/3125], train_loss:0.039687
Epoch [1/10], Iter [2211/3125], train_loss:0.054980
Epoch [1/10], Iter [2212/3125], train_loss:0.052664
Epoch [1/10], Iter [2213/3125], train_loss:0.065844
Epoch [1/10], Iter [2214/3125], train_loss:0.094000
Epoch [1/10], Iter [2215/3125], train_loss:0.053468
Epoch [1/10], Iter [2216/3125], train_loss:0.061695
Epoch [1/10], Iter [2217/3125], train_loss:0.067787
Epoch [1/10], Iter [2218/3125], train_loss:0.035557
Epoch [1/10], Iter [2219/3125], train_loss:0.054791
Epoch [1/10], Iter [2220/3125], train_loss:0.074102
Epoch [1/10], Iter [2221/3125], train_loss:0.053827
Epoch [1/10], Iter [2222/3125], train_loss:0.064904
Epoch [1/10], Iter [2223/3125], train_loss:0.048594
Epoch [1/10], Iter [2224/3125], train_loss:0.038459
Epoch [1/10], Iter [2225/3125], train_loss:0.033388
Epoch [1/10], Iter [2226/3125], train_loss:0.053181
Epoch [1/10], Iter [2227/3125], train_loss:0.070912
Epoch [1/10], Iter [2228/3125], train_loss:0.087150
Epoch [1/10], Iter [2229/3125], train_loss:0.043372
Epoch [1/10], Iter [2230/3125], train_loss:0.053783
Epoch [1/10], Iter [2231/3125], train_loss:0.040672
Epoch [1/10], Iter [2232/3125], train_loss:0.045534
Epoch [1/10], Iter [2233/3125], train_loss:0.040906
Epoch [1/10], Iter [2234/3125], train_loss:0.046060
Epoch [1/10], Iter [2235/3125], train_loss:0.073936
Epoch [1/10], Iter [2236/3125], train_loss:0.048040
Epoch [1/10], Iter [2237/3125], train_loss:0.044033
Epoch [1/10], Iter [2238/3125], train_loss:0.058578
Epoch [1/10], Iter [2239/3125], train_loss:0.046442
Epoch [1/10], Iter [2240/3125], train_loss:0.070717
Epoch [1/10], Iter [2241/3125], train_loss:0.057559
Epoch [1/10], Iter [2242/3125], train_loss:0.071514
Epoch [1/10], Iter [2243/3125], train_loss:0.072684
Epoch [1/10], Iter [2244/3125], train_loss:0.071098
Epoch [1/10], Iter [2245/3125], train_loss:0.029106
Epoch [1/10], Iter [2246/3125], train_loss:0.047889
Epoch [1/10], Iter [2247/3125], train_loss:0.074630
Epoch [1/10], Iter [2248/3125], train_loss:0.039345
Epoch [1/10], Iter [2249/3125], train_loss:0.076240
Epoch [1/10], Iter [2250/3125], train_loss:0.046938
Epoch [1/10], Iter [2251/3125], train_loss:0.051236
Epoch [1/10], Iter [2252/3125], train_loss:0.060951
Epoch [1/10], Iter [2253/3125], train_loss:0.072658
Epoch [1/10], Iter [2254/3125], train_loss:0.072621
Epoch [1/10], Iter [2255/3125], train_loss:0.071780
Epoch [1/10], Iter [2256/3125], train_loss:0.047900
Epoch [1/10], Iter [2257/3125], train_loss:0.083139
Epoch [1/10], Iter [2258/3125], train_loss:0.042750
Epoch [1/10], Iter [2259/3125], train_loss:0.030537
Epoch [1/10], Iter [2260/3125], train_loss:0.071231
Epoch [1/10], Iter [2261/3125], train_loss:0.058627
Epoch [1/10], Iter [2262/3125], train_loss:0.061551
Epoch [1/10], Iter [2263/3125], train_loss:0.057065
Epoch [1/10], Iter [2264/3125], train_loss:0.063427
Epoch [1/10], Iter [2265/3125], train_loss:0.052468
Epoch [1/10], Iter [2266/3125], train_loss:0.052080
Epoch [1/10], Iter [2267/3125], train_loss:0.033376
Epoch [1/10], Iter [2268/3125], train_loss:0.041073
Epoch [1/10], Iter [2269/3125], train_loss:0.065047
Epoch [1/10], Iter [2270/3125], train_loss:0.062026
Epoch [1/10], Iter [2271/3125], train_loss:0.109442
Epoch [1/10], Iter [2272/3125], train_loss:0.056198
Epoch [1/10], Iter [2273/3125], train_loss:0.063348
Epoch [1/10], Iter [2274/3125], train_loss:0.039659
Epoch [1/10], Iter [2275/3125], train_loss:0.062523
Epoch [1/10], Iter [2276/3125], train_loss:0.057241
Epoch [1/10], Iter [2277/3125], train_loss:0.026030
Epoch [1/10], Iter [2278/3125], train_loss:0.060936
Epoch [1/10], Iter [2279/3125], train_loss:0.037769
Epoch [1/10], Iter [2280/3125], train_loss:0.047071
Epoch [1/10], Iter [2281/3125], train_loss:0.067723
Epoch [1/10], Iter [2282/3125], train_loss:0.071875
Epoch [1/10], Iter [2283/3125], train_loss:0.049202
Epoch [1/10], Iter [2284/3125], train_loss:0.060309
Epoch [1/10], Iter [2285/3125], train_loss:0.068315
Epoch [1/10], Iter [2286/3125], train_loss:0.072877
Epoch [1/10], Iter [2287/3125], train_loss:0.063042
Epoch [1/10], Iter [2288/3125], train_loss:0.078719
Epoch [1/10], Iter [2289/3125], train_loss:0.026097
Epoch [1/10], Iter [2290/3125], train_loss:0.060497
Epoch [1/10], Iter [2291/3125], train_loss:0.078648
Epoch [1/10], Iter [2292/3125], train_loss:0.068681
Epoch [1/10], Iter [2293/3125], train_loss:0.044549
Epoch [1/10], Iter [2294/3125], train_loss:0.079612
Epoch [1/10], Iter [2295/3125], train_loss:0.036360
Epoch [1/10], Iter [2296/3125], train_loss:0.029000
Epoch [1/10], Iter [2297/3125], train_loss:0.055833
Epoch [1/10], Iter [2298/3125], train_loss:0.078257
Epoch [1/10], Iter [2299/3125], train_loss:0.064521
Epoch [1/10], Iter [2300/3125], train_loss:0.053077
Epoch [1/10], Iter [2301/3125], train_loss:0.061464
Epoch [1/10], Iter [2302/3125], train_loss:0.054382
Epoch [1/10], Iter [2303/3125], train_loss:0.029077
Epoch [1/10], Iter [2304/3125], train_loss:0.047081
Epoch [1/10], Iter [2305/3125], train_loss:0.034250
Epoch [1/10], Iter [2306/3125], train_loss:0.067229
Epoch [1/10], Iter [2307/3125], train_loss:0.038814
Epoch [1/10], Iter [2308/3125], train_loss:0.059177
Epoch [1/10], Iter [2309/3125], train_loss:0.029574
Epoch [1/10], Iter [2310/3125], train_loss:0.034070
Epoch [1/10], Iter [2311/3125], train_loss:0.077129
Epoch [1/10], Iter [2312/3125], train_loss:0.036397
Epoch [1/10], Iter [2313/3125], train_loss:0.065701
Epoch [1/10], Iter [2314/3125], train_loss:0.044045
Epoch [1/10], Iter [2315/3125], train_loss:0.078438
Epoch [1/10], Iter [2316/3125], train_loss:0.099388
Epoch [1/10], Iter [2317/3125], train_loss:0.053328
Epoch [1/10], Iter [2318/3125], train_loss:0.033426
Epoch [1/10], Iter [2319/3125], train_loss:0.045820
Epoch [1/10], Iter [2320/3125], train_loss:0.071173
Epoch [1/10], Iter [2321/3125], train_loss:0.058071
Epoch [1/10], Iter [2322/3125], train_loss:0.032791
Epoch [1/10], Iter [2323/3125], train_loss:0.049563
Epoch [1/10], Iter [2324/3125], train_loss:0.037852
Epoch [1/10], Iter [2325/3125], train_loss:0.071495
Epoch [1/10], Iter [2326/3125], train_loss:0.051821
Epoch [1/10], Iter [2327/3125], train_loss:0.049604
Epoch [1/10], Iter [2328/3125], train_loss:0.084093
Epoch [1/10], Iter [2329/3125], train_loss:0.050646
Epoch [1/10], Iter [2330/3125], train_loss:0.035999
Epoch [1/10], Iter [2331/3125], train_loss:0.079603
Epoch [1/10], Iter [2332/3125], train_loss:0.036003
Epoch [1/10], Iter [2333/3125], train_loss:0.029306
Epoch [1/10], Iter [2334/3125], train_loss:0.080034
Epoch [1/10], Iter [2335/3125], train_loss:0.056424
Epoch [1/10], Iter [2336/3125], train_loss:0.067404
Epoch [1/10], Iter [2337/3125], train_loss:0.048945
Epoch [1/10], Iter [2338/3125], train_loss:0.034922
Epoch [1/10], Iter [2339/3125], train_loss:0.060189
Epoch [1/10], Iter [2340/3125], train_loss:0.041691
Epoch [1/10], Iter [2341/3125], train_loss:0.076982
Epoch [1/10], Iter [2342/3125], train_loss:0.075437
Epoch [1/10], Iter [2343/3125], train_loss:0.056825
Epoch [1/10], Iter [2344/3125], train_loss:0.038702
Epoch [1/10], Iter [2345/3125], train_loss:0.048160
Epoch [1/10], Iter [2346/3125], train_loss:0.054957
Epoch [1/10], Iter [2347/3125], train_loss:0.073520
Epoch [1/10], Iter [2348/3125], train_loss:0.025029
Epoch [1/10], Iter [2349/3125], train_loss:0.078251
Epoch [1/10], Iter [2350/3125], train_loss:0.058632
Epoch [1/10], Iter [2351/3125], train_loss:0.027224
Epoch [1/10], Iter [2352/3125], train_loss:0.078937
Epoch [1/10], Iter [2353/3125], train_loss:0.047743
Epoch [1/10], Iter [2354/3125], train_loss:0.051082
Epoch [1/10], Iter [2355/3125], train_loss:0.079061
Epoch [1/10], Iter [2356/3125], train_loss:0.073499
Epoch [1/10], Iter [2357/3125], train_loss:0.043175
Epoch [1/10], Iter [2358/3125], train_loss:0.056764
Epoch [1/10], Iter [2359/3125], train_loss:0.019714
Epoch [1/10], Iter [2360/3125], train_loss:0.063975
Epoch [1/10], Iter [2361/3125], train_loss:0.051211
Epoch [1/10], Iter [2362/3125], train_loss:0.057849
Epoch [1/10], Iter [2363/3125], train_loss:0.069020
Epoch [1/10], Iter [2364/3125], train_loss:0.062727
Epoch [1/10], Iter [2365/3125], train_loss:0.038595
Epoch [1/10], Iter [2366/3125], train_loss:0.029429
Epoch [1/10], Iter [2367/3125], train_loss:0.039399
Epoch [1/10], Iter [2368/3125], train_loss:0.065248
Epoch [1/10], Iter [2369/3125], train_loss:0.031663
Epoch [1/10], Iter [2370/3125], train_loss:0.027714
Epoch [1/10], Iter [2371/3125], train_loss:0.041660
Epoch [1/10], Iter [2372/3125], train_loss:0.023911
Epoch [1/10], Iter [2373/3125], train_loss:0.043590
Epoch [1/10], Iter [2374/3125], train_loss:0.027625
Epoch [1/10], Iter [2375/3125], train_loss:0.027970
Epoch [1/10], Iter [2376/3125], train_loss:0.086231
Epoch [1/10], Iter [2377/3125], train_loss:0.030232
Epoch [1/10], Iter [2378/3125], train_loss:0.048442
Epoch [1/10], Iter [2379/3125], train_loss:0.037288
Epoch [1/10], Iter [2380/3125], train_loss:0.036998
Epoch [1/10], Iter [2381/3125], train_loss:0.062230
Epoch [1/10], Iter [2382/3125], train_loss:0.077990
Epoch [1/10], Iter [2383/3125], train_loss:0.037560
Epoch [1/10], Iter [2384/3125], train_loss:0.060333
Epoch [1/10], Iter [2385/3125], train_loss:0.067466
Epoch [1/10], Iter [2386/3125], train_loss:0.044783
Epoch [1/10], Iter [2387/3125], train_loss:0.061185
Epoch [1/10], Iter [2388/3125], train_loss:0.020483
Epoch [1/10], Iter [2389/3125], train_loss:0.040517
Epoch [1/10], Iter [2390/3125], train_loss:0.080889
Epoch [1/10], Iter [2391/3125], train_loss:0.078674
Epoch [1/10], Iter [2392/3125], train_loss:0.038500
Epoch [1/10], Iter [2393/3125], train_loss:0.043009
Epoch [1/10], Iter [2394/3125], train_loss:0.045287
Epoch [1/10], Iter [2395/3125], train_loss:0.052948
Epoch [1/10], Iter [2396/3125], train_loss:0.096492
Epoch [1/10], Iter [2397/3125], train_loss:0.084607
Epoch [1/10], Iter [2398/3125], train_loss:0.018984
Epoch [1/10], Iter [2399/3125], train_loss:0.058866
Epoch [1/10], Iter [2400/3125], train_loss:0.054521
Epoch [1/10], Iter [2401/3125], train_loss:0.035970
Epoch [1/10], Iter [2402/3125], train_loss:0.083726
Epoch [1/10], Iter [2403/3125], train_loss:0.040679
Epoch [1/10], Iter [2404/3125], train_loss:0.065046
Epoch [1/10], Iter [2405/3125], train_loss:0.094652
Epoch [1/10], Iter [2406/3125], train_loss:0.059551
Epoch [1/10], Iter [2407/3125], train_loss:0.065810
Epoch [1/10], Iter [2408/3125], train_loss:0.050208
Epoch [1/10], Iter [2409/3125], train_loss:0.066216
Epoch [1/10], Iter [2410/3125], train_loss:0.058400
Epoch [1/10], Iter [2411/3125], train_loss:0.053513
Epoch [1/10], Iter [2412/3125], train_loss:0.060500
Epoch [1/10], Iter [2413/3125], train_loss:0.044563
Epoch [1/10], Iter [2414/3125], train_loss:0.029764
Epoch [1/10], Iter [2415/3125], train_loss:0.047340
Epoch [1/10], Iter [2416/3125], train_loss:0.035138
Epoch [1/10], Iter [2417/3125], train_loss:0.071377
Epoch [1/10], Iter [2418/3125], train_loss:0.024064
Epoch [1/10], Iter [2419/3125], train_loss:0.042528
Epoch [1/10], Iter [2420/3125], train_loss:0.043153
Epoch [1/10], Iter [2421/3125], train_loss:0.030465
Epoch [1/10], Iter [2422/3125], train_loss:0.072440
Epoch [1/10], Iter [2423/3125], train_loss:0.055920
Epoch [1/10], Iter [2424/3125], train_loss:0.035570
Epoch [1/10], Iter [2425/3125], train_loss:0.056007
Epoch [1/10], Iter [2426/3125], train_loss:0.041977
Epoch [1/10], Iter [2427/3125], train_loss:0.063373
Epoch [1/10], Iter [2428/3125], train_loss:0.052605
Epoch [1/10], Iter [2429/3125], train_loss:0.036802
Epoch [1/10], Iter [2430/3125], train_loss:0.034278
Epoch [1/10], Iter [2431/3125], train_loss:0.052479
Epoch [1/10], Iter [2432/3125], train_loss:0.039629
Epoch [1/10], Iter [2433/3125], train_loss:0.060461
Epoch [1/10], Iter [2434/3125], train_loss:0.022422
Epoch [1/10], Iter [2435/3125], train_loss:0.058592
Epoch [1/10], Iter [2436/3125], train_loss:0.085719
Epoch [1/10], Iter [2437/3125], train_loss:0.055790
Epoch [1/10], Iter [2438/3125], train_loss:0.033942
Epoch [1/10], Iter [2439/3125], train_loss:0.074614
Epoch [1/10], Iter [2440/3125], train_loss:0.042400
Epoch [1/10], Iter [2441/3125], train_loss:0.066518
Epoch [1/10], Iter [2442/3125], train_loss:0.084506
Epoch [1/10], Iter [2443/3125], train_loss:0.045445
Epoch [1/10], Iter [2444/3125], train_loss:0.058341
Epoch [1/10], Iter [2445/3125], train_loss:0.050448
Epoch [1/10], Iter [2446/3125], train_loss:0.053517
Epoch [1/10], Iter [2447/3125], train_loss:0.061119
Epoch [1/10], Iter [2448/3125], train_loss:0.067219
Epoch [1/10], Iter [2449/3125], train_loss:0.038764
Epoch [1/10], Iter [2450/3125], train_loss:0.050990
Epoch [1/10], Iter [2451/3125], train_loss:0.068929
Epoch [1/10], Iter [2452/3125], train_loss:0.112174
Epoch [1/10], Iter [2453/3125], train_loss:0.045488
Epoch [1/10], Iter [2454/3125], train_loss:0.034194
Epoch [1/10], Iter [2455/3125], train_loss:0.088972
Epoch [1/10], Iter [2456/3125], train_loss:0.044014
Epoch [1/10], Iter [2457/3125], train_loss:0.051432
Epoch [1/10], Iter [2458/3125], train_loss:0.038895
Epoch [1/10], Iter [2459/3125], train_loss:0.091389
Epoch [1/10], Iter [2460/3125], train_loss:0.067894
Epoch [1/10], Iter [2461/3125], train_loss:0.077940
Epoch [1/10], Iter [2462/3125], train_loss:0.035168
Epoch [1/10], Iter [2463/3125], train_loss:0.057799
Epoch [1/10], Iter [2464/3125], train_loss:0.039412
Epoch [1/10], Iter [2465/3125], train_loss:0.055779
Epoch [1/10], Iter [2466/3125], train_loss:0.039693
Epoch [1/10], Iter [2467/3125], train_loss:0.044370
Epoch [1/10], Iter [2468/3125], train_loss:0.072034
Epoch [1/10], Iter [2469/3125], train_loss:0.039117
Epoch [1/10], Iter [2470/3125], train_loss:0.041900
Epoch [1/10], Iter [2471/3125], train_loss:0.078160
Epoch [1/10], Iter [2472/3125], train_loss:0.043799
Epoch [1/10], Iter [2473/3125], train_loss:0.034027
Epoch [1/10], Iter [2474/3125], train_loss:0.033906
Epoch [1/10], Iter [2475/3125], train_loss:0.040556
Epoch [1/10], Iter [2476/3125], train_loss:0.076365
Epoch [1/10], Iter [2477/3125], train_loss:0.044474
Epoch [1/10], Iter [2478/3125], train_loss:0.050639
Epoch [1/10], Iter [2479/3125], train_loss:0.094295
Epoch [1/10], Iter [2480/3125], train_loss:0.049790
Epoch [1/10], Iter [2481/3125], train_loss:0.058790
Epoch [1/10], Iter [2482/3125], train_loss:0.063505
Epoch [1/10], Iter [2483/3125], train_loss:0.049205
Epoch [1/10], Iter [2484/3125], train_loss:0.056420
Epoch [1/10], Iter [2485/3125], train_loss:0.034539
Epoch [1/10], Iter [2486/3125], train_loss:0.060778
Epoch [1/10], Iter [2487/3125], train_loss:0.061710
Epoch [1/10], Iter [2488/3125], train_loss:0.059184
Epoch [1/10], Iter [2489/3125], train_loss:0.051106
Epoch [1/10], Iter [2490/3125], train_loss:0.055393
Epoch [1/10], Iter [2491/3125], train_loss:0.069071
Epoch [1/10], Iter [2492/3125], train_loss:0.038927
Epoch [1/10], Iter [2493/3125], train_loss:0.055511
Epoch [1/10], Iter [2494/3125], train_loss:0.030150
Epoch [1/10], Iter [2495/3125], train_loss:0.046406
Epoch [1/10], Iter [2496/3125], train_loss:0.050650
Epoch [1/10], Iter [2497/3125], train_loss:0.067050
Epoch [1/10], Iter [2498/3125], train_loss:0.065522
Epoch [1/10], Iter [2499/3125], train_loss:0.039835
Epoch [1/10], Iter [2500/3125], train_loss:0.037947
Epoch [1/10], Iter [2501/3125], train_loss:0.087482
Epoch [1/10], Iter [2502/3125], train_loss:0.049749
Epoch [1/10], Iter [2503/3125], train_loss:0.075907
Epoch [1/10], Iter [2504/3125], train_loss:0.048454
Epoch [1/10], Iter [2505/3125], train_loss:0.056744
Epoch [1/10], Iter [2506/3125], train_loss:0.063433
Epoch [1/10], Iter [2507/3125], train_loss:0.093217
Epoch [1/10], Iter [2508/3125], train_loss:0.060091
Epoch [1/10], Iter [2509/3125], train_loss:0.038879
Epoch [1/10], Iter [2510/3125], train_loss:0.073510
Epoch [1/10], Iter [2511/3125], train_loss:0.078042
Epoch [1/10], Iter [2512/3125], train_loss:0.018318
Epoch [1/10], Iter [2513/3125], train_loss:0.071369
Epoch [1/10], Iter [2514/3125], train_loss:0.055521
Epoch [1/10], Iter [2515/3125], train_loss:0.074205
Epoch [1/10], Iter [2516/3125], train_loss:0.034892
Epoch [1/10], Iter [2517/3125], train_loss:0.059679
Epoch [1/10], Iter [2518/3125], train_loss:0.044943
Epoch [1/10], Iter [2519/3125], train_loss:0.039163
Epoch [1/10], Iter [2520/3125], train_loss:0.033841
Epoch [1/10], Iter [2521/3125], train_loss:0.095452
Epoch [1/10], Iter [2522/3125], train_loss:0.052355
Epoch [1/10], Iter [2523/3125], train_loss:0.097691
Epoch [1/10], Iter [2524/3125], train_loss:0.043344
Epoch [1/10], Iter [2525/3125], train_loss:0.082170
Epoch [1/10], Iter [2526/3125], train_loss:0.037574
Epoch [1/10], Iter [2527/3125], train_loss:0.046212
Epoch [1/10], Iter [2528/3125], train_loss:0.028267
Epoch [1/10], Iter [2529/3125], train_loss:0.048699
Epoch [1/10], Iter [2530/3125], train_loss:0.089290
Epoch [1/10], Iter [2531/3125], train_loss:0.080898
Epoch [1/10], Iter [2532/3125], train_loss:0.040260
Epoch [1/10], Iter [2533/3125], train_loss:0.079006
Epoch [1/10], Iter [2534/3125], train_loss:0.044073
Epoch [1/10], Iter [2535/3125], train_loss:0.056003
Epoch [1/10], Iter [2536/3125], train_loss:0.049989
Epoch [1/10], Iter [2537/3125], train_loss:0.045744
Epoch [1/10], Iter [2538/3125], train_loss:0.049811
Epoch [1/10], Iter [2539/3125], train_loss:0.059298
Epoch [1/10], Iter [2540/3125], train_loss:0.041965
Epoch [1/10], Iter [2541/3125], train_loss:0.044184
Epoch [1/10], Iter [2542/3125], train_loss:0.070333
Epoch [1/10], Iter [2543/3125], train_loss:0.061322
Epoch [1/10], Iter [2544/3125], train_loss:0.033247
Epoch [1/10], Iter [2545/3125], train_loss:0.037805
Epoch [1/10], Iter [2546/3125], train_loss:0.031448
Epoch [1/10], Iter [2547/3125], train_loss:0.034567
Epoch [1/10], Iter [2548/3125], train_loss:0.053322
Epoch [1/10], Iter [2549/3125], train_loss:0.081269
Epoch [1/10], Iter [2550/3125], train_loss:0.078102
Epoch [1/10], Iter [2551/3125], train_loss:0.022630
Epoch [1/10], Iter [2552/3125], train_loss:0.032897
Epoch [1/10], Iter [2553/3125], train_loss:0.050063
Epoch [1/10], Iter [2554/3125], train_loss:0.053164
Epoch [1/10], Iter [2555/3125], train_loss:0.033120
Epoch [1/10], Iter [2556/3125], train_loss:0.046334
Epoch [1/10], Iter [2557/3125], train_loss:0.068456
Epoch [1/10], Iter [2558/3125], train_loss:0.070154
Epoch [1/10], Iter [2559/3125], train_loss:0.036025
Epoch [1/10], Iter [2560/3125], train_loss:0.070635
Epoch [1/10], Iter [2561/3125], train_loss:0.052198
Epoch [1/10], Iter [2562/3125], train_loss:0.043804
Epoch [1/10], Iter [2563/3125], train_loss:0.067197
Epoch [1/10], Iter [2564/3125], train_loss:0.080402
Epoch [1/10], Iter [2565/3125], train_loss:0.071421
Epoch [1/10], Iter [2566/3125], train_loss:0.044109
Epoch [1/10], Iter [2567/3125], train_loss:0.063801
Epoch [1/10], Iter [2568/3125], train_loss:0.075022
Epoch [1/10], Iter [2569/3125], train_loss:0.030197
Epoch [1/10], Iter [2570/3125], train_loss:0.060289
Epoch [1/10], Iter [2571/3125], train_loss:0.041631
Epoch [1/10], Iter [2572/3125], train_loss:0.047699
Epoch [1/10], Iter [2573/3125], train_loss:0.028659
Epoch [1/10], Iter [2574/3125], train_loss:0.046188
Epoch [1/10], Iter [2575/3125], train_loss:0.031889
Epoch [1/10], Iter [2576/3125], train_loss:0.066076
Epoch [1/10], Iter [2577/3125], train_loss:0.062998
Epoch [1/10], Iter [2578/3125], train_loss:0.034345
Epoch [1/10], Iter [2579/3125], train_loss:0.045776
Epoch [1/10], Iter [2580/3125], train_loss:0.063058
Epoch [1/10], Iter [2581/3125], train_loss:0.049935
Epoch [1/10], Iter [2582/3125], train_loss:0.084482
Epoch [1/10], Iter [2583/3125], train_loss:0.057923
Epoch [1/10], Iter [2584/3125], train_loss:0.045246
Epoch [1/10], Iter [2585/3125], train_loss:0.058265
Epoch [1/10], Iter [2586/3125], train_loss:0.035428
Epoch [1/10], Iter [2587/3125], train_loss:0.042721
Epoch [1/10], Iter [2588/3125], train_loss:0.067164
Epoch [1/10], Iter [2589/3125], train_loss:0.045646
Epoch [1/10], Iter [2590/3125], train_loss:0.038400
Epoch [1/10], Iter [2591/3125], train_loss:0.038546
Epoch [1/10], Iter [2592/3125], train_loss:0.072927
Epoch [1/10], Iter [2593/3125], train_loss:0.030221
Epoch [1/10], Iter [2594/3125], train_loss:0.056022
Epoch [1/10], Iter [2595/3125], train_loss:0.056454
Epoch [1/10], Iter [2596/3125], train_loss:0.044413
Epoch [1/10], Iter [2597/3125], train_loss:0.031464
Epoch [1/10], Iter [2598/3125], train_loss:0.051813
Epoch [1/10], Iter [2599/3125], train_loss:0.077083
Epoch [1/10], Iter [2600/3125], train_loss:0.040987
Epoch [1/10], Iter [2601/3125], train_loss:0.037267
Epoch [1/10], Iter [2602/3125], train_loss:0.033299
Epoch [1/10], Iter [2603/3125], train_loss:0.049933
Epoch [1/10], Iter [2604/3125], train_loss:0.050345
Epoch [1/10], Iter [2605/3125], train_loss:0.068158
Epoch [1/10], Iter [2606/3125], train_loss:0.063846
Epoch [1/10], Iter [2607/3125], train_loss:0.057081
Epoch [1/10], Iter [2608/3125], train_loss:0.050321
Epoch [1/10], Iter [2609/3125], train_loss:0.084901
Epoch [1/10], Iter [2610/3125], train_loss:0.061853
Epoch [1/10], Iter [2611/3125], train_loss:0.059709
Epoch [1/10], Iter [2612/3125], train_loss:0.057150
Epoch [1/10], Iter [2613/3125], train_loss:0.034964
Epoch [1/10], Iter [2614/3125], train_loss:0.044947
Epoch [1/10], Iter [2615/3125], train_loss:0.089898
Epoch [1/10], Iter [2616/3125], train_loss:0.052279
Epoch [1/10], Iter [2617/3125], train_loss:0.065590
Epoch [1/10], Iter [2618/3125], train_loss:0.079470
Epoch [1/10], Iter [2619/3125], train_loss:0.064696
Epoch [1/10], Iter [2620/3125], train_loss:0.031827
Epoch [1/10], Iter [2621/3125], train_loss:0.057286
Epoch [1/10], Iter [2622/3125], train_loss:0.059908
Epoch [1/10], Iter [2623/3125], train_loss:0.050808
Epoch [1/10], Iter [2624/3125], train_loss:0.076302
Epoch [1/10], Iter [2625/3125], train_loss:0.054479
Epoch [1/10], Iter [2626/3125], train_loss:0.050685
Epoch [1/10], Iter [2627/3125], train_loss:0.057106
Epoch [1/10], Iter [2628/3125], train_loss:0.050811
Epoch [1/10], Iter [2629/3125], train_loss:0.025450
Epoch [1/10], Iter [2630/3125], train_loss:0.035107
Epoch [1/10], Iter [2631/3125], train_loss:0.037918
Epoch [1/10], Iter [2632/3125], train_loss:0.049256
Epoch [1/10], Iter [2633/3125], train_loss:0.062963
Epoch [1/10], Iter [2634/3125], train_loss:0.043879
Epoch [1/10], Iter [2635/3125], train_loss:0.043937
Epoch [1/10], Iter [2636/3125], train_loss:0.043007
Epoch [1/10], Iter [2637/3125], train_loss:0.033700
Epoch [1/10], Iter [2638/3125], train_loss:0.024870
Epoch [1/10], Iter [2639/3125], train_loss:0.039514
Epoch [1/10], Iter [2640/3125], train_loss:0.067759
Epoch [1/10], Iter [2641/3125], train_loss:0.062978
Epoch [1/10], Iter [2642/3125], train_loss:0.073482
Epoch [1/10], Iter [2643/3125], train_loss:0.051648
Epoch [1/10], Iter [2644/3125], train_loss:0.065120
Epoch [1/10], Iter [2645/3125], train_loss:0.023624
Epoch [1/10], Iter [2646/3125], train_loss:0.019855
Epoch [1/10], Iter [2647/3125], train_loss:0.106905
Epoch [1/10], Iter [2648/3125], train_loss:0.058358
Epoch [1/10], Iter [2649/3125], train_loss:0.072519
Epoch [1/10], Iter [2650/3125], train_loss:0.070563
Epoch [1/10], Iter [2651/3125], train_loss:0.073849
Epoch [1/10], Iter [2652/3125], train_loss:0.051423
Epoch [1/10], Iter [2653/3125], train_loss:0.041773
Epoch [1/10], Iter [2654/3125], train_loss:0.042694
Epoch [1/10], Iter [2655/3125], train_loss:0.041109
Epoch [1/10], Iter [2656/3125], train_loss:0.046723
Epoch [1/10], Iter [2657/3125], train_loss:0.032426
Epoch [1/10], Iter [2658/3125], train_loss:0.031085
Epoch [1/10], Iter [2659/3125], train_loss:0.071443
Epoch [1/10], Iter [2660/3125], train_loss:0.034657
Epoch [1/10], Iter [2661/3125], train_loss:0.064858
Epoch [1/10], Iter [2662/3125], train_loss:0.011753
Epoch [1/10], Iter [2663/3125], train_loss:0.056094
Epoch [1/10], Iter [2664/3125], train_loss:0.039091
Epoch [1/10], Iter [2665/3125], train_loss:0.067260
Epoch [1/10], Iter [2666/3125], train_loss:0.054605
Epoch [1/10], Iter [2667/3125], train_loss:0.073443
Epoch [1/10], Iter [2668/3125], train_loss:0.047724
Epoch [1/10], Iter [2669/3125], train_loss:0.061778
Epoch [1/10], Iter [2670/3125], train_loss:0.052013
Epoch [1/10], Iter [2671/3125], train_loss:0.040040
Epoch [1/10], Iter [2672/3125], train_loss:0.058101
Epoch [1/10], Iter [2673/3125], train_loss:0.058269
Epoch [1/10], Iter [2674/3125], train_loss:0.056329
Epoch [1/10], Iter [2675/3125], train_loss:0.074943
Epoch [1/10], Iter [2676/3125], train_loss:0.060055
Epoch [1/10], Iter [2677/3125], train_loss:0.066210
Epoch [1/10], Iter [2678/3125], train_loss:0.077830
Epoch [1/10], Iter [2679/3125], train_loss:0.069789
Epoch [1/10], Iter [2680/3125], train_loss:0.022511
Epoch [1/10], Iter [2681/3125], train_loss:0.074430
Epoch [1/10], Iter [2682/3125], train_loss:0.064221
Epoch [1/10], Iter [2683/3125], train_loss:0.033731
Epoch [1/10], Iter [2684/3125], train_loss:0.057155
Epoch [1/10], Iter [2685/3125], train_loss:0.071050
Epoch [1/10], Iter [2686/3125], train_loss:0.031468
Epoch [1/10], Iter [2687/3125], train_loss:0.061247
Epoch [1/10], Iter [2688/3125], train_loss:0.033162
Epoch [1/10], Iter [2689/3125], train_loss:0.053674
Epoch [1/10], Iter [2690/3125], train_loss:0.052903
Epoch [1/10], Iter [2691/3125], train_loss:0.053036
Epoch [1/10], Iter [2692/3125], train_loss:0.031536
Epoch [1/10], Iter [2693/3125], train_loss:0.047191
Epoch [1/10], Iter [2694/3125], train_loss:0.053092
Epoch [1/10], Iter [2695/3125], train_loss:0.046388
Epoch [1/10], Iter [2696/3125], train_loss:0.081545
Epoch [1/10], Iter [2697/3125], train_loss:0.031258
Epoch [1/10], Iter [2698/3125], train_loss:0.065705
Epoch [1/10], Iter [2699/3125], train_loss:0.085829
Epoch [1/10], Iter [2700/3125], train_loss:0.036830
Epoch [1/10], Iter [2701/3125], train_loss:0.039658
Epoch [1/10], Iter [2702/3125], train_loss:0.034230
Epoch [1/10], Iter [2703/3125], train_loss:0.046603
Epoch [1/10], Iter [2704/3125], train_loss:0.062321
Epoch [1/10], Iter [2705/3125], train_loss:0.074843
Epoch [1/10], Iter [2706/3125], train_loss:0.064365
Epoch [1/10], Iter [2707/3125], train_loss:0.041580
Epoch [1/10], Iter [2708/3125], train_loss:0.042753
Epoch [1/10], Iter [2709/3125], train_loss:0.054325
Epoch [1/10], Iter [2710/3125], train_loss:0.029269
Epoch [1/10], Iter [2711/3125], train_loss:0.056201
Epoch [1/10], Iter [2712/3125], train_loss:0.032027
Epoch [1/10], Iter [2713/3125], train_loss:0.041384
Epoch [1/10], Iter [2714/3125], train_loss:0.042245
Epoch [1/10], Iter [2715/3125], train_loss:0.049180
Epoch [1/10], Iter [2716/3125], train_loss:0.071382
Epoch [1/10], Iter [2717/3125], train_loss:0.053056
Epoch [1/10], Iter [2718/3125], train_loss:0.076437
Epoch [1/10], Iter [2719/3125], train_loss:0.036449
Epoch [1/10], Iter [2720/3125], train_loss:0.037378
Epoch [1/10], Iter [2721/3125], train_loss:0.056445
Epoch [1/10], Iter [2722/3125], train_loss:0.070102
Epoch [1/10], Iter [2723/3125], train_loss:0.032661
Epoch [1/10], Iter [2724/3125], train_loss:0.045753
Epoch [1/10], Iter [2725/3125], train_loss:0.051136
Epoch [1/10], Iter [2726/3125], train_loss:0.048787
Epoch [1/10], Iter [2727/3125], train_loss:0.078822
Epoch [1/10], Iter [2728/3125], train_loss:0.053859
Epoch [1/10], Iter [2729/3125], train_loss:0.061877
Epoch [1/10], Iter [2730/3125], train_loss:0.068190
Epoch [1/10], Iter [2731/3125], train_loss:0.059085
Epoch [1/10], Iter [2732/3125], train_loss:0.041527
Epoch [1/10], Iter [2733/3125], train_loss:0.037386
Epoch [1/10], Iter [2734/3125], train_loss:0.045102
Epoch [1/10], Iter [2735/3125], train_loss:0.072924
Epoch [1/10], Iter [2736/3125], train_loss:0.024766
Epoch [1/10], Iter [2737/3125], train_loss:0.036317
Epoch [1/10], Iter [2738/3125], train_loss:0.060391
Epoch [1/10], Iter [2739/3125], train_loss:0.026071
Epoch [1/10], Iter [2740/3125], train_loss:0.045086
Epoch [1/10], Iter [2741/3125], train_loss:0.060746
Epoch [1/10], Iter [2742/3125], train_loss:0.037758
Epoch [1/10], Iter [2743/3125], train_loss:0.042991
Epoch [1/10], Iter [2744/3125], train_loss:0.057417
Epoch [1/10], Iter [2745/3125], train_loss:0.029067
Epoch [1/10], Iter [2746/3125], train_loss:0.095886
Epoch [1/10], Iter [2747/3125], train_loss:0.033592
Epoch [1/10], Iter [2748/3125], train_loss:0.043915
Epoch [1/10], Iter [2749/3125], train_loss:0.085850
Epoch [1/10], Iter [2750/3125], train_loss:0.066093
Epoch [1/10], Iter [2751/3125], train_loss:0.062001
Epoch [1/10], Iter [2752/3125], train_loss:0.069263
Epoch [1/10], Iter [2753/3125], train_loss:0.041522
Epoch [1/10], Iter [2754/3125], train_loss:0.056623
Epoch [1/10], Iter [2755/3125], train_loss:0.076867
Epoch [1/10], Iter [2756/3125], train_loss:0.063004
Epoch [1/10], Iter [2757/3125], train_loss:0.055485
Epoch [1/10], Iter [2758/3125], train_loss:0.066020
Epoch [1/10], Iter [2759/3125], train_loss:0.033939
Epoch [1/10], Iter [2760/3125], train_loss:0.032806
Epoch [1/10], Iter [2761/3125], train_loss:0.054655
Epoch [1/10], Iter [2762/3125], train_loss:0.050211
Epoch [1/10], Iter [2763/3125], train_loss:0.025504
Epoch [1/10], Iter [2764/3125], train_loss:0.052584
Epoch [1/10], Iter [2765/3125], train_loss:0.029184
Epoch [1/10], Iter [2766/3125], train_loss:0.020083
Epoch [1/10], Iter [2767/3125], train_loss:0.027875
Epoch [1/10], Iter [2768/3125], train_loss:0.024596
Epoch [1/10], Iter [2769/3125], train_loss:0.055002
Epoch [1/10], Iter [2770/3125], train_loss:0.055419
Epoch [1/10], Iter [2771/3125], train_loss:0.024973
Epoch [1/10], Iter [2772/3125], train_loss:0.086723
Epoch [1/10], Iter [2773/3125], train_loss:0.048133
Epoch [1/10], Iter [2774/3125], train_loss:0.046027
Epoch [1/10], Iter [2775/3125], train_loss:0.047695
Epoch [1/10], Iter [2776/3125], train_loss:0.037621
Epoch [1/10], Iter [2777/3125], train_loss:0.049847
Epoch [1/10], Iter [2778/3125], train_loss:0.050305
Epoch [1/10], Iter [2779/3125], train_loss:0.028408
Epoch [1/10], Iter [2780/3125], train_loss:0.057841
Epoch [1/10], Iter [2781/3125], train_loss:0.037195
Epoch [1/10], Iter [2782/3125], train_loss:0.046566
Epoch [1/10], Iter [2783/3125], train_loss:0.059322
Epoch [1/10], Iter [2784/3125], train_loss:0.089970
Epoch [1/10], Iter [2785/3125], train_loss:0.035622
Epoch [1/10], Iter [2786/3125], train_loss:0.036376
Epoch [1/10], Iter [2787/3125], train_loss:0.049406
Epoch [1/10], Iter [2788/3125], train_loss:0.027285
Epoch [1/10], Iter [2789/3125], train_loss:0.024182
Epoch [1/10], Iter [2790/3125], train_loss:0.058590
Epoch [1/10], Iter [2791/3125], train_loss:0.031623
Epoch [1/10], Iter [2792/3125], train_loss:0.064973
Epoch [1/10], Iter [2793/3125], train_loss:0.083880
Epoch [1/10], Iter [2794/3125], train_loss:0.063413
Epoch [1/10], Iter [2795/3125], train_loss:0.027198
Epoch [1/10], Iter [2796/3125], train_loss:0.065740
Epoch [1/10], Iter [2797/3125], train_loss:0.045814
Epoch [1/10], Iter [2798/3125], train_loss:0.058582
Epoch [1/10], Iter [2799/3125], train_loss:0.037425
Epoch [1/10], Iter [2800/3125], train_loss:0.040245
Epoch [1/10], Iter [2801/3125], train_loss:0.069127
Epoch [1/10], Iter [2802/3125], train_loss:0.038190
Epoch [1/10], Iter [2803/3125], train_loss:0.076748
Epoch [1/10], Iter [2804/3125], train_loss:0.063528
Epoch [1/10], Iter [2805/3125], train_loss:0.050070
Epoch [1/10], Iter [2806/3125], train_loss:0.043468
Epoch [1/10], Iter [2807/3125], train_loss:0.037768
Epoch [1/10], Iter [2808/3125], train_loss:0.069925
Epoch [1/10], Iter [2809/3125], train_loss:0.027971
Epoch [1/10], Iter [2810/3125], train_loss:0.045305
Epoch [1/10], Iter [2811/3125], train_loss:0.072035
Epoch [1/10], Iter [2812/3125], train_loss:0.027901
Epoch [1/10], Iter [2813/3125], train_loss:0.055258
Epoch [1/10], Iter [2814/3125], train_loss:0.033380
Epoch [1/10], Iter [2815/3125], train_loss:0.035067
Epoch [1/10], Iter [2816/3125], train_loss:0.062196
Epoch [1/10], Iter [2817/3125], train_loss:0.031055
Epoch [1/10], Iter [2818/3125], train_loss:0.027535
Epoch [1/10], Iter [2819/3125], train_loss:0.074925
Epoch [1/10], Iter [2820/3125], train_loss:0.014863
Epoch [1/10], Iter [2821/3125], train_loss:0.040033
Epoch [1/10], Iter [2822/3125], train_loss:0.073055
Epoch [1/10], Iter [2823/3125], train_loss:0.044778
Epoch [1/10], Iter [2824/3125], train_loss:0.041350
Epoch [1/10], Iter [2825/3125], train_loss:0.045701
Epoch [1/10], Iter [2826/3125], train_loss:0.069052
Epoch [1/10], Iter [2827/3125], train_loss:0.070689
Epoch [1/10], Iter [2828/3125], train_loss:0.073792
Epoch [1/10], Iter [2829/3125], train_loss:0.027273
Epoch [1/10], Iter [2830/3125], train_loss:0.070355
Epoch [1/10], Iter [2831/3125], train_loss:0.050928
Epoch [1/10], Iter [2832/3125], train_loss:0.063157
Epoch [1/10], Iter [2833/3125], train_loss:0.052722
Epoch [1/10], Iter [2834/3125], train_loss:0.066621
Epoch [1/10], Iter [2835/3125], train_loss:0.049870
Epoch [1/10], Iter [2836/3125], train_loss:0.045198
Epoch [1/10], Iter [2837/3125], train_loss:0.047708
Epoch [1/10], Iter [2838/3125], train_loss:0.031084
Epoch [1/10], Iter [2839/3125], train_loss:0.054982
Epoch [1/10], Iter [2840/3125], train_loss:0.062080
Epoch [1/10], Iter [2841/3125], train_loss:0.052313
Epoch [1/10], Iter [2842/3125], train_loss:0.027638
Epoch [1/10], Iter [2843/3125], train_loss:0.069474
Epoch [1/10], Iter [2844/3125], train_loss:0.051465
Epoch [1/10], Iter [2845/3125], train_loss:0.047240
Epoch [1/10], Iter [2846/3125], train_loss:0.043358
Epoch [1/10], Iter [2847/3125], train_loss:0.046753
Epoch [1/10], Iter [2848/3125], train_loss:0.059748
Epoch [1/10], Iter [2849/3125], train_loss:0.032166
Epoch [1/10], Iter [2850/3125], train_loss:0.051633
Epoch [1/10], Iter [2851/3125], train_loss:0.032861
Epoch [1/10], Iter [2852/3125], train_loss:0.046734
Epoch [1/10], Iter [2853/3125], train_loss:0.031587
Epoch [1/10], Iter [2854/3125], train_loss:0.028285
Epoch [1/10], Iter [2855/3125], train_loss:0.063359
Epoch [1/10], Iter [2856/3125], train_loss:0.063512
Epoch [1/10], Iter [2857/3125], train_loss:0.048190
Epoch [1/10], Iter [2858/3125], train_loss:0.070683
Epoch [1/10], Iter [2859/3125], train_loss:0.016137
Epoch [1/10], Iter [2860/3125], train_loss:0.045513
Epoch [1/10], Iter [2861/3125], train_loss:0.033696
Epoch [1/10], Iter [2862/3125], train_loss:0.056089
Epoch [1/10], Iter [2863/3125], train_loss:0.040835
Epoch [1/10], Iter [2864/3125], train_loss:0.059301
Epoch [1/10], Iter [2865/3125], train_loss:0.065590
Epoch [1/10], Iter [2866/3125], train_loss:0.054262
Epoch [1/10], Iter [2867/3125], train_loss:0.032128
Epoch [1/10], Iter [2868/3125], train_loss:0.070486
Epoch [1/10], Iter [2869/3125], train_loss:0.050579
Epoch [1/10], Iter [2870/3125], train_loss:0.048929
Epoch [1/10], Iter [2871/3125], train_loss:0.059329
Epoch [1/10], Iter [2872/3125], train_loss:0.059987
Epoch [1/10], Iter [2873/3125], train_loss:0.038087
Epoch [1/10], Iter [2874/3125], train_loss:0.042215
Epoch [1/10], Iter [2875/3125], train_loss:0.037359
Epoch [1/10], Iter [2876/3125], train_loss:0.064945
Epoch [1/10], Iter [2877/3125], train_loss:0.032644
Epoch [1/10], Iter [2878/3125], train_loss:0.035471
Epoch [1/10], Iter [2879/3125], train_loss:0.054034
Epoch [1/10], Iter [2880/3125], train_loss:0.055840
Epoch [1/10], Iter [2881/3125], train_loss:0.040988
Epoch [1/10], Iter [2882/3125], train_loss:0.076851
Epoch [1/10], Iter [2883/3125], train_loss:0.084683
Epoch [1/10], Iter [2884/3125], train_loss:0.052963
Epoch [1/10], Iter [2885/3125], train_loss:0.033718
Epoch [1/10], Iter [2886/3125], train_loss:0.047949
Epoch [1/10], Iter [2887/3125], train_loss:0.066821
Epoch [1/10], Iter [2888/3125], train_loss:0.062198
Epoch [1/10], Iter [2889/3125], train_loss:0.064902
Epoch [1/10], Iter [2890/3125], train_loss:0.057373
Epoch [1/10], Iter [2891/3125], train_loss:0.048909
Epoch [1/10], Iter [2892/3125], train_loss:0.047169
Epoch [1/10], Iter [2893/3125], train_loss:0.037598
Epoch [1/10], Iter [2894/3125], train_loss:0.044367
Epoch [1/10], Iter [2895/3125], train_loss:0.059186
Epoch [1/10], Iter [2896/3125], train_loss:0.027673
Epoch [1/10], Iter [2897/3125], train_loss:0.046781
Epoch [1/10], Iter [2898/3125], train_loss:0.044963
Epoch [1/10], Iter [2899/3125], train_loss:0.053782
Epoch [1/10], Iter [2900/3125], train_loss:0.037537
Epoch [1/10], Iter [2901/3125], train_loss:0.043916
Epoch [1/10], Iter [2902/3125], train_loss:0.056527
Epoch [1/10], Iter [2903/3125], train_loss:0.025347
Epoch [1/10], Iter [2904/3125], train_loss:0.038642
Epoch [1/10], Iter [2905/3125], train_loss:0.066414
Epoch [1/10], Iter [2906/3125], train_loss:0.041623
Epoch [1/10], Iter [2907/3125], train_loss:0.050016
Epoch [1/10], Iter [2908/3125], train_loss:0.043550
Epoch [1/10], Iter [2909/3125], train_loss:0.039868
Epoch [1/10], Iter [2910/3125], train_loss:0.026067
Epoch [1/10], Iter [2911/3125], train_loss:0.045635
Epoch [1/10], Iter [2912/3125], train_loss:0.070421
Epoch [1/10], Iter [2913/3125], train_loss:0.063436
Epoch [1/10], Iter [2914/3125], train_loss:0.049509
Epoch [1/10], Iter [2915/3125], train_loss:0.071456
Epoch [1/10], Iter [2916/3125], train_loss:0.029413
Epoch [1/10], Iter [2917/3125], train_loss:0.042938
Epoch [1/10], Iter [2918/3125], train_loss:0.060789
Epoch [1/10], Iter [2919/3125], train_loss:0.035195
Epoch [1/10], Iter [2920/3125], train_loss:0.049221
Epoch [1/10], Iter [2921/3125], train_loss:0.032330
Epoch [1/10], Iter [2922/3125], train_loss:0.037042
Epoch [1/10], Iter [2923/3125], train_loss:0.065629
Epoch [1/10], Iter [2924/3125], train_loss:0.022151
Epoch [1/10], Iter [2925/3125], train_loss:0.056095
Epoch [1/10], Iter [2926/3125], train_loss:0.034682
Epoch [1/10], Iter [2927/3125], train_loss:0.081066
Epoch [1/10], Iter [2928/3125], train_loss:0.038369
Epoch [1/10], Iter [2929/3125], train_loss:0.025391
Epoch [1/10], Iter [2930/3125], train_loss:0.043224
Epoch [1/10], Iter [2931/3125], train_loss:0.073949
Epoch [1/10], Iter [2932/3125], train_loss:0.062411
Epoch [1/10], Iter [2933/3125], train_loss:0.048195
Epoch [1/10], Iter [2934/3125], train_loss:0.041265
Epoch [1/10], Iter [2935/3125], train_loss:0.051641
Epoch [1/10], Iter [2936/3125], train_loss:0.051737
Epoch [1/10], Iter [2937/3125], train_loss:0.085035
Epoch [1/10], Iter [2938/3125], train_loss:0.041058
Epoch [1/10], Iter [2939/3125], train_loss:0.052639
Epoch [1/10], Iter [2940/3125], train_loss:0.067252
Epoch [1/10], Iter [2941/3125], train_loss:0.067398
Epoch [1/10], Iter [2942/3125], train_loss:0.035560
Epoch [1/10], Iter [2943/3125], train_loss:0.026009
Epoch [1/10], Iter [2944/3125], train_loss:0.028872
Epoch [1/10], Iter [2945/3125], train_loss:0.100868
Epoch [1/10], Iter [2946/3125], train_loss:0.073545
Epoch [1/10], Iter [2947/3125], train_loss:0.064018
Epoch [1/10], Iter [2948/3125], train_loss:0.038802
Epoch [1/10], Iter [2949/3125], train_loss:0.035678
Epoch [1/10], Iter [2950/3125], train_loss:0.057404
Epoch [1/10], Iter [2951/3125], train_loss:0.038700
Epoch [1/10], Iter [2952/3125], train_loss:0.066487
Epoch [1/10], Iter [2953/3125], train_loss:0.036224
Epoch [1/10], Iter [2954/3125], train_loss:0.049169
Epoch [1/10], Iter [2955/3125], train_loss:0.060712
Epoch [1/10], Iter [2956/3125], train_loss:0.054164
Epoch [1/10], Iter [2957/3125], train_loss:0.045852
Epoch [1/10], Iter [2958/3125], train_loss:0.046974
Epoch [1/10], Iter [2959/3125], train_loss:0.046566
Epoch [1/10], Iter [2960/3125], train_loss:0.029474
Epoch [1/10], Iter [2961/3125], train_loss:0.048267
Epoch [1/10], Iter [2962/3125], train_loss:0.093090
Epoch [1/10], Iter [2963/3125], train_loss:0.059621
Epoch [1/10], Iter [2964/3125], train_loss:0.053808
Epoch [1/10], Iter [2965/3125], train_loss:0.019410
Epoch [1/10], Iter [2966/3125], train_loss:0.080236
Epoch [1/10], Iter [2967/3125], train_loss:0.048073
Epoch [1/10], Iter [2968/3125], train_loss:0.045536
Epoch [1/10], Iter [2969/3125], train_loss:0.037549
Epoch [1/10], Iter [2970/3125], train_loss:0.077696
Epoch [1/10], Iter [2971/3125], train_loss:0.044552
Epoch [1/10], Iter [2972/3125], train_loss:0.028185
Epoch [1/10], Iter [2973/3125], train_loss:0.027866
Epoch [1/10], Iter [2974/3125], train_loss:0.047479
Epoch [1/10], Iter [2975/3125], train_loss:0.047819
Epoch [1/10], Iter [2976/3125], train_loss:0.040483
Epoch [1/10], Iter [2977/3125], train_loss:0.070177
Epoch [1/10], Iter [2978/3125], train_loss:0.021798
Epoch [1/10], Iter [2979/3125], train_loss:0.041524
Epoch [1/10], Iter [2980/3125], train_loss:0.038104
Epoch [1/10], Iter [2981/3125], train_loss:0.050260
Epoch [1/10], Iter [2982/3125], train_loss:0.047825
Epoch [1/10], Iter [2983/3125], train_loss:0.059096
Epoch [1/10], Iter [2984/3125], train_loss:0.036488
Epoch [1/10], Iter [2985/3125], train_loss:0.048905
Epoch [1/10], Iter [2986/3125], train_loss:0.092370
Epoch [1/10], Iter [2987/3125], train_loss:0.065375
Epoch [1/10], Iter [2988/3125], train_loss:0.050387
Epoch [1/10], Iter [2989/3125], train_loss:0.040478
Epoch [1/10], Iter [2990/3125], train_loss:0.070799
Epoch [1/10], Iter [2991/3125], train_loss:0.074366
Epoch [1/10], Iter [2992/3125], train_loss:0.035977
Epoch [1/10], Iter [2993/3125], train_loss:0.050263
Epoch [1/10], Iter [2994/3125], train_loss:0.038603
Epoch [1/10], Iter [2995/3125], train_loss:0.091508
Epoch [1/10], Iter [2996/3125], train_loss:0.041844
Epoch [1/10], Iter [2997/3125], train_loss:0.037022
Epoch [1/10], Iter [2998/3125], train_loss:0.035034
Epoch [1/10], Iter [2999/3125], train_loss:0.035311
Epoch [1/10], Iter [3000/3125], train_loss:0.027116
Epoch [1/10], Iter [3001/3125], train_loss:0.029279
Epoch [1/10], Iter [3002/3125], train_loss:0.033700
Epoch [1/10], Iter [3003/3125], train_loss:0.058413
Epoch [1/10], Iter [3004/3125], train_loss:0.023097
Epoch [1/10], Iter [3005/3125], train_loss:0.045443
Epoch [1/10], Iter [3006/3125], train_loss:0.029848
Epoch [1/10], Iter [3007/3125], train_loss:0.052713
Epoch [1/10], Iter [3008/3125], train_loss:0.035926
Epoch [1/10], Iter [3009/3125], train_loss:0.058838
Epoch [1/10], Iter [3010/3125], train_loss:0.056548
Epoch [1/10], Iter [3011/3125], train_loss:0.039738
Epoch [1/10], Iter [3012/3125], train_loss:0.053625
Epoch [1/10], Iter [3013/3125], train_loss:0.032034
Epoch [1/10], Iter [3014/3125], train_loss:0.099142
Epoch [1/10], Iter [3015/3125], train_loss:0.041366
Epoch [1/10], Iter [3016/3125], train_loss:0.041256
Epoch [1/10], Iter [3017/3125], train_loss:0.037890
Epoch [1/10], Iter [3018/3125], train_loss:0.051505
Epoch [1/10], Iter [3019/3125], train_loss:0.032262
Epoch [1/10], Iter [3020/3125], train_loss:0.108767
Epoch [1/10], Iter [3021/3125], train_loss:0.039950
Epoch [1/10], Iter [3022/3125], train_loss:0.074630
Epoch [1/10], Iter [3023/3125], train_loss:0.074800
Epoch [1/10], Iter [3024/3125], train_loss:0.068196
Epoch [1/10], Iter [3025/3125], train_loss:0.039287
Epoch [1/10], Iter [3026/3125], train_loss:0.052125
Epoch [1/10], Iter [3027/3125], train_loss:0.025400
Epoch [1/10], Iter [3028/3125], train_loss:0.066438
Epoch [1/10], Iter [3029/3125], train_loss:0.038479
Epoch [1/10], Iter [3030/3125], train_loss:0.057109
Epoch [1/10], Iter [3031/3125], train_loss:0.034795
Epoch [1/10], Iter [3032/3125], train_loss:0.027901
Epoch [1/10], Iter [3033/3125], train_loss:0.050128
Epoch [1/10], Iter [3034/3125], train_loss:0.032854
Epoch [1/10], Iter [3035/3125], train_loss:0.053708
Epoch [1/10], Iter [3036/3125], train_loss:0.088014
Epoch [1/10], Iter [3037/3125], train_loss:0.075370
Epoch [1/10], Iter [3038/3125], train_loss:0.075677
Epoch [1/10], Iter [3039/3125], train_loss:0.063172
Epoch [1/10], Iter [3040/3125], train_loss:0.076501
Epoch [1/10], Iter [3041/3125], train_loss:0.058156
Epoch [1/10], Iter [3042/3125], train_loss:0.061623
Epoch [1/10], Iter [3043/3125], train_loss:0.066724
Epoch [1/10], Iter [3044/3125], train_loss:0.053383
Epoch [1/10], Iter [3045/3125], train_loss:0.050633
Epoch [1/10], Iter [3046/3125], train_loss:0.058951
Epoch [1/10], Iter [3047/3125], train_loss:0.042557
Epoch [1/10], Iter [3048/3125], train_loss:0.030441
Epoch [1/10], Iter [3049/3125], train_loss:0.024813
Epoch [1/10], Iter [3050/3125], train_loss:0.033426
Epoch [1/10], Iter [3051/3125], train_loss:0.055847
Epoch [1/10], Iter [3052/3125], train_loss:0.044011
Epoch [1/10], Iter [3053/3125], train_loss:0.027693
Epoch [1/10], Iter [3054/3125], train_loss:0.051109
Epoch [1/10], Iter [3055/3125], train_loss:0.040254
Epoch [1/10], Iter [3056/3125], train_loss:0.022783
Epoch [1/10], Iter [3057/3125], train_loss:0.052132
Epoch [1/10], Iter [3058/3125], train_loss:0.056355
Epoch [1/10], Iter [3059/3125], train_loss:0.058088
Epoch [1/10], Iter [3060/3125], train_loss:0.031884
Epoch [1/10], Iter [3061/3125], train_loss:0.049938
Epoch [1/10], Iter [3062/3125], train_loss:0.039419
Epoch [1/10], Iter [3063/3125], train_loss:0.083298
Epoch [1/10], Iter [3064/3125], train_loss:0.052872
Epoch [1/10], Iter [3065/3125], train_loss:0.035879
Epoch [1/10], Iter [3066/3125], train_loss:0.040194
Epoch [1/10], Iter [3067/3125], train_loss:0.053528
Epoch [1/10], Iter [3068/3125], train_loss:0.036000
Epoch [1/10], Iter [3069/3125], train_loss:0.039297
Epoch [1/10], Iter [3070/3125], train_loss:0.058124
Epoch [1/10], Iter [3071/3125], train_loss:0.032619
Epoch [1/10], Iter [3072/3125], train_loss:0.056250
Epoch [1/10], Iter [3073/3125], train_loss:0.053652
Epoch [1/10], Iter [3074/3125], train_loss:0.033999
Epoch [1/10], Iter [3075/3125], train_loss:0.041154
Epoch [1/10], Iter [3076/3125], train_loss:0.064491
Epoch [1/10], Iter [3077/3125], train_loss:0.051499
Epoch [1/10], Iter [3078/3125], train_loss:0.072850
Epoch [1/10], Iter [3079/3125], train_loss:0.074374
Epoch [1/10], Iter [3080/3125], train_loss:0.037571
Epoch [1/10], Iter [3081/3125], train_loss:0.043772
Epoch [1/10], Iter [3082/3125], train_loss:0.042835
Epoch [1/10], Iter [3083/3125], train_loss:0.049374
Epoch [1/10], Iter [3084/3125], train_loss:0.069075
Epoch [1/10], Iter [3085/3125], train_loss:0.028113
Epoch [1/10], Iter [3086/3125], train_loss:0.037884
Epoch [1/10], Iter [3087/3125], train_loss:0.050082
Epoch [1/10], Iter [3088/3125], train_loss:0.063452
Epoch [1/10], Iter [3089/3125], train_loss:0.053441
Epoch [1/10], Iter [3090/3125], train_loss:0.041038
Epoch [1/10], Iter [3091/3125], train_loss:0.059465
Epoch [1/10], Iter [3092/3125], train_loss:0.027648
Epoch [1/10], Iter [3093/3125], train_loss:0.034605
Epoch [1/10], Iter [3094/3125], train_loss:0.019859
Epoch [1/10], Iter [3095/3125], train_loss:0.031989
Epoch [1/10], Iter [3096/3125], train_loss:0.051489
Epoch [1/10], Iter [3097/3125], train_loss:0.056322
Epoch [1/10], Iter [3098/3125], train_loss:0.046863
Epoch [1/10], Iter [3099/3125], train_loss:0.047653
Epoch [1/10], Iter [3100/3125], train_loss:0.050260
Epoch [1/10], Iter [3101/3125], train_loss:0.080984
Epoch [1/10], Iter [3102/3125], train_loss:0.039387
Epoch [1/10], Iter [3103/3125], train_loss:0.029410
Epoch [1/10], Iter [3104/3125], train_loss:0.038941
Epoch [1/10], Iter [3105/3125], train_loss:0.043713
Epoch [1/10], Iter [3106/3125], train_loss:0.037539
Epoch [1/10], Iter [3107/3125], train_loss:0.025358
Epoch [1/10], Iter [3108/3125], train_loss:0.071836
Epoch [1/10], Iter [3109/3125], train_loss:0.056706
Epoch [1/10], Iter [3110/3125], train_loss:0.033099
Epoch [1/10], Iter [3111/3125], train_loss:0.037032
Epoch [1/10], Iter [3112/3125], train_loss:0.038965
Epoch [1/10], Iter [3113/3125], train_loss:0.041378
Epoch [1/10], Iter [3114/3125], train_loss:0.049832
Epoch [1/10], Iter [3115/3125], train_loss:0.044040
Epoch [1/10], Iter [3116/3125], train_loss:0.029385
Epoch [1/10], Iter [3117/3125], train_loss:0.059979
Epoch [1/10], Iter [3118/3125], train_loss:0.067147
Epoch [1/10], Iter [3119/3125], train_loss:0.057981
Epoch [1/10], Iter [3120/3125], train_loss:0.028045
Epoch [1/10], Iter [3121/3125], train_loss:0.042211
Epoch [1/10], Iter [3122/3125], train_loss:0.056431
Epoch [1/10], Iter [3123/3125], train_loss:0.044317
Epoch [1/10], Iter [3124/3125], train_loss:0.054007
Epoch [1/10], Iter [3125/3125], train_loss:0.042914



---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_14844/2960384600.py in 
     40     test_total_correct = 0
     41     test_total_num = 0
---> 42     for iter,(images,labels) in enumerate(test_loader):
     43         images = images.to(device)
     44         labels = labels.to(device)


NameError: name 'test_loader' is not defined

2、动态调整学习率

2.1 torch.optim.lr_scheduler

学习率选择的问题:

  • 1、学习率设置过小,会极大降低收敛速度,增加训练时间
  • 2、学习率设置太大,可能导致参数在最优解两侧来回振荡

以上问题都是学习率设置不满足模型训练的需求,解决方案:

  • PyTorch中提供了scheduler

官方API提供的torch.optim.lr_scheduler动态学习率:

  • lr_scheduler.LambdaLR

  • lr_scheduler.MultiplicativeLR

  • lr_scheduler.StepLR

  • lr_scheduler.MultiStepLR

  • lr_scheduler.ExponentialLR

  • lr_scheduler.CosineAnnealingLR

  • lr_scheduler.ReduceLROnPlateau

  • lr_scheduler.CyclicLR

  • lr_scheduler.OneCycleLR

  • lr_scheduler.CosineAnnealingWarmRestarts

2.2、torch.optim.lr_scheduler.LambdaLR

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)

# LambdaLR 实现
lr_lambda = f(epoch)
new_lr = lr_lambda * init_lr

思想:初始学习率乘以系数,由于每一次乘系数都是乘初始学习率,因此系数往往是epoch的函数。

#伪代码:Assuming optimizer has two groups.
    
    
lambda1 = lambda epoch: 1 / (epoch+1)
    
scheduler = LambdaLR(optimizer, lr_lambda=lambda1)
    
for epoch in range(100):
    
    train(...)
    
    validate(...)
    
    scheduler.step()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f4P8ROuA-1692613806234)(attachment:image-2.png)]

MultiplicativeLR

torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)

与LambdaLR不同,该方法用前一次的学习率乘以lr_lambda,因此通常lr_lambda函数不需要与epoch有关。

new_lr = lr_lambda * old_lr

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-g2URgkPf-1692613806234)(attachment:image.png)]

2.2、自定义scheduler

官方给的动态学习率调整的API如果均不能满足我们的诉求,应该怎么办?

我们可以通过自定义函数adjust_learning_rate来改变param_group中lr的值
  • 1、官方的API均不能满足诉求
  • 2、我们根据adjust_learning_rate实现学习率调整方法
# 训练中调用学习率方法
optimizer = torch.optim.SGD(model.parameters(),lr = args.lr,momentum = 0.9)
for epoch in range(10):
    train(...)
    validate(...)
    adjust_learning_rate(optimizer,epoch)
#函数:分段,每隔几(10)段个epoch,第一个epoch为序号0不计,使学习率变乘以0.1的epoch次方数
def adjust_learning_rate(optim, epoch, size=10, gamma=0.1):
    if (epoch + 1) % size == 0:
        pow = (epoch + 1) // size
        lr = learning_rate * np.power(gamma, pow)
        for param_group in optim.param_groups:
            param_group['lr'] = lr

代码实例

  • lr_scheduler.LambdaLR
  • adjust_learning_rate

#训练&验证
writer = SummaryWriter("../train_skills")
# 定义损失函数和优化器
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 损失函数
criterion = nn.CrossEntropyLoss()
# 优化器
optimizer = torch.optim.Adam(Resnet50.parameters(), lr=lr)

# 自定义 scheduler 
scheduler_my = LambdaLR(optimizer, lr_lambda=lambda epoch: 1/(epoch+1),verbose = True)
print("初始化的学习率:", optimizer.defaults['lr'])

epoch = max_epochs
Resnet50 = Resnet50.to(device)
total_step = len(train_loader)
train_all_loss = []
test_all_loss = []

for i in range(epoch):
    Resnet50.train()
    train_total_loss = 0
    train_total_num = 0
    train_total_correct = 0

    for iter, (images,labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        train_total_correct += (outputs.argmax(1) == labels).sum().item()
        
        #backword
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
       
        
        train_total_num += labels.shape[0]
        train_total_loss += loss.item()
        print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0]))
    
    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    
    print("第%d个epoch的学习率:%f" % (epoch, optimizer.param_groups[0]['lr']))
    scheduler_my.step() #scheduler
    #自定义调整lr
#     adjust_learning_rate(optimizer, i)
    
    Resnet50.eval()
    test_total_loss = 0
    test_total_correct = 0
    test_total_num = 0
    for iter,(images,labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet50(images)
        loss = criterion(outputs,labels)
        test_total_correct += (outputs.argmax(1) == labels).sum().item()
        test_total_loss += loss.item()
        test_total_num += labels.shape[0]
    print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format(
        i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100
    
    ))
    train_all_loss.append(np.round(train_total_loss / train_total_num,4))
    test_all_loss.append(np.round(test_total_loss / test_total_num,4))
writer.close()
Adjusting learning rate of group 0 to 1.0000e-04.
初始化的学习率: 0.0001
Epoch [1/2], Iter [1/3125], train_loss:0.777986
Epoch [1/2], Iter [2/3125], train_loss:0.662992
Epoch [1/2], Iter [3/3125], train_loss:0.767887
Epoch [1/2], Iter [4/3125], train_loss:0.748286
Epoch [1/2], Iter [5/3125], train_loss:0.686887
Epoch [1/2], Iter [6/3125], train_loss:0.675070
Epoch [1/2], Iter [7/3125], train_loss:0.655532
Epoch [1/2], Iter [8/3125], train_loss:0.713970
Epoch [1/2], Iter [9/3125], train_loss:0.675706
Epoch [1/2], Iter [10/3125], train_loss:0.665308
Epoch [1/2], Iter [11/3125], train_loss:0.670263
Epoch [1/2], Iter [12/3125], train_loss:0.597091
Epoch [1/2], Iter [13/3125], train_loss:0.541138
Epoch [1/2], Iter [14/3125], train_loss:0.471112
Epoch [1/2], Iter [15/3125], train_loss:0.570017
Epoch [1/2], Iter [16/3125], train_loss:0.569556
Epoch [1/2], Iter [17/3125], train_loss:0.552114
Epoch [1/2], Iter [18/3125], train_loss:0.569929
Epoch [1/2], Iter [19/3125], train_loss:0.524716
Epoch [1/2], Iter [20/3125], train_loss:0.522762
Epoch [1/2], Iter [21/3125], train_loss:0.499370
Epoch [1/2], Iter [22/3125], train_loss:0.459812
Epoch [1/2], Iter [23/3125], train_loss:0.407852
Epoch [1/2], Iter [24/3125], train_loss:0.472173
Epoch [1/2], Iter [25/3125], train_loss:0.370801
Epoch [1/2], Iter [26/3125], train_loss:0.459706
Epoch [1/2], Iter [27/3125], train_loss:0.403983
Epoch [1/2], Iter [28/3125], train_loss:0.372209
Epoch [1/2], Iter [29/3125], train_loss:0.357835
Epoch [1/2], Iter [30/3125], train_loss:0.501332
Epoch [1/2], Iter [31/3125], train_loss:0.354409
Epoch [1/2], Iter [32/3125], train_loss:0.352994
Epoch [1/2], Iter [33/3125], train_loss:0.359231
Epoch [1/2], Iter [34/3125], train_loss:0.378708
Epoch [1/2], Iter [35/3125], train_loss:0.445062
Epoch [1/2], Iter [36/3125], train_loss:0.345325
Epoch [1/2], Iter [37/3125], train_loss:0.290598
Epoch [1/2], Iter [38/3125], train_loss:0.355161
Epoch [1/2], Iter [39/3125], train_loss:0.295590
Epoch [1/2], Iter [40/3125], train_loss:0.269099
Epoch [1/2], Iter [41/3125], train_loss:0.339802
Epoch [1/2], Iter [42/3125], train_loss:0.251694
Epoch [1/2], Iter [43/3125], train_loss:0.328401
Epoch [1/2], Iter [44/3125], train_loss:0.257955
Epoch [1/2], Iter [45/3125], train_loss:0.325558
Epoch [1/2], Iter [46/3125], train_loss:0.342137
Epoch [1/2], Iter [47/3125], train_loss:0.259149
Epoch [1/2], Iter [48/3125], train_loss:0.249372
Epoch [1/2], Iter [49/3125], train_loss:0.257600
Epoch [1/2], Iter [50/3125], train_loss:0.289483
Epoch [1/2], Iter [51/3125], train_loss:0.301230
Epoch [1/2], Iter [52/3125], train_loss:0.217237
Epoch [1/2], Iter [53/3125], train_loss:0.279841
Epoch [1/2], Iter [54/3125], train_loss:0.261875
Epoch [1/2], Iter [55/3125], train_loss:0.216530
Epoch [1/2], Iter [56/3125], train_loss:0.279174
Epoch [1/2], Iter [57/3125], train_loss:0.188948
Epoch [1/2], Iter [58/3125], train_loss:0.207412
Epoch [1/2], Iter [59/3125], train_loss:0.239609
Epoch [1/2], Iter [60/3125], train_loss:0.195655
Epoch [1/2], Iter [61/3125], train_loss:0.196358
Epoch [1/2], Iter [62/3125], train_loss:0.264320
Epoch [1/2], Iter [63/3125], train_loss:0.193350
Epoch [1/2], Iter [64/3125], train_loss:0.165940
Epoch [1/2], Iter [65/3125], train_loss:0.267849
Epoch [1/2], Iter [66/3125], train_loss:0.221301
Epoch [1/2], Iter [67/3125], train_loss:0.269790
Epoch [1/2], Iter [68/3125], train_loss:0.227033
Epoch [1/2], Iter [69/3125], train_loss:0.156358
Epoch [1/2], Iter [70/3125], train_loss:0.210391
Epoch [1/2], Iter [71/3125], train_loss:0.251990
Epoch [1/2], Iter [72/3125], train_loss:0.177134
Epoch [1/2], Iter [73/3125], train_loss:0.155195
Epoch [1/2], Iter [74/3125], train_loss:0.251515
Epoch [1/2], Iter [75/3125], train_loss:0.159152
Epoch [1/2], Iter [76/3125], train_loss:0.166255
Epoch [1/2], Iter [77/3125], train_loss:0.115882
Epoch [1/2], Iter [78/3125], train_loss:0.175745
Epoch [1/2], Iter [79/3125], train_loss:0.138844
Epoch [1/2], Iter [80/3125], train_loss:0.176611
Epoch [1/2], Iter [81/3125], train_loss:0.161312
Epoch [1/2], Iter [82/3125], train_loss:0.148712
Epoch [1/2], Iter [83/3125], train_loss:0.207151
Epoch [1/2], Iter [84/3125], train_loss:0.111603
Epoch [1/2], Iter [85/3125], train_loss:0.107699
Epoch [1/2], Iter [86/3125], train_loss:0.162084
Epoch [1/2], Iter [87/3125], train_loss:0.199193
Epoch [1/2], Iter [88/3125], train_loss:0.138881
Epoch [1/2], Iter [89/3125], train_loss:0.161221
Epoch [1/2], Iter [90/3125], train_loss:0.149200
Epoch [1/2], Iter [91/3125], train_loss:0.151864
Epoch [1/2], Iter [92/3125], train_loss:0.201360
Epoch [1/2], Iter [93/3125], train_loss:0.169258
Epoch [1/2], Iter [94/3125], train_loss:0.149062
Epoch [1/2], Iter [95/3125], train_loss:0.149584
Epoch [1/2], Iter [96/3125], train_loss:0.145563
Epoch [1/2], Iter [97/3125], train_loss:0.126489
Epoch [1/2], Iter [98/3125], train_loss:0.139146
Epoch [1/2], Iter [99/3125], train_loss:0.138828
Epoch [1/2], Iter [100/3125], train_loss:0.133510
Epoch [1/2], Iter [101/3125], train_loss:0.137596
Epoch [1/2], Iter [102/3125], train_loss:0.130815
Epoch [1/2], Iter [103/3125], train_loss:0.156223
Epoch [1/2], Iter [104/3125], train_loss:0.101501
Epoch [1/2], Iter [105/3125], train_loss:0.119640
Epoch [1/2], Iter [106/3125], train_loss:0.145987
Epoch [1/2], Iter [107/3125], train_loss:0.182159
Epoch [1/2], Iter [108/3125], train_loss:0.134178
Epoch [1/2], Iter [109/3125], train_loss:0.125466
Epoch [1/2], Iter [110/3125], train_loss:0.136854
Epoch [1/2], Iter [111/3125], train_loss:0.114577
Epoch [1/2], Iter [112/3125], train_loss:0.176352
Epoch [1/2], Iter [113/3125], train_loss:0.114336
Epoch [1/2], Iter [114/3125], train_loss:0.132073
Epoch [1/2], Iter [115/3125], train_loss:0.132009
Epoch [1/2], Iter [116/3125], train_loss:0.138485
Epoch [1/2], Iter [117/3125], train_loss:0.131889
Epoch [1/2], Iter [118/3125], train_loss:0.127713
Epoch [1/2], Iter [119/3125], train_loss:0.136108
Epoch [1/2], Iter [120/3125], train_loss:0.099374
Epoch [1/2], Iter [121/3125], train_loss:0.177180
Epoch [1/2], Iter [122/3125], train_loss:0.133789
Epoch [1/2], Iter [123/3125], train_loss:0.108010
Epoch [1/2], Iter [124/3125], train_loss:0.124499
Epoch [1/2], Iter [125/3125], train_loss:0.145130
Epoch [1/2], Iter [126/3125], train_loss:0.139046
Epoch [1/2], Iter [127/3125], train_loss:0.162694
Epoch [1/2], Iter [128/3125], train_loss:0.106318
Epoch [1/2], Iter [129/3125], train_loss:0.136911
Epoch [1/2], Iter [130/3125], train_loss:0.161438
Epoch [1/2], Iter [131/3125], train_loss:0.116436
Epoch [1/2], Iter [132/3125], train_loss:0.145941
Epoch [1/2], Iter [133/3125], train_loss:0.114138
Epoch [1/2], Iter [134/3125], train_loss:0.167708
Epoch [1/2], Iter [135/3125], train_loss:0.137426
Epoch [1/2], Iter [136/3125], train_loss:0.181821
Epoch [1/2], Iter [137/3125], train_loss:0.126747
Epoch [1/2], Iter [138/3125], train_loss:0.161444
Epoch [1/2], Iter [139/3125], train_loss:0.137294
Epoch [1/2], Iter [140/3125], train_loss:0.140909
Epoch [1/2], Iter [141/3125], train_loss:0.127225
Epoch [1/2], Iter [142/3125], train_loss:0.086217
Epoch [1/2], Iter [143/3125], train_loss:0.125356
Epoch [1/2], Iter [144/3125], train_loss:0.152855
Epoch [1/2], Iter [145/3125], train_loss:0.182545
Epoch [1/2], Iter [146/3125], train_loss:0.076299
Epoch [1/2], Iter [147/3125], train_loss:0.154243
Epoch [1/2], Iter [148/3125], train_loss:0.101580
Epoch [1/2], Iter [149/3125], train_loss:0.136949
Epoch [1/2], Iter [150/3125], train_loss:0.137361
Epoch [1/2], Iter [151/3125], train_loss:0.119204
Epoch [1/2], Iter [152/3125], train_loss:0.126940
Epoch [1/2], Iter [153/3125], train_loss:0.127168
Epoch [1/2], Iter [154/3125], train_loss:0.132602
Epoch [1/2], Iter [155/3125], train_loss:0.112731
Epoch [1/2], Iter [156/3125], train_loss:0.128222
Epoch [1/2], Iter [157/3125], train_loss:0.112968
Epoch [1/2], Iter [158/3125], train_loss:0.106631
Epoch [1/2], Iter [159/3125], train_loss:0.131883
Epoch [1/2], Iter [160/3125], train_loss:0.105249
Epoch [1/2], Iter [161/3125], train_loss:0.148656
Epoch [1/2], Iter [162/3125], train_loss:0.115082
Epoch [1/2], Iter [163/3125], train_loss:0.099327
Epoch [1/2], Iter [164/3125], train_loss:0.131512
Epoch [1/2], Iter [165/3125], train_loss:0.121838
Epoch [1/2], Iter [166/3125], train_loss:0.122599
Epoch [1/2], Iter [167/3125], train_loss:0.108223
Epoch [1/2], Iter [168/3125], train_loss:0.157398
Epoch [1/2], Iter [169/3125], train_loss:0.112632
Epoch [1/2], Iter [170/3125], train_loss:0.092063
Epoch [1/2], Iter [171/3125], train_loss:0.092099
Epoch [1/2], Iter [172/3125], train_loss:0.143247
Epoch [1/2], Iter [173/3125], train_loss:0.107952
Epoch [1/2], Iter [174/3125], train_loss:0.150982
Epoch [1/2], Iter [175/3125], train_loss:0.154513
Epoch [1/2], Iter [176/3125], train_loss:0.122460
Epoch [1/2], Iter [177/3125], train_loss:0.130054
Epoch [1/2], Iter [178/3125], train_loss:0.075364
Epoch [1/2], Iter [179/3125], train_loss:0.092844
Epoch [1/2], Iter [180/3125], train_loss:0.131176
Epoch [1/2], Iter [181/3125], train_loss:0.089559
Epoch [1/2], Iter [182/3125], train_loss:0.137490
Epoch [1/2], Iter [183/3125], train_loss:0.148960
Epoch [1/2], Iter [184/3125], train_loss:0.088713
Epoch [1/2], Iter [185/3125], train_loss:0.098040
Epoch [1/2], Iter [186/3125], train_loss:0.159430
Epoch [1/2], Iter [187/3125], train_loss:0.091044
Epoch [1/2], Iter [188/3125], train_loss:0.108532
Epoch [1/2], Iter [189/3125], train_loss:0.089453
Epoch [1/2], Iter [190/3125], train_loss:0.112841
Epoch [1/2], Iter [191/3125], train_loss:0.150818
Epoch [1/2], Iter [192/3125], train_loss:0.112883
Epoch [1/2], Iter [193/3125], train_loss:0.124884
Epoch [1/2], Iter [194/3125], train_loss:0.107502
Epoch [1/2], Iter [195/3125], train_loss:0.099678
Epoch [1/2], Iter [196/3125], train_loss:0.183032
Epoch [1/2], Iter [197/3125], train_loss:0.111150
Epoch [1/2], Iter [198/3125], train_loss:0.136155
Epoch [1/2], Iter [199/3125], train_loss:0.113451
Epoch [1/2], Iter [200/3125], train_loss:0.144825
Epoch [1/2], Iter [201/3125], train_loss:0.133655
Epoch [1/2], Iter [202/3125], train_loss:0.111885
Epoch [1/2], Iter [203/3125], train_loss:0.111356
Epoch [1/2], Iter [204/3125], train_loss:0.107932
Epoch [1/2], Iter [205/3125], train_loss:0.143930
Epoch [1/2], Iter [206/3125], train_loss:0.097970
Epoch [1/2], Iter [207/3125], train_loss:0.088761
Epoch [1/2], Iter [208/3125], train_loss:0.131987
Epoch [1/2], Iter [209/3125], train_loss:0.135780
Epoch [1/2], Iter [210/3125], train_loss:0.096630
Epoch [1/2], Iter [211/3125], train_loss:0.128221
Epoch [1/2], Iter [212/3125], train_loss:0.155038
Epoch [1/2], Iter [213/3125], train_loss:0.099105
Epoch [1/2], Iter [214/3125], train_loss:0.111038
Epoch [1/2], Iter [215/3125], train_loss:0.142604
Epoch [1/2], Iter [216/3125], train_loss:0.145580
Epoch [1/2], Iter [217/3125], train_loss:0.111073
Epoch [1/2], Iter [218/3125], train_loss:0.128455
Epoch [1/2], Iter [219/3125], train_loss:0.096221
Epoch [1/2], Iter [220/3125], train_loss:0.086480
Epoch [1/2], Iter [221/3125], train_loss:0.115596
Epoch [1/2], Iter [222/3125], train_loss:0.093819
Epoch [1/2], Iter [223/3125], train_loss:0.068540
Epoch [1/2], Iter [224/3125], train_loss:0.105397
Epoch [1/2], Iter [225/3125], train_loss:0.081237
Epoch [1/2], Iter [226/3125], train_loss:0.127183
Epoch [1/2], Iter [227/3125], train_loss:0.133673
Epoch [1/2], Iter [228/3125], train_loss:0.102121
Epoch [1/2], Iter [229/3125], train_loss:0.124757
Epoch [1/2], Iter [230/3125], train_loss:0.124150
Epoch [1/2], Iter [231/3125], train_loss:0.109962
Epoch [1/2], Iter [232/3125], train_loss:0.121613
Epoch [1/2], Iter [233/3125], train_loss:0.122472
Epoch [1/2], Iter [234/3125], train_loss:0.093679
Epoch [1/2], Iter [235/3125], train_loss:0.104721
Epoch [1/2], Iter [236/3125], train_loss:0.102781
Epoch [1/2], Iter [237/3125], train_loss:0.093572
Epoch [1/2], Iter [238/3125], train_loss:0.094514
Epoch [1/2], Iter [239/3125], train_loss:0.099495
Epoch [1/2], Iter [240/3125], train_loss:0.106375
Epoch [1/2], Iter [241/3125], train_loss:0.111261
Epoch [1/2], Iter [242/3125], train_loss:0.089024
Epoch [1/2], Iter [243/3125], train_loss:0.107102
Epoch [1/2], Iter [244/3125], train_loss:0.098898
Epoch [1/2], Iter [245/3125], train_loss:0.105752
Epoch [1/2], Iter [246/3125], train_loss:0.098761
Epoch [1/2], Iter [247/3125], train_loss:0.110852
Epoch [1/2], Iter [248/3125], train_loss:0.110072
Epoch [1/2], Iter [249/3125], train_loss:0.106461
Epoch [1/2], Iter [250/3125], train_loss:0.123407
Epoch [1/2], Iter [251/3125], train_loss:0.092958
Epoch [1/2], Iter [252/3125], train_loss:0.111045
Epoch [1/2], Iter [253/3125], train_loss:0.129692
Epoch [1/2], Iter [254/3125], train_loss:0.096450
Epoch [1/2], Iter [255/3125], train_loss:0.084925
Epoch [1/2], Iter [256/3125], train_loss:0.141627
Epoch [1/2], Iter [257/3125], train_loss:0.088181
Epoch [1/2], Iter [258/3125], train_loss:0.110038
Epoch [1/2], Iter [259/3125], train_loss:0.132803
Epoch [1/2], Iter [260/3125], train_loss:0.098667
Epoch [1/2], Iter [261/3125], train_loss:0.085513
Epoch [1/2], Iter [262/3125], train_loss:0.121055
Epoch [1/2], Iter [263/3125], train_loss:0.099879
Epoch [1/2], Iter [264/3125], train_loss:0.149433
Epoch [1/2], Iter [265/3125], train_loss:0.116061
Epoch [1/2], Iter [266/3125], train_loss:0.090697
Epoch [1/2], Iter [267/3125], train_loss:0.087413
Epoch [1/2], Iter [268/3125], train_loss:0.146219
Epoch [1/2], Iter [269/3125], train_loss:0.097796
Epoch [1/2], Iter [270/3125], train_loss:0.088155
Epoch [1/2], Iter [271/3125], train_loss:0.107575
Epoch [1/2], Iter [272/3125], train_loss:0.101357
Epoch [1/2], Iter [273/3125], train_loss:0.090542
Epoch [1/2], Iter [274/3125], train_loss:0.092936
Epoch [1/2], Iter [275/3125], train_loss:0.107296
Epoch [1/2], Iter [276/3125], train_loss:0.078067
Epoch [1/2], Iter [277/3125], train_loss:0.099335
Epoch [1/2], Iter [278/3125], train_loss:0.118054
Epoch [1/2], Iter [279/3125], train_loss:0.098823
Epoch [1/2], Iter [280/3125], train_loss:0.100404
Epoch [1/2], Iter [281/3125], train_loss:0.116890
Epoch [1/2], Iter [282/3125], train_loss:0.083836
Epoch [1/2], Iter [283/3125], train_loss:0.134695
Epoch [1/2], Iter [284/3125], train_loss:0.092292
Epoch [1/2], Iter [285/3125], train_loss:0.089188
Epoch [1/2], Iter [286/3125], train_loss:0.103081
Epoch [1/2], Iter [287/3125], train_loss:0.127043
Epoch [1/2], Iter [288/3125], train_loss:0.116650
Epoch [1/2], Iter [289/3125], train_loss:0.121881
Epoch [1/2], Iter [290/3125], train_loss:0.186911
Epoch [1/2], Iter [291/3125], train_loss:0.126078
Epoch [1/2], Iter [292/3125], train_loss:0.091569
Epoch [1/2], Iter [293/3125], train_loss:0.079495
Epoch [1/2], Iter [294/3125], train_loss:0.099240
Epoch [1/2], Iter [295/3125], train_loss:0.118772
Epoch [1/2], Iter [296/3125], train_loss:0.093694
Epoch [1/2], Iter [297/3125], train_loss:0.108655
Epoch [1/2], Iter [298/3125], train_loss:0.095032
Epoch [1/2], Iter [299/3125], train_loss:0.111288
Epoch [1/2], Iter [300/3125], train_loss:0.098187
Epoch [1/2], Iter [301/3125], train_loss:0.097793
Epoch [1/2], Iter [302/3125], train_loss:0.096069
Epoch [1/2], Iter [303/3125], train_loss:0.098303
Epoch [1/2], Iter [304/3125], train_loss:0.053307
Epoch [1/2], Iter [305/3125], train_loss:0.089034
Epoch [1/2], Iter [306/3125], train_loss:0.079592
Epoch [1/2], Iter [307/3125], train_loss:0.127933
Epoch [1/2], Iter [308/3125], train_loss:0.098109
Epoch [1/2], Iter [309/3125], train_loss:0.064728
Epoch [1/2], Iter [310/3125], train_loss:0.173963
Epoch [1/2], Iter [311/3125], train_loss:0.076444
Epoch [1/2], Iter [312/3125], train_loss:0.104166
Epoch [1/2], Iter [313/3125], train_loss:0.098701
Epoch [1/2], Iter [314/3125], train_loss:0.080666
Epoch [1/2], Iter [315/3125], train_loss:0.114130
Epoch [1/2], Iter [316/3125], train_loss:0.077030
Epoch [1/2], Iter [317/3125], train_loss:0.118316
Epoch [1/2], Iter [318/3125], train_loss:0.057820
Epoch [1/2], Iter [319/3125], train_loss:0.126976
Epoch [1/2], Iter [320/3125], train_loss:0.071933
Epoch [1/2], Iter [321/3125], train_loss:0.090767
Epoch [1/2], Iter [322/3125], train_loss:0.090457
Epoch [1/2], Iter [323/3125], train_loss:0.105079
Epoch [1/2], Iter [324/3125], train_loss:0.101791
Epoch [1/2], Iter [325/3125], train_loss:0.106632
Epoch [1/2], Iter [326/3125], train_loss:0.087738
Epoch [1/2], Iter [327/3125], train_loss:0.082531
Epoch [1/2], Iter [328/3125], train_loss:0.123027
Epoch [1/2], Iter [329/3125], train_loss:0.089840
Epoch [1/2], Iter [330/3125], train_loss:0.123866
Epoch [1/2], Iter [331/3125], train_loss:0.139623
Epoch [1/2], Iter [332/3125], train_loss:0.097267
Epoch [1/2], Iter [333/3125], train_loss:0.087837
Epoch [1/2], Iter [334/3125], train_loss:0.079422
Epoch [1/2], Iter [335/3125], train_loss:0.085209
Epoch [1/2], Iter [336/3125], train_loss:0.147867
Epoch [1/2], Iter [337/3125], train_loss:0.149562
Epoch [1/2], Iter [338/3125], train_loss:0.107306
Epoch [1/2], Iter [339/3125], train_loss:0.114367
Epoch [1/2], Iter [340/3125], train_loss:0.075745
Epoch [1/2], Iter [341/3125], train_loss:0.081646
Epoch [1/2], Iter [342/3125], train_loss:0.114543
Epoch [1/2], Iter [343/3125], train_loss:0.107771
Epoch [1/2], Iter [344/3125], train_loss:0.091723
Epoch [1/2], Iter [345/3125], train_loss:0.085628
Epoch [1/2], Iter [346/3125], train_loss:0.069710
Epoch [1/2], Iter [347/3125], train_loss:0.080913
Epoch [1/2], Iter [348/3125], train_loss:0.078024
Epoch [1/2], Iter [349/3125], train_loss:0.132719
Epoch [1/2], Iter [350/3125], train_loss:0.119744
Epoch [1/2], Iter [351/3125], train_loss:0.116647
Epoch [1/2], Iter [352/3125], train_loss:0.109735
Epoch [1/2], Iter [353/3125], train_loss:0.081496
Epoch [1/2], Iter [354/3125], train_loss:0.073368
Epoch [1/2], Iter [355/3125], train_loss:0.111581
Epoch [1/2], Iter [356/3125], train_loss:0.075484
Epoch [1/2], Iter [357/3125], train_loss:0.072975
Epoch [1/2], Iter [358/3125], train_loss:0.062364
Epoch [1/2], Iter [359/3125], train_loss:0.076667
Epoch [1/2], Iter [360/3125], train_loss:0.080340
Epoch [1/2], Iter [361/3125], train_loss:0.063418
Epoch [1/2], Iter [362/3125], train_loss:0.061630
Epoch [1/2], Iter [363/3125], train_loss:0.062767
Epoch [1/2], Iter [364/3125], train_loss:0.084588
Epoch [1/2], Iter [365/3125], train_loss:0.105539
Epoch [1/2], Iter [366/3125], train_loss:0.071236
Epoch [1/2], Iter [367/3125], train_loss:0.087279
Epoch [1/2], Iter [368/3125], train_loss:0.076322
Epoch [1/2], Iter [369/3125], train_loss:0.116615
Epoch [1/2], Iter [370/3125], train_loss:0.100660
Epoch [1/2], Iter [371/3125], train_loss:0.099755
Epoch [1/2], Iter [372/3125], train_loss:0.114215
Epoch [1/2], Iter [373/3125], train_loss:0.112513
Epoch [1/2], Iter [374/3125], train_loss:0.101781
Epoch [1/2], Iter [375/3125], train_loss:0.067294
Epoch [1/2], Iter [376/3125], train_loss:0.098053
Epoch [1/2], Iter [377/3125], train_loss:0.107353
Epoch [1/2], Iter [378/3125], train_loss:0.081777
Epoch [1/2], Iter [379/3125], train_loss:0.080122
Epoch [1/2], Iter [380/3125], train_loss:0.107728
Epoch [1/2], Iter [381/3125], train_loss:0.095094
Epoch [1/2], Iter [382/3125], train_loss:0.083242
Epoch [1/2], Iter [383/3125], train_loss:0.102041
Epoch [1/2], Iter [384/3125], train_loss:0.072550
Epoch [1/2], Iter [385/3125], train_loss:0.088450
Epoch [1/2], Iter [386/3125], train_loss:0.092246
Epoch [1/2], Iter [387/3125], train_loss:0.105446
Epoch [1/2], Iter [388/3125], train_loss:0.127865
Epoch [1/2], Iter [389/3125], train_loss:0.072769
Epoch [1/2], Iter [390/3125], train_loss:0.073997
Epoch [1/2], Iter [391/3125], train_loss:0.066677
Epoch [1/2], Iter [392/3125], train_loss:0.102232
Epoch [1/2], Iter [393/3125], train_loss:0.117690
Epoch [1/2], Iter [394/3125], train_loss:0.084889
Epoch [1/2], Iter [395/3125], train_loss:0.103554
Epoch [1/2], Iter [396/3125], train_loss:0.073418
Epoch [1/2], Iter [397/3125], train_loss:0.096942
Epoch [1/2], Iter [398/3125], train_loss:0.089206
Epoch [1/2], Iter [399/3125], train_loss:0.126500
Epoch [1/2], Iter [400/3125], train_loss:0.119990
Epoch [1/2], Iter [401/3125], train_loss:0.065327
Epoch [1/2], Iter [402/3125], train_loss:0.127086
Epoch [1/2], Iter [403/3125], train_loss:0.089086
Epoch [1/2], Iter [404/3125], train_loss:0.088689
Epoch [1/2], Iter [405/3125], train_loss:0.118437
Epoch [1/2], Iter [406/3125], train_loss:0.111353
Epoch [1/2], Iter [407/3125], train_loss:0.128636
Epoch [1/2], Iter [408/3125], train_loss:0.104118
Epoch [1/2], Iter [409/3125], train_loss:0.090673
Epoch [1/2], Iter [410/3125], train_loss:0.125681
Epoch [1/2], Iter [411/3125], train_loss:0.115205
Epoch [1/2], Iter [412/3125], train_loss:0.077153
Epoch [1/2], Iter [413/3125], train_loss:0.094824
Epoch [1/2], Iter [414/3125], train_loss:0.098783
Epoch [1/2], Iter [415/3125], train_loss:0.087345
Epoch [1/2], Iter [416/3125], train_loss:0.097017
Epoch [1/2], Iter [417/3125], train_loss:0.096015
Epoch [1/2], Iter [418/3125], train_loss:0.075332
Epoch [1/2], Iter [419/3125], train_loss:0.084599
Epoch [1/2], Iter [420/3125], train_loss:0.111044
Epoch [1/2], Iter [421/3125], train_loss:0.093526
Epoch [1/2], Iter [422/3125], train_loss:0.063629
Epoch [1/2], Iter [423/3125], train_loss:0.067428
Epoch [1/2], Iter [424/3125], train_loss:0.079753
Epoch [1/2], Iter [425/3125], train_loss:0.135439
Epoch [1/2], Iter [426/3125], train_loss:0.112857
Epoch [1/2], Iter [427/3125], train_loss:0.074499
Epoch [1/2], Iter [428/3125], train_loss:0.052821
Epoch [1/2], Iter [429/3125], train_loss:0.075851
Epoch [1/2], Iter [430/3125], train_loss:0.104684
Epoch [1/2], Iter [431/3125], train_loss:0.102066
Epoch [1/2], Iter [432/3125], train_loss:0.083621
Epoch [1/2], Iter [433/3125], train_loss:0.064658
Epoch [1/2], Iter [434/3125], train_loss:0.111376
Epoch [1/2], Iter [435/3125], train_loss:0.055758
Epoch [1/2], Iter [436/3125], train_loss:0.128865
Epoch [1/2], Iter [437/3125], train_loss:0.100289
Epoch [1/2], Iter [438/3125], train_loss:0.084247
Epoch [1/2], Iter [439/3125], train_loss:0.073448
Epoch [1/2], Iter [440/3125], train_loss:0.080761
Epoch [1/2], Iter [441/3125], train_loss:0.119340
Epoch [1/2], Iter [442/3125], train_loss:0.173922
Epoch [1/2], Iter [443/3125], train_loss:0.067979
Epoch [1/2], Iter [444/3125], train_loss:0.080348
Epoch [1/2], Iter [445/3125], train_loss:0.132988
Epoch [1/2], Iter [446/3125], train_loss:0.069152
Epoch [1/2], Iter [447/3125], train_loss:0.084873
Epoch [1/2], Iter [448/3125], train_loss:0.088424
Epoch [1/2], Iter [449/3125], train_loss:0.094467
Epoch [1/2], Iter [450/3125], train_loss:0.111121
Epoch [1/2], Iter [451/3125], train_loss:0.067928
Epoch [1/2], Iter [452/3125], train_loss:0.065471
Epoch [1/2], Iter [453/3125], train_loss:0.075276
Epoch [1/2], Iter [454/3125], train_loss:0.076016
Epoch [1/2], Iter [455/3125], train_loss:0.088840
Epoch [1/2], Iter [456/3125], train_loss:0.061118
Epoch [1/2], Iter [457/3125], train_loss:0.079531
Epoch [1/2], Iter [458/3125], train_loss:0.122364
Epoch [1/2], Iter [459/3125], train_loss:0.100249
Epoch [1/2], Iter [460/3125], train_loss:0.073599
Epoch [1/2], Iter [461/3125], train_loss:0.084068
Epoch [1/2], Iter [462/3125], train_loss:0.056314
Epoch [1/2], Iter [463/3125], train_loss:0.079495
Epoch [1/2], Iter [464/3125], train_loss:0.076411
Epoch [1/2], Iter [465/3125], train_loss:0.130830
Epoch [1/2], Iter [466/3125], train_loss:0.086917
Epoch [1/2], Iter [467/3125], train_loss:0.093509
Epoch [1/2], Iter [468/3125], train_loss:0.084006
Epoch [1/2], Iter [469/3125], train_loss:0.070421
Epoch [1/2], Iter [470/3125], train_loss:0.107369
Epoch [1/2], Iter [471/3125], train_loss:0.065467
Epoch [1/2], Iter [472/3125], train_loss:0.069032
Epoch [1/2], Iter [473/3125], train_loss:0.073237
Epoch [1/2], Iter [474/3125], train_loss:0.151757
Epoch [1/2], Iter [475/3125], train_loss:0.097692
Epoch [1/2], Iter [476/3125], train_loss:0.100925
Epoch [1/2], Iter [477/3125], train_loss:0.091285
Epoch [1/2], Iter [478/3125], train_loss:0.103061
Epoch [1/2], Iter [479/3125], train_loss:0.064359
Epoch [1/2], Iter [480/3125], train_loss:0.082491
Epoch [1/2], Iter [481/3125], train_loss:0.057366
Epoch [1/2], Iter [482/3125], train_loss:0.092543
Epoch [1/2], Iter [483/3125], train_loss:0.067777
Epoch [1/2], Iter [484/3125], train_loss:0.067935
Epoch [1/2], Iter [485/3125], train_loss:0.105495
Epoch [1/2], Iter [486/3125], train_loss:0.136604
Epoch [1/2], Iter [487/3125], train_loss:0.092469
Epoch [1/2], Iter [488/3125], train_loss:0.082614
Epoch [1/2], Iter [489/3125], train_loss:0.122642
Epoch [1/2], Iter [490/3125], train_loss:0.064453
Epoch [1/2], Iter [491/3125], train_loss:0.127374
Epoch [1/2], Iter [492/3125], train_loss:0.090427
Epoch [1/2], Iter [493/3125], train_loss:0.076251
Epoch [1/2], Iter [494/3125], train_loss:0.061046
Epoch [1/2], Iter [495/3125], train_loss:0.103997
Epoch [1/2], Iter [496/3125], train_loss:0.109734
Epoch [1/2], Iter [497/3125], train_loss:0.070913
Epoch [1/2], Iter [498/3125], train_loss:0.069599
Epoch [1/2], Iter [499/3125], train_loss:0.078603
Epoch [1/2], Iter [500/3125], train_loss:0.133940
Epoch [1/2], Iter [501/3125], train_loss:0.072970
Epoch [1/2], Iter [502/3125], train_loss:0.075337
Epoch [1/2], Iter [503/3125], train_loss:0.094221
Epoch [1/2], Iter [504/3125], train_loss:0.091344
Epoch [1/2], Iter [505/3125], train_loss:0.085541
Epoch [1/2], Iter [506/3125], train_loss:0.089418
Epoch [1/2], Iter [507/3125], train_loss:0.066250
Epoch [1/2], Iter [508/3125], train_loss:0.112804
Epoch [1/2], Iter [509/3125], train_loss:0.084062
Epoch [1/2], Iter [510/3125], train_loss:0.087550
Epoch [1/2], Iter [511/3125], train_loss:0.073422
Epoch [1/2], Iter [512/3125], train_loss:0.089989
Epoch [1/2], Iter [513/3125], train_loss:0.056597
Epoch [1/2], Iter [514/3125], train_loss:0.084649
Epoch [1/2], Iter [515/3125], train_loss:0.095353
Epoch [1/2], Iter [516/3125], train_loss:0.057524
Epoch [1/2], Iter [517/3125], train_loss:0.086105
Epoch [1/2], Iter [518/3125], train_loss:0.100302
Epoch [1/2], Iter [519/3125], train_loss:0.085303
Epoch [1/2], Iter [520/3125], train_loss:0.097001
Epoch [1/2], Iter [521/3125], train_loss:0.078477
Epoch [1/2], Iter [522/3125], train_loss:0.118421
Epoch [1/2], Iter [523/3125], train_loss:0.094699
Epoch [1/2], Iter [524/3125], train_loss:0.081237
Epoch [1/2], Iter [525/3125], train_loss:0.082480
Epoch [1/2], Iter [526/3125], train_loss:0.082260
Epoch [1/2], Iter [527/3125], train_loss:0.088543
Epoch [1/2], Iter [528/3125], train_loss:0.072576
Epoch [1/2], Iter [529/3125], train_loss:0.095206
Epoch [1/2], Iter [530/3125], train_loss:0.076497
Epoch [1/2], Iter [531/3125], train_loss:0.051827
Epoch [1/2], Iter [532/3125], train_loss:0.051135
Epoch [1/2], Iter [533/3125], train_loss:0.088031
Epoch [1/2], Iter [534/3125], train_loss:0.111677
Epoch [1/2], Iter [535/3125], train_loss:0.070332
Epoch [1/2], Iter [536/3125], train_loss:0.084658
Epoch [1/2], Iter [537/3125], train_loss:0.099877
Epoch [1/2], Iter [538/3125], train_loss:0.083049
Epoch [1/2], Iter [539/3125], train_loss:0.080456
Epoch [1/2], Iter [540/3125], train_loss:0.060653
Epoch [1/2], Iter [541/3125], train_loss:0.126004
Epoch [1/2], Iter [542/3125], train_loss:0.089957
Epoch [1/2], Iter [543/3125], train_loss:0.097005
Epoch [1/2], Iter [544/3125], train_loss:0.098928
Epoch [1/2], Iter [545/3125], train_loss:0.050157
Epoch [1/2], Iter [546/3125], train_loss:0.068912
Epoch [1/2], Iter [547/3125], train_loss:0.105661
Epoch [1/2], Iter [548/3125], train_loss:0.063028
Epoch [1/2], Iter [549/3125], train_loss:0.101849
Epoch [1/2], Iter [550/3125], train_loss:0.087718
Epoch [1/2], Iter [551/3125], train_loss:0.085455
Epoch [1/2], Iter [552/3125], train_loss:0.101876
Epoch [1/2], Iter [553/3125], train_loss:0.069947
Epoch [1/2], Iter [554/3125], train_loss:0.082198
Epoch [1/2], Iter [555/3125], train_loss:0.078910
Epoch [1/2], Iter [556/3125], train_loss:0.071619
Epoch [1/2], Iter [557/3125], train_loss:0.091170
Epoch [1/2], Iter [558/3125], train_loss:0.073899
Epoch [1/2], Iter [559/3125], train_loss:0.097393
Epoch [1/2], Iter [560/3125], train_loss:0.059482
Epoch [1/2], Iter [561/3125], train_loss:0.086727
Epoch [1/2], Iter [562/3125], train_loss:0.067922
Epoch [1/2], Iter [563/3125], train_loss:0.049343
Epoch [1/2], Iter [564/3125], train_loss:0.079434
Epoch [1/2], Iter [565/3125], train_loss:0.082183
Epoch [1/2], Iter [566/3125], train_loss:0.093476
Epoch [1/2], Iter [567/3125], train_loss:0.078752
Epoch [1/2], Iter [568/3125], train_loss:0.091465
Epoch [1/2], Iter [569/3125], train_loss:0.089662
Epoch [1/2], Iter [570/3125], train_loss:0.080252
Epoch [1/2], Iter [571/3125], train_loss:0.068077
Epoch [1/2], Iter [572/3125], train_loss:0.061509
Epoch [1/2], Iter [573/3125], train_loss:0.085185
Epoch [1/2], Iter [574/3125], train_loss:0.079471
Epoch [1/2], Iter [575/3125], train_loss:0.053422
Epoch [1/2], Iter [576/3125], train_loss:0.077580
Epoch [1/2], Iter [577/3125], train_loss:0.097711
Epoch [1/2], Iter [578/3125], train_loss:0.088529
Epoch [1/2], Iter [579/3125], train_loss:0.078072
Epoch [1/2], Iter [580/3125], train_loss:0.066475
Epoch [1/2], Iter [581/3125], train_loss:0.100759
Epoch [1/2], Iter [582/3125], train_loss:0.059701
Epoch [1/2], Iter [583/3125], train_loss:0.109780
Epoch [1/2], Iter [584/3125], train_loss:0.091762
Epoch [1/2], Iter [585/3125], train_loss:0.092769
Epoch [1/2], Iter [586/3125], train_loss:0.087646
Epoch [1/2], Iter [587/3125], train_loss:0.077475
Epoch [1/2], Iter [588/3125], train_loss:0.082140
Epoch [1/2], Iter [589/3125], train_loss:0.064143
Epoch [1/2], Iter [590/3125], train_loss:0.118475
Epoch [1/2], Iter [591/3125], train_loss:0.061369
Epoch [1/2], Iter [592/3125], train_loss:0.103518
Epoch [1/2], Iter [593/3125], train_loss:0.109588
Epoch [1/2], Iter [594/3125], train_loss:0.075540
Epoch [1/2], Iter [595/3125], train_loss:0.066279
Epoch [1/2], Iter [596/3125], train_loss:0.084220
Epoch [1/2], Iter [597/3125], train_loss:0.093858
Epoch [1/2], Iter [598/3125], train_loss:0.064187
Epoch [1/2], Iter [599/3125], train_loss:0.066326
Epoch [1/2], Iter [600/3125], train_loss:0.081327
Epoch [1/2], Iter [601/3125], train_loss:0.083892
Epoch [1/2], Iter [602/3125], train_loss:0.072193
Epoch [1/2], Iter [603/3125], train_loss:0.070572
Epoch [1/2], Iter [604/3125], train_loss:0.099174
Epoch [1/2], Iter [605/3125], train_loss:0.073340
Epoch [1/2], Iter [606/3125], train_loss:0.075066
Epoch [1/2], Iter [607/3125], train_loss:0.089540
Epoch [1/2], Iter [608/3125], train_loss:0.087063
Epoch [1/2], Iter [609/3125], train_loss:0.067917
Epoch [1/2], Iter [610/3125], train_loss:0.078777
Epoch [1/2], Iter [611/3125], train_loss:0.073020
Epoch [1/2], Iter [612/3125], train_loss:0.053916
Epoch [1/2], Iter [613/3125], train_loss:0.099749
Epoch [1/2], Iter [614/3125], train_loss:0.076472
Epoch [1/2], Iter [615/3125], train_loss:0.092774
Epoch [1/2], Iter [616/3125], train_loss:0.072519
Epoch [1/2], Iter [617/3125], train_loss:0.115796
Epoch [1/2], Iter [618/3125], train_loss:0.111423
Epoch [1/2], Iter [619/3125], train_loss:0.035930
Epoch [1/2], Iter [620/3125], train_loss:0.053881
Epoch [1/2], Iter [621/3125], train_loss:0.121114
Epoch [1/2], Iter [622/3125], train_loss:0.121951
Epoch [1/2], Iter [623/3125], train_loss:0.073308
Epoch [1/2], Iter [624/3125], train_loss:0.048398
Epoch [1/2], Iter [625/3125], train_loss:0.107412
Epoch [1/2], Iter [626/3125], train_loss:0.068145
Epoch [1/2], Iter [627/3125], train_loss:0.077340
Epoch [1/2], Iter [628/3125], train_loss:0.085913
Epoch [1/2], Iter [629/3125], train_loss:0.085568
Epoch [1/2], Iter [630/3125], train_loss:0.075331
Epoch [1/2], Iter [631/3125], train_loss:0.063729
Epoch [1/2], Iter [632/3125], train_loss:0.096395
Epoch [1/2], Iter [633/3125], train_loss:0.091692
Epoch [1/2], Iter [634/3125], train_loss:0.087556
Epoch [1/2], Iter [635/3125], train_loss:0.128987
Epoch [1/2], Iter [636/3125], train_loss:0.078282
Epoch [1/2], Iter [637/3125], train_loss:0.072686
Epoch [1/2], Iter [638/3125], train_loss:0.101055
Epoch [1/2], Iter [639/3125], train_loss:0.088135
Epoch [1/2], Iter [640/3125], train_loss:0.076548
Epoch [1/2], Iter [641/3125], train_loss:0.074535
Epoch [1/2], Iter [642/3125], train_loss:0.133764
Epoch [1/2], Iter [643/3125], train_loss:0.081785
Epoch [1/2], Iter [644/3125], train_loss:0.081873
Epoch [1/2], Iter [645/3125], train_loss:0.052027
Epoch [1/2], Iter [646/3125], train_loss:0.065710
Epoch [1/2], Iter [647/3125], train_loss:0.066639
Epoch [1/2], Iter [648/3125], train_loss:0.077497
Epoch [1/2], Iter [649/3125], train_loss:0.071994
Epoch [1/2], Iter [650/3125], train_loss:0.077160
Epoch [1/2], Iter [651/3125], train_loss:0.088668
Epoch [1/2], Iter [652/3125], train_loss:0.091575
Epoch [1/2], Iter [653/3125], train_loss:0.063036
Epoch [1/2], Iter [654/3125], train_loss:0.077080
Epoch [1/2], Iter [655/3125], train_loss:0.120097
Epoch [1/2], Iter [656/3125], train_loss:0.057079
Epoch [1/2], Iter [657/3125], train_loss:0.078749
Epoch [1/2], Iter [658/3125], train_loss:0.080975
Epoch [1/2], Iter [659/3125], train_loss:0.084412
Epoch [1/2], Iter [660/3125], train_loss:0.081507
Epoch [1/2], Iter [661/3125], train_loss:0.106032
Epoch [1/2], Iter [662/3125], train_loss:0.044990
Epoch [1/2], Iter [663/3125], train_loss:0.071733
Epoch [1/2], Iter [664/3125], train_loss:0.068678
Epoch [1/2], Iter [665/3125], train_loss:0.060852
Epoch [1/2], Iter [666/3125], train_loss:0.061496
Epoch [1/2], Iter [667/3125], train_loss:0.099616
Epoch [1/2], Iter [668/3125], train_loss:0.043187
Epoch [1/2], Iter [669/3125], train_loss:0.042735
Epoch [1/2], Iter [670/3125], train_loss:0.063698
Epoch [1/2], Iter [671/3125], train_loss:0.054137
Epoch [1/2], Iter [672/3125], train_loss:0.122349
Epoch [1/2], Iter [673/3125], train_loss:0.045259
Epoch [1/2], Iter [674/3125], train_loss:0.096469
Epoch [1/2], Iter [675/3125], train_loss:0.058725
Epoch [1/2], Iter [676/3125], train_loss:0.092602
Epoch [1/2], Iter [677/3125], train_loss:0.066935
Epoch [1/2], Iter [678/3125], train_loss:0.077298
Epoch [1/2], Iter [679/3125], train_loss:0.110552
Epoch [1/2], Iter [680/3125], train_loss:0.048738
Epoch [1/2], Iter [681/3125], train_loss:0.096448
Epoch [1/2], Iter [682/3125], train_loss:0.110349
Epoch [1/2], Iter [683/3125], train_loss:0.119194
Epoch [1/2], Iter [684/3125], train_loss:0.078200
Epoch [1/2], Iter [685/3125], train_loss:0.090346
Epoch [1/2], Iter [686/3125], train_loss:0.067279
Epoch [1/2], Iter [687/3125], train_loss:0.056750
Epoch [1/2], Iter [688/3125], train_loss:0.103682
Epoch [1/2], Iter [689/3125], train_loss:0.070194
Epoch [1/2], Iter [690/3125], train_loss:0.077888
Epoch [1/2], Iter [691/3125], train_loss:0.089339
Epoch [1/2], Iter [692/3125], train_loss:0.069433
Epoch [1/2], Iter [693/3125], train_loss:0.062627
Epoch [1/2], Iter [694/3125], train_loss:0.088834
Epoch [1/2], Iter [695/3125], train_loss:0.057176
Epoch [1/2], Iter [696/3125], train_loss:0.062857
Epoch [1/2], Iter [697/3125], train_loss:0.107247
Epoch [1/2], Iter [698/3125], train_loss:0.075563
Epoch [1/2], Iter [699/3125], train_loss:0.075217
Epoch [1/2], Iter [700/3125], train_loss:0.073498
Epoch [1/2], Iter [701/3125], train_loss:0.084294
Epoch [1/2], Iter [702/3125], train_loss:0.055456
Epoch [1/2], Iter [703/3125], train_loss:0.101781
Epoch [1/2], Iter [704/3125], train_loss:0.102988
Epoch [1/2], Iter [705/3125], train_loss:0.090018
Epoch [1/2], Iter [706/3125], train_loss:0.071555
Epoch [1/2], Iter [707/3125], train_loss:0.066634
Epoch [1/2], Iter [708/3125], train_loss:0.075814
Epoch [1/2], Iter [709/3125], train_loss:0.077288
Epoch [1/2], Iter [710/3125], train_loss:0.104503
Epoch [1/2], Iter [711/3125], train_loss:0.067886
Epoch [1/2], Iter [712/3125], train_loss:0.079606
Epoch [1/2], Iter [713/3125], train_loss:0.071527
Epoch [1/2], Iter [714/3125], train_loss:0.085514
Epoch [1/2], Iter [715/3125], train_loss:0.057681
Epoch [1/2], Iter [716/3125], train_loss:0.078999
Epoch [1/2], Iter [717/3125], train_loss:0.071168
Epoch [1/2], Iter [718/3125], train_loss:0.089825
Epoch [1/2], Iter [719/3125], train_loss:0.045149
Epoch [1/2], Iter [720/3125], train_loss:0.084063
Epoch [1/2], Iter [721/3125], train_loss:0.066844
Epoch [1/2], Iter [722/3125], train_loss:0.111551
Epoch [1/2], Iter [723/3125], train_loss:0.090148
Epoch [1/2], Iter [724/3125], train_loss:0.088762
Epoch [1/2], Iter [725/3125], train_loss:0.053935
Epoch [1/2], Iter [726/3125], train_loss:0.097556
Epoch [1/2], Iter [727/3125], train_loss:0.057640
Epoch [1/2], Iter [728/3125], train_loss:0.099852
Epoch [1/2], Iter [729/3125], train_loss:0.072951
Epoch [1/2], Iter [730/3125], train_loss:0.086131
Epoch [1/2], Iter [731/3125], train_loss:0.076418
Epoch [1/2], Iter [732/3125], train_loss:0.093934
Epoch [1/2], Iter [733/3125], train_loss:0.086792
Epoch [1/2], Iter [734/3125], train_loss:0.076435
Epoch [1/2], Iter [735/3125], train_loss:0.098343
Epoch [1/2], Iter [736/3125], train_loss:0.064591
Epoch [1/2], Iter [737/3125], train_loss:0.136798
Epoch [1/2], Iter [738/3125], train_loss:0.086149
Epoch [1/2], Iter [739/3125], train_loss:0.071737
Epoch [1/2], Iter [740/3125], train_loss:0.064806
Epoch [1/2], Iter [741/3125], train_loss:0.080049
Epoch [1/2], Iter [742/3125], train_loss:0.096013
Epoch [1/2], Iter [743/3125], train_loss:0.060116
Epoch [1/2], Iter [744/3125], train_loss:0.067535
Epoch [1/2], Iter [745/3125], train_loss:0.093100
Epoch [1/2], Iter [746/3125], train_loss:0.072566
Epoch [1/2], Iter [747/3125], train_loss:0.103533
Epoch [1/2], Iter [748/3125], train_loss:0.083829
Epoch [1/2], Iter [749/3125], train_loss:0.058632
Epoch [1/2], Iter [750/3125], train_loss:0.063049
Epoch [1/2], Iter [751/3125], train_loss:0.072190
Epoch [1/2], Iter [752/3125], train_loss:0.081107
Epoch [1/2], Iter [753/3125], train_loss:0.073657
Epoch [1/2], Iter [754/3125], train_loss:0.063324
Epoch [1/2], Iter [755/3125], train_loss:0.061974
Epoch [1/2], Iter [756/3125], train_loss:0.064494
Epoch [1/2], Iter [757/3125], train_loss:0.077813
Epoch [1/2], Iter [758/3125], train_loss:0.070678
Epoch [1/2], Iter [759/3125], train_loss:0.062416
Epoch [1/2], Iter [760/3125], train_loss:0.062071
Epoch [1/2], Iter [761/3125], train_loss:0.030896
Epoch [1/2], Iter [762/3125], train_loss:0.054023
Epoch [1/2], Iter [763/3125], train_loss:0.123419
Epoch [1/2], Iter [764/3125], train_loss:0.080511
Epoch [1/2], Iter [765/3125], train_loss:0.088166
Epoch [1/2], Iter [766/3125], train_loss:0.044754
Epoch [1/2], Iter [767/3125], train_loss:0.065380
Epoch [1/2], Iter [768/3125], train_loss:0.062831
Epoch [1/2], Iter [769/3125], train_loss:0.082807
Epoch [1/2], Iter [770/3125], train_loss:0.106045
Epoch [1/2], Iter [771/3125], train_loss:0.039265
Epoch [1/2], Iter [772/3125], train_loss:0.040538
Epoch [1/2], Iter [773/3125], train_loss:0.064032
Epoch [1/2], Iter [774/3125], train_loss:0.098438
Epoch [1/2], Iter [775/3125], train_loss:0.044762
Epoch [1/2], Iter [776/3125], train_loss:0.059482
Epoch [1/2], Iter [777/3125], train_loss:0.071769
Epoch [1/2], Iter [778/3125], train_loss:0.081381
Epoch [1/2], Iter [779/3125], train_loss:0.077327
Epoch [1/2], Iter [780/3125], train_loss:0.062736
Epoch [1/2], Iter [781/3125], train_loss:0.093462
Epoch [1/2], Iter [782/3125], train_loss:0.072988
Epoch [1/2], Iter [783/3125], train_loss:0.060638
Epoch [1/2], Iter [784/3125], train_loss:0.093783
Epoch [1/2], Iter [785/3125], train_loss:0.071993
Epoch [1/2], Iter [786/3125], train_loss:0.100763
Epoch [1/2], Iter [787/3125], train_loss:0.072992
Epoch [1/2], Iter [788/3125], train_loss:0.092503
Epoch [1/2], Iter [789/3125], train_loss:0.087834
Epoch [1/2], Iter [790/3125], train_loss:0.112599
Epoch [1/2], Iter [791/3125], train_loss:0.078161
Epoch [1/2], Iter [792/3125], train_loss:0.080000
Epoch [1/2], Iter [793/3125], train_loss:0.043560
Epoch [1/2], Iter [794/3125], train_loss:0.080028
Epoch [1/2], Iter [795/3125], train_loss:0.104163
Epoch [1/2], Iter [796/3125], train_loss:0.064733
Epoch [1/2], Iter [797/3125], train_loss:0.051298
Epoch [1/2], Iter [798/3125], train_loss:0.069372
Epoch [1/2], Iter [799/3125], train_loss:0.044411
Epoch [1/2], Iter [800/3125], train_loss:0.071995
Epoch [1/2], Iter [801/3125], train_loss:0.058943
Epoch [1/2], Iter [802/3125], train_loss:0.075079
Epoch [1/2], Iter [803/3125], train_loss:0.065944
Epoch [1/2], Iter [804/3125], train_loss:0.054138
Epoch [1/2], Iter [805/3125], train_loss:0.061844
Epoch [1/2], Iter [806/3125], train_loss:0.075249
Epoch [1/2], Iter [807/3125], train_loss:0.090213
Epoch [1/2], Iter [808/3125], train_loss:0.106900
Epoch [1/2], Iter [809/3125], train_loss:0.087969
Epoch [1/2], Iter [810/3125], train_loss:0.082871
Epoch [1/2], Iter [811/3125], train_loss:0.083834
Epoch [1/2], Iter [812/3125], train_loss:0.067130
Epoch [1/2], Iter [813/3125], train_loss:0.081398
Epoch [1/2], Iter [814/3125], train_loss:0.075722
Epoch [1/2], Iter [815/3125], train_loss:0.102066
Epoch [1/2], Iter [816/3125], train_loss:0.095934
Epoch [1/2], Iter [817/3125], train_loss:0.073375
Epoch [1/2], Iter [818/3125], train_loss:0.114593
Epoch [1/2], Iter [819/3125], train_loss:0.080349
Epoch [1/2], Iter [820/3125], train_loss:0.093809
Epoch [1/2], Iter [821/3125], train_loss:0.057519
Epoch [1/2], Iter [822/3125], train_loss:0.060332
Epoch [1/2], Iter [823/3125], train_loss:0.069837
Epoch [1/2], Iter [824/3125], train_loss:0.081108
Epoch [1/2], Iter [825/3125], train_loss:0.064217
Epoch [1/2], Iter [826/3125], train_loss:0.077845
Epoch [1/2], Iter [827/3125], train_loss:0.062394
Epoch [1/2], Iter [828/3125], train_loss:0.078574
Epoch [1/2], Iter [829/3125], train_loss:0.077207
Epoch [1/2], Iter [830/3125], train_loss:0.052881
Epoch [1/2], Iter [831/3125], train_loss:0.105506
Epoch [1/2], Iter [832/3125], train_loss:0.085921
Epoch [1/2], Iter [833/3125], train_loss:0.062045
Epoch [1/2], Iter [834/3125], train_loss:0.078639
Epoch [1/2], Iter [835/3125], train_loss:0.091643
Epoch [1/2], Iter [836/3125], train_loss:0.070230
Epoch [1/2], Iter [837/3125], train_loss:0.061350
Epoch [1/2], Iter [838/3125], train_loss:0.100740
Epoch [1/2], Iter [839/3125], train_loss:0.085829
Epoch [1/2], Iter [840/3125], train_loss:0.060633
Epoch [1/2], Iter [841/3125], train_loss:0.071548
Epoch [1/2], Iter [842/3125], train_loss:0.083561
Epoch [1/2], Iter [843/3125], train_loss:0.066375
Epoch [1/2], Iter [844/3125], train_loss:0.100119
Epoch [1/2], Iter [845/3125], train_loss:0.088684
Epoch [1/2], Iter [846/3125], train_loss:0.055062
Epoch [1/2], Iter [847/3125], train_loss:0.074315
Epoch [1/2], Iter [848/3125], train_loss:0.069999
Epoch [1/2], Iter [849/3125], train_loss:0.035895
Epoch [1/2], Iter [850/3125], train_loss:0.037956
Epoch [1/2], Iter [851/3125], train_loss:0.100308
Epoch [1/2], Iter [852/3125], train_loss:0.067342
Epoch [1/2], Iter [853/3125], train_loss:0.100173
Epoch [1/2], Iter [854/3125], train_loss:0.095898
Epoch [1/2], Iter [855/3125], train_loss:0.037566
Epoch [1/2], Iter [856/3125], train_loss:0.109127
Epoch [1/2], Iter [857/3125], train_loss:0.086012
Epoch [1/2], Iter [858/3125], train_loss:0.042612
Epoch [1/2], Iter [859/3125], train_loss:0.095185
Epoch [1/2], Iter [860/3125], train_loss:0.041484
Epoch [1/2], Iter [861/3125], train_loss:0.077971
Epoch [1/2], Iter [862/3125], train_loss:0.077879
Epoch [1/2], Iter [863/3125], train_loss:0.074702
Epoch [1/2], Iter [864/3125], train_loss:0.065591
Epoch [1/2], Iter [865/3125], train_loss:0.044043
Epoch [1/2], Iter [866/3125], train_loss:0.086357
Epoch [1/2], Iter [867/3125], train_loss:0.076382
Epoch [1/2], Iter [868/3125], train_loss:0.126473
Epoch [1/2], Iter [869/3125], train_loss:0.111014
Epoch [1/2], Iter [870/3125], train_loss:0.053985
Epoch [1/2], Iter [871/3125], train_loss:0.066713
Epoch [1/2], Iter [872/3125], train_loss:0.092710
Epoch [1/2], Iter [873/3125], train_loss:0.072230
Epoch [1/2], Iter [874/3125], train_loss:0.072040
Epoch [1/2], Iter [875/3125], train_loss:0.128901
Epoch [1/2], Iter [876/3125], train_loss:0.094567
Epoch [1/2], Iter [877/3125], train_loss:0.068851
Epoch [1/2], Iter [878/3125], train_loss:0.124406
Epoch [1/2], Iter [879/3125], train_loss:0.060597
Epoch [1/2], Iter [880/3125], train_loss:0.053799
Epoch [1/2], Iter [881/3125], train_loss:0.089491
Epoch [1/2], Iter [882/3125], train_loss:0.056719
Epoch [1/2], Iter [883/3125], train_loss:0.076862
Epoch [1/2], Iter [884/3125], train_loss:0.068522
Epoch [1/2], Iter [885/3125], train_loss:0.104225
Epoch [1/2], Iter [886/3125], train_loss:0.082506
Epoch [1/2], Iter [887/3125], train_loss:0.052971
Epoch [1/2], Iter [888/3125], train_loss:0.059774
Epoch [1/2], Iter [889/3125], train_loss:0.086975
Epoch [1/2], Iter [890/3125], train_loss:0.056777
Epoch [1/2], Iter [891/3125], train_loss:0.087735
Epoch [1/2], Iter [892/3125], train_loss:0.070902
Epoch [1/2], Iter [893/3125], train_loss:0.111826
Epoch [1/2], Iter [894/3125], train_loss:0.059331
Epoch [1/2], Iter [895/3125], train_loss:0.094341
Epoch [1/2], Iter [896/3125], train_loss:0.051812
Epoch [1/2], Iter [897/3125], train_loss:0.112401
Epoch [1/2], Iter [898/3125], train_loss:0.061509
Epoch [1/2], Iter [899/3125], train_loss:0.064180
Epoch [1/2], Iter [900/3125], train_loss:0.038741
Epoch [1/2], Iter [901/3125], train_loss:0.053055
Epoch [1/2], Iter [902/3125], train_loss:0.054728
Epoch [1/2], Iter [903/3125], train_loss:0.078024
Epoch [1/2], Iter [904/3125], train_loss:0.044780
Epoch [1/2], Iter [905/3125], train_loss:0.089853
Epoch [1/2], Iter [906/3125], train_loss:0.101245
Epoch [1/2], Iter [907/3125], train_loss:0.052246
Epoch [1/2], Iter [908/3125], train_loss:0.071536
Epoch [1/2], Iter [909/3125], train_loss:0.075075
Epoch [1/2], Iter [910/3125], train_loss:0.074174
Epoch [1/2], Iter [911/3125], train_loss:0.072227
Epoch [1/2], Iter [912/3125], train_loss:0.101729
Epoch [1/2], Iter [913/3125], train_loss:0.071239
Epoch [1/2], Iter [914/3125], train_loss:0.101731
Epoch [1/2], Iter [915/3125], train_loss:0.066899
Epoch [1/2], Iter [916/3125], train_loss:0.042201
Epoch [1/2], Iter [917/3125], train_loss:0.057565
Epoch [1/2], Iter [918/3125], train_loss:0.043300
Epoch [1/2], Iter [919/3125], train_loss:0.101549
Epoch [1/2], Iter [920/3125], train_loss:0.080133
Epoch [1/2], Iter [921/3125], train_loss:0.088354
Epoch [1/2], Iter [922/3125], train_loss:0.079794
Epoch [1/2], Iter [923/3125], train_loss:0.082035
Epoch [1/2], Iter [924/3125], train_loss:0.043397
Epoch [1/2], Iter [925/3125], train_loss:0.101342
Epoch [1/2], Iter [926/3125], train_loss:0.070656
Epoch [1/2], Iter [927/3125], train_loss:0.068928
Epoch [1/2], Iter [928/3125], train_loss:0.086801
Epoch [1/2], Iter [929/3125], train_loss:0.059911
Epoch [1/2], Iter [930/3125], train_loss:0.079392
Epoch [1/2], Iter [931/3125], train_loss:0.083579
Epoch [1/2], Iter [932/3125], train_loss:0.051975
Epoch [1/2], Iter [933/3125], train_loss:0.083430
Epoch [1/2], Iter [934/3125], train_loss:0.066587
Epoch [1/2], Iter [935/3125], train_loss:0.087434
Epoch [1/2], Iter [936/3125], train_loss:0.087518
Epoch [1/2], Iter [937/3125], train_loss:0.075971
Epoch [1/2], Iter [938/3125], train_loss:0.060921
Epoch [1/2], Iter [939/3125], train_loss:0.059609
Epoch [1/2], Iter [940/3125], train_loss:0.053374
Epoch [1/2], Iter [941/3125], train_loss:0.059154
Epoch [1/2], Iter [942/3125], train_loss:0.037160
Epoch [1/2], Iter [943/3125], train_loss:0.094307
Epoch [1/2], Iter [944/3125], train_loss:0.069412
Epoch [1/2], Iter [945/3125], train_loss:0.093543
Epoch [1/2], Iter [946/3125], train_loss:0.057713
Epoch [1/2], Iter [947/3125], train_loss:0.050613
Epoch [1/2], Iter [948/3125], train_loss:0.101521
Epoch [1/2], Iter [949/3125], train_loss:0.099398
Epoch [1/2], Iter [950/3125], train_loss:0.098440
Epoch [1/2], Iter [951/3125], train_loss:0.036929
Epoch [1/2], Iter [952/3125], train_loss:0.062752
Epoch [1/2], Iter [953/3125], train_loss:0.048165
Epoch [1/2], Iter [954/3125], train_loss:0.075584
Epoch [1/2], Iter [955/3125], train_loss:0.080492
Epoch [1/2], Iter [956/3125], train_loss:0.087700
Epoch [1/2], Iter [957/3125], train_loss:0.043403
Epoch [1/2], Iter [958/3125], train_loss:0.069215
Epoch [1/2], Iter [959/3125], train_loss:0.044430
Epoch [1/2], Iter [960/3125], train_loss:0.066561
Epoch [1/2], Iter [961/3125], train_loss:0.106058
Epoch [1/2], Iter [962/3125], train_loss:0.066117
Epoch [1/2], Iter [963/3125], train_loss:0.075821
Epoch [1/2], Iter [964/3125], train_loss:0.076452
Epoch [1/2], Iter [965/3125], train_loss:0.068917
Epoch [1/2], Iter [966/3125], train_loss:0.073009
Epoch [1/2], Iter [967/3125], train_loss:0.066570
Epoch [1/2], Iter [968/3125], train_loss:0.078626
Epoch [1/2], Iter [969/3125], train_loss:0.071714
Epoch [1/2], Iter [970/3125], train_loss:0.073739
Epoch [1/2], Iter [971/3125], train_loss:0.036135
Epoch [1/2], Iter [972/3125], train_loss:0.077290
Epoch [1/2], Iter [973/3125], train_loss:0.108345
Epoch [1/2], Iter [974/3125], train_loss:0.085700
Epoch [1/2], Iter [975/3125], train_loss:0.081209
Epoch [1/2], Iter [976/3125], train_loss:0.034647
Epoch [1/2], Iter [977/3125], train_loss:0.056354
Epoch [1/2], Ite

你可能感兴趣的:(pytorch,人工智能)