注意:torch.mm:仅适用于2d矩阵相乘,不建议使用,建议使用matmul来计算矩阵乘法
a = torch.rand(4,3,28,64)
b = torch.rand(4,3,64,32)
torch.matmul(a,b).shape # 只计算最后两维的乘积
:torch.Size([4,3,28,32])
a = torch.rand(4,3,28,64)
b = torch.rand(4,1,64,32)
torch.matmul(a,b).shape # 有broadcasting 操作
:torch.Size([4,3,28,32])
a = torch.full([8],1)
b = a.view(2,4)
c = a.view(2,2,2)
a.norm(1) # a tensor 的一范式
: tensor(8.)
b.norm(1)
: tensor(8.)
c.norm(1)
: tensor(8.)
b.norm(2) # b tensor 的二范式
: tensor(2.8284)
b.norm(1,dim=1)
:tensor(4.,4.)
对于argmin,argmax:如果不给出固定的dimension,会把tensor打平成dim=1,然后返回最小、最大的索引。
a = torch.arange(8).view(2,3).float()
: tensor([[0,1,2,3],
[4,5,6,7]])
a.min(),a.max(),a.mean(),a.prod(),a.sum(),a.argmin(),a.argmax()
:tensor(0.),tensor(7.),tensor(3.5000),tensor(0.),tensor(28.),tensor(0),tensor(7)
a = torch.rand(4,10)
: tensor([[0.4992, 0.4095, 0.5239, 0.8184, 0.3184, 0.6433, 0.2028, 0.1133, 0.6991,0.3260],
[0.1473, 0.2765, 0.1476, 0.2192, 0.8490, 0.7610, 0.0072, 0.6767, 0.1496, 0.2772],
[0.0691, 0.4229, 0.6794, 0.9665, 0.3935, 0.9259, 0.3509, 0.6875, 0.8682, 0.0592],
[0.2496, 0.3506, 0.8447, 0.2141, 0.4849, 0.2772, 0.3786, 0.6603, 0.8913, 0.1118]])
a.max(dim=1)
:(tensor([0.8184, 0.8490, 0.9665, 0.8913]), tensor([3, 4, 3, 8]))
a.argmax(dim=1)
tensor([3, 4, 3, 8])
a.max(dim=1,keepdim=True) # 希望结果的维度(dim)和a保持一致
:(tensor([[0.8184],
[0.8490],
[0.9665],
[0.8913]]), tensor([[3],
[4],
[3],
[8]]))
a.argmax(dim=1,keepdim=True)
:tensor([[3],
[4],
[3],
[8]])
kthvalue 第几小的值
a = torch.rand(4,10)
: tensor([[0.4992, 0.4095, 0.5239, 0.8184, 0.3184, 0.6433, 0.2028, 0.1133, 0.6991,0.3260],
[0.1473, 0.2765, 0.1476, 0.2192, 0.8490, 0.7610, 0.0072, 0.6767, 0.1496, 0.2772],
[0.0691, 0.4229, 0.6794, 0.9665, 0.3935, 0.9259, 0.3509, 0.6875, 0.8682, 0.0592],
[0.2496, 0.3506, 0.8447, 0.2141, 0.4849, 0.2772, 0.3786, 0.6603, 0.8913, 0.1118]])
a.topk(3,dim=1)
:(tensor([[0.8184, 0.6991, 0.6433],
[0.8490, 0.7610, 0.6767],
[0.9665, 0.9259, 0.8682],
[0.8913, 0.8447, 0.6603]]), tensor([[3, 8, 5],
[4, 5, 7],
[3, 5, 8],
[8, 2, 7]]))
a.topk(3,dim=1,largest=False):概率最小的几个
:(tensor([[0.1133, 0.2028, 0.3184],
[0.0072, 0.1473, 0.1476],
[0.0592, 0.0691, 0.3509],
[0.1118, 0.2141, 0.2496]]), tensor([[7, 6, 4],
[6, 0, 2],
[9, 0, 6],
[9, 3, 0]]))
a.kthvalue(8,dim=1)# 第八小的值,在这里就是第三大的值
:(tensor([0.6433, 0.6767, 0.8682, 0.6603]), tensor([5, 7, 8, 7]))
a.kthvalue(8)
:(tensor([0.6433, 0.6767, 0.8682, 0.6603]), tensor([5, 7, 8, 7]))
torch.where(condition,a,b)->tensor c:c中数值的来源于:a,b
cond = torch.rand(2,2)
:tensor([[0.9019, 0.2225],
[0.4002, 0.4745]])
a = torch.zeros(2,2)
:tensor([[0., 0.],
[0., 0.]])
b = torch.ones(2,2)
:tensor([[1., 1.],
[1., 1.]])
torch.where(cond>0.5,a,b)# 免去了for循环的嵌套,可以在gpu上运行
:tensor([[0., 1.],
[1., 1.]])
torch.gather(input,dim,index,out=None) -> Tensor
input:表示要查的表
dim:对input查找的维度
index:查找的索引值
prob = torch.randn(4,10)
idx = prob.topk(dim=1,k=3)
:(tensor([[1.3720, 1.0751, 1.0114],
[2.3205, 0.9811, 0.5586],
[1.1462, 0.9951, 0.9102],
[1.9489, 0.9159, 0.7970]]), tensor([[9, 3, 2],
[9, 4, 5],
[0, 2, 9],
[3, 8, 6]]))
label = torch.arange(10)+100
:tensor([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
idx = idx[1]
:tensor([[9, 3, 2],
[9, 4, 5],
[0, 2, 9],
[3, 8, 6]])
torch.gather(label.expand(4,10),dim=1,index=idx.long())
:tensor([[109, 103, 102],
[109, 104, 105],
[100, 102, 109],
[103, 108, 106]])
how to search for minima?
θ t + 1 = θ t − α t ∗ Δ ∗ f ( θ t ) \theta_{t+1} = \theta_t - \alpha_t * \Delta * f(\theta_t) θt+1=θt−αt∗Δ∗f(θt)
function:
J ( θ 1 , θ 2 ) = θ 1 2 + θ 2 2 J(\theta_1,\theta_2)=\theta_1^2+\theta_2^2 J(θ1,θ2)=θ12+θ22
objective:
m i n θ 1 , θ 2 J ( θ 1 , θ 2 ) min_{\theta_1,\theta_2}J(\theta_1,\theta_2) minθ1,θ2J(θ1,θ2)
Update rules:
θ 1 : = θ 1 − α d d θ 1 J ( θ 1 , θ 2 ) \theta_1 := \theta_1 - \alpha \frac{d}{d \theta_1}J(\theta_1,\theta_2) θ1:=θ1−αdθ1dJ(θ1,θ2)
θ 2 : = θ 2 − α d d θ 2 J ( θ 1 , θ 2 ) \theta_2 := \theta_2 - \alpha \frac{d}{d \theta_2}J(\theta_1,\theta_2) θ2:=θ2−αdθ2dJ(θ1,θ2)
derivatives:
d d θ 1 J ( θ 1 , θ 2 ) = d d θ 1 θ 1 2 + d d θ 1 θ 2 2 = 2 θ 1 \frac{d}{d \theta_1}J(\theta_1,\theta_2)=\frac{d}{d \theta_1}\theta_1^2+\frac{d}{d \theta_1}\theta_2^2=2\theta_1 dθ1dJ(θ1,θ2)=dθ1dθ12+dθ1dθ22=2θ1
d d θ 2 J ( θ 1 , θ 2 ) = d d θ 2 θ 1 2 + d d θ 2 θ 2 2 = 2 θ 2 \frac{d}{d \theta_2}J(\theta_1,\theta_2)=\frac{d}{d \theta_2}\theta_1^2+\frac{d}{d \theta_2}\theta_2^2=2\theta_2 dθ2dJ(θ1,θ2)=dθ2dθ12+dθ2dθ22=2θ2
鞍点和局部极小值会影响到搜索最小值
optimizer
1. MSE:$loss = \sum(y-y^-)^2$
2. L2_norm = $||y-y^-||_2$ : $\sqrt{\sum{(y-y^-)}^2}$,注意有一个开根号的过程
3. torch.norm((y-pred),2),是开过根号的
x = torch.ones(1)
w = torch.full([1],2)
w.required_grad_()# 设置w可以求导
mse = F.mse_loss(torch.ones(1),x * w)
torch.auto.grad.grad(mse,[w])# mse 代表损失,[w]表示哪些参数要求导数。
:(tensor([2.]))
mse.backward() # 也可以这样计算导数,因为网络图已经记住计算导数的路径了,并且该函数把计算出来的导数保存到对应需要梯度的变量上,通过tensor.grad 来获取导数值
w.grad
:tensor([2.])
注意:softmax可以让所有变量的概率和为1
sigmoid : S ( x ) = 1 1 + e − x S(x) = \frac{1}{1+e^{-x}} S(x)=1+e−x1
softmax : S ( y i ) = e y i ∑ j e y j S(y_i) = \frac{e^{y_i}}{\sum_j{e^{y_j}}} S(yi)=∑jeyjeyi
logits scores:[2.0,1.0,0.1],需要经过softmax函数,转变成概率值[0.7,0.2,0.1]
对softmax函数求导: p i p_i pi是对应概率值, a j a_j aj是参数
a = torch.rand(3)
a.requires_grad_()
p = F.softmax(a,dim=0)
# p.backward()# 如果后面还需要在使用torch.backward(),需要设置retain_graph=True
torch.autograd.grad(p[1],[a],retain_graph=True)
:(tensor([-0.1143, 0.2311, -0.1168]),)
1. binary
2. multi-class
3. +softmax
4. leave it to Logistic Regression Part