龟龟是最可爱的小猫咪
import numpy as np
import time
import mxnet.ndarray as nd
print("===============start using mxnet ndarray============")
start = time.time()
x = nd.random.uniform(shape = (2000,2000))
y = nd.dot(x,x)
print ('workloads are queued:\t%.4f sec'%(time.time()-start))
print (y)
print ('workloads are finished:\t%.4f sec'%(time.time()-start))
# 以下使用numpy做对比
print("===============start using numpy============")
start_np = time.time()
x = np.random.uniform(size=(2000,2000))
y = np.dot(x,x)
print ('workloads are queued:\t%.4f sec'%(time.time()-start_np))
print (y)
print ('workloads are finished:\t%.4f sec'%(time.time()-start_np))
输出结果:
====================start using mxnet ndarray====================
workloads are queued: 0.0009 sec
[[501.1584 508.2972 495.65228 ... 492.84705 492.69086 490.04797]
[508.81046 507.18216 495.17426 ... 503.1053 497.2932 493.67917]
[489.566 499.47006 490.17715 ... 490.9994 488.05002 483.2883 ]
...
[484.00186 495.7179 479.92142 ... 493.6995 478.89175 487.20737]
[499.64926 507.651 497.5939 ... 493.04742 500.74512 495.82703]
[516.0143 519.17145 506.35403 ... 510.08868 496.3561 495.42523]]
<NDArray 2000x2000 @cpu(0)>
workloads are finished: 0.0855 sec
====================start using numpy===========================
workloads are queued: 0.2381 sec
[[499.78141775 500.20765179 482.10959684 ... 495.70733609 494.62182745
496.71387707]
[499.84924786 498.85308693 489.2123237 ... 502.57251308 502.91773143
501.31651937]
[476.24190127 482.92743953 470.31281454 ... 487.23095596 483.80486825
477.8116057 ]
...
[497.14541701 499.29581685 487.46588491 ... 497.46870583 502.95987923
504.09474747]
[507.14380432 517.89740016 507.72814195 ... 517.49853163 516.01229087
517.79480039]
[505.88017037 506.28039536 498.74898774 ... 504.91917801 511.03318716
508.53549885]]
workloads are finished: 0.2390 sec
Process finished with exit code 0
可以看出mxnet的ndarray实在print(y)这一步骤才进行计算,而numpy是命令式顺序计算
在mxnet中也可以让前端线程等待直到结果完成再继续执行。
可以通过nd.ndarray.wait_to_read() 等待直到特定结果完成,
也可以通过 nd.waitall() 等待所有结果完成。
i.e.
import time
import mxnet.ndarray as nd
start = time.time()
x = nd.random.uniform(shape = (2000,2000))
y = nd.dot(x,x)
y.wait_to_read()
print ('%.4f'%(time.time()-start))
start = time.time()
x = nd.random.uniform(shape = (2000,2000))
y = nd.dot(x,x)
z = nd.dot(x,x)
nd.waitall()
print ('%.4f'%(time.time()-start))
结果:
0.0836
0.0964
Process finished with exit code 0
任何将内容从ndarray搬运到其他不支持延迟执行的数据结构的操作都会触发等待,
例如 asnumpy(), asscalar() 等操作
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import ndarray as nd
from time import time
from mxnet import sym
import os
import subprocess
def get_data():
start = time()
batch_size = 1024
for i in range(60):
if i%10==0 and i != 0:
print ('batch %d, time %f sec'%(i,time()-start))
x = nd.ones((batch_size,1024))
y = nd.ones((batch_size,))
yield x,y
net = nn.Sequential()
with net.name_scope():
net.add(
nn.Dense(1024, activation='relu'),
nn.Dense(1024, activation='relu'),
nn.Dense(1),
)
net.initialize()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {})
loss = gluon.loss.L2Loss()
def get_mem():
res = subprocess.check_output(['ps','u','-p',str(os.getpid())])
return int(str(res).split()[15])/1e3
for x,y in get_data():
break
loss(y,net(x)).wait_to_read()
mem = get_mem()
for x,y in get_data():
loss(y,net(x)).wait_to_read()
nd.waitall()
print('Increased memory %f MB'%(get_mem()-mem))
注意最后的
for x,y in get_data():
loss(y,net(x)).wait_to_read()
nd.waitall()
在每个循环结束触发等待,不至于一次性将大批量数据推入内存,
如果仅仅使用 loss(y,net(x)) 则会短时间将所有数据送进后端
延迟执行使得系统有更多的内存空间来做性能的优化,但我们推荐每个批量里至少有一个同步函数,例如对损失函数进行评估,来避免将过多任务同时丢进后端系统。