TVM Monthly - August 2021

TVM Monthly - August 2021

As discussed by the TVM PMC, our goal is to provide a monthly summary of the project so users and developers can get a better understanding of the goings on of the TVM community.

Feedback and suggestions are welcomed so that we can further improve these updates.

Community

During August of 2021 we welcomed many new contributors to the project. Importantly we welcomed @manupa-arm as a new committer, @electriclilies, @Mousius, @gromero, @Lunderberg, and @mdw-octoml as new reviewers. Thanks to everyone for the hard work and contributions!

We continue to improve TOPI and frontend support, especially on the ONNX importer and new frontends (PaddlePaddle 2, OneFlow 1). TensorIR is in steady progress, several schedule primitives have been added. We started adding features for Meta Schedule (AutoTIR) 1, the new auto-scheduling system on top of TensorIR. We improved Relay with better profilers, executors and mixed-precision support. We landed Project API, an infrastructure for MicroTVM platforms. The community has also made various improvements to CI and documentation.

This forum got 122k pageviews, 2.8k user visits in the last month.

Pull Requests

The below is high-level summary of the PRs closed in the last month grouped by area.

TensorIR

  • Fix a typo in include/tvm/ir/function.h #8617 1
  • Add from_legacy_te_schdule attr to TE PrimFuncs #8641
  • LowerWarpMemory: remove unneeded shuffle when accessing from the same thread #8681
  • Storage Align #8693
  • Improve the error message in module.cc #8694
  • Parallel, Vectorize, Bind & Unroll #8716
  • Reorder #8767
  • CacheRead/Write #8863
  • enhance tir signed-unsigned cast #8706 1
  • Change Integer Implicit Conversion Rule to C Standard Way #8733
  • Support fold constants in specialize process #8803
  • Fix buffer scope in structural equal #8768
  • Add LowerTEPass, and convert calls to LowerTE to application of LowerTEPass #8802
  • GetBlockReadWriteRegion #8875
  • Bug fix for a floormod rewrite simplify rule #8852
  • Fix opaque access in buffer locator pass and match_buffer in region detector #8855
  • Fix printing ForNode annotations #8891

Relay

  • Extend FakeQuantizationToInteger to more ops #8241
  • Change Default "opt_level" of Sequential from 2 to 0 #8634
  • Support for non scalar zero points in qnn.conv2d #8620
  • Remove redundant cuda kernels caused by fusion of less & logical or #8618
  • Replace compile engine with TE compiler in the VM #8501
  • Dense alter layout fixed for packed input #8669
  • Refactor Interpreter to treat lowering as IRModule->IRModule rewrite. #8597
  • Extract dataflow matcher data structure into header #8774
  • Support of depthwise conv2d NHWC for Mali/Bifrost. #8584
  • Avoid Override Generic Op Strategy in "hls.py" #8614
  • Add batch_matmul convertion to FQ2I pass #8635
  • Expose FTVMInferCorrectLayout Python interface #8755
  • ToBasicBlockNormalForm immutability #8778
  • Disallow fp16 conversion for summation-like ops #8810
  • Add an option to rewrite the graph only once #8843

Frontend

  • add suppport for 'aten::upsample_bicubic2d' #8648
  • Support for nn.SiLU added #8753
  • Implement fake quant #8780
  • GRU layer #8781
  • Unified LSTM cell #8599
  • Add onnx opset v13 support for softmax, logsoftmax #8625
  • Add a PaddlePaddle Frontend #8645 2
  • Support TensorFlow < 1.13 for test_sparse_add #8647
  • add support for half_pixel_centers in resize #8689
  • in-place methods (sigmoid_ and tanh_) used by Tacotron2 were added #8692
  • Fix ELU conversion #8699
  • chunk and unsafe chunk #8718
  • Make from_tensorflow.py more GPU memory friendly. #8763
  • Add support for QLinearMul ONNX op #8773
  • Increased tolerance on onnx test_forward::test_aten #8798
  • extend repeat_interleave op for relay.Expr #8839
  • Simplify onnx input since name accesses are not reliable. #8867

Topi & Operators

  • Improve the performance of scatter_nd #8479
  • Float16 unittests for dense, conv2d, depthwise conv2d #8529
  • Sparse Conv2d Implementation for 3x3 kernels #8605 1
  • Add transpose_a/b for TensorRT batch_matmul #8607
  • minor bugs #8622
  • CMSIS-NN graph partitioner for softmax #8653
  • remove wrong fix in x86's dense_nopack operator #8687
  • densenet implementation fix #8704
  • Celu #8741
  • Bug fix for batch_matmul parameters mismatch #8785
  • Support select_last_index for argmin/max #8816

Executor & AOT

  • add set_output_zero_copy #8497
  • Remove unused parameter. #8580
  • Add get_input_index support. #8661
  • Add graph_executor get_input_index API. #8633
  • Remove unused variables in AOT tests #8686
  • Refactor AOT Test Utils parameters into object #8650
  • Convert AOT to TECompiler #8697
  • Run AOT tests against reference system #8744
  • Remove old AOT Executor code #8758
  • Change AOT from ExprVisitor to MixedModeVisitor #8856
  • Better reflect allocator names in CRT tests #8828
  • Remove unused allocated memory in crt initialization #8819
  • Switch profile flag to use new profiler #8710
  • Add benchmarking function to graph executor and vm #8807
  • Add end to end benchmarking of models #8858
  • Correctly link to PAPI #8691

AutoTVM & AutoScheduler & MetaSchedule

  • Fix deserization of workload registry entry #8662
  • Fix FLOPS estimation #8695 1
  • Use PopenPool instead of multiprocessing.pool #8492
  • Update AutoScheduler Docs – Units for cooldown_interval #8736
  • Fix exception handling in measure.py #8754
  • Configurable workload keys #8862
  • Fix use of fallback AutoTVM knobs in default scheduling #8707
  • Updated tolerances to avoid flaky unit test. #8723
  • Add parameter to allow caller to supply a Runner #8747
  • Extend tune_relay_x86 tutorial to measure default and kernel level tune #8794
  • Use PopenPool in XGBoostCostModel #8820
  • Traced Schedule #8623
  • Linear Congruential Random Engine #8642
  • Add Sampling Primitive SampleCategorical. #8817
  • Instruction and Trace #8615

Target & Codegen

  • [Texture support] TIR lowering and OpenCL support #7686
  • Allow spaces in target attributes #8587
  • Add support for AOT in external code generation tests #8591
  • Framework for device querying for all targets. #8602
  • Fix test_external_codegen, broken by #8591 #8630
  • Add __launch_bounds__ directive as part of the CUDA code generation #8678
  • Disallow fp16 conversion for arange op #8644
  • Several minor corrections to the device property query #8651
  • Correct passing of target-queried bool/int parameters #8660
  • Support fp16 input in cpu sort #8672
  • Fix builtin_fp16.h path according to: https://discuss.tvm.apache.org/… #8705
  • fix tir.erf codegen to opencl directly #8756
  • Check at codegen if the shader is within shared memory limits. #8746
  • Fix Vulkan runtime support #8791
  • Remote target.h #include #8813
  • Remove uses of LLVM from simulator runtime #8821
  • Reuse Hexagon SDK analysis across cmake files #8822
  • Rework tvm.target.hexagon() interface #8823
  • Change target string to Target object in the TE compiler and interpreter #8835
  • Add support for llvm parameter -mabi (aka -target-abi) #8860
  • Added the driver name to the vulkan target string. #8882

MicroTVM

  • Introduce --interface-api={c,packed} parameter #8280
  • Set the number of cores based on the VM sizing #8624
  • Fix platform name in base-box-tool #8612
  • Add skip for AOT test #8628
  • Project API infrastructure #8380
  • Add Arduino CLI support to ci-qemu #8504
  • Rev ci-qemu to 0.07 (add arduino-cli to ci-qemu) #8698
  • Zephyr Test Refactor #8713 1
  • Remove QEMU installation from RVM #8701
  • Fix warnings on Zephyr tests #8740
  • Fix ci-qemu Arduino install dir #8766
  • Project API Arduino support #8708
  • Fix base-box-tool command in README.md #8613
  • Fix: Test fails on hardware because of short timeout #8677
  • Fix platform name for qemu_x86 in Zephyr AOT tests #8762
  • skip aot checks when USE_MICRO=OFF #8772
  • Increase timeout to fix flaky tests #8846
  • Add Arduino RVM #8748
  • Update QemuTransport#write() to match new write API contract. #8761
  • Remove AOT Executor header from Arduino project #8857

VTA

  • Fix vta rpc server, refactor launch cond to not depend on sys.argv #8671
  • Make vta graph_pack compatible with latest TVM, and bring back object detection tutorials. #8731
  • VTA cmake change to include Verilator header for building tsim library #8797

Rust

  • Fix rust rt link #8631 1
  • Allow rust tvm build configuration through cargo features #8665
  • Memory leak #8714 1
  • Fix memory leak #2 #8725 1

Docs

  • Fix scipy docs inv #8619
  • Fix the usage of executors in tutorials #8586
  • TVM install addenda for M1 Macs #8568
  • Added documentation on pytest target parametrization. #8638
  • Updated target parametrization documentation #8724
  • Moved the generated tutorials folders into a _staging folder. #8735
  • refactor optimize GEMM on CPU tutorial #8825
  • Correct function signatures for CreateXPass functions in docs #8829
  • Add link to docs and tutorials in the README. #8832

CI & Build

  • Add caching to CMake #8373
  • Add pre-commit configuration to perform minimal checks locally #8382
  • Docker env for Arm® Ethos™-U55 Port #8514
  • Add USE_PAPI configuration to config.cmake #8567
  • Fix global pip cache disable change #8590
  • Move flake8 to ci_lint #8652
  • Refactor RPC test to isolate runs into a sub-function #8656
  • Restore the Rust CI testing after Docker image update #8657
  • Refactor/clean-up of docker/bash.sh #8670
  • Fix error when compile tvm with latest llvm14git #8682
  • Increase atol for CI #8712
  • Add Arm Compute Library to Arm CI unit test pipeline #8734
  • Enable custom images to be set in TVM Jenkinsfile #8721
  • Add PaddlePaddle dependency in docker file #8742
  • Rev ci-cpu to v0.76 #8786
  • Move Rust Format Script #8726
  • Install rust in ci-lint so cargo fmt can move to lint stage #8727
  • Allow Linker script files to be committed #8745
  • Add params.* to Jenkins file parameters #8771
  • Rev ci-qemu to v0.08 #8776
  • Allow Vulkan GPU access in docker container. #8784
  • Remove leftover instances of USE_GRAPH_EXECUTOR_DEBUG #8796
  • Update CPU and GPU Image #8853
  • Add synr==0.3.0 dependency for Docker images and Python dependency. #8801
  • A small bug fix on the CmakeLists #8826
  • Support for CMSIS-NN in Corstone300 Makefile #8831
  • Force CMake targets in top-level Makefile to run #8840
  • Update CI Lint Image Version #8841
  • make pre-commit hooks to run on every push instead of every commit #8888

Unit tests

  • Added cuDNN to default test targets #8383
  • Expose TVM pytest helpers as plugin #8532
  • Apply correct requires_gpu() pytest marks for parametrized target #8542
  • Parametrize ONNX Unit tests #8621
  • Use CTest for C++ tests #8809
  • Apply CPPLint to C++ Unit Tests #8827
  • Apply CPPLint to CRT Tests #8844
  • Bump up tolerance on flaky test #8850
  • Require cached fixtures to be copy-able, with opt-in. #8451
  • Remove duplicated PackedFunc C++ test #8812

Misc

  • Rename .asnumpy() to .numpy() #8659
  • Add DictAttrs to IRModule and refactor DictAttrs utility functions #8750
  • Force a gc between sphinx-gallery items to reclaim GPU memory. #8722
  • Restore License #8779
  • Remove reference to Apache Incubator status. #8837
  • Allow customized initializer in PopenPool #8789
  • Fix typos #8787
  • Remove unnecessary memset in TVMMutableFuncRegistry initialization #8818
  • Fix threadpool reset by killing threads before destroying their shared queue #8658
  • Fix ios_rpc build #8864
  • Change declaration order of unique_ptr objects to fix crash #8859

你可能感兴趣的:(TVM,深度学习推理引擎,TVM,TVM,Monthly)