第一种:
(1)显卡配置:GTX1050Ti
(2)系统环境:win10、cuda=9.2
(3)pom依赖:cuda=9.2 nd4j=1.0.0-beta6第二种配置:
(1)显卡配置:RTX3080
(2)系统环境:win10、cuda=11.2 或cuda=11.6
(3)pom依赖:cuda=11.2 nd4j=1.0.0-M1.1 (这里不能用1.0.0-M1,会报错-详见下方,是一个bug,在新版M1.1中不会出现。也不要用1.0.0-M2,因为虽然nd4j-cuda-11.2-platform最高支持1.0.0-M2,但deeplearing4j-cuda-11.2最高只支持到1.0.0-M1.1。)备注:这里说明cuda大版本(version第一个小数点前的数字)一致时,系统环境和pom.xml中使用的cuda小版本可以不一致。
(1)系统环境cuda=11.2,pom.xml中cuda=11.2 且 nd4j=1.0.0-M1
或者系统环境cuda=11.6,pom.xml中cuda=11.2 且 nd4j=1.0.0-M1
系统环境:笔记本cuda=11.2 ;pom依赖:cuda=11.2 nd4j=1.0.0-M1
或
或者系统环境cuda=11.6,pom.xml中cuda=11.2 且 nd4j=1.0.0-M1
的报错日志:
[main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
[main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot create instance of class 'org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper'.
java.lang.NoSuchMethodException: org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper.(java.lang.Class, [Ljava.lang.Object;)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:103)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:89)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:74)
at org.deeplearning4j.nn.layers.HelperUtils.createHelper(HelperUtils.java:57)
at org.deeplearning4j.nn.layers.recurrent.LSTM.initializeHelper(LSTM.java:53)
at org.deeplearning4j.nn.layers.recurrent.LSTM.(LSTM.java:49)
at org.deeplearning4j.nn.conf.layers.LSTM.instantiate(LSTM.java:78)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:714)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
at zj.rnn.effectiveness.train.wordvector.TestWordVector.main(TestWordVector.java:89)
Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodException: org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper.(java.lang.Class, [Ljava.lang.Object;)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:108)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:89)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:74)
at org.deeplearning4j.nn.layers.HelperUtils.createHelper(HelperUtils.java:57)
at org.deeplearning4j.nn.layers.recurrent.LSTM.initializeHelper(LSTM.java:53)
at org.deeplearning4j.nn.layers.recurrent.LSTM.(LSTM.java:49)
at org.deeplearning4j.nn.conf.layers.LSTM.instantiate(LSTM.java:78)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:714)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
at zj.rnn.effectiveness.train.wordvector.TestWordVector.main(TestWordVector.java:89)
Caused by: java.lang.NoSuchMethodException: org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper.(java.lang.Class, [Ljava.lang.Object;)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:103)
... 9 more
Process finished with exit code 1
(2)系统环境cuda=11.6,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7
这里的错误就是系统环境的cuda、cudnn版本和pom.xml中不一致导致的。也有说是RTX3080算力比较高,使用cuda10.2与之不匹配的问题。
解决:升级cuda=11.2,nd4j=1.0.0-M1.1
系统环境cuda=11.6,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7
[main] WARN org.nd4j.linalg.factory.Nd4jBackend - Skipped [JCublasBackend] backend (unavailable): java.lang.UnsatisfiedLinkError: C:\Users\A\.javacpp\cache\rnn-effective-0.0.1-bin.jar\org\bytedeco\cuda\windows-x86_64\jnicudart.dll: Can't find dependent libraries
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.deeplearning4j.models.embeddings.inmemory.InMemoryLookupTable$Builder.(InMemoryLookupTable.java:637)
at org.deeplearning4j.models.sequencevectors.SequenceVectors$Builder.presetTables(SequenceVectors.java:941)
at org.deeplearning4j.models.word2vec.Word2Vec$Builder.build(Word2Vec.java:615)
at zj.rnn.effectiveness.util.PrepareWordVector.trainWordVector(PrepareWordVector.java:133)
at zj.rnn.effectiveness.train.wordvector.RnnClassifyWithTrainWordVector.main(RnnClassifyWithTrainWordVector.java:64)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5094)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:270)
... 5 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:221)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5091)
... 6 more
(3)系统环境cuda=10.2,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7
虽然词向量的保存和读取都是用的同一类型方法,但仍然报错。最后选用高版本的cuda=11.2, nd4j=1.0.0-M1.1就可以完美解决所有问题。
系统环境cuda=10.2,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7。在读词向量的时候报错。
其中,词向量的训练保存代码:
// 1、词向量训练
SentenceIterator iter = null;
try {
iter = new BasicLineIterator(hanLpFilePath);
TokenizerFactory t = new DefaultTokenizerFactory();
Word2Vec vec = new Word2Vec.Builder().minWordFrequency(3) // 词在文本(整条训练语句,与窗口大小无关)必须出现的最少次数,短文本中设置只要出现一次就拿下
.epochs(5) // 迭代次数
.layerSize(wordVectorSize) // 每个词用wordVector表示的大小
.seed(42).windowSize(8) // 上下文窗口大小,表示每个词需要考虑前8个词和后8个词,和最小词频无关
.iterate(iter).tokenizerFactory(t).build();
vec.fit();
// 保存词向量
WordVectorSerializer.writeWord2VecModel(vec, vectorPath);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// 2、读取词向量
WordVectors wordVectors = WordVectorSerializer.readWord2VecModel(new File(vectorPath));
[main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
[main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
[main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 10.2.89
[main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [NVIDIA GeForce RTX 3080]; cc: [8.6]; Total memory: [10736893952]
[main] ERROR org.deeplearning4j.models.embeddings.loader.WordVectorSerializer - Cannot read binary model
U syn0.txt\???[q??????χH??B &??Rw?L?#,?#E??O?ZUk)q?7s?9???CZ?j??9????????k??9?????Zf???3??s??Yu?}V?{??U???~??[??g???m?y????m??????Y??z???z??_????r?~????[W?{?V????7?=G??L?????m?~{?]?????SN)k?>&???e???)s???Vj[?6}?,z????}?y[ie?~??zic???\K??G??????????/??N?E?X{???????????:???\????????Z??T????????f/?\???n|s??????????o?1?.???j??7k?1?V?????+u7?3???z?z?^J??q?v?/??j??u???;?E?(??U??V???/K+Z?,K???t?o{??E?d?it??g??7'*7u??G:??m?V??j?v??;??,?~??1"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readBinaryModel(WordVectorSerializer.java:278)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2444)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2426)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2413)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at maotiao.train.wordvector.rnn.RnnClassifyWordVector.main(RnnClassifyWordVector.java:79)
[main] ERROR org.deeplearning4j.models.embeddings.loader.WordVectorSerializer - Unable to guess input file format
java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2447)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2426)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2413)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at maotiao.train.wordvector.rnn.RnnClassifyWordVector.main(RnnClassifyWordVector.java:79)
Exception in thread "main" java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2416)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at maotiao.train.wordvector.rnn.RnnClassifyWordVector.main(RnnClassifyWordVector.java:79)
显卡的和cuda的匹配关系可看英伟达显卡、cuda、cudnn、tensorflow-gpu、torch-gpu版本对应关系
需要说明:官网上的映射关系都是指最高匹配版本,如RTX3080 最高匹配cuda 11.7,也就是cuda <= 11.7都是可以的,但是如果版本低于11可能会和显卡的算力(NVIDIA支持的显卡算力CC(computer-capability)) 不匹配,在模型训练时可能也会报错。
笔者同时在RTX3080 的台式机上同时安装了cuda11.6、cuda11.2、cuda10.2。在GTX1050Ti上同时安装了cuda9.2、cuda9.0。
4.0.0
maotiao-classify-gpu
1.0.0-M1.1
1.0.0-M1.1
11.2
org.slf4j
slf4j-simple
1.7.25
compile
com.hankcs
hanlp
portable-1.7.1
org.apache.poi
poi
3.13
org.apache.poi
poi-ooxml
3.13
org.nd4j
nd4j-cuda-${cuda.version}-platform
${nd4j.version}
org.deeplearning4j
deeplearning4j-cuda-${cuda.version}
${dl4j.version}
org.deeplearning4j
deeplearning4j-core
${dl4j.version}
org.deeplearning4j
deeplearning4j-nlp
${dl4j.version}
0.0.1
com.tianque
${project.artifactId}
org.apache.maven.plugins
maven-resources-plugin
2.7
UTF-8
org.apache.maven.plugins
maven-compiler-plugin
3.5.1
1.8
UTF-8
org.codehaus.mojo
exec-maven-plugin
1.4.0
exec
java
org.apache.maven.plugins
maven-shade-plugin
3.0.0
true
bin
true
*:*
org/datanucleus/**
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
package
shade
reference.conf