Colaboratory 是一个免费的 Jupyter 笔记本环境,包括GPU和TPU。不需要进行任何设置就可以使用,并且完全在云端运行。
借助 Colaboratory,您可以编写和执行代码、保存和共享分析结果,以及利用强大的计算资源,所有这些都可通过浏览器免费使用。
第一步
#安装相关的套件
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
result
E: Package 'python-software-properties' has no installation candidate
Selecting previously unselected package google-drive-ocamlfuse.
(Reading database ... 145605 files and directories currently installed.)
Preparing to unpack .../google-drive-ocamlfuse_0.7.14-0ubuntu1~ubuntu18.04.1_amd64.deb ...
Unpacking google-drive-ocamlfuse (0.7.14-0ubuntu1~ubuntu18.04.1) ...
Setting up google-drive-ocamlfuse (0.7.14-0ubuntu1~ubuntu18.04.1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
··········
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
Please enter the verification code: Access token retrieved correctly.
第二步
#加载云端硬碟,每次加载都会更新
!mkdir -p drive
!google-drive-ocamlfuse drive -o nonempty
#进入云端存有自己文件夹
import os
os.chdir('drive/bert-master')
#看一下文件夹里面的内容
!ls
result
CONTRIBUTING.md predicting_movie_reviews_with_bert_on_tf_hub.ipynb
create_pretraining_data.py __pycache__
data_dir README.md
extract_features.py requirements.txt
__init__.py run_classifier.py
LICENSE run_classifier_with_tfhub.py
modeling.py run_pretraining.py
modeling_test.py run_squad.py
multilingual.md sample_text.txt
optimization.py tokenization.py
optimization_test.py tokenization_test.py
!python run_classifier.py \
--task_name=MRPC \
--do_train=true \
--do_eval=true \
--data_dir=data_dir \
--vocab_file=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=output \
result
I1202 13:57:51.438973 140524128954240 run_classifier.py:923] ***** Eval results *****
INFO:tensorflow: eval_accuracy = 0.85294116
I1202 13:57:51.439047 140524128954240 run_classifier.py:925] eval_accuracy = 0.85294116
INFO:tensorflow: eval_loss = 0.4763845
I1202 13:57:51.960015 140524128954240 run_classifier.py:925] eval_loss = 0.4763845
INFO:tensorflow: global_step = 343
I1202 13:57:51.960390 140524128954240 run_classifier.py:925] global_step = 343
INFO:tensorflow: loss = 0.4763845
I1202 13:57:51.960465 140524128954240 run_classifier.py:925] loss = 0.4763845
在文本分类问题中,需要先将数据集整理成可用的形式。然后对代码进行相应的修改。
主要修改以下文件:run_classifier.py
不同的格式对应了不同的DataProcessor类。可以将数据保存成如下格式:
class SafeProcessor(DataProcessor):
"""Processor for the move data set ."""
def get_train_examples(self, data_dir):
"""See base class."""
return self._create_examples(
self._read_tsv(os.path.join(data_dir, "train.csv")), "train")
def get_dev_examples(self, data_dir):
"""See base class."""
return self._create_examples(
self._read_tsv(os.path.join(data_dir, "dev.csv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(
self._read_tsv(os.path.join(data_dir, "test.csv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1", "2", "3"]
##分类数,根据自己情况调整
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
examples = []
for (i, line) in enumerate(lines):
guid = "%s-%s" % (set_type, i)
if set_type == "test":
text_a = tokenization.convert_to_unicode(line[0])
label = "0"
else:
text_a = tokenization.convert_to_unicode(line[1])
label = tokenization.convert_to_unicode(line[0])
examples.append(
InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
return examples
processors = {
"cola": ColaProcessor,
"mnli": MnliProcessor,
"mrpc": MrpcProcessor,
"xnli": XnliProcessor,
"safe": SafeProcessor
}