colab 导入项目

介绍(Intro)

Google Colab (short for Colaboratory) is an online platform for data science hosted by Google. It consists of an online notebook that runs in the cloud, using no resources on your local machine.

Google Colab(Colaboratory的缩写)是Google托管的数据科学在线平台。它由一个运行在云中的在线笔记本组成，不使用本地计算机上的任何资源。

Many data science students and professionals love this solution, because it combines convenience, portability and computational power in one package. You don’t need to install packages and use virtual environments. It gives you portability with zero effort, which is a big bonus if you have to switch computers very often (e.g. home and office). It allows you to migrate your heavy computations to the cloud and save you poor laptop from a meltdown. Finally, you can avoid downloading huge datasets locally.

许多数据科学专业的学生和专业人士都喜欢这种解决方案，因为它将解决方案的便捷性，可移植性和计算能力集于一身。您无需安装软件包并使用虚拟环境。它使您轻而易举地实现了便携性，如果您必须经常切换计算机(例如，家庭和办公室)，这将是一大优势。它使您可以将繁重的计算迁移到云中，并避免笔记本电脑崩溃。最后，您可以避免在本地下载庞大的数据集。

However, this environment constrains you to a single-notebook environment, with all the issues (as explained here and here) that derive. During my time with Colab, I have collected a few hacks that I use daily and help me make the most of the platform. You can find them scattered across the internet, but this is to my knowedge the first attempt to put them in one article.

但是，此环境将您限制在单一笔记本电脑环境中，并带来了所有问题(如此处和此处所述)。在Colab期间，我收集了一些日常使用的技巧，可帮助我充分利用该平台。您可以发现它们分散在Internet上，但是据我所知，这是将它们放入一篇文章中的第一次尝试。

提示1：导入Google云端硬盘 (Tip 1: Importing Google Drive)

Google lets you read and write files on GDrive from Colab. As explained in the official docs, you need to run the following cell:

Google使您可以从Colab在GDrive上读写文件。如官方文档中所述，您需要运行以下单元格：

from google.colab import drive
drive.mount('/content/drive')

a one time link will appear in the output cell. Click it and you will be redirected to an interface that allows you to grant temporary Drive accces to you notebook. Copy the code at the end of the procedure and paste it in the notebook.

一次链接将出现在输出单元格中。单击它，您将被重定向到一个界面，该界面允许您向笔记本授予临时驱动器访问权限。复制该过程末尾的代码，并将其粘贴到笔记本中。

As we will see, Drive can be used to host files, dataset and modules containing code that you want to reuse, so it can become a great tool to use for you daily work. However, file I/O from Drive is painfully slow, especially when you try to access numerous, small files, suggesting that the interface between the two has a great overhead in file access. This means that you don’t want to host your big dataset on drive, even if you have enough space to leave them there.

正如我们将看到的，Drive可用于托管包含您要重用的代码的文件，数据集和模块，因此它可以成为日常工作中使用的出色工具。但是，来自Drive的文件I / O非常缓慢，特别是当您尝试访问大量小文件时，这表明两者之间的接口在文件访问方面有很大的开销。这意味着即使您有足够的空间将大型数据集保留在驱动器上，您也不想将其托管在驱动器上。

What can you do instead? There are three workarounds.

您能做什么呢？有三种解决方法。

still host them on GDrive, but compressed. Accessing a single file will have less overhead and the compression will reduce the file size. You can then decompress it from the command line.
仍将它们托管在GDrive上，但已压缩。访问单个文件将具有较少的开销，并且压缩将减小文件的大小。然后，您可以从命令行对其进行解压缩。
if you are using a popular dataset, like MNIST, many Python packages can download and read it for you
如果您使用的是流行的数据集(例如MNIST)，则许多Python软件包都可以为您下载和阅读
download it directly form the command line
直接从命令行下载

If available, the second option is the most convenient. However, many dataset don’t come bundled inside Python packages, so we’ll need to explore the third solution.

如果可用，第二个选项是最方便的。但是，许多数据集并未捆绑在Python包中，因此我们需要探索第三个解决方案。

提示2：从笔记本电脑下载文件 (Tip 2: Downloading files from the notebook)

Jupyter Notebooks allow you to execute shell command using by prepending an exclamation mark ! before the line. For example, if you want to view the contents of the current working directory you could simply write !ls.

Jupyter Notebooks允许您通过在前面加上感叹号来执行shell命令! 前行。例如，如果要查看当前工作目录的内容，则只需编写!ls 。

This gives you the opportunity to use curl to download files from the command line:

这使您有机会使用curl从命令行下载文件：

!curl -O http://your.link/here

This line of code will save the data in you current working directory (/content by default on Colab). The networks speeds of the computers on which your kernels run are very high, so that even big datasets will be downloaded in a few seconds.

这行代码会将数据保存在您当前的工作目录中(在Colab上默认为/content )。运行内核的计算机的网络速度非常快，因此，即使是大型数据集也将在几秒钟内被下载。

Now that you have the primary materials in your notebook, it’s likely that they need to be decompressed before you can use them. There’s a variety of archive formats out there and for each of them there’s an appropriate command that you can use to decompress the file. For example. if the dataset is zipped you can type:

既然您的笔记本中已包含主要材料，则可能需要先解压缩它们，然后才能使用它们。有各种各样的存档格式，每种格式都有一个适当的命令，可用于解压缩文件。例如。如果数据集已压缩，则可以键入：

!unzip -q -d destination_path archive_to_unzip.zip

提示3：从驱动器导入模块 (Tip 3: Importing Modules from Drive)

If you have some custom modules that you want to use in you notebook, you can upload them on Drive and import them later. Let’s see how.

如果您有一些要在笔记本中使用的自定义模块，则可以将其上传到云端硬盘中，然后再导入。让我们看看如何。

Before you can import them, you have to tell Python were to look. First, create a Project folder where we will put our .py files. Then you need to execute:

在导入它们之前，您必须告诉Python要外观。首先，创建一个Project文件夹，在其中放置我们的.py文件。然后您需要执行：

import sys
sys.path.insert(0, '/content/drive/My Drive/Project')

After executing the cell above, your modules will import without errors:

执行完上面的单元格后，您的模块将无错误地导入：

import foo
import bar

Why is this necessary? When you write import foo, Python looks for a module named foo in a predefined list of locations called sys.path. By inserting the location of our modules as the first entry we make sure that foo and bar will be found.

为什么这是必要的？当您编写import foo ，Python将在名为sys.path的预定义位置列表中查找名为foo的模块。通过插入模块的位置作为第一个条目，我们确保将找到foo和bar 。

Note that this trick is a kind of workaround that will make many expert Python practitioners uneasy. Abusing this hack on you personal computer, or on a production server, will likely lead to a messy import path and unpredictable consequences if you have different modules with the same name. However in the case of the experimental and temporary environment of the notebook this may be acceptable.

请注意，此技巧是一种变通办法，它将使许多Python专业人士变得不安。如果您具有相同名称的不同模块，则在您的个人计算机或生产服务器上滥用此黑客可能会导致导入路径混乱，并导致不可预知的后果。但是，在笔记本电脑的实验性和临时性环境中，这是可以接受的。

If you want more info take a look at Learning Python, 5th Edition by Mark Lutz, chapter 22, section The Module Search Path.

如果您想了解更多信息，请阅读Mark Lutz撰写的Learning Python，第5版，第22章，模块搜索路径一节。

如果还不够怎么办 (What if this isn’t enough)

Colab doesn’t make any guarantees on the consinstency or availability of the service. Furthermore, if you run a kernel for too many consecutive hours you will be taken down. Therefore, this platform isn’t intended for professional or heavy usage.

Colab对服务的一致性或可用性不做任何保证。此外，如果您连续多个小时运行内核，则会被删除。因此，该平台不适合专业用途或大量使用。

If you need more power, or if you wish to attach a permanent storage to your cloud machine, you’d better check out the options available on Google Cloud, AWS or Microsoft Azure.

如果您需要更多功能，或者希望将永久存储连接到云计算机，则最好查看Google Cloud，AWS或Microsoft Azure上可用的选项。

结论 (Conclusion)

Colab is a powerful tool, that if used correctly can save your laptop and your time. These tricks allow you to make the most out of the platform and overcome some of the limitation of an online notebook environment.

Colab是功能强大的工具，如果使用正确，可以节省您的笔记本电脑和时间。这些技巧使您可以充分利用平台并克服在线笔记本环境的某些限制。

If you know some other tricks that I didn’t mention, feel free to share them in the comments.

如果您知道我没有提到的其他技巧，请随时在评论中分享。

翻译自: https://medium.com/@d_toniolo/how-to-migrate-your-projects-to-colab-2c8a5769c802