工程师到谷歌

Want to get this certification? Well it is not an easy one. You’ll need to do the homework. From what I read online people usually spend 2–3 month on preparation.

w ^蚂蚁得到这个认证？好吧，这不是一件容易的事。您需要做家庭作业。根据我在网上阅读的内容，人们通常会花2-3个月的时间进行准备。

It’s not a secret that many of us won’t be using each of the Google products every day but we need to know them, right? This article is for those who don’t have time to read all the manuals. I will describe what I did to get ready for this exam in 8 days.

我们中的许多人每天都不会使用每一种Google产品，这不是什么秘密，但我们需要了解它们，对吗？本文适用于那些没有时间阅读所有手册的人。我将描述在8天内为准备这次考试所做的准备。

First of all I need to say that I didn’t have a clue how serious that exam really is. Exam questions were way more complex and different from any online course questions I know. So if you don’t have any developer background please take your time, read the books and do the tutorials.

首先，我需要说的是，我不知道该考试有多认真。考试问题与我所知道的任何在线课程问题相比都更加复杂和不同。因此，如果您没有任何开发人员背景，请花一些时间阅读本书并进行教程。

It took 1 hour and 35 minutes to pass the test. Every exam question was exactly the same though.

测试通过了1小时35分钟。但是，每个考试题都完全相同。

“Why did I choose to do that to myself?”

“我为什么选择对自己这样做？”

After the 3rd question I had a strong feeling that I know nothing. I was really scared as I told everyone that I was going to take this exam. A bit of advice — don’t put yourself under that pressure. Don’t tell your boss or your girlfriend. If you fail you will be able to take this exam again in two weeks. I think I passed only because I am a lucky guy and was wearing my lucky T-shirt that day. After I submit my final answers I saw exam result straight away. It was a ‘Pass’. My certificate came next week as well as a promo code for a hoodie.

在第三个问题之后，我有一种强烈的感觉，我什么都不知道。当我告诉所有人我要参加这次考试时，我真的很害怕。一点建议-不要让自己承受这种压力。不要告诉你的老板或女友。如果不及格，您将可以在两周内再次参加该考试。我想我通过是因为我是一个幸运的人，并且那天穿着我的幸运T恤。提交最终答案后，我立即看到考试结果。这是一个“通行证”。我的证书将于下周寄出，并附带连帽衫的促销代码。

制备 (Preparation)

第一天 (Day 1)

I started with practice tests. There is a plenty of online courses for this task. I gave an overview below on some of them that helped me to get ready. Long story short, if you don’t want to waste your time start with the tests.

我从练习测试开始。有很多在线课程可以完成此任务。我在下面概述了其中一些有助于我做好准备的事情。长话短说，如果您不想浪费时间，请从测试开始。

There is a practice exam from Google which I did and failed but then I knew how questions looked exactly. It gives you the format, level, and scope of questions you may encounter on the certification exam.

Google进行了一次练习考试，但我失败了，但后来我知道问题的样子。它提供了您在认证考试中可能遇到的问题的格式，级别和范围 。

Practice exam — Google Professional Data Engineer 练习考试-Google专业数据工程师

Then there is a Linux Academy practice exam.

然后是Linux Academy实践考试。

Recently they did have a migration of some courses from A Cloud Guru, which would include the Google Professional Data Engineer course! This course itself would be newer than the course that they had within their platform, but you are still able to view their Data Engineer course here Google Cloud Certified Professional Data Engineer (LA). Which was updated to the latest July 2019 exam objectives.

最近，他们确实从A Cloud Guru迁移了一些课程，其中包括Google Professional Data Engineer课程！本课程本身比他们在其平台上的课程要新，但是您仍然可以在此处查看其数据工程师课程Google Cloud Certified Professional Data Engineer(LA) 。已更新为最新的2019年7月考试目标。

The first one is free! So go unlock the challenge. For the second one you will have two free attempts to practice after the registration.

第一个是免费的！ 因此，去挑战吧。注册第二个课程后，您将有两次免费练习的机会。

And they have a handy course book (free): Google Cloud Professional Data Engineer Exam Handbook by Linux Academy: summary of the main concepts in scope.

他们有一本便捷的课程书(免费)： Linux Academy编写的Google Cloud Professional数据工程师考试手册 ：范围内主要概念的摘要。

第二天 (Day 2)

On day 2 I started to shape an idea of how to deal with case studies, what exam structure and questions are. I started to pay attention to words like economically, cost-effective, as soon as possible, etc. These types of keywords very often define the right answer because on exam you can find multiple answers that technically satisfy the requirments.

在第二天，我开始对如何处理案例研究，考试结构和问题有一个构想。我开始关注诸如经济，成本效益，尽快之类的词。这些类型的关键字通常会定义正确的答案，因为在考试中，您可以找到从技术上满足要求的多个答案。

第3-5天 (Day 3–5)

I did 2 practice exams a day occasionally reading google docs related to topics I didn’t know. I did that during my morning cardio in the gym while I was cycling. 30–40 minutes is more than enough to do the practice exam. Also there is a research suggesting that moderate aerobic activity improves cognitive function. I found it very useful and cardio didn’t sound painful anymore. I was learning.

我每天进行两次练习考试，偶尔阅读与我不知道的主题相关的google文档。我在骑自行车的早晨在健身房做有氧运动。 30至40分钟的时间足以进行练习考试。也有研究表明适度的有氧运动可改善认知功能。我发现它非常有用，并且Cardio听起来再也不痛苦了。我正在学习。

第6-8天 (Day 6–8)

I did two practice exams a day but now I had two browser tabs opened with previously passed practice exams. Every question I was uncertain about I checked straight away and read the docs. I think this tactics helped to polish my knowledge. Also I started to take some product specific notes and tie them to those keywords I was talking about earlier.

我每天进行两次练习考试，但是现在我打开了两个浏览器选项卡，其中包含以前通过的练习考试。我不确定的每个问题都立即检查并阅读了文档。我认为这种策略有助于增进我的知识。我也开始记一些产品特定的注释，并将它们与我之前提到的那些关键字相关联。

By day 7 I was getting at least 90% pass on practice.

到第7天，我至少有90％的练习通过了。

Prcatice Google Certified Data Engineer exam Prcatice Google认证数据工程师考试

如何通过考试？ (How to pass the exam?)

There is no generic answer to that question. During the real exam I felt that I know nothing and questions seemed very difficult. However, the following strategy worked for me:

这个问题没有通用的答案。在真正的考试中，我感到自己一无所知，问题似乎很难解决。但是，以下策略对我有用：

Do the practice tests to understand the type of questions and structure.
做练习测试以了解问题的类型和结构。
Learn product features
了解产品功能
Pay attention to question keywords as very often they define the correct answer.
注意问题关键字，因为它们经常定义正确的答案。

Put on your lucky T-Shirt or whatever lucky thing you have. You’ll need this.

穿上幸运的T恤或其他任何幸运的东西。您需要这个。

4. Read the manual. It’s optional but very useful.

4.阅读手册。它是可选的，但非常有用。

Read official Google docs. At least an overview and case studies. These guides are great and have all the information you need to pass the exam. I was checking a topic from a Professional Data Engineer Study Guide by Willey and then searched for that topic in Google docs.

阅读官方的Google文档。至少有概述和案例研究。这些指南非常有用，并提供通过考试所需的所有信息。我正在从Willey撰写的《专业数据工程师学习指南》中检查一个主题，然后在Google文档中搜索该主题。

Most of the exam questions were case studies, how to fix something, design a process, best practice or about machine learning.

大部分考试问题是案例研究 ， 如何解决问题，设计流程，最佳实践或关于机器学习的问题 。

Real exam is very machine learning heavy.

真正的考试对机器学习很重。

典型问题。 (Typical questions.)

确保解决方案质量。 (Ensuring solution quality.)

Case studies and Best practice questions. There will be a lot of them.

案例研究和最佳实践问题。会有很多。

Example 1: You are monitoring GCP Operations (formerly Stackdriver) metrics which show that your Bigtable instance’s storage utilization is approaching 70% per node. What do you do?

示例1 ： 您正在监视GCP操作(以前称为Stackdriver)指标，该指标表明Bigtable实例的存储利用率接近每个节点70％。 你是做什么？

Answer: Add additional nodes to the cluster to increase storage processing capacity. Even though Cloud Bigtable table data is stored in Google Colossus, a cluster needs to be sized appropriately so that nodes have enough resources to process the total storage in use. When instance storage utilization reaches 70% per node, additional nodes should be added.

答： 将其他节点添加到群集以增加存储处理能力。 即使Cloud Bigtable表数据存储在Google Colossus中，也需要适当调整集群的大小，以便节点具有足够的资源来处理正在使用的总存储量。当实例存储利用率达到每个节点的70％时，应添加其他节点。

Read: Quotas & limits | Cloud Bigtable Documentation | Google Cloud

阅读：配额和限制| Cloud Bigtable文档| 谷歌云

Example 2: Your organization has just recently started using Google Cloud. Everyone in the company has access to all datasets in BigQuery, using it as they see fit without documenting their use cases. You need to implement a formal security policy, but need to first determine what everyone has been doing in BigQuery. What is your first step to do so?

示例2：您的组织最近才开始使用Google Cloud。公司中的每个人都可以访问BigQuery中的所有数据集，只要他们认为合适就可以使用它，而无需记录其用例。您需要实施正式的安全策略，但首先需要确定每个人在BigQuery中所做的事情。您要做的第一步是什么？

Answer: Use Stackdriver Logging to review data access. Stackdriver Logging will record the audit logs of jobs and queries of each individual user’s actions. Query slots won’t work because they measure BigQuery performance and resource usage, but gives no visibility to individual user activity. You will not be able to view user activity via billing records. IAM policies are applied to datasets, but not individual tables inside each dataset. Furthermore, IAM policies show who has permissions to resources, but not their activity.

回答：使用Stackdriver Logging查看数据访问。 Stackdriver Logging将记录作业的审核日志以及每个用户操作的查询。查询位置无法使用，因为它们可以衡量BigQuery的性能和资源使用情况，但无法查看单个用户的活动。您将无法通过帐单记录查看用户活动。 IAM策略适用于数据集，但不适用于每个数据集内的单个表。此外，IAM策略会显示谁拥有资源的权限，但没有他们的活动的权限。

Read: BigQuery documentation | Google Cloud

阅读： BigQuery文档| 谷歌云

Example 3: Your security team have decided that your Dataproc cluster must be isolated from the public internet and not have any public IP addresses. How can you achieve this?

示例3 ：您的安全团队已决定您的Dataproc群集必须与公共Internet隔离，并且没有任何公共IP地址。你怎么能做到这一点？

Answer: Using the — no-address flag will prevent public IPs from being assigned to nodes in a Cloud Dataproc cluster. However, Private Google Access is still required for the subnet to access certain GCP APIs.

回答：使用— no-address标志将防止将公共IP分配给Cloud Dataproc集群中的节点。但是，子网仍然需要私有Google Access才能访问某些GCP API。

Read: Dataproc Cluster Network Configuration | Dataproc Documentation

阅读： Dataproc群集网络配置| Dataproc文档

Explore what Google recommends as best practice

探索Google建议的最佳做法

BigQuery: https://cloud.google.com/bigquery/docs/best-practices
BigQuery： https ： //cloud.google.com/bigquery/docs/best-practices
Stackdriver and Logging: https://cloud.google.com/products/operations
Stackdriver和记录： https ://cloud.google.com/products/operations
BigTable: https://cloud.google.com/bigtable/docs/performance
BigTable： https ： //cloud.google.com/bigtable/docs/performance
IAM and security: https://cloud.google.com/iam/docs/concepts
IAM和安全性： https : //cloud.google.com/iam/docs/concepts
Cloud Storage: https://cloud.google.com/storage/docs/best-practices
云端存储： https ： //cloud.google.com/storage/docs/best-practices

After all I would recommend to read overviews of all database products as there will be a lot of questions about them: https://cloud.google.com/products/databases

毕竟，我建议您阅读所有数据库产品的概述，因为有关它们的问题很多： https : //cloud.google.com/products/databases

设计数据处理系统 (Designing data processing systems)

Example: A customer has a 400GB MySQL database running in a datacentre. What would be the best approach for migrating this database to GCP?

示例： 一个客户在一个数据中心中运行着一个400GBMySQL数据库。 将此数据库迁移到GCP的最佳方法是什么？

Answer: Create a Cloud SQL for MySQL 2nd generation instance and migrate the data. For a MySQL database of this size, a Cloud SQL for MySQL instance would be the recommended approach. Using Compute Engine adds additional operational overhead. Postgres and Spanner would not be suitable migration hosts for a MySQL database.

答： 为MySQL第二代实例创建Cloud SQL并迁移数据。 对于这种规模MySQL数据库，建议使用Cloud SQL for MySQL实例。使用Compute Engine会增加额外的运营开销。 Postgres和Spanner不适用于MySQL数据库。

Recommended read: Migration from MySQL to Cloud SQL | Solutions | Google Cloud

推荐阅读： 从MySQL迁移到Cloud SQL | 解决方案谷歌云

选择Google数据库产品 (Choosing Google Database products)

Choosing Google Database products 选择Google数据库产品

Example: Your database is 500 GB in size. The data is semi-structured and does not need full atomicity. You need to process transactions in a point-of-sale application on Google Cloud Platform? You need to account for exponential user growth, but you do not want to deal with managing your infrastructure overhead?

示例：您的数据库大小为500 GB。数据是半结构化的，不需要完整的原子性。您需要在Google Cloud Platform上的销售点应用程序中处理交易吗？您需要考虑指数级的用户增长，但是不想管理基础架构开销吗？

Use Datastore

使用数据存储

Example: Data is more than 1 Tb and low latency required (also you probably don’t care about costs):

示例：数据超过1 Tb，所需的等待时间短(您可能也不在乎成本)：

Use BigTable

使用BigTable

Low latency not required and/or need to run ANSI SQL analytics and do it economically? Need to easily load data from CSV and JSON for later inspection with SQL?

不需要低等待时间和/或需要运行ANSI SQL分析并且经济地做到了吗？是否需要轻松地从CSV和JSON加载数据以供以后使用SQL检查？

Use BigQuery. Cloud Datastore supports JSON and SQL-like queries but cannot easily ingest CSV files. Cloud SQL can read from CSV but not easily convert from JSON. Cloud Bigtable does not support SQL-like queries.

使用BigQuery 。 Cloud Datastore支持JSON和类似SQL的查询，但无法轻松提取CSV文件。 Cloud SQL可以从CSV读取，但不容易从JSON转换。 Cloud Bigtable不支持类似SQL的查询。

You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactionally consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably.

您正在设计Google Cloud上的关系数据存储库，以根据需要进行扩展。数据将在交易中保持一致，并可以从世界上的任何位置添加。您要监视和调整输入流量的节点数，这可能会意外地增加。

Use Cloud Spanner

使用Cloud Spanner

You need strongly consistent transactions? Data less than 500 Gb? The data does not need to be streaming or real-time?

您需要高度一致的交易吗？数据小于500 Gb？数据不需要流式传输还是实时的？

Use Cloud SQL

使用Cloud SQL

注意： (Pay attention to:)

High Availability and Performances and things like failover and read replicas.

高可用性和性能以及诸如故障转移和只读副本之类的功能 。

毕竟有很多BigTable问题。 (After all there are a lot of BigTable questions.)

注意： (Pay attention to:)

Development and Production instances, Disk Types (HDD vs. SSD).

开发和生产实例，磁盘类型(HDD与SSD)。

BigTable Performance Example: Your organization will be deploying a new fleet of IoT devices, and writes to your Bigtable instance are expected to peak at 50,000 queries per second. You have optimized your row key design and need to design a cluster that can meet this demand. What do you do?

BigTable Performance示例 ：您的组织将部署新的IoT设备，并且对Bigtable实例的写入预计将达到每秒50,000个查询的峰值。您已经优化了行键设计，需要设计一个可以满足此需求的群集。你是做什么？

Answer: An optimized Bigtable instance with a well-designed row key schema can theoretically support up to 10,000 write queries per second per node, so 5 nodes are required.

答：具有精心设计的行键架构的优化Bigtable实例理论上可以每个节点每秒最多支持10,000个写查询，因此需要5个节点。

Read: Understanding Cloud Bigtable performance

阅读：了解Cloud Bigtable性能

BigTable Performance Example: You are asked to investigate a Bigtable instance that is performing poorly. Each row in the table represents a record from an IoT device and contains 128 different metrics in their own column, each metric containing a 32-bit integer. How could you modify the design to improve performance?

BigTable性能示例：要求您调查性能不佳的Bigtable实例。表中的每一行代表来自IoT设备的一条记录，并且在其自己的列中包含128个不同的指标，每个指标包含一个32位整数。您如何修改设计以提高性能？

Answer: Large numbers of cells in a row can cause poor performance in Cloud Bigtable. When the data itself is so small, as in this scenario, it would be more efficient to simply retrieve all of the metrics from a single cell, and use delimiters inside the cell to separate the data. Row versioning would compound the problem by creating the most new entries along the least efficient dimension of the table, and HDD disks will always slow things down.

答：连续的大量单元会导致Cloud Bigtable的性能下降。当数据本身如此之小时(如在这种情况下)，将更有效地简单地从单个单元格中检索所有指标，并在单元格中使用定界符来分隔数据。行版本控制会通过沿着表的最低效率维度创建最新条目来使问题复杂化，而HDD磁盘将始终使速度变慢。

Read: Understanding Cloud Bigtable performance

阅读：了解Cloud Bigtable性能

BigTable Performance Example: Your production Bigtable instance is currently using four nodes. Due to the increased size of your table, you need to add additional nodes to offer better performance. How should you accomplish this without the risk of data loss?

BigTable性能示例：您的生产Bigtable实例当前正在使用四个节点。由于表的大小增加，您需要添加其他节点以提供更好的性能。如何在没有数据丢失风险的情况下完成此任务？

Answer: Edit instance details and increase the number of nodes. Save your changes. Data will re-distribute with no downtime.You can add/remove nodes to Bigtable with no downtime necessary.

答：编辑实例详细信息并增加节点数。保存您的更改。数据将在不停机的情况下进行重新分配。您无需停机即可将节点添加/删除到Bigtable。

Read: Overview of Cloud Bigtable | Cloud Bigtable Documentation

阅读： Cloud Bigtable概述| Cloud Bigtable文档

BigTable Performance Example: You currently have a Bigtable instance you’ve been using for development running a development instance type, using HDDs for storage. You are ready to upgrade your development instance to a production instance for increased performance. You also want to upgrade your storage to SSDs as you need maximum performance for your instance. What should you do?

BigTable性能示例：您当前有一个用于开发的Bigtable实例正在运行开发实例类型，并使用HDD进行存储。您已准备好将开发实例升级到生产实例，以提高性能。您还需要将存储升级到SSD，因为您需要为实例提供最佳性能。你该怎么办？

Answer: you cannot change the disk type on an existing Bigtable instance, you will need to export/import your Bigtable data into a new instance with the different storage type. You will need to export to Cloud Storage then back to Bigtable again.

答：您不能在现有Bigtable实例上更改磁盘类型，您需要将Bigtable数据导出/导入到具有不同存储类型的新实例中。您将需要导出到Cloud Storage，然后再次回到Bigtable。

BigTable Performance Example: Your customer uses a Bigtable instance that contains 2 replicating clusters for regional disaster recovery. Table transactions from the application are required to be strongly consistent. How can you guarantee that for this configuration?

BigTable性能示例：您的客户使用包含2个复制群集的Bigtable实例进行区域灾难恢复。来自应用程序的表事务必须高度一致。您如何保证这种配置？

Answer: Determine one cluster as the master, and use an application profile that specifies single-cluster routing. By default, Cloud Bigtable is eventually consistent. To guarantee strong consistency you must limit queries to a single cluster in an instance by using an application profile.

答：将一个群集确定为主群集，并使用指定单群集路由的应用程序配置文件。默认情况下，Cloud Bigtable最终是一致的。为了保证强一致性，您必须使用应用程序配置文件将查询限制在实例中的单个群集上。

Read: Overview of Replication | Cloud Bigtable Documentation | Google Cloud

阅读：复制概述| Cloud Bigtable文档| 谷歌云

Read: Overview of Cloud Bigtable | Cloud Bigtable Documentation

阅读： Cloud Bigtable概述| Cloud Bigtable文档

BigTable Performance Example: What will happen to your data in a Bigtable instance if a node goes down?

BigTable性能示例：如果节点发生故障，Bigtable实例中的数据将如何处理？

Answer: Nothing, as the storage is separated from the node compute. Rebuilding from RAID is not a valid Bigtable function. Storage and compute are separate, so a node going down may affect performance, but not data integrity; nodes only store pointers to storage as metadata.

答：没什么，因为存储与节点计算是分开的。从RAID重建不是有效的Bigtable函数。存储和计算是分开的，因此发生故障的节点可能会影响性能，但不会影响数据完整性。节点仅将指向存储的指针作为元数据存储。

Read: Overview of Cloud Bigtable | Cloud Bigtable Documentation

阅读： Cloud Bigtable概述| Cloud Bigtable文档

BigTable Performance Example: You are monitoring GCP Operations (formerly Stackdriver) metrics which show that your Bigtable instance’s storage utilization is approaching 70% per node. What do you do?

BigTable性能示例：您正在监视GCP操作(以前称为Stackdriver)指标，该指标表明Bigtable实例的存储利用率接近每个节点70％。你是做什么？

Answer: Add additional nodes to the cluster to increase storage processing capacity. Even though Cloud Bigtable tablet data is stored in Google Colossus, a cluster needs to be sized appropriately so that nodes have enough resources to process the total storage in use. When instance storage utilization reaches 70% per node, additional nodes should be added.

答：将其他节点添加到群集以增加存储处理能力。即使Cloud Bigtable平板电脑数据存储在Google Colossus中，也需要适当调整集群的大小，以便节点具有足够的资源来处理正在使用的总存储量。当实例存储利用率达到每个节点的70％时，应添加其他节点。

Read: Quotas & limits | Cloud Bigtable Documentation | Google Cloud

阅读：配额和限制| Cloud Bigtable文档| 谷歌云

BigTable Performance Example: Which of these is NOT a valid reason to choose an HDD storage type over SSD in a Bigtable instance?

BigTable性能示例：在Bigtable实例中，以下哪个不是在SSD上选择HDD存储类型的有效理由？

Answer: Bigtable can integrate with Cloud Storage regardless of the type of disk in use by the instance. The other reasons are valid for choosing HDD as an outlying case, but in general SSD disks are preferred as HDD disks will cause a significant drop in performance.

答：Bigtable可以与Cloud Storage集成，而不管实例使用的磁盘类型如何。其他原因也可以选择HDD作为例外，但是通常首选SSD磁盘，因为HDD磁盘会导致性能显着下降。

Read: Overview of Cloud Bigtable | Cloud Bigtable Documentation

阅读： Cloud Bigtable概述| Cloud Bigtable文档

关系数据库问题 (Relational Database questions)

注意： (Pay attention to:)

Replicas, availability and migration guides.

副本，可用性和迁移指南。

Example: You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactionally consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably. What should you do?

示例：您正在设计Google Cloud上的关系数据存储库，以根据需要进行扩展。数据将在交易中保持一致，并可以从世界上的任何位置添加。您要监视和调整输入流量的节点数，这可能会意外地增加。你该怎么办？

Answer: Use Cloud Spanner for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.

答：使用Cloud Spanner进行存储。如果在您的时间跨度中使用了70％以上，请监视CPU利用率并增加节点数。

Example: Your customer is looking to move a 2TB MySQL database to GCP. Their business requires an uptime SLA exceeding 99.95%. How can you achieve this?

示例：您的客户希望将2TB MySQL数据库移至GCP。他们的业务需要正常运行时间SLA超过99.95％。你怎么能做到这一点？

Answer: Migrate the database to a Cloud SQL for MySQL high availability configuration with a standby instance in a secondary zone. Cloud SQL’s standard SLA is 99.95%. Uptime in excess of this can be achieved by using a high availability configuration with a failover instance in a secondary availability zone. Failover replicas are not a feature — read replicas are. Cold-spares are inefficient as they will not be automatically switched to, unlike the proper HA configuration. Compute Engine options are not required to achieve the required SLA.

答：将数据库迁移到Cloud SQL for MySQL高可用性配置，并在辅助区域中具有备用实例。 Cloud SQL的标准SLA为99.95％。通过将高可用性配置与辅助可用性区域中的故障转移实例一起使用，可以实现超过此时间的正常运行时间。故障转移副本不是功能，而只读副本则是。冷备用效率不高，因为它们不会自动切换到正常HA配置不同的位置。不需要Compute Engine选项即可实现所需的SLA。

Read: Cloud SQL Service Level Agreement (SLA) | Cloud SQL Documentation Overview of the high availability configuration | Cloud SQL for MySQL

阅读： Cloud SQL服务级别协议(SLA)| Cloud SQL文档高可用性配置概述| 适用于MySQL的Cloud SQL

有关Pub / Sub，Kafka和窗口化的很多问题。 (A lot of questions about Pub/Sub, Kafka and windowing.)

注意： (Pay attention to:)

Kafka Mirroring, Differences between these two.

卡夫卡镜像，两者之间的差异。

发布/订阅 (Pub/Sub)

Pub/Sub handles the need to scale exponentially with traffic coming from around the globe. Apache Kafka will not be able to handle an exponential growth in users globally as well as Pub/Sub.

Pub / Sub可以满足来自全球的流量呈指数增长的需求。 Apache Kafka将无法应对全球以及Pub / Sub用户的指数增长。

Cloud Pub/Sub guarantees to deliver messages at least once to every subscriber. As multiple systems need to be notified of every order, you should create one topic and use multiple subscribers. Order of delivery is not guaranteed by Pub/Sub so attach a timestamp in the publishing system if possible.

Cloud Pub / Sub保证至少向每个订阅者传递一次消息。由于需要将每个订单通知多个系统，因此您应该创建一个主题并使用多个订阅者。发布/订阅不能保证交付顺序，因此请尽可能在发布系统中附加时间戳。

Read: https://cloud.google.com/pubsub/architecture

阅读： https : //cloud.google.com/pubsub/architecture

Example: Your company’s Kafka server cluster has been unable to scale to the demands of their data ingest needs. Streaming data ingest comes from locations all around the world. How can they migrate this functionality to Google Cloud to be able to scale for future growth?

示例：您公司的Kafka服务器群集无法扩展到其数据提取需求的需求。流式数据提取来自世界各地。他们如何才能将此功能迁移到Google Cloud，以进行扩展以实现未来的增长？

Answer: Create a single Pub/Sub topic. Configure endpoints to publish to the Pub/Sub topic, and configure Cloud Dataflow to subscribe to the same topic to process messages as they come in.

答：创建一个发布/订阅主题。配置端点以发布到发布/订阅主题，并配置Cloud Dataflow订阅同一主题以处理传入的消息。

安全性，加密和密钥管理 (Security, Encryption and key management)

Example: Your organization has a security policy which mandates that the security department must own and manage encryption keys for data stored in Cloud Storage buckets. Analysts and developers need to store data that is encrypted with these keys without access to the keys themselves. How can this be achieved?

示例：您的组织具有一项安全策略，要求安全部门必须拥有和管理存储在Cloud Storage存储桶中的数据的加密密钥。分析师和开发人员需要存储使用这些密钥加密的数据，而不能访问密钥本身。如何做到这一点？

Answer: Use Cloud KMS for the security team to manage their own encryption keys in a dedicated project. Grant the Cloud KMS CryptoKey Encrypter/Decrypter role for the keys to the Cloud Storage service accounts in the other projects. Cloud KMS allows you to create and manage your own encryption keys which can then be used by service accounts in other projects. Developers in those projects can then access services without any access to the underlying keys. A staging area is not required, neither is any other manual intervention by the security team.

答：使用Cloud KMS，安全团队可以在专用项目中管理自己的加密密钥。将Cloud KMS CryptoKey Encrypter / Decrypter角色授予其他项目中Cloud Storage服务帐户的密钥。 Cloud KMS允许您创建和管理自己的加密密钥，然后其他项目中的服务帐户可以使用它们。然后，那些项目中的开发人员可以访问服务而无需访问基础密钥。不需要暂存区域，安全团队也不需要任何其他手动干预。

Read: Using customer-managed encryption keys | Cloud Storage | Google Cloud

阅读：使用客户管理的加密密钥| 云存储| 谷歌云

Comparison of encryption methods. Source: Coursera 加密方法比较。资料来源：Coursera

建立并运行数据处理系统 (Building and operationalizing data processing systems)

数据过程 (Dataproc)

注意： (Pay attention to:)

HDFS vs. Google Cloud Storage for Dataproc workloads.

HDFS与适用于Dataproc工作负载的Google Cloud Storage。

Best practice: Dataproc clusters better be job specific. Use cloud storage if you need scaling because HDFS won’t scale well and needs custom settings. Also Google recommends using Cloud Storage instead of HDFS as it is much more cost effective especially when jobs aren’t running.

最佳实践 ：Dataproc群集最好针对特定工作。如果您需要扩展，请使用云存储，因为HDFS无法很好地扩展并且需要自定义设置。 Google还建议使用Cloud Storage而不是HDFS，因为这样做更具成本效益，特别是在作业不运行时。

Read: https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-jobs

阅读： https : //cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-jobs

数据流 (Dataflow)

注意： (Pay attention to:)

PCollection branching, Flatten and Joins, transformations and sliding windows.

PCollection分支， PCollection联接，转换和滑动窗口。

Flatten — You can use the Flatten transform in the Beam SDKs to merge multiple PCollections of the same type.
展平 -您可以在Beam SDK中使用展Flatten变换来合并多个相同类型的 PCollection 。
Join — You can use the CoGroupByKey transform in the Beam SDK to perform a relational join between two PCollections. The PCollections must be keyed (i.e. they must be collections of key/value pairs) and they must use the same key type.
联接 -您可以在Beam SDK中使用CoGroupByKey转换在两个PCollection之间执行关系PCollection 。必须对PCollection进行键控(即，它们必须是键/值对的集合)，并且它们必须使用相同的键类型。

Read: https://beam.apache.org/documentation/pipelines/design-your-pipeline/

阅读： https : //beam.apache.org/documentation/pipelines/design-your-pipeline/

Windowing:

窗口化 ：

Example: You are writing a streaming Cloud Dataflow pipeline that transforms user activity updates before writing them to a time-series database. While continually transforming each element as it arrives, you also need to depend on some additional data at run-time to create the transformation. How can you achieve this?

示例：您正在编写一个流式Cloud Dataflow管道，该管道在将用户活动更新写入时间序列数据库之前将其转换。在不断变换每个元素到达时，您还需要在运行时依赖一些其他数据来创建变换。你怎么能做到这一点？

Answer: Side inputs are useful if your ParDo needs to inject additional data when processing each element in the input PCollection, but the additional data needs to be determined at runtime (and not hard-coded). A combine would not achieve the same outcome, and using an external shell script is unnecessary and inefficient.

回答：如果您的ParDo在处理输入PCollection中的每个元素时需要注入附加数据，但是附加数据需要在运行时确定(而不是硬编码)，则侧面输入很有用。合并将不会获得相同的结果，并且使用外部Shell脚本是不必要且效率低下的。

Read: Beam Programming Guide

阅读：光束编程指南

Example: You need to design a pipeline that can ingest batch data from your organization’s application metrics as well as your user database, then join the data using a common key before outputting to BigQuery. What is the most efficient way to go about this?

示例：您需要设计一个管道，该管道可以从组织的应用程序指标以及用户数据库中提取批处理数据，然后在输出到BigQuery之前，使用通用密钥将数据加入。最有效的方法是什么？

Answer: Create a Cloud Dataflow pipeline and join the two PCollections using CoGroupByKey transform on the common key. CoGroupByKey performs a relational join of two or more key/value PCollections that have the same key type; “common key” is the magic clue in this question.

答案：创建一个Cloud Dataflow管道，并在公共密钥上使用CoGroupByKey转换将两个PCollection结合在一起。 CoGroupByKey对具有相同键类型的两个或多个键/值PCollection执行关系联接； “公共钥匙”是这个问题的神奇线索。

Read: Beam Programming Guide

阅读：光束编程指南

Example : You are setting up multiple MySQL databases on Compute Engine. You need to collect logs from your MySQL applications for audit purposes. How should you approach this?

示例： 您正在Compute Engine上设置多个MySQL数据库。 您需要从MySQL应用程序收集日志以进行审核。 您应该如何处理？

Answer: Install the Stackdriver Logging agent on your database instances and configure the fluentd plugin to read and export your MySQL logs into Stackdriver Logging. The Stackdriver Logging agent requires the fluentd plugin to be configured to read logs from your database application. Not Stackdriver Monitoring, not Cloud Composer. Cloud Composer is used for managing workflows, not logging. Stackdriver Monitoring is useful for measuring performance metrics and alerts, but not for logs.

答：在您的数据库实例上安装Stackdriver Logging代理，并配置fluentd插件以读取MySQL日志并将其导出到Stackdriver Logging中。 Stackdriver Logging代理要求将fluentd插件配置为从数据库应用程序读取日志。不是Stackdriver Monitoring，不是Cloud Composer。 Cloud Composer用于管理工作流，而不是日志记录。 Stackdriver Monitoring对测量性能指标和警报很有用，但对日志却没有用。

Read: About the Logging agent | Cloud Logging | Google Cloud

阅读：关于日志记录代理| 云记录| 谷歌云

Example : You want to make changes to a Cloud Dataflow pipeline that is currently in production, which reads data from Cloud Storage and writes the output back to Cloud Storage. What is the easiest and safest way to test changes while in development?

示例：您想要更改当前正在生产中的Cloud Dataflow管道，该管道将从Cloud Storage读取数据并将输出写回到Cloud Storage。在开发过程中测试变更的最简单，最安全的方法是什么？

Answer: Use a DirectRunner to test-run the pipeline using local compute power, and a staging storage bucket. Using a DirectRunner configuration with a staging storage bucket is the quickest and easiest way of testing a new pipeline, without risking changes to a pipeline that is currently in production.

答：使用DirectRunner使用本地计算能力和分段存储桶来测试运行管道。在分段存储桶中使用DirectRunner配置是测试新管道的最快，最简单的方法，而不必冒险更改当前正在生产的管道。

Read: Direct Runner

阅读：直接跑步

Example : You are asked to investigate a Bigtable instance that is performing poorly. Each row in the table represents a record from an IoT device and contains 128 different metrics in their own column, each metric containing a 32-bit integer. How could you modify the design to improve performance?

示例：要求您调查性能不佳的Bigtable实例。表中的每一行代表来自IoT设备的一条记录，并且在其自己的列中包含128个不同的指标，每个指标包含一个32位整数。您如何修改设计以提高性能？

Answer: Store the metrics in a single column by using delimiters. Make sure the cluster is using SSD disks. Large numbers of cells in a row can cause poor performance in Cloud Bigtable. When the data itself is so small, as in this scenario, it would be more efficient to simply retrieve all of the metrics from a single cell, and use delimiters inside the cell to separate the data. Row versioning would compound the problem by creating the most new entries along the least efficient dimension of the table, and HDD disks will always slow things down.

回答：使用定界符将度量标准存储在单个列中。确保群集正在使用SSD磁盘。连续的大量单元会导致Cloud Bigtable的性能下降。当数据本身如此之小时(如在这种情况下)，将更有效地简单地从单个单元格中检索所有指标，并在单元格中使用定界符来分隔数据。行版本控制会通过沿着表的最低效率维度创建最新条目来使问题复杂化，而HDD磁盘将始终使速度变慢。

Read: Understanding Cloud Bigtable performance

阅读：了解Cloud Bigtable性能

运作机器学习模型 (Operationalizing machine learning models)

I started with Google docs straight away as I was already familiar with ML basic concepts but I think official Google ML crash course is really useful for exam purposes. 100% I should have started with this one first. And it has a lot of reading content as well as videos.

我已经很熟悉ML基本概念，所以我立即开始使用Google文档，但是我认为Google ML官方速成课程对于考试非常有用。 100％我应该先从这个开始。它具有大量的阅读内容和视频。

And then there is a quiz in each section so you can check your understanding:

每个部分都有一个测验，因此您可以检查自己的理解：

Dynamic (Online) Inference 动态(在线)推理

This exam is really machine learning heavy.

这项考试真是机器学习的重头戏。

Example questions:

问题示例：

Example L1/L2 regularization : You are attempting to train a TensorFlow model but you are aware that some of your input features will have no significant impact on a prediction. What technique can you employ to discourage model complexity?

示例L1 / L2正则化 ：您正在尝试训练TensorFlow模型，但您知道某些输入功能不会对预测产生重大影响。您可以采用哪种技术来阻止模型的复杂性？

Answer: L2 regularization is more relevant when all features have relatively equal weights/influence, which is not the case here. Hyperparameters deal with learning rate, which is not relevant for this question. L1 regularization is able to reduce the weights of less important features to zero or near zero.

答：当所有特征的权重/影响都相对相等时， L2正则化更为相关，在此情况并非如此。超参数处理学习率，与该问题无关。 L1正则化能够将次要特征的权重减少到零或接近零。

Read: Regularization for Simplicity | Machine Learning Crash Course

阅读：简化的正则化| 机器学习速成课程

Example: What is over/underfitting and how to fix it. You are training a facial detection machine learning model. Your model is suffering from overfitting your training data. What steps can you take to solve this problem?

示例： 什么是过度安装/安装不足 以及如何修复。 您正在训练面部检测机器学习模型。您的模型正遭受过度拟合训练数据的困扰。您可以采取什么步骤解决此问题？

Simple answer: To fix underfitting (for example, when the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set.) increase the complexity of your model (introduce an additional layer or increase the size of vocabularies). A model overfits when predicts training data well but performs poor on the validation set. To fix overfitting: reduce the number of features (Regularization), add more data to increase the variety of samples and better generalize your model,use Dropout layers or reduce the network’s capacity by removing layers or reducing the number of elements in the hidden layers. Increasing your regularization parameters also allows you to reduce ‘noise’ in your model to reduce overfitting.

简单答案：要解决拟合不足问题 (例如，当模型的均方根误差(RMSE)在训练集上是测试集的两倍时，)会增加模型的复杂性(引入额外的一层)或增加词汇量)。当预测训练数据很好但在验证集上表现不佳时，模型会过拟合。要解决过度拟合问题，请执行以下操作：减少要素数量 ( 正则化 )， 添加更多数据以增加样本的种类并更好地泛化模型，使用Dropout层或通过删除层或减少隐藏层中的元素数来减少网络的容量。 增加正则化参数还可以减少模型中的“噪声”以减少过度拟合。

Read: https://developers.google.com/machine-learning/crash-course/generalization/peril-of-overfitting and here: Machine learning workflow | AI Platform | Google Cloud

阅读： https : //developers.google.com/machine-learning/crash-course/generalization/peril-of-overfitting，并在这里：机器学习工作流程| 人工智能平台谷歌云

Synthetic features
综合功能

Simple answer:

简单答案：

A feature not present among the input features, but created from one or more of them. Kinds of synthetic features include: — Bucketing a continuous feature into range bins. -Multiplying (or dividing) one feature value by other feature value(s) or by itself. — Creating a feature cross : A synthetic feature formed by crossing (taking a Cartesian product of) individual binary features. Feature crosses help represent nonlinear relationships.

一种输入要素中不存在的要素，而是由一个或多个输入要素创建的。的合成的特性种类包括： - 桶装一个连续特征划分范围二进制位。 -将一个特征值乘以(或除以)其他特征值或自身。 —创建特征交叉 ：通过交叉(采用笛卡尔积 )单个二元特征形成的综合特征 。特征叉帮助表示非线性关系。

Example Google Machine Learning APIs : You are developing an application that will only recognize and tag specific business to business product logos in images. You do not have an extensive background working with machine learning models, but need to get your application working. What is the current best method to accomplish this task?

Google机器学习API 示例：您正在开发的应用程序将仅识别并标记图像中特定企业对企业的产品徽标。您没有使用机器学习模型的广泛背景，但是需要使您的应用程序正常工作。当前完成此任务的最佳方法是什么？

Answer: Use the AutoML Vision service to train a custom model using the Vision API. Cloud Vision API can recognize common logos, but would struggle to find specific business logos on its own. The best option is AutoML, which allows you to take the pre-trained Vision API and apply it to custom images. Creating a custom ML model from scratch would be time-consuming and is not necessary when you can build on existing models.

答：使用AutoML Vision服务可以使用Vision API训练自定义模型。 Cloud Vision API可以识别常见徽标，但是很难独自找到特定的业务徽标。最好的选择是AutoML，它使您可以采用预先训练的Vision API并将其应用于自定义图像。从头开始创建自定义ML模型将非常耗时，并且当您可以在现有模型上构建时就没有必要了。

Read: Cloud Auto ML Vision

阅读： Cloud Auto ML Vision

Example: You need to quickly add functionality to your application that will allow it to process uploaded user images, extract any text contained within them and perform sentiment analysis on the text. What should you do?

示例：您需要快速向应用程序添加功能，以使其能够处理上载的用户图像，提取其中包含的所有文本并对文本进行情感分析。你该怎么办？

Answer: Call the Cloud Vision API for Optical Character Recognition (OCR) then call the Natural Language API for sentiment analysis. Cloud Vision API for OCR is the quickest way to extract from user uploaded images. The Natural Language API already has a built-in model for sentiment analysis.

回答：调用Cloud Vision API进行光学字符识别(OCR)，然后调用Natural Language API进行情感分析。用于OCR的Cloud Vision API是从用户上传的图像中提取的最快方法。自然语言API已经具有用于情感分析的内置模型。

Read: Detect text in images | Cloud Vision API | Google Cloud Analyzing Sentiment | Cloud Natural Language API | Google Cloud

阅读：检测图像中的文本| Cloud Vision API | Google Cloud 分析情绪| 云自然语言API | 谷歌云

Example: You’re developing a mobile application that allows a food processing business to detect fruit that has gone bad. Staff at warehouses will use mobile devices to take pictures of fruit to determine whether it should be discarded. Which GCP services could you use to accomplish this?

示例：您正在开发一个移动应用程序，该程序可以使食品加工企业检测出变质的水果。仓库的工作人员将使用移动设备为水果拍照，以确定是否应丢弃水果。您可以使用哪些GCP服务来完成此任务？

Answer: Train an AutoML Vision model using labeled images of fruit that has gone bad. Use AutoML Vision Edge in ML Kit to deploy the custom model to mobile devices using ML Kit client libraries. The AutoML Vision service is the quickest way to train a custom classification model using image data, and the Vision Edge in ML Kit will allow the model to be deployed to Android and IOS devices. Images must be labelled in order for the model to be trained. There is no need to use Kubernetes Engine or App Engine.

答案：使用变质的水果标签图像训练AutoML Vision模型。使用ML Kit中的AutoML Vision Edge将自定义模型部署到使用ML Kit客户端库的移动设备上。 AutoML Vision服务是使用图像数据训练自定义分类模型的最快方法，而ML Kit中的Vision Edge可以将模型部署到Android和IOS设备。必须标记图像以便训练模型。无需使用Kubernetes引擎或App Engine。

Read: AutoML Vision API Tutorial | Cloud AutoML Vision | Google Cloud

阅读： AutoML Vision API教程| Cloud AutoML视觉| 谷歌云

Example: You have a large number of images that you wish to process through a custom AutoML Vision model. Time is not a factor, but cost is. Which approach should you take?

示例：您希望通过自定义AutoML Vision模型处理大量图像。时间不是因素，但成本是因素。您应该采用哪种方法？

Answer: Make an asynchronous prediction request for the entire batch of images using the batchPredict method. Batch prediction often offers a lower cost per inference and higher throughput than synchronous (online) prediction. However, batch prediction produces a long-running operation (LRO), meaning that results are only available once the LRO has completed.

答：使用batchPredict方法对整个图像批次发出异步预测请求。与同步(在线)预测相比，批量预测通常提供较低的每次推理成本和更高的吞吐量。但是，批生产预测会产生长时间运行的操作(LRO)，这意味着结果仅在LRO完成后才可用。

Read: Making batch predictions | Cloud AutoML Vision | Google Cloud

阅读：进行批次预测| Cloud AutoML视觉| 谷歌云

ML产品 (ML products)

This is a full list and very easy to find all docs. Just skim through Overview and use cases sections.

这是完整列表，非常容易找到所有文档。只需浏览概述和用例部分。

Vision AI use cases 视觉AI用例

Speech to text API best practices: https://cloud.google.com/speech-to-text/docs/best-practices

语音对文字API最佳做法 ： https ： //cloud.google.com/speech-to-text/docs/best-practices

Cloud Vision API: https://cloud.google.com/vision/docs/labels

Cloud Vision API ： https : //cloud.google.com/vision/docs/labels

Source: Linux Academy 资料来源：Linux Academy

自然语言API (Natural Language API)

Example question: You wish to build an AutoML Natural Language model for classifying some documents with user-defined labels. How can you ensure you are providing quality training data for the model?

问题示例 ：您希望构建一个AutoML自然语言模型，以使用用户定义的标签对某些文档进行分类。如何确保为模型提供高质量的培训数据？

Answer: Ensure you provide at minimum 10 training documents per label, but ideally 100 times more documents for the most common label than for the least common label. To achieve the best results when preparing training data for AutoML Natural Language classification models, the minimum number of documents per label is 10, and ideally you should have at least 100 times more documents for the most common label than for the least common label.

回答：确保每个标签至少提供10份培训文档，但最理想的标签的文档数量最好是最不常见的标签的文档数量的100倍。为了在为AutoML自然语言分类模型准备训练数据时获得最佳结果，每个标签的最小文档数为10，理想情况下，最常见标签的文档数量应比最不常见标签的文档数量至少多100倍。

Read: Preparing your training data | AutoML Natural Language | Google Cloud

阅读：准备训练数据| AutoML自然语言| 谷歌云

Google Cloud AI Platform
Google Cloud AI平台
Google Cloud TPUs
Google Cloud TPU
Google Glossary of ML terms
Google词汇表词汇

Also I would recommend this guide for best practice:

另外，我会推荐此指南以获得最佳实践：

在线练习考试 (Online practice exams)

Linux Academy的Google认证的专业数据工程师 (Google Certified Professional Data Engineer from Linux Academy)

Rank: 4/5 - Probably the best one. Still won’t cover all aspects you may face during exam.

排名：4/5-可能是最好的之一。 仍然无法涵盖考试期间可能面临的所有方面。

Google Certified Professional Data Engineer from Linux Academy Linux Academy的Google认证的专业数据工程师

Google Certified Professional Data Engineer Results Google认证的专业数据工程师结果

Coursera Google Cloud专业数据工程师课程 (Coursera Google Cloud Professional Data Engineer course)

It has a good resources section with pdfs you could use for your exam preparation. It has 7 days free trial as well. Final practice exam includes 25 questions of which only 4 I found somewhat different from Linux Academy.

它有一个包含pdf的良好资源部分，可用于准备考试。它也有7天的免费试用期。最终实践考试包括25个问题，其中只有4个我发现与Linux Academy有所不同。

Coursera GCP Professional Data Engineer — Exam pdf and resources Coursera GCP专业数据工程师—考试pdf和资源

云学院课程 (Cloud Academy course)

Latest update here was on Jul 10 2020. Unfortunately you can’t take a practice exam for free here but they have a nice mobile app and lecture transcripts. So I did that one while I was cycling in the gym.

此处的最新更新是2020年7月10日。不幸的是，您不能在此处免费参加实践考试，但他们有一个不错的移动应用程序和演讲记录。所以我在健身房骑车的时候就做了那个。

Just click register for 7-Day Free trial.

只需点击注册即可享受7天免费试用。

Cloud Academy — GCP Data Engineer Exam (Not available in Free version) 云学院— GCP数据工程师考试(免费版本不可用)

最后提示 (Final tips)

Decide if you really need this certification. Exam preparation is a huge commitment.
确定您是否真的需要此认证。考试准备是一个巨大的承诺。
Don’t tell anyone.
不要告诉任何人。
Learn more about machine learning. There will be a lot of them.
了解有关机器学习的更多信息。会有很多。
Pay attention to ML products features
注意ML产品的功能
Real exam questions are more complex then the ones you’ll face during practice tests.
实际的考试题比您在实践测试中要面对的题要复杂。

推荐阅读： (Recommended read:)

A study guide by Ivam Luz : https://docs.google.com/spreadsheets/d/1LUtqhOEjUMySCfn3zj8Arhzcmazr3vrPzy7VzJwIshE/edit?usp=sharing&source=post_page-----bb6a0812a1b1----------------------

Ivam Luz的学习指南： https : //docs.google.com/spreadsheets/d/1LUtqhOEjUMySCfn3zj8Arhzcmazr3vrPzy7VzJwIshE/edit ? usp = sharing & source = post_page-----bb6a0812a1b1-------------- --------

翻译自: https://towardsdatascience.com/how-i-passed-google-professional-data-engineer-exam-in-2020-2830e10658b6

工程师到谷歌

工程师到谷歌_我如何在2020年通过Google专业数据工程师考试

推荐阅读 (Recommended read)

制备 (Preparation)

第一天 (Day 1)

第二天 (Day 2)

第3-5天 (Day 3–5)

第6-8天 (Day 6–8)

如何通过考试？ (How to pass the exam?)

典型问题。 (Typical questions.)

确保解决方案质量。 (Ensuring solution quality.)

设计数据处理系统 (Designing data processing systems)

选择Google数据库产品 (Choosing Google Database products)

注意： (Pay attention to:)

毕竟有很多BigTable问题。 (After all there are a lot of BigTable questions.)

注意： (Pay attention to:)

关系数据库问题 (Relational Database questions)

注意： (Pay attention to:)

有关Pub / Sub，Kafka和窗口化的很多问题。 (A lot of questions about Pub/Sub, Kafka and windowing.)

注意： (Pay attention to:)

发布/订阅 (Pub/Sub)

安全性，加密和密钥管理 (Security, Encryption and key management)

建立并运行数据处理系统 (Building and operationalizing data processing systems)

数据过程 (Dataproc)

注意： (Pay attention to:)

数据流 (Dataflow)

注意： (Pay attention to:)

运作机器学习模型 (Operationalizing machine learning models)

ML产品 (ML products)

自然语言API (Natural Language API)

在线练习考试 (Online practice exams)

Linux Academy的Google认证的专业数据工程师 (Google Certified Professional Data Engineer from Linux Academy)

Coursera Google Cloud专业数据工程师课程 (Coursera Google Cloud Professional Data Engineer course)

云学院课程 (Cloud Academy course)

最后提示 (Final tips)

推荐阅读： (Recommended read:)

你可能感兴趣的:(java,大数据,python,人工智能,google)