Chapter 7: Databases and AWS
- B. Amazon RDS is best suited for traditional OLTP transactions. Amazon Redshift, on the other hand, is designed for OLAP workloads. Amazon Glacier is designed for cold archival storage.
- 传统的OLTP一般采用RDS类型作为数据库,aws的RDS支持的引擎如下:
- Anrora引擎:兼容mysql和postgreSQL,高于mysql5倍的吞吐量,高于postgreSQL3倍的吞吐量,64TB存储,3个az的6路复制,15个只读副本,且副本滞后不超过10毫秒,故障监控,保证在30S内进行故障转移;
- MYSQL引擎:支持跨区域读取副本,支持32 VCPU及244G内存,16TB存储,支持自动备份及时间点恢复
- MariaDB引擎:支持跨区域读取副本,支持32 VCPU及244G内存,16TB存储,支持自动备份及时间点恢复,支持全局事务和线程池;
- PostgreSQL引擎:支持高稳定性和高可靠性,兼容oracle
- SqlServer引擎:支持express(10G存储),web,standard,enterprise四种版本;
- oracle引擎:支持enterprise、standard(32 vcpu)、standard one(16 vcpu)、standard two(16vcpu)四个版本;
- D. Amazon DynamoDB is best suited for non-relational databases. Amazon RDS and Amazon Redshift are both structured relational databases.
- DynamoDB是非关系型数据库,也就是非结构化数据库NOSQLDB的典型代表。
- RDS和Redshift都是结构化数据库;
- DynamoDB对标的是开源数据库Cassandra
- 想创建全局表:要创建全局表,请确保此表是空的,而且 DynamoDB Streams 已启用
- C. In this scenario, the best idea is to use read replicas to scale out the database and thus maximize read performance. When using Multi-AZ, the secondary database is not accessible and all reads and writes must go to the primary or any read replicas.
- 读写分离是提升性能的最佳方式。
- A. Amazon Redshift is best suited for traditional OLAP transactions. While Amazon RDS can also be used for OLAP, Amazon Redshift is purpose-built as an OLAP data warehouse.
- aws的OLAP解决方案就是redshift,用来处理在线数据分析的场景
- B. DB Snapshots can be used to restore a complete copy of the database at a specific point in time. Individual tables cannot be extracted from a snapshot.
- 因为RDS的自动备份被打开,所以可以通过恢复一个时间点的snapshots完成数据恢复
- 恢复数据是无法通过抽取单独的表进行的;
- A. All Amazon RDS database engines support Multi-AZ deployment.
- 所有的RDS数据库引擎都支持 multi-AZ的部署。这里包括 Anrora、mysql、SqlServer、oracle、mariadb、postgrepSQL
- B. Read replicas are supported by MySQL, MariaDB, PostgreSQL, and Aurora.
- 支持只读副本的数据库引擎有:MYSQL/mariaDB、PostgrepSql、Anrora
- A. You can force a failover from one Availability Zone to another by rebooting the primary instance in the AWS Management Console. This is often how people test a failover in the real world. There is no need to create a support case.
- 测试RDS的Multi-AZ能力,只需要开启MultiAZ的部署方式,然后重启主数据库就可以了。
- D. Monitor the environment while Amazon RDS attempts to recover automatically. AWS will update the DB endpoint to point to the secondary instance automatically.
- 启动了MultiAZ的部署方式,当主数据库宕机的时候,从数据库自动承接所有的访问,不需要人为的干预;
- A. Amazon RDS supports Microsoft SQL Server Enterprise edition and the license is available only under the BYOL model.
- aws的SQLServer 支持byol模式,就是自己带lisence的模式启动;
- B. General Purpose (SSD) volumes are generally the right choice for databases that have bursts of activity.
- 采用General Purpose ssd就行,因为这个有信用分,支持短时间内突然提升访问性能的需求;
- B. NoSQL databases like Amazon DynamoDB excel at scaling to hundreds of thousands of requests with key/value access to user profile and session.
- 注意是保存session数据,这个使用nosqldb是最合适的,这个采用Dynamo数据库比较匹配;
- A, C, D. DB snapshots allow you to back up and recover your data, while read replicas and a Multi-AZ deployment allow you to replicate your data and reduce the time to failover.
- snapshots支持我们恢复数据库数据;
- read副本和Multi-az的部署模式支持将数据快速无损回复;
- C, D. Amazon RDS allows for the creation of one or more read-replicas for many engines that can be used to handle reads. Another common pattern is to create a cache using Memcached and Amazon ElastiCache to store frequently used queries. The secondary slave DB Instance is not accessible and cannot be used to offload queries.
- 目标是降低主库读的压力。MultiAZ的部署方式的standby服务是不可用的;
- 策略只有构建read副本或者通过ElasticCache进行;
- A, B, C. Protecting your database requires a multilayered approach that secures the infrastructure, the network, and the database itself. Amazon RDS is a managed service and direct access to the OS is not available.
- 使用AWS的RDS,理论上是无法直接访问RDS所在实例的操作系统的;
- A, B, C. Vertically scaling up is one of the simpler options that can give you additional processing power without making any architectural changes. Read replicas require some application changes but let you scale processing power horizontally. Finally, busy databases are often I/O- bound, so upgrading storage to General Purpose (SSD) or
Provisioned IOPS (SSD) can often allow for additional request processing.
- 短时间快速提升RDS的性能:创建时选择高性能的instance、只读的副本、使用Provisioned SSD磁盘
- C. Query is the most efficient operation to find a single item in a large table.
- Query是最高效的查找单一数据条目的方式
- A. Using the Username as a partition key will evenly spread your users across the partitions. Messages are often filtered down by time range, so Timestamp makes sense as a sort key.
- 主键使用username进行区分,排序采用timestamp,这样是比较有意义的
- B, D. You can only have a single local secondary index, and it must be created at the same time the table is created. You can create many global secondary indexes after the table has been created.
- 本地的二级索引只能有一个,只能在表创建的时候一同创建 ;
- B, C. Amazon Redshift is an Online Analytical Processing (OLAP) data warehouse designed for analytics, Extract, Transform, Load (ETL), and high-speed querying. It is not well suited for running transactional applications that require high volumes of small inserts or updates.
- Redshift是一个OLAP的引用场景,比较适合数据仓库和数据分析;
知识点总结
Know what a relational database is. A relational database consists of one or more tables. Communication to and from relational databases usually involves simple SQL queries, such as “Add a new record,” or “What is the cost of product x?” These simple queries are often referred to as OLTP.
了解什么是关系型数据库,一个关系型数据库有一个或者多个表组成。与关系型数据库交互是通过SQL query完成。例如增加一个新记录,或者查询某个产品的价格,这些简单的查询都经常使用OLTP。
Understand which databases are supported by Amazon RDS. Amazon RDS currently supports six relational database engines:Microsoft SQL Server、MySQL Server、Oracle、PostgreSQL、MariaDB、Amazon Aurora
理解AWS的RDS的数据库引擎,当前支持6种引擎:SQL Server、MYSQL、ORACLE、PostgreSQL、MariaDB、Aurora
Understand the operational benefits of using Amazon RDS. Amazon RDS is a managed service provided by AWS. AWS is responsible for patching, antivirus, and management of the underlying guest OS for Amazon RDS. Amazon RDS greatly simplifies the process of setting a secondary slave with replication for failover and setting up read replicas to offload queries. Remember that you cannot access the underlying OS for Amazon RDS DB instances. You cannot use Remote Desktop Protocol (RDP) or SSH to connect to the underlying OS. If you need to access the OS, install custom software or agents, or want to use a database engine not supported by Amazon RDS, consider running your database on Amazon EC2 instead.
- 了解使用AWS RDS的好处。RDS是一个托管型的服务。AWS负责补丁、病毒防护、guest OS的维护。RDS极大的简化了建设slave的工作。同时记住你不能访问存放RDS的EC2实例。你不能使用RDP或者ssh去连接RDB所在instance OS.如果你需要访问OS,需要安装定制的软件或者代理。如果想使用一个AWS不支持的引擎,考虑自己部署到EC2上。
Know that you can increase availability using Amazon RDS Multi-AZ deployment. Add fault tolerance to your Amazon RDS database using Multi-AZ deployment. You can quickly set up a secondary DB Instance in another Availability Zone with Multi-AZ for rapid failover.
- 了解如何使用Multi-AZ增加RDS的高可用性。可以通过使用Multi-AZ的部署方式来完成容灾。在容灾恢复场景,你可以快速建立起另外一个DB实例在另外一个AZ中。
Understand the importance of RPO and RTO. Each application should set RPO and RTO targets to define the amount of acceptable data loss and also the amount of time required to recover from an incident. Amazon RDS can be used to meet a wide range of RPO and RTO requirements.
理解RPO和RTO的重要性。每个应用都应该设置一个RPO和RTO目标,来定义可接受的事故数据损失以及恢复时间损失。RDS被用来满足更广范围的RPO和RTO需求;
Understand that Amazon RDS handles Multi-AZ failover for you. If your primary Amazon RDS Instance becomes unavailable, AWS fails over to your secondary instance in another Availability Zone automatically. This failover is done by pointing your existing database endpoint to a new IP address. You do not have to change the connection string manually; AWS handles the DNS change automatically.
理解RDS如何处理Multi-AZ的故障转移。当你的主数据库不可用的时候。AWS的故障转移功能将自动的将访问切换到另外一个可用区。这个故障转移通过将现有的数据库访问端点指向一个新的IP地址。这个不需要我们手工更改任何数据库连接,这个通过dns进行域名解析完成。
Remember that Amazon RDS read replicas are used for scaling out and increased performance. This replication feature makes it easy to scale out your read-intensive databases. Read replicas are currently supported in Amazon RDS for MySQL, PostgreSQL, and Amazon Aurora. You can create one or more replicas of a database within a single AWS Region or across multiple AWS Regions. Amazon RDS uses native replication to propagate changes made to a source DB Instance to any associated read replicas. Amazon RDS also supports cross-region read replicas to replicate changes asynchronously to another geography or AWS Region.
记得RDS的read replias是用来进行水平扩展提升性能的。这个replication特性让我们很容易扩展读敏感的数据库。Read replicas当前在RDS支持的引擎有MYSQL、PostgrepSQL、Aurora。你可以创建一个或者多个DB的读副本在但一个region中或者跨多个REGION中。RDS使用本地复制的方式将source DB的改变传递到read replicas上。RDS也支持跨region的replicate同步;
Know what a NoSQL database is. NoSQL databases are non-relational databases, meaning that you do not have to have an existing table created in which to store your data. NoSQL databases come in the following formats:
Document databases
Graph stores
Key/value stores
Wide-column stores理解什么是NOSQL DB。NOSQL数据库是非关系型数据库。意味着你不必先创建一个表来存储你的数据。NOSQL数据库以如下形式存在:文档数据库、图形存储、kv存储、列存储;
Remember that Amazon DynamoDB is AWS NoSQL service. You should remember that for NoSQL databases, AWS provides a fully managed service called Amazon DynamoDB. Amazon DynamoDB is an extremely fast NoSQL database with predictable performance and high scalability. You can use Amazon DynamoDB to create a table that can store and retrieve any amount of data and serve any level of request traffic. Amazon DynamoDB automatically spreads the data and traffic for the table over a sufficient number of partitions to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.
了解DynamoDB就是一个nosqldb服务。AWS的托管nosqldb就是DynamoDB。DynamoDB是一个相当快的NOSQL 数控,拥有高性能和高扩展性。你可以使用它去创建表,同时存储任意数量的数据,支持任意量级的请求。他可以自动的将数据和流量在分发的不同的分区中进行存储,同时保持了一致性和高性能;
Know what a data warehouse is. A data warehouse is a central repository for data that can come from one or more sources. This data repository would be used for query and analysis using OLAP. An organization’s management typically uses a data warehouse to
compile reports on specific data. Data warehouses are usually queried with highly complex queries.了解什么是warehourse,一个warehouse是一个数据仓库,可以存储一个或者多个来源的数据。这个数据库仓库会被用来查询和分析。一个组织最典型的场景就是用数据仓库进行报告生成。数据库仓库一般用来进行高复杂度的查询分析;
Remember that Amazon Redshift is AWS data warehouse service. You should remember that Amazon Redshift is Amazon’s data warehouse service. Amazon Redshift organizes the data by column instead of storing data as a series of rows. Because only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, which greatly improves query performance. Another advantage of columnar data storage is the increased compression, which can further reduce overall I/O.
Redshift是aws的数据库仓库服务。redshift通过列的方式存储数据。因为列式存储查询更改、i/o更小。另外就是可以增加压缩,更好的减少i/o。