SQL或NoSQL:Google App Engine第2部分

tech2023-09-05  105

In the first part of this series, we looked at a relational databases and how NoSQL is different in comparison to relational databases. In this part we will look at “Google App Engine Datastore” which is one of options for storing your data with Google App Engine. Among other options first one is Google Cloud SQL – which is a relational DB in cloud, based on MySQL. Second option is Google Cloud Storage which is a storage service for storing files and objects of sizes upto terabytes.

在本系列的第一部分中 ,我们研究了关系数据库以及NoSQL与关系数据库相比有何不同。 在这一部分中,我们将研究“ Google App Engine数据存储区 ”,它是使用Google App Engine存储数据的选项之一。 第一个选项是Google Cloud SQL ,它是基于MySQL的云中关系数据库。 第二种选择是Google Cloud Storage ,它是一种存储服务,用于存储大小不超过TB的文件和对象。

Google App Engine数据存储 (Google App Engine Datastore)

The Datastore is an infinitely scalable, schemaless object store, right at your disposal. It handles data quite a bit different than a RDBMS, which is why it provides solutions for many of RDBMS shortcomings. At the same time it comes with it’s own set of restrictions and different ways of modelling and building the data access layer of your application. Let’s look at some features of the Google App Engine Datastore.

数据存储区是无限扩展的,无模式的对象存储区,随时供您使用。 它处理数据的方式与RDBMS完全不同,这就是为什么它为RDBMS的许多缺点提供了解决方案。 同时,它具有一组自己的限制,以及不同的建模和构建应用程序数据访问层的方式。 让我们看一下Google App Engine数据存储区的某些功能。

无模式 (Schemaless)

The Datastore doesn’t require a fixed schema for your data. It’s an object store, you can throw objects at it and it’ll store them. Let’s talk in relation to a practical example. Let’s say we need to store business card information for an application. First name, last name and email address are mandatory fields, and there are optional fields like mobile number, LinkedIn URL, Twitter handle. Now when we are storing an entity of type “Business” we of course need to make sure to supply the mandatory fields, but optional fields can be stored only when they are available. So one entity might have twitter handle stored and another one might have twitter handle and mobile number. Let’s look at a JSON representation of object to understand the how entity looks like:

数据存储区不需要为数据提供固定的架构。 这是一个对象存储,您可以向其扔对象并将其存储。 让我们谈一个实际的例子。 假设我们需要存储应用程序的名片信息。 名字,姓氏和电子邮件地址是必填字段,还有一些可选字段,例如手机号码,LinkedIn URL和Twitter句柄。 现在,当我们存储“业务”类型的实体时,我们当然需要确保提供必填字段,但是可选字段只有在可用时才能存储。 因此,一个实体可能存储了Twitter句柄,而另一个实体可能存储了Twitter句柄和手机号码。 让我们看一下对象的JSON表示,以了解实体的外观:

实体1 (Entity 1)

{

{

"firstName" : "John", "LastName" : "Taylor", "Email" : "john@gmail.com", "twitter_id" : "john_t" }

实体2 (Entity 2)

{

{

"firstName" : "Tom", "LastName" : "Rogers", "Email" : "trogers@gmail.com", "twitter_id" : "trogers", "mobileNumber" : "567 555 1256" }

Let’s compare how this would be a modeled  in a relational DB. A table with columns of all required and all optional fields will be created. For entities which don’t have any value for an optional field, a “null” will be filled. Any new field to be added to entity would mean change in table structure and populating value of that field for all entities.

让我们比较一下在关系数据库中如何建模。 将创建一个包含所有必填字段和所有可选字段的列的表。 对于对于可选字段没有任何值的实体,将填充“ null”。 任何要添加到实体的新字段都将意味着表结构的更改以及该字段对所有实体的填充值。

无限扩展 (Infinitely Scalable)

You can store as much data in the datastore as you desire (leaving per GB cost aside for a moment), none of your queries will slow down. Fetching five entities from 50 is no different than fetching five entities from 50 million, performance wise. Query runtime will only increase with the size of your result-set and not the data-set to be scanned.

您可以根据需要在数据存储中存储尽可能多的数据(暂时保留每GB成本),所有查询都不会减慢速度。 从性能角度来看,从50个中获取五个实体与从5000万个中获取五个实体没有什么不同。 查询运行时只会随结果集的大小而增加,而不会随要扫描的数据集的大小而增加。

If you recall the discussion we had in part 1 of this series, you’ll quickly ask yourself how the Datastore is able to shard automatically, when RDBMS can’t. This amazing property is due to the way data is modeled inside the Datastore. Instead of spreading attributes across several relations, all information about a single entity is kept in one place. All entities are then ordered by their unique id. A simple algorithm can now split this list of entities by their ids and store the resulting shards on separate machines. The same algorithm can now be used to route every request to the appropriate machine.

如果您回想起本系列第1部分中的讨论,您会很快问自己,在RDBMS无法做到的情况下,数据存储区如何能够自动分片。 这个惊人的特性是由于在数据存储区中对数据建模的方式。 与其在多个关系中分布属性,不如将单个实体的所有信息都保存在一个地方。 然后,所有实体均按其唯一ID排序。 现在,一种简单的算法可以按实体的ID划分此实体列表,并将结果碎片存储在单独的计算机上。 现在可以使用相同的算法将每个请求路由到适当的计算机。

强一致性与最终一致性 (Strong consistency vs. eventual consistency)

The Datastore will only guarantee strong consistency for reads and “ancestor queries”, every other query will be eventually consistent. So there is a slight chance, that a user might not see the most up-to-date version of an entity, when it was very recently updated. This is not a big deal for a lot of use-cases (“Gosh!  This Tweet only showed up now, when it was posted two seconds ago!”). There are ways to trade performance for strong consistency, but I won’t go into them here (take this as a starting point).

数据存储区只能保证读取和“祖先查询”的高度一致性,其他所有查询最终都将保持一致。 因此,当实体最近更新时,用户可能看不到该实体的最新版本。 对于许多用例来说,这并不是什么大问题(“天哪!此推文仅在两秒钟前发布,现在才显示!”)。 可以通过多种方法来交换性能以实现高度一致性,但是在这里我不做介绍( 以此为起点)。

Sacrificing perfect consistency means there is no need to immediately synchronize all machines, a RDBMS would have to wait until every machine finishes updating the data. The Datastore also won’t check referential integrity, because that would mean having to read data from other machines to ensure a valid update. Another benefit of being able to scale horizontally is, that the Datastore is impressively fault-tolerant. Many machines in your data center could fail, and your data would most likely still be served without hesitation.

牺牲完美的一致性意味着不需要立即同步所有机器,RDBMS将不得不等到每台机器完成数据更新。 数据存储区也不会检查参照完整性,因为这意味着必须从其他计算机读取数据以确保有效更新。 能够水平扩展的另一个好处是,数据存储区具有出色的容错能力。 数据中心中的许多计算机可能会发生故障,并且很可能仍会毫不犹豫地为您的数据提供服务。

建模数据 (Modeling your data)

In the datastore, you’ll have to model your data based on the queries you’ll want to run in your application. Because all the data for one entity has to be kept in one place, the Datastore will have to build an index for every query you need, before such queries can be served. This implies, that Ad-Hoc queries won’t be an option with the Datastore. App Engine provides (great) tools like MapReduce to crunch to huge datasets and perform analytical tasks, but such tasks will take significantly more time to implement than the elegant SQL statement you might be used to.

在数据存储区中,您将必须根据要在应用程序中运行的查询为数据建模。 由于一个实体的所有数据都必须保存在一个地方,因此数据存储区将必须为您需要的每个查询建立索引,然后才能提供此类查询。 这意味着,临时存储查询将不是数据存储区的选项。 App Engine提供了诸如MapReduce之类的(出色)工具来处理庞大的数据集并执行分析任务,但是与您可能习惯于优雅SQL语句相比,实现这些任务将花费更多的时间。

在SQL和NoSQL数据存储区之间进行选择 (Choosing between SQL and NoSQL datastore)

To summarize the discussion so far

总结到目前为止的讨论

“The Datastore provides a way to persist ‘dumb’ data, which the application turns into information, RDBMS provide a way to persist structured data, which the application can make use of directly”

“数据存储区提供了一种持久化“哑巴”数据的方式,应用程序将其转化为信息,RDBMS提供了一种持久化结构化数据的方式,应用程序可以直接利用它。”

Remember the discussion we had in part 1 around RDBMS needing information about the structure of your data to provide application-independent services like data aggregation on the database-level. The Datastore won’t do that. It only cares about the pieces of data it needs to build indexes, the rest of your entity is seen as a sealed blob of bytes.

请记住,在第1部分中 ,有关RDBMS的讨论需要有关数据结构的信息,以提供独立于应用程序的服务,例如数据库级别的数据聚合。 数据存储区不会这样做。 它只关心构建索引所需的数据,其余的实体被视为字节的密封Blob。

Let’s look at some scenarios and use cases and try to evaluate if the RDBMS is a better solution or a NoSQL

让我们看一些方案和用例,并尝试评估RDBMS是更好的解决方案还是NoSQL

Can a single server provide the performance we need? Maybe by utilizing caching? In this case RDBMS is the way to go. Look for example into CloudSQL, AppEngine’s purely relational offering.

一台服务器可以提供我们所需的性能吗? 也许利用缓存? 在这种情况下,RDBMS是必经之路。 例如,查看AppEngine的纯关系产品Cloud SQL 。

Do you plan on growing a multi-application environment working on the same dataset? Depending on your volume, RDBMS might be what you need, because it separates the database layer very strictly from the application.

您是否计划在同一数据集上发展一个多应用程序环境? 根据您的数量,可能需要RDBMS,因为它非常严格地将数据库层与应用程序分开。 Do you need Ad-Hoc queries? In terms of query flexibility, SQL is the clear winner.

您是否需要即席查询? 在查询灵活性方面,SQL无疑是赢家。 Do you require perfect consistency? Even though there are ways to achieve strong consistency in the Datastore, it’s not what it was designed for. Again, RDBMS is the better choice.

您需要完美的一致性吗? 尽管可以通过多种方法在数据存储区中实现强大的一致性,但这并不是其设计目的。 同样,RDBMS是更好的选择。 Are you expecting millions of reads & writes per second? The Datastore provides automatic scaling to infinity and beyond, and it’s right there for you to use with AppEngine.

您是否期望每秒数以百万计的读写? 数据存储区可自动缩放到无穷远甚至更远,并且就可以与AppEngine一起使用。 Do you need a simple, scalable way to persist entities with variable attributes? Even though you’ll have to handle consistency and data aggregation yourself, the schemaless Datastore should be what you need. And it’s integrated right into AppEngine, so it’s the ideal choice for quick prototypes with changing entities.

您是否需要一种简单,可扩展的方式来保留具有可变属性的实体? 即使您必须自己处理一致性和数据聚合,无模式数据存储也应该是您所需要的。 而且它已直接集成到AppEngine中,因此它是具有变化的实体的快速原型的理想选择。

Choosing the right database is a vast topic, but with two great options at hand (CloudSQL and the Datastore), you at least won’t have to steer away from App Engine. I hope this article made the decision easier for you, and I wish you all the best with whatever great application you have in mind.

选择正确的数据库是一个广泛的话题,但是手头有两个不错的选择(CloudSQL和数据存储区),您至少不必逃避App Engine。 我希望本文对您来说使决定变得容易,并且希望您在考虑任何出色的应用程序的过程中一切顺利。

进一步阅读 (Further reading)

SQLvsNoSQL – BattleoftheBackends : A very informative and somewhat entertaining presentation at Google I/O 2012

SQL VS 的NoSQL - 战役 中 的 后端 :一个非常丰富的,有点娱乐呈现在谷歌I / O 2012

ThegreatAppEnginedocumentation : Provides lots of information about everything Datastore related

在 伟大 的AppEngine 文档 :提供了大量的信息有关的一切相关数据存储

MoredetailsonNoSQL – Datamodeling

更多 细节 上 的NoSQL - 数据 建模

Google App Engine: Databse Strategies

Google App Engine:Databse策略

最后的话 (Final words)

Choosing a right datastore for your application and understanding NoSQL, both are big topics. What we have covered in this two part series is a just the tip of an iceberg. Any missed out mentions of a major or emerging player in this space is purely inadvertent. We at CloudSpring hope to cover more of this great subject in near future, keep watching.

为您的应用程序选择合适的数据存储并理解NoSQL,这都是重要的话题。 这两个系列的内容只是冰山一角。 在这个领域,任何错过的主要或新兴参与者的消息纯粹是无意的。 CloudSpring的我们希望在不久的将来涵盖更多这个伟大的主题,请继续关注。

翻译自: https://www.sitepoint.com/sql-or-nosql-google-app-engine-part-2/

相关资源:jdk-8u281-windows-x64.exe
最新回复(0)