Amazon Neptune architectures for scale, availability, and insight

Ok, we can get started. Hello everyone. Thank you very much for coming along this morning, very start of the week. Uh my name is Ian Robinson and I'm a Principal Graph Architect at AWS. And in this session, I'll be talking about how you can take your Amazon Neptune applications to the next level in terms of scale availability and the insights you offer graph practitioners.

So start by talking about scaling, sometimes call it scaling for success. So if your application is successful, if it does the job that it was intended to do very well, then you're invariably going to attract more users and more queries, you may add features that require more complex queries and you may even introduce new workloads running against the same underlying data. So by that, I mean, you may have started with some online queries supporting a web application, but you then decide that you want to add some reporting or some lightweight analytics or you may want to use the very same data to train a machine learning model. So different workloads running against the same underlying data.

So all of these things are drivers for scaling. So I'm going to talk about a couple of the things that you should be looking out for that will help you identify your scaling needs. And then about the decisions you can make with regards to the resources that you assign to meet those needs. And in particular, I'll be talking about how you can determine whether Neptune Serve is a good fit for your workload. And then finally, having assigned the right resources to meet those workload needs. How do you ensure that they're utilized effectively? So that's the last part really of this scaling se section.

So a couple of things to be looking out for with an existing application. The first thing is the situation where all of the worker threads on your database instances are busy most of the time or all of the time servicing queries. So when that happens, new queries coming in will end up being queued up on the server. The server has a service iq and those queries coming in will sit on that queue for a while. So i can introduce some additional latency or some unpredictable latencies into your application. You know, a query may take just a few milliseconds to to execute once the next time round, it may take several seconds because it's been sitting behind other queries in that queue. So you can monitor the depth of that service iq on each instance using this main request que pending requests, cloudwatch metric. And if that is frequently always above zero, then it's likely you're sending more work to the cluster than it can deal with immediately.

If that's the case, then have a look at the cpu utilization metrics on those instances. And if those metrics are really high 80 90% or more, then it's likely that all of those worker threads are busy doing useful work. So this is probably a good indicator that you need to scale for more worker threads for more cpu if that cpu utilization is pretty low 50% or lower, all those threads are perhaps busy, but they're perhaps waiting for data to come back from storage. It may be that they don't have the data available to them that they need to processed computer result. So they're waiting for data to come back to storage. So there's a kind of network i owe.

So that brings me to the second thing that you should be looking out for and this is buffer cash churn. So this is the situation where your working set can't necessarily fit into main memory into the buffer cache on those database instances. So queries are always going to run fastest if the data that they need is already in my memory. But if it's not in memory, then neptune is going to have to go down to that separate managed storage layer to retrieve one or more data pages and it will bring them back into the buffer cache. But if that buffer cash is already full. When those pages come back, then it's going to have to evict some of the least recently used pages in the cash. And at that point, you're going to be experiencing a lot of buffer cash churn, ok?

And you can see that happening, you'll see the buffer cash hit ratio going down. And if it's frequently going below 99.9% then your buffer cash is probably smaller than your working set. And as that buffer cash hit ratio goes down, you'll often see the volume read iops going up because we're constantly going down to storage to get all of those additional data pages. So the impact here is very similar additional latencies from the application's point of view, it's waiting longer for a particular piece of work to be done. But there's also a cost implication because one of neptune's charging dimensions is the number of io operations. So if you're experiencing a lot of buffer cash churn, you're potentially, you know, racking up costs along that dimension because you're constantly going down to the storage layer.

So this is an indication that you may want to scale for a larger buffer cache for more memory. So having identified your scaling needs, what can you do about it? Well, you can scale up with larger instances, bigger instances give you more worker threads and the number of worker threads is equivalent to two times the number of virtual cp us on each instance and bigger instances will also give you a bigger buffer cash. And the size of the buffer cache is approximately two thirds the main memory available on each instance. But besides scaling up, you can also scale out for high reed workloads with additional read replicas. You can add reid replicas up to 15 reid replicas to a cluster.

Now you can do that manually via the console or the cl or the sdk or cloud formation template. But neptune also has an auto scaling feature that will add and remove reid replicas based upon cpu thresholds that you specify or based on a schedule that you supply. I actually think the best use of this feature is scheduled auto scaling if you have a variable workload, but you can predict when it's going to peak, you know, you have a workload that peaks every day around about the same time and then tails off around about the same time every day. Then you can use scheduled auto scaling to add read replicas to address the peak and then schedule their removal once the traffic has died off.

And then once you've decided whether you're going to scale up or out or do both have a look at the particular instance types or instance families available to you. So different instance types have specific features that help address particular workload needs. So for example, the r five d incidence types are in fact any of the incidence types that have d in the name that neptune supports. These all support an mvme based property look up cache. So this is in addition to the buffer cache and it's useful for workloads where you have queries that are frequently accessing lots of node and edge properties or rdf literals if you need a bigger buffer cash, but you don't necessarily need more worker threads. You're not seeing a lot of queuing but you are seeing a lot of buffer cash churn then look at the x two instances.

So these have a larger memory to virtual cpu ratio than their peers. In the other instance types, typically four times the amount of memory to the number of virtual cp us versus the the other instance types. And then finally, if you have a variable workload, workload where you experience peaks and then troughs, you know, very high throughput and then periods where you've got relatively low throughput or the cluster is is idle for long periods of time, then you should consider using surplus instances, service instances can scale dynamically in order to address those workload needs. And that's what i'm going to talk about next.

So as you may note, neptune surplus instances scale using what are called neptune capacity units. An ncu is equivalent to two gigabytes of ra m and the associated cpu and network. And an instance can scale as low as one ncu and as high as 128 nc us. And then you can further control that at the cluster level by specifying minimum and maximum values that will apply to all of the service instances in that cluster.

So the minimum value determines how quickly a new surplus instance or an idle surplus instance is going to scale up. And the maximum value allows you to control or cap the amount of capacity that will be assigned to each surplus instance. So these values apply on a per instance basis. And using that maximum value, you can effectively control costs, you know what the maximum spend is going to be given that maximum capacity.

So cus spurs along several different dimensions. cpu utilization, memory utilization and network throughput, effectively. What it's doing on a second by second basis is some of the stuff i was talking about a moment ago. It's looking at memory demands and cpu demands on that instance.

What i really want to talk about with regards to this slide are some of the situations in which you would choose not to use serverless. Ok. So if you have a large frequent write job, perhaps once a week, you bulk load many millions or billions of records into neptune. These very large right jobs can sometimes cause neptune to refresh or recalculate the dfe statistics. These are the statistics that are used by the query engine when it's planning a query and the same statistics that are used by the summary a pi to give you a summary of the graph. All right.

Now that recalculation for a very, very large data set can sometimes take quite a while. And if you've got a cus primary instance where this recalculation takes place, that instance will be at its maximum capacity for many hours or even days. And that's not a good use of servius. So for these very large right, repeated right jobs, i would recommend using a large provisioned instance even if you choose to use servius with your reid replicas.

Another thing don't use surplus if you've got very latency sensitive query requirements. Ok. So surplus instances will start scaling up in a matter of seconds as the traffic increases. But until they can acquire all of the capacity that they need in order to service that, that higher traffic, some of those requests coming in may end up on that service side queue. And if a worker thread can't acquire enough memory, whilst it's generating intermediate results, it will spill to disc. And both of these things can cause some additional latency for some period of time until that incidence has scaled up fully.

And then finally, don't use cus if you've got a very memory intensive workload, if the buffer cache requirement is greater than what can be provided by an eight x large instance. So 100 and 28 nc us is equivalent to an eight x large instance. And if you have need of a buffer cash larger than that then surplus is probably not for you.

But assuming that your workload isn't constrained in any of those ways. How can you tell whether serves neptune services is genuinely a good fit for your needs? Well, it's partly trial and error but there is a bit of method to it as well. So what we recommend is take an existing workload and look at the cpu utilization metrics for a period of time for a duration that represents all of the variability in that workload, all of the cyclical behaviors, things like that. If the area under that cpu utilization curve is less than or around about 50% of the total potential cpu utilization were that to be running at its peak for the entire duration. So if the area is less than 50% the area under the curve is less than 50% this is potentially a good candidate for candidate for servius.

And at that point, you can run an experiment, you can flip to to running surplus, run your workload for a few days, drive out all of that variability, all of those cyclical behaviors and then review the costs compare the cost of running surplus with the cost of running the provisioned instances.

So how can you estimate the costs of running serverless or calculate the cost of running server for just a few days? Well, we've produced a very simple little tool which is in our aws labs, github repository called the surplus cost evaluator so it's a command line tool and you would run it against the service instance that's been running for a few days and it will estimate how much that will cost you to run that incident for that period of time. The nice thing is the other thing that it does, it will try and recommend a provisioned instance that would have addressed those workload needs. Were it running at its peak? So it will recommend a provision instance for the peak part of that workload and it will give you the costs of having run that provision in instance as well. So now you can start compare costs and decide whether sur sur is appropriate for you.

So i actually ran some experiments in preparation for today. I've got three different variable workloads and they all peak at around 1500 requests a second and they're a mixture of reads and writes. The first one, i'm going to characterize as an office hours workload. So it peaks and it runs at its peak for 678 hours a day and then it drops off and it's a relatively low throughput workload for the remainder of the day, perhaps for, you know, another 18 hours, 16 hours, something like that. And it's idle at the weekends.

The second workload is what i call an international hours workload. So it's very similar, but now it's running at its peak for about 18 hours a day and it's idle only for around about six hours a day. And again, it's idle at the weekends.

And then the third one is an on off workload. It's constantly peaking, you know, peaks and troughs high throughput, low throughput and that's running constantly seven days a week.

So if you look at that first one, that office hours workload, and we look at that cpu utilization curve, it definitely looks that the area under that curve is way less than 50% of the total potential cpu utilization. So this seems a good candidate for candidate for servius.

So what i did then was flip to using cus, i simply provisioned the cus instance failed over to that instance and deleted my old provision instance without interrupting the workload and then ran this for a few days and then ran the surplus calculator. And it tells me that for that five day period, surplus cost me approximately 100 and $809

And then it's saying, well, you know, an equivalent provisioned instance, a good provisioned instance will be a two x large. This would help address the peak needs of that workload. And if you are running that for five days, that would cost just over 100 and $40.

And then if I adjust those figures for the seven day week, you know, the two days at the weekend where it's idle, we can see that servius is definitely a lot cheaper than provision for this particular workload so this seems to be a very, very good workload for servius.

The second one is that international hours workload. And as soon as I look at that utilization curve, I can see that it's way more than 50% of the total potential consumption. So right from the outset, I don't think this is a good candidate for surplus. I do think this is a good candidate for that scheduled auto scaling. So this is the kind of workload that, you know, 56 in the morning begins to peak and then drops off after 18 hours or so. And it does that repeatedly every day or five days a week. It's very easy to schedule the addition of re replicas and then to have those re replicas removed once that workload tails off and then finally that on off workload, i did exactly the same.

It looked to me as though the area under the curve was around about 50%. So it was worthwhile running the experiment. I ran it for a few days. I think i ran the cost evaluator for just three days worth of metrics. So the cost evaluator is using cloudwatch metrics to to generate those estimates. It says that serves cost me just over $30. uh the equivalent two x large incident would have cost just under $30.28 $29. So it's very, very close and just a small change in that workload, just a couple of hours more every day running at its peak, you know, that could potentially make service a lot more expensive versus provisioned.

So in this case, I'm slightly on the fence, but I'd probably stick with running provisioned for this particular workload. Ok. So that's identifying some of your scaling needs and looking at the kinds of database resources that you can provision to address those needs.

The next thing I want to talk about is how you can utilize those resources efficiently. Ideally, you want each of those instances that are dedicated to a workload to be to be doing roughly the same amount of work. You know, if one of them is underutilized, it's going to cost you for no appreciable benefit. So ideally, you want to be able to distribute the work across all the instances that are servicing a particular workload.

And what I want to talk about here is how you can scale graphs for multiple workloads. And I mean two slightly different things here when I talk about multiple workloads. The first is the situation where you've got different query and access patterns running against the same underlying data. So this is the situation in which you may have some online queries back in that web app. And then you introduce some reporting or some analytics or you decide to train a machine learning model. So very different query access patterns, but all touching the same data set, the same underlying data set.

The second meaning of multiple workload in this context is multiple tenants. So you may have multiple clients, multiple customers, each with their own discrete data set, their own discrete component or subgraph within that larger data set. They're all running all of those clients. Customers are probably running similar query access patterns but against very discreet data sets. Ok.

Now, irrespective of whether it's one or the other, the problems are pretty much the same, you've got multiple workloads that are all competing for resources within the cluster. And in many cases, the aggregate working set of all of those workloads is greater than you can support with a single instance. So not all of that data is going to fit into memory, you've got all of these different workloads competing for resources and they're all competing for the buffer cache.

And what you may see there is a lot of that buffer cash churn and that's going to be all of those problems around additional latencies and additional costs, all of those re operations against the underlying storage. And then on top of that, you get no query prioritization. So if they're all competing for resources, you may end up in situations where a really important but short running query is sitting in the queue behind a less important, less critical but long running query or is waiting on a long running query to complete.

So how can you ensure all of these database resources are employed efficiently when you have these multiple workloads? Well, there's a wanstead technique called cash sharding, which is effectively routing specific workload traffic to an individual replica or a set of replicas that are dedicated to servicing that workload. Ok. So that way, hopefully you end up with a smaller working set. You don't have a very large aggregate working set, you have a smaller working set. So it's more likely you can size instances and the buffer cash to accommodate that working set. So you get fewer cash evictions and then you can tune individual replicas or sets of instances according to the workload needs.

So you've got all of those choices around scaling up around choosing particular instance types, perhaps adopt the uh the the the look up cache or use servius. And you can do that on a workload by workload basis as long as you consistently route traffic belonging to a specific workload to a particular replica or set of replicas.

So how can you do that? I mean, you can imagine developing some reasonably complex application logic to do that routing or using load balances to route traffic appropriately. But the things that you need to think about are how are you going to accommodate changes in the cluster topology and your cluster is going to change, you're going to add re replicas, you may remove replicas. You may experience a situation where a replica is promoted to a primary that cluster topology is going to change. How do you ensure that you can accommodate those changes in all of your routing logic. All right, we don't really want you to spend time worrying about that.

Fortunately, there is something in neptune, an inbuilt feature today that can help you with that and that's custom end points. So custom end points are like the reader end point, but you get to decide the membership set and then the custom endpoint is going to balance connections across all of the instances that belong to that particular end points, membership set. All right, you get a couple of very simple controls in terms of defining how you create these custom end points. You can use include lists or exclude lists, but i could create a custom endpoint that only includes those two readers. I could create a custom endpoint that includes the primary and a reader. So i can utilize the primary. If it's under utilized for writing, i might want to include it for servicing some read requests.

Custom endpoints have the advantage that it's really simple to configure application. You simply need to configure application with the specific endpoint addresses for each of those endpoints and it works across all of the different query languages and data models. There are no special client drivers or anything like that that you need to configure. It's just gonna work irrespective of whether you're using property graph or rdf open cipher gremlin or sparkle.

There are some downsides as well. So custom endpoints suffer from the same problem that reader endpoints suffer from which is connection swarms. So the way these things are implemented, the ip address to which that endpoint resolves changes every five seconds. And if you open a large connection pool and open all those connections around about the same time, they're all going to get locked on. Typically, they get locked on to a single instance. And then because they're long live connections, you stay locked on to that instance for a very long period of time, tens of minutes hours even. Ok. So you can often end up in this situation where all of your traffic, despite having an end point that's pointed in many instances, all that traffic is just going to one instance, right?

So the thing to do here is recycle connections, recycle connections every 10 seconds or so. And if possible turn off or reduce the time to live in any dns caches, and that way you'll get a more even distribution of traffic across the instances, whether you're using the reader or a custom end point, some of the downsides of custom end points, they're very simple to configure, but actually, it's not very flexible. It's not very powerful for creating application meaningful semantics around the different end points that you want to employ. All you've got are these include lists and exclude lists.

So you actually have to explicitly add specific instance i ds to one or other of those lists and it's still difficult, a little bit difficult to ensure that these end points properly adapt to changes in the cluster topology. So for an instances role changes, if a reader is promoted to primary, it will still remain a member of any of those end points that it previously belonged to, even if that wasn't your intention.

So to address some of those, those issues, we've created a very simple little package called dynamic custom end points. So it's using custom end points, but it's a cloud formation template that will install a lambda function and a scheduler in your account. And then it gives you the ability to create these very rich declarative specifications that allow you to describe your membership set and every minute or so this package, the lambda function will update the custom end points in your cluster.

So here's an example specification. Um this is saying that any instance that is in the role of reader and that he is tagged, either b or reporting is a member of this particular custom end point. Now, there are lots of different ways in which you can specify the membership. So you can use things such as availability zones, instant types, instant sizes, uh the names of particular instances. But i think the most powerful way of using it is using tags, ok? Because you can now control which instances are added to and removed based on adding and removing tags.

So it may be that you can hide an instance from a custom end point for a while and then tag it and the next minute it will be picked up and included in that custom end point. So tagging here gives you a very nice application meaningful way of defining the members or the membership set for a custom endpoint.

So given that that cluster topology, there, those three instances, two of which are tagged b i and one of which is tagged reporting will all be considered members of that group. And if i were to remove one of those tags, remove one of those b i tags within a minute, the membership set for that custom endpoint will have adjusted automatically so that it only includes two readers.

So this is your, your normal set up. You've got a client or an application sitting in the, the neptune vpc. When you're in the cloud formation template, you just get a very simple lambda function and a scheduler, you supply adjacent document to the configuration for that lambda function. That's your specification of all the different custom end points. The scheduler triggers the lambda function every minute and the lambda function goes to the management api gets the cost of topology and then evaluates it against those specifications and then it decides whether it needs to create or update existing custom end points.

So that's scaling what i want to talk about next is availability. And in particular, i want to talk about how you can reduce downtime and increase the availability of your application and your database during neptune engine updates.

So neptune frequently releases engine updates in the form of major version changes, minor version changes and patch releases the major and the minor versions are optional, which means that you can decide to stay on the version of the engine that you're currently on for as long as you want up until that version is deprecated.

So earlier this year, we deprecated the last of the 1.0 engines. And at the end of january next year, january 2024 we'll be deprecating the 1.10 engine, ok? So major or minor versions are optional, patch releases are mandatory.

So after a patch is made available, there's a 14 day grace period in which you can choose to apply that patch yourself at any time that's relevant to you or that suits you. But if you don't apply it within that 14 day grace period, then the patch will be automatically applied during the next maintenance window.

Now, up until version 130, which we released just a couple of weeks ago, the difference between minor versions and patches was a little bit vague and often new features that ideally should have gone into a minor version release were actually introduced by way of a patch. But with 130 onwards, we're being a lot stricter about this separation and only critical bug fixes and security and operating system patches are going to go into those uh security fixes will go into those, those patches.

Now, irrespective of the type of update, whether it's major minor or a patch, there is going to be some downtime. So they all require restarting instances. If it's a patch or a minor version update, the downtime is probably going to be in the order of several tens of seconds or perhaps a couple of minutes. But if it's a major version change, the downtime could be many tens of minutes or even more.

So some of those major version upgrades have involved changes to the underlying storage format. If you've got a very large storage volume, those changes can take many tens of minutes, perhaps an hour or more. And during that time, your cluster isn't available.

So this is obviously a bit of an issue and we've spoken to many customers who've asked are there ways in which we can mitigate or minimize this downtime, particularly for those major version updates

So that's why we released earlier this year, we released Neptune Blue Green Deployment which allows you to clone a blue production cluster, clone it to a green cluster where it's upgraded in place and then migrate your application to that green cluster once you're happy that it's all working well and minimize the downtime. So even for major version updates, Blue Green Deployment results in a downtime of probably just a couple of minutes several minutes or so, right.

So it's built on a couple of native Neptune features including fast database cloning and Neptune Streams and it's installed via a CloudFormation template. And once that template has been installed, it runs automatically and it will begin to clone the blue production cluster to the green. It will upgrade the green cluster and then it will replicate any changes that have taken place in the interim from the blue to the green. And all this time, your application is continuing to interact with that blue cluster. So it's only at the very end where there's a small amount of downtime when you flip from blue to green.

So there are a few things that you need to do and a few things that you need to think about before you can use Blue Green Deployment. First of all, you need to ensure that Neptune Streams is enabled on your blue production cluster. If it isn't, that's a cluster parameter change and you will need to restart the instances, but that's just a matter of a couple of seconds, you also need to ensure that you have a DynamoDB VPC End Point in that Neptune VPC. And the reason for that is the migration, the replication part of the the migration uses DynamoDB to checkpoint its progress.

But once the migration is complete, if you uninstall that CloudFormation stack, then a lot of the ancillary assets such as that DynamoDB table a little EC2 instance that's been managing the process, these are all deleted on your behalf. And there are a few things that you need to think about. You need to choose a period of relatively low right traffic. So you can continue your application working against the database whilst all this is taking place. But if you've got really, really high right throughput during that period, there's a chance that that green cluster will never catch up with the blue when we're trying to replicate those changes. So try and identify a period of relatively low traffic and then you're going to need to think about how you're going to manage the change over at the very end towards the end of the process, you're going to have to pause the rights on the blue cluster just to allow the last few transactions to replicate over. And then you're going to have to configure your application to switch from using the blue to the green cluster because this is a new cluster with new endpoints, new dns addresses. So somehow you will need to ensure that you can configure application to direct the traffic from one to the other at the end of the process.

So what does this look like in practice, I'll just quickly run through. So here we've got a blue cluster. It's on 1.10. That's that version that will be deprecated at the end of January next year. It's a primary instance with three re replicas. So the first things I do, I ensure that Neptune Streams has been enabled on that cluster and ensure I've got a DynamoDB VPC end point, then I can install the CloudFormation template. I get to specify the target cluster ID. So the name of the target cluster and this name will be incorporated into all of those endpoint addresses later on. So you get to specify the name of that that green cluster, which is the source cluster. And we're also specifying the target engine version.

And then once that's up and running the migration begins automatically, the first thing it does is clone the blue cluster. So it uses Neptune's fast database cloning feature to create a copy of the database. And that green cluster will contain a copy of the database, a copy of the data in the blue cluster at the point in time that that cloning was initiated. All right. So we've got a green cluster with most of the data already in it at this point in time. And the process at this point also makes a note of the very last transaction ID that had been applied to that green cluster. And it's going to use that a little later on in the process. When we begin the replication to catch up, it will be able to resume applying transactions from the transaction immediately after this ID.

So at this point, you can start monitoring the migration via CloudWatch logs. So here there's lots of details about the cloning that's taking place. Um other things that are being copied over, things like security groups, IM roles tags and all of the configuration. All right. So it's making a aaa complete copy of the source. Once that cloning is complete, the process then begins to upgrade the green cluster in place and it may go through several intermediate upgrades. And if there are some major version upgrades, you know, again, some of those may take many tens of minutes, but that's not really an issue for your application because your application is still pointed to the blue cluster. It's still working very happily against that blue cluster.

Once we've upgraded to the target version number, the process then adds in any necessary read replicas. So up until this point, it's just been a single instance. But our source here had three rev replicas. So we add those in and again, we're supplying the same configuration, security groups tags, things like that. And then the last part of the process is to begin catching up applying any additional transactions, any additional rights that have taken place on the blue cluster in the intervening period. And all of those will have been captured as change, data capture records in that Neptune stream.

So all the process needs to do takes that last transaction ID, that it made a note of a little earlier on. And it looks for the point in the stream immediately following that transaction and then begins to apply all of those changes. And even even at this point, you can still be writing to the blue cluster and those changes will make their way through by way of that Neptune stream. And at this point, you need to be watching the logs a lot more closely.

So every few seconds, there'll be a couple of records that are emitted into those logs. And the second one here is the most important one that I've highlighted in green, this stream event ID difference. So this reference represents the difference between the last transaction that's been applied on the green cluster and the transactions that are still pending, that's still sitting in the stream waiting to be applied. And ideally over time, this number is going to be coming down, it's going to be reducing if it's not or if it starts going up. That probably means you've got too much white traffic on the blue cluster, the green cluster is never gonna catch up.

So in those kind of situations, it may be that you have to somewhat prematurely pause or throttle rights on the blue cluster. So as to give the green a chance to catch up, but in many situations, you know, that number is just going to start coming down and you're really looking to anticipate a time when it's going to get very, very close to zero. And that's the point in time where you're going to choose to flip from the blue cluster to the green cluster.

Now, whilst all this is taking place, you can be qualifying and testing and reviewing the green cluster, don't perform any additional rights against it only do that against the blue cluster. But other than that, you can, you can test that your application is working fine with it. And then at the very end of the process, at a point in time where you're happy with this, you pause rights on your blue cluster. And the way that you do that is going to be very dependent upon your application. It may be that you can buffer rights in a stream or a queue or you can use back pressure or reconfigure your application to ensure that no rights make it all the way down to that, that that blue database, but you need to be able to pause those rights.

So the last few transactions finally drain down, they're finally applied to that green cluster. And at that point, you're good to switch all right. At that point, this difference will have come all the way down to zero. So now you can switch over. So you need to be able to configure your application to use the new end points from the green cluster. And at that point, both read and write traffic can be running against that green cluster and you successfully migrated at some point, then you can choose to delete that CloudFormation stack that's going to remove all of those ancillary assets. Things like the DynamoDB table, there's a small EC2 instance that's managing the process, all of these things will be removed. And then later on, once you're happy that there's no need to roll back, you can finally remove that blue cluster. So you've successfully migrated from 110 up to 1202.

And then the last thing that I want to talk about today is the way in which you can help graph practitioners derive better insights from your connected data. And by graph practitioners, I mean application developers, data engineers, data scientists and analysts. So I'm going to talk about some of the practitioner tooling that we've built over the last few years and how that's all come together. And then I've just got a little kind of future oriented bit talking about some of the opportunities that we're seeing emerge when you marry graphs with generative AI.

So over the last few years, the Neptune team have developed several pieces of open source software that they've contributed back to the graph community. A couple that I want to talk about are the Graph Notebook and the Graph Explorer. So these are both Apache two open sourced pieces of software and they both work with multiple different graph databases. So different Gremlin Server implementations, Bras Graph Amazon Neptune obviously Neo4j so multiple different graph databases and, and these tools work with all of them.

The Graph Notebook is a Python package that provides Jupiter notebooks with cell and line magics that make it really easy to interact with your database and to write queries and to visualize the results of those queries and to tune those queries without any other boilerplate code. So you get to focus just on writing the query and visualizing the results. You can review the explain plans if you're running against Neptune and you can use those to further tune the query. But there's no other application code that surrounds you. You're not worrying about any other boilerplate code.

The Graphic Explorer provides you with a browser based no code visualization environment. So it allows you to search and visualize both property graph data and RDF data. So the graph notebooks ideally suited for practitioners who like or need to write queries, but they don't want to write any other kind of boilerplate code. They just want to focus on writing those queries and potentially visualizing the results. The Graph Explorer is suited for users who don't want to write any queries at all. They just want to search for and filter and expand parts of their network via the visualization in the browser and then export or save the results of their work to a file.

So they're both open source pieces of software that work with multiple different graph databases. The n notebooks provide you with a fully managed ID for Neptune that combines query authoring application development and no code visualization all in one environment. So the Neptune notebooks include the Graph Notebook and the Graph Explorer, but they also include the AWS SDK for Python. So that's the Boto three libraries, the SDKs, the APIs that you can use to interact with any other AWS service. And they also include the AWS SDK for Pandas, which is another piece of open source software that allows you to write complex ETL and machine learning tasks and run them against different AWS data and analytic services.

So you've got a whole package of of libraries here whenever you provision a notebook environment, so you can provision a notebook via a CloudFormation template or via the console. The notebooks themselves are actually hosted by Amazon SageMaker. So you're effectively getting a fully managed hosted Jupiter or Jupiter Lab environment. But besides that, you can also take advantage of all those other SageMaker features. So lots of other machine learning features that you can utilize by way of the Neptune notebooks.

When you create the notebook from the console, it's automatically configured for your cluster and that includes things like IM database authentication. So it's going to be very easy to connect to your cluster and start to interact with it. So using an ETSY notebook, for example, I can open the explorer for this quick and easy visualization of my data. I can begin to explore that data, save the results to file once I'm happy with having discovered the, the stuff that's of interest to me or I can author queries using the notebook magics. The Jupiter magics just focus on writing those queries, tuning those queries and visualizing the results.

So here I've written a couple of queries, one in Gremlin and one in OpenCypher, but against the same underlying data. And if you've got an RDF data set, you can do exactly the same with SPARQL. All right

So really at this point, I'm focused on query authoring. And then the last part of the puzzle was added in just the last few weeks. So this is the ability to be able to write application code that queries or interacts with Neptune.

So that AWS SDK for Python Boto3 now includes the Neptune Data API, a new data SDK that we released just a few weeks ago that includes over 40 different data operations. So these include things such as loading data, bulk loading data, running different Neptune machine learning jobs or tasks. You can use it to get the engine details and the graph statistics, the summary statistics.

But most importantly, you can use the Data API now for querying Neptune. So today it supports openCypher and Gremlin, but we've got SPARQL support coming very, very soon. And the nice thing about this is, you know, it's just a standard AWS SDK. It really simplifies application development. It deals with all of the connection management. It deals with signing the requests, it deals with interacting with an IAM enabled database without you really having to do any special configuration.

So if you were using some of the open source drivers, open source libraries for interacting with Neptune, you may know that there was quite a lot of configuration, tuning, connection management and so on that you had to do there. All of that is taken care of in the Neptune Data API SDKs.

So how would you use these things together? What might be the typical workflow? Well, if you imagine your task with writing an AWS Lambda function that queries Neptune, what I might do is begin to focus first of all on authoring a query using those cell magics. So I just get to focus on the query. I can review the explain plan. I can visualize the results. I can iterate on that query until I'm absolutely happy that I've got the right query for the task at hand.

At that point within that notebook environment, I can then port it. I can develop some application code using the Data API, embed that query in the application code. But now I can introduce all that additional application logic. Perhaps there's some preprocessing I want to do or some work that I want to do once I've got the results back from Neptune. So I can flesh out all of that application logic. And then once I'm happy with that, it's a very easy matter to pour that over to an AWS Lambda function handler so that some of the productivity tools.

The very last topic for today is graphs and generative AI. So over the last few months, we've identified several different ways in which you can use generative AI to benefit your application development and derive better insights from your data. And there are probably three different high level use cases.

The first is to use generative a large language model to write or author a graph query. So you give it a natural language question and it gives you back a query in Gremlin or openCypher or SPARQL you get to to specify. So this can be useful during the application development process. I'm an application developer, I just want you know some starting point for authoring a query. I could perhaps use an LLM to help me begin authoring that query. But I could also choose to incorporate some of this if I'm developing a chatbot or an assistant.

The second use case is using an LLM again to design a graph application data model. So again, I provide a natural language description of my domain and the kinds of things that I want to ask of that domain. And I can guide the LLM to describe for me a good working graph data model and I can even ask it to provide me or to create for me some sample data so that I can begin to test it again perhaps in one of those notebook environments.

And in fact, my colleague Michael Hay has recently released a blog post. It's on the the AWS Database blog about using generative AI to create application data models to create graph data models.

And then the third use case is retrieval augmented generation. So this is the ability to be able to enrich the results or the power of a large language model by invoking an external source of data. So it may be that a large language model doesn't know how to describe all of the routes that connect Austin with London, all of the air routes that connect Austin with London. But if I can give it access or guide it by giving access to a structured representation of all those air routes, a knowledge graph, for example, then the large language model can help answer some detailed and interesting questions around that specific use case, for example.

So how might all of this work? So I've got a very simple example of how this might work we're going to need. Well, we've got our source of data, our database, our graph that's Amazon Neptune, but we're going to need a large language model.

So in this use case, we're going to be using Amazon Bedrock, which is a managed service which gives you access to foundation models from companies such as Anthropic and Cohere. The model we're going to be using here is Anthropic Claude v2 model which is an AI assistant. That's really good at text summarization, question and answer, generating content, retrieval augmented generation and even some programming tasks. So it's it's a good fit for some of the things that we want to do here.

So we've got Bedrock and Neptune. The last part of the architecture here is LangChain. So this is an open source framework, open source piece of software that makes it really easy to build generative AI applications and effectively what LangChain is doing is brokering interactions between the large language model in Bedrock in this instance and the external source of data Neptune.

So how would this work? Well, our user supplies a natural language question. You know how many, how many external outgoing routes are there from the city of Austin, from the airport in the city of Austin? So they submit a question, LangChain takes that question it queries Neptune to get a representation of the graph schema. So it just wants some simple representation of these are the kinds of nodes and edges and the properties that are contained within the graph.

And it's going to take that question and the schema and it's going to generate some additional prompts. It's going to do a little bit of prompt engineering and it's going to submit all of that to the large language model. All right. So it's effectively going to ask the large language model. Can you give me a query in this case in openCypher that satisfies this question? And here are some details of the graph that I want you to query against.

The large language model Claude in this instance, is going to generate an openCypher query and return it to LangChain. LangChain then runs that query against Neptune. So you can see LangChain's brokering all these interactions between the large language model and Neptune. It runs that query against Neptune gets the results.

And then again, it submits those results to Claude to the, the LLM. And again, it's doing a little bit of prompt engineering, but the LLM is going to give us a natural language representation of those results and we're going to return that response to the user.

So if you're using one of the latest Neptune notebooks, there's not a lot that you actually need to do to set this up. Most of the prerequisites are already in place. All you need to do is install LangChain. So it's just a matter of pip install langchain and then you need to update the notebooks IAM role so that it can invoke that Claude v2 model in Bedrock. So that's, that's all the prerequisites that we need to set up.

And then the code itself is relatively simple. So here we're using LangChain to create what we're calling NeptuneGraph. So this is just something that's going to is a connector really to Neptune. So we're supplying the host and the port and then we're creating a Bedrock client and we're taking that NeptuneGraph and that Bedrock client and we're wiring them up together using this Neptune openCypher QA chain. And this gives us back a chain object and this chain object is what we're then going to use to submit a question, get the results.

So this is this is the way in which we're going to interact with all of those different moving parts of the architecture. So you can see using that chain object, I can supply a question. How many outgoing routes does the Austin airport have? And then you can see in the notebook, the cipher the openCypher query that is being generated by Claude, you can see the results of running that against Neptune.

So we get back some details from Neptune. Those results are being handed back to Claude to give us a a natural language response. And then finally, that response is based on the information provided the Austin airport has 93 outgoing routes. Ok. So it works really well for nice simple questions.

The more complex questions, things begin to break down. And this is this is common across all of the models that you may be working with. So the work today is probably indicative of future possibilities. If you ask more complex questions, you may get a valid query, you may get a valid openCypher query, but it may generate nonsensical results, it may run, but the results are not appropriate to answering the question.

And in some cases, we've seen examples where Claude tries to generate or the large language model tries to generate a query, but it injects functions or keywords that just don't exist, although they aren't part of openCypher. Ok. But over time, we expect these things to improve enormously. So this is really just indicative of future possibilities.

There are ways of improving the kinds of responses that you get. You can, whilst LangChain is doing some of that prompt engineering, you can actually supply additional prompts that will further help guide the model to produce a meaningful query or a meaningful response. And this document from Anthropic uh is very good introduction to how you can do your own prompt engineering to to help improve the kind of results that you're getting from, from working with a large language model.

So if you're interested in this, I mean, that's just a very brief introduction to some of the work that we've been doing. We've got a workshop this afternoon in the MGM Grand getting started with Neptune LLMs and LangChain. I think it is fully booked up, but you can always queue and see whether there's a no show or something like that.

And then throughout the rest of the week, we've got lots and lots of other Neptune sessions. So there are sessions where you can get hands on or, or see other people getting really hands on with code running against Neptune. There are sessions where we're going to talk about some of the more recent features that are going into Neptune.

And then if you're interested in meeting some of the developers who are working on some of the open source tooling, I think they're going to be in the Expo Hall on Thursday giving some demos of some of that stuff and that's a great chance to, to talk to them and, and, and ask them about some of that material.

And then finally, there are lots of resources, new resources that we've, we've put out this year that can help as you're beginning to architect your applications and as you're beginning to scale them out. So we've produced some Well Architected guidance for Neptune that was published recently in the documentation.

Uh we have a very deep dive data modeling course that's freely available through Skills Builder. I think that's a property graph data modeling course, but uh there's a lot of depth to that. And then we're frequently publishing lots of blogs on the AWS Database blog. And that top one there is that one from Michael Hay around using generative AI to build a data model for Amazon Neptune.

So that's it. Hopefully, that was very useful. Hopefully there's something you can take away there in terms of scale availability or the way in which you might use those, those those productivity tools. Thank you very much for coming along today. Um and I hope you have a really good rest of the week. Thank you.

你可能感兴趣的:(aws,亚马逊云科技,科技,人工智能,re:Invent,2023,生成式AI,云服务)