Explore text-generation FMs for top use cases with Amazon Bedrock

Good afternoon, everybody. Thank you for coming to our session on exploring text generation foundation models on Bedrock. I'm Muhan Anwar, one of the founding members of the Amazon Bedrock service. I'm here with Dennis Bado who is a senior leader in our worldwide sales organization, and Daniel Charles who is SVP at Alida leading their generative AI initiatives.

We have a packed agenda for you in the next 55 minutes or so. We will be walking you through how you can put foundation models in Bedrock to use. We will be showing you how easy it is for you to incorporate these models in your applications through the Bedrock API. We'll give you some best practices to keep in mind to get the most out of these models. And then we'll hear from Dan about Alida's exciting journey with generative AI.

The very exciting thing about foundation models is that they're able to process a diverse set of data. Text models, for example, can process long form documents, messages, even logs. This gives us a unique opportunity to take advantage of a lot of the data that is already available in our organizations and unleash trillions of dollars of impact over the next few years. This is a big number, but there's reason to be optimistic about this impact today. Developers are using generative AI to up level their customer experiences, improve productivity of employees in their organizations and remove waste from business processes.

For example, we are seeing organizations ranging from retailers to SaaS providers that are using generative AI to build autonomous AI agents to solve customer issues without requiring human intervention. We are seeing startups create tools that can empower individuals to create content that they could not before on their own. And we are seeing enterprises rethink workflows that involved processing a lot of data.

Let's walk through some of these use cases with crisp examples. I mentioned customer support which is a very popular use case. This is because foundation models are ideally suited to answer customer questions, ask follow up questions and go retrieve the information necessary to solve an issue without requiring human intervention. It is also a very fast way to have a good impact on the business and keep your customers satisfied.

In this example, I'm using a text generation model to hold a conversation with a customer that is having problems with the TV that they bought from a seller. The AI chatbot powered by a foundation model can go look up the specific information about the TV from just mere order information and give the specific guidance needed to resolve this issue only in a matter of 10 dialogues, saving both the seller and the customer valuable time.

Now you can imagine as a seller that you don't just want to serve one customer, but uplevel your customer support function for all your customers. In this example, I'm passing transcripts from multiple customers, these are fictitious examples by the way, from multiple customers to a foundation model and asking it to generate a table of all the products and the issues that we have encountered, and if we are able to resolve them or not. And I use the LAMA 13B model to generate such a table. The table has order information, product information, it tells you the specific issue that occurred with the product and the resolution.

Now you can imagine as a seller having a table with tens of thousands of products across many customers can give you great insights on how you can improve your customer service function, but also how you can refine your catalog and what feedback you can give to your suppliers.

Let's stick with transcripts because they are becoming more and more pervasive as a form of data. A lot of us spend time collaborating in meetings and it is becoming easier than ever to generate transcripts from meeting recordings. In this example, I use the AI21 Jurassic-ultra model on Bedrock to create a crisp list of action items that has owners and due dates. Now you can imagine as a developer that you can take this one level further, you could create an application that not only extracts this information but goes and schedules to do's on the participants calendars and then follows up with them. Such an application could save everybody time and money and probably everybody time and lead to better results.

Here's an example of generating a blog post. As a product manager, I have to generate a lot of blog posts every now and then. This one, I'm trying to generate a blog post on fashion trends in the United States, not a topic that I'm particularly familiar with, but I asked Claude 2.0 to do it. And it's able to give me a very crisp outline, bringing up specific subtopics that I can go and research further, saving me a lot of time and giving me a strong jump off point.

Now, foundation models are also particularly good at extracting specific information from complex documents such as legal contracts. Here is an example where I'm using Coherent Command model to extract the name of parties and the start date of a contract. Now, you can imagine a busy legal professional could use this information across their contracts to help prioritize which ones they should review first because they are coming due sooner.

Now, let's look at how you could use a similar capability to uplevel a workflow that is particularly cumbersome. In this example, I have a fictitious events company that has a very complex payment schedule that their clients have to adhere to. Now, you can imagine that as an events company, you may have hundreds of such agreements live at the same time. It could be particularly challenging for you to track whether customers are adhering to your service agreement or not. But this is simplified by use of text generation models.

In this example, I'm asking a CLAUD model on whether a customer is adhering to the contract or not simply by passing in their transaction history. And the CLAUD instant model is able to tell me that they're not. And it further, it can also tell me how they arrived at that conclusion, breaking down their entire analysis, which can then help me follow up on this case.

It's easy to imagine how as a developer, you could build an application that is analyzing hundreds of contracts and just generating a list of customers you need to follow up with. You could also make this application follow up with the customer themselves and maybe even give them an alternative payment schedule to help them out. So the sky is really the limit with these use cases and text generation models. And we are particularly excited to offer a full range of them on top of Bedrock.

We offer text generation models from leading AI startups such as Anthropic, Cohere, AI21 Labs and we also offer our own Titan models for customers who prefer a first party solution. And you can expect us to continue adding more to this selection. We initially announced the product with just three providers. We then went and added Cohere and also Meta. Most recently, the LAMA two models that you can now use on Bedrock.

Now, there are a bunch of factors that might be helpful to keep in mind as developers as you're trying to navigate this choice. I'm going to go through a few of them that come up very often in our conversations.

First, developers are very keen to learn about what foundation models they can use for which use case. The good thing about foundation models is that they generalize, they generalize to a broad range of tasks. So a foundation model is useful across many, many different use cases. What is particularly helpful in the early stages of building an application is identifying broadly what those use cases might be in your modality, whether it's text generation or image generation, and then listing out all the tasks that you expect the application to perform.

You can then look up evaluation studies and benchmarks that are available online and run your own experiments to see which of these models suits your application the best or which set of models you can use together for your application. One important point to bear in mind is model providers may also prohibit certain use cases out of safety or policy. So it's helpful to read the license agreements and when available the acceptable use policy of a provider to understand what kind of applications you can use the model in.

Foundation models are created in stages. First, they are pre-trained on raw datasets. This could be the data that has been collected from the internet. Then they are refined or fine-tuned using labeled datasets which help the models align better with human preferences. In Bedrock, the models we offer for on-demand inference are the fine-tuned type that are useful out of the box. But it's helpful to go when you hear terms like pre-training, fine-tuning, tuned dialogue, tuned to look up the model cards and learn a bit about how they have been trained so you know what to expect from the model.

Model size in terms of the number of parameters has been part of the community conversation for a couple of years. Foundation models emerged when the model size grew to a few billion parameters which give them the ability to then reason on top of that and perform tasks that they may not have been explicitly trained to do. We call this the emergent behavior inside of foundation models.

However, recently providers are moving away from talking about model sizes because there are many other ways to get the desired result from a model other than just increasing its size. When the model size is mentioned, such as in the case of LAMA, where you have a 13B model and a 70B model, if you're starting with a complex task, it might make sense to try the bigger model as well to see if it gives you better results.

Developers are very excited to build products for a diverse global customer base. Foundation models support many different languages out of the box since they have been trained on data in different languages. Providers may or may not officially mention what languages they support, but if you experiment, you'll find many models perform really well in multiple different languages.

Here is an example of a simple prompt that I translated into 20 different languages using one of the models on Bedrock. You can find language specific benchmarks in the model cards. And finally, these models also support programming languages so you can perform tasks such as debugging or generating code or asking it to fix issues for you.

Another important factor to consider is context size. Context size refers to the total amount of input tokens and output tokens you can process in a single request on Bedrock. We have models ranging from 4k context size to 100k. Larger context size models can help you tackle use cases such as long form document processing. They also give you an opportunity to engineer your prompts so you can specify rules and preferences using that context size to get a more specific result.

One thing to bear in mind when thinking about context sizes is different models may have different input and output context sizes. So you may have a large context model, but it may allow the larger context for the inputs than the outputs. So it's an important detail to bear in mind when you are experimenting with different use cases.

Many Bedrock models are fine-tunable, including the Titan models. Fine-tuning basically involves improving the model by adjusting its weights using your own labeled data. It's particularly helpful when your use cases are domain specific and you want the model to have that context somewhat intrinsically.

We've seen customers achieve great improvement over the performance of the base model by fine-tuning their models with only about 1000 records of labeled data. We made it easy for you to do this in Bedrock. All you need to do is point to a dataset and Bedrock runs a fine-tuning job for you and stages the model so you can with a simple API make a call and use it in your application.

Now, many times customers ask, when is it the right time to fine tune a model? The answer is experimentation. You can get a lot by engineering your prompts.

so reading the best practices about prompt engineering for different models can get you pretty far. the examples i went through earlier in this presentation were all done using base models and you can get quite a bit of functionality just by writing the right prompt.

so uh consider making that trade off early, coming back to choice and bedrock. our goal really is to give you a variety of models of different context sizes, different feature functionalities uh at different price uh at different price points. um the third party models, the price is set by the sellers of the first party models. we are the ones that uh set the price. but the overarching goal is that you can find the right fit for your use case and for your budget, there's no one size fits all solution here.

a lot of the customers they would, they can start with uh for certain use case, it might make sense to go with a small context size model and see if that's sufficient. uh it seems sufficient for basic q and a and customer support for more complex tasks involving reasoning over documents such as articles, research papers contracts, a larger context size model with more reasoning abilities is probably a better choice.

so with that, i'm gonna hand it over to uh my friend dennis here who show you how to use these models in bedrock. thank you.

thank you oman. so we're gonna look at yet another interesting example. uh there's so many fascinating examples or fascinating use cases that generative a i unlocks. and in this case, what we're going to look at is taking transcripts, the entire dialogue recorded from a movie. so a movie transcript, a movie dialogue. and we're going to try to create an advertising summary, a synopsis of sorts. and we're also going to try to personalize it to a particular persona.

this is based on the cornell uh movie uh dialogues corpus uh that actually has a bunch of uh dialogues there, including uh movie titles and so on. and the code examples i'm gonna show are uh based or include um samples from the amazon bedrock workshop, which is a great way to actually start using bedrock. so i recommend uh looking at many different examples for, you know, summarization, text generation classification and so on in that bedrock workshop,

here's a transcript. in fact, we're actually gonna use one for uh this movie 10 things i hate about you um as just one of the um possibilities. so imagine that i've actually downloaded um that entire corpus and i'm using a pandas data frame here. so df is the one that actually contains the entire uh movie uh data set. and so you could see we could look at the contents of that data frame. so we have just the movie titles, the entire large dialogue in there and even a bunch of genres. uh we select just the primary genre out of this data set. and we're gonna pick that 10 things i hate about. you just print the title and actually the first start of the dialogue uh from that movie.

great. so next, we're going to create a bedrock client here. i'm going to use the ils uh module from that bedrock workshop that i mentioned. and you see, we import a bunch of things, we import bo three of course, and i get the bedrock client and then i create a helper function because i'm going to invoke that client multiple times. and in that function, i will pass the bedrock run time, the prompt, a bunch of parameters that control the randomness of bedrock, right temperature top k top p and then the number of tokens i to generate these are short summaries that i'm trying to generate. so we'll limit it to 200 stop sequences will come back to that later. but you could see that in this function, i actually use the anthropic clawed v two model. so i've just run that cell and you see the bedrock client uh is generated for me. and that's all i need to actually start exercising or using bedrock for different use cases.

and first, i'm gonna have a rather naive um attempt at a prompt. and in fact, i will use a prompt template class from the lang chain library just because it's convenient in python to substitute in uh portions of that prompt. uh in this case, transcript is a potentially large piece of text. so i'm just gonna say write a description of the movie based on the following transcript. i have that transcript variable and then i run the client and uh wait a minute bero shows a validation exception. that's because for claud v two, you actually need to use, that's the best practice to use the human and assistant uh tags within your prompt to get better results. that's part of the clawed. best practice. and so bedrock is helping us get there right away.

ok. so my naive attempt didn't quite work out. so i will try to do better in the next uh version of the prompt. now, i will actually include uh the transcript within the sort of transcript tags. uh just to make it clear in my prompt that this is where the transcript is. i'm also asking it to avoid using the spoilers. and you could see that i'm also saying interesting to a young female, right? so that's already trying to target to a particular demographic in this demo. and uh when you look at it actually using the language, you could kind of see that it is probably going to appeal to this demographic, right? so hottest guy in school rule obsessed father da da da and uh you know, i, i guess, yeah, you could say it's sort of matching, but i'm going to then try to switch to a different version of the prompt where i parameterize the persona. so make it easy to substitute whatever persona i want into my prompt. so add a variable in there and now, you know, use young man in his twenties again, uh 10 things i hate about you. some people would say it's a chick flick. uh but let's see what this uh prompt will generate for uh this new demographic.

so this is sort of showing you the real time of how it take, how long it takes to run. um this prompt, it is a long transcript, right? that bedrock uh needs to read through. and now we're seeing that different terms are starting to be used, see attending prompt with popular jock rebellious punk rocker, right? bad boy and i guess and also throwing in some other things here. so um interesting, right? that with the same exact transcript, we can now customize it to a different persona, but there's one issue here that you see that i care about this advertising, i don't really care about. this initial bit here is the 127 word prompt, right? i just want to grab my summary and uh put it somewhere. uh and so we're gonna try to in this new um prompt, put the words into claude's mouth, so to speak. and in here if you look at the prompt, uh we're now saying so format and advertising using markdown and output it in between the ad tags. and of course, i i'm also here, i'm putting the words in claude's mouth because i'm starting the assistant's response with here's the advertising opening tag and then i also need to set the stop sequences to close the tag uh and um just generate whatever is in between. and of course, now i'm changing the persona for a middle aged woman who's a fan of heath ledger's work, right? and let's see what this generates. and actually it's a bit on the longer side, maybe so maybe next try, we wanna make it a little bit more concise or provide prompting. but you could see that something interesting here that this model recognized the movie, right? we didn't actually say which movie that was it recognized that this is 10 things about you. it also knows, you know which character is being played by heath ledger in this movie. so i think there's pluses and minuses in this, right? on the one hand, it's great that that knowledge was used, but there's also possibility for hallucinations in this case, right? if we used not heath ledger, but some other actor, possibly it would try to see if that actor actually played some character in the transcript may not recognize the movie and so on. so i think there's potentially more work that we probably want to do here, but you also see the progress and how you gradually build on top of these different prompts using bedrock.

cool. so in summary, for this different models require different prompts. it's sort of clear one size doesn't fit all. we really need to know which model we're working with. we need to look at the prompting guide and there is something to be said about this whole prompt engineering. uh you know, job role as much as uh sometimes we like to joke about it because there are some specifics that you need to be aware, you need to understand how these models were trained, what sort of instructions were provided uh during uh the pre training of the model. and uh yeah, so uh i think that's, that's key.

so let's actually take a look at a few other examples of where this matters. we're going to look at question and answering task, right? and again, this is showing exactly what you've seen earlier. if you don't say human and assistant, you're gonna get a validation exception. so your naive prompt here will probably not work. and then now we're putting the right structure, we're using the right template and we're getting the results that we wanted. so that's clawed, we've seen that already. but uh if we switch to say lama two, a different set of prompting emerges, right?

so, and it's interesting that if you don't know any better, you're a naive attempt at a prompt may actually lead to some strange results that include potentially hallucinations and so on. but reading the guide, it is recommended that you incorporate um your sort of utterance inside the inst tag. um and uh this is especially important if you have a chat conversation with multiple terms um that are going on. and we can also use the c i uh tag portion to sort of indicate and guide the model as to sort of what tone it needs to use and some additional characteristics and constraints that need to be put in place. and now we're getting the good response to this and also actually quite an interesting contextual information that you know, your question may be simple enough. but in fact, um it's nuanced right there, there's no single distance to the moon, um the the difference um matters and maybe that's something you care about.

cool. so same thing with titan. um again, um you might be surprised, right? uh it seems like a simple question and you can get like a very concise answer. i don't know, even though actually the model does know what the right answer is and the recommendation is to use something like user uh assistant and you'll get the accurate answer that you were looking for.

we can also look at classification like summarization uh is something that we've seen in the demo, right? kind of summarization or text generation of a summary. but if you look at classification, we really want to take this long passage and you can sort of read it. welcome to the a a assistant

Person 1: How can I help you? You know, hi. I need help with my purchase.

Person 2: Sure. Could you please tell me, um, your order ID and so on and so forth?

Person 1: So there's like a, a conversation and entire transcript to here and we just want to label it. Is it about information gathering? Is it feedback, support other, give me a single tag, like what's the classification and a naive kind of approach?

And, and you can assume that there's also hum in the assistant somewhere. It's just too long to kind of show it. Um you're gonna get classification for individual phrases or individual utterances, right? So this first thing was about information, maybe something else was about feedback and so on, it's not really what we want, we want just a single overall class.

And so with the right kind of prompting here by saying, uh you know, here's the question here are the categories and you classify uh the according to one category and we get support as the right answer.

And so we could look at other models with lama, the equivalent version, uh lama two, the equivalent version would be uh shown here. Again, uh category support as opposed to sort of turn by turn uh classification. And then the same thing with uh amazon titan, so clean and concise sort of answer on support.

So again, you know, obvious takeaways in some ways you need to design prompt templates. So definitely work with templates, prompt templates um from lang chain. For example, that is integrating with um a bedrock is one simple way uh to start, you need to test them across many different tasks and really, you know, for your production use case you need to pick the best performing ones and monitor rinse and repeat and so on.

So, sort of best practices and, and monitoring are the key kind of uh concepts here. All right. Well, with that, let me pass to dan, who's gonna talk about exciting generative a i uh applications that they've done at a lita?

Thank you, dennis. Hi, I'm daniel charles. I'm the senior vice president of product and go to market at alida. And I'm going to tell you today about how we revolutionized our sort of insight using amazon bedrock and ada's human centered generative a i.

So first, I'm going to tell you a little bit about who alita is. Then I'm going to talk to you about some of the challenges with text analytics, pre generative a i. Then we're going to go into what we did with generative a i in amazon bedrock and take you under the hood of some of our generative a i solution. And finally, I'll tell you about why we're really excited about the future of generative a i at alita.

So let me jump right in alita is a customer or a community centric customer research platform that lets companies like hbo max hulu x, lululemon volvo, jam city amongst others to collect both qualitative and quantitative feedback about their customers. In order to build better product and user experiences.

We have more than 176 verified and engaged respondents that have used our platform overall and the platform has grown by millions and millions of people. Every month. We've been recognized by gartner as a leader in in the voc space. And really our customers collect a lot of qualitative and quantitative feedback in the form of surveys and other types of activities. And so we have to take in a whole bunch of different types of data into our system to deliver a really great text, analytic solution.

So let's start by taking a look over here at what we can gather from the following text and you might recognize this coffee shop. We took these reviews off of the internet and i just want to sort of take a look at this.

Well as the star of starbucks. Starbucks is my favorite as a hip hop artist. It gets me through tough times. The only issue i have is that the app freezes sometimes.

Well, it's quite clear here that this is a very big fan of starbucks. He gave it a five star review. Starbucks is his favorite coffee shop. He's a hip hop artist. He's got a couple of minor issues with the app. He's a high spender and he's had some issues with the christmas promotion and some of the rewards. He wished he could get more stuff. Doesn't everybody wish that he could get more stuff from a review from a reward program? And lastly, he was unhappy with that sort of christmas promotion because he never wins anything. And he spent $10,000 there, imagine spending $10,000 at starbucks and never winning a thing. And lastly, there's no punctuation and there's lots of slang.

And so imagine running this through a text analytics engine 24 months ago before generative a i existed or before we really all knew about generative a i, what might the results look like?

So we looked at sort of ibm watson and comprehend and without training those models to get a really good result, what you find is the overall sentiment is negative. Now, clearly, if he is a very big fan of starbucks, then the sentiment can't be negative, but there is more negative comments than positive. So that's how the model may be worked.

And it extracted a whole bunch of keywords, hip hop artists, starbucks day half a tough times. These are not particularly actionable keywords, there's not topics, it doesn't have any idea about the context. And so you get this sort of keyword analysis that looks something like this with a whole bunch of gibberish and you don't have the ability to differentiate the context.

And so our generative a i models, they understand context, they understand that this is an a view we have these high maintenance taxonomies. And so maybe we want to understand how to label that data and we want to label it like customer service or technical issues or maybe product quality.

And in order for us to do that in that sort of preer a i world, what we would do is we would train the model on several examples for the customer, maybe a csa thing, but we weren't able to actually extract value from that without doing this on a one off basis for every single customer. And we like to do that.

And lastly, you had to translate everything into english. Hey, jude, don't be so thoughtless. Make a sad song and make it better. Don't forget to carry them in your heart and then you can start doing better. You lose a lot of the meaning of the song when you keep translating it back and forth between english and others. And so you don't get the actual context, new models understand language across multiple languages. And so we don't have those same language barriers.

So what did we build? Well, we built a new text analytics engine on bedrock and we did that using the anthropic model. And what you can see immediately from this is app feedback has a broad sort of distribution, some negative, some neutral, some positive. And that's what you would expect. We didn't give it any context into what kind of data it was analyzing mobile app experience positive and negative. These are, these are very good in terms of sentiment.

If you drill into the login issues, you see that about seven or 8% of the total content here is login issues. And what if you sort of then go drill into that data? What we found was the login issues, which is a very actionable thing as a product or user experience person. How can i go fix those login issues? What kind of problems are my customers having with the login of the app?

Well, it turns out that people forget their passwords a lot. It turns out that they can't log in because they want sort of an anonymous version of this. And so you can then drill into that, fix those problems and deliver value back to the business immediately. And this was not true when we had those keywords that were seemingly random. These now are actionable things that we can go and task the engine with and fix those problems for the customers that we're serving or help them fix their problems.

So, amazon bedrock in our dashboard, we've been thrilled with the results so far and let's talk a little bit about how we did this under the hood.

Well, we're going to go back to the star of starbucks and we'll say, why did we ask amazon bedrock? Well, what we asked, amazon bedrock was using the clad i model, give me the sentiment of each aspect of my app, the person is talking about with a score of negative one through one in a format like this and that format is adjacent format that, that was able to parse and utilize in our system. And then we inserted that statement.

And so what does the result look like? Well, you might guess that amazon bedrock right out of the gate was able to deliver this as a positive sentiment and of course, it's a positive sentiment. He had, he spent $10,000 there. Starbucks is his favorite. We didn't give it any other context. Starbucks coffee is a positive sentiment on this, the mobile app functionality. He's had some challenges with mobile app functionality. So this is an action item that we can go and take the rewards program is something else that we, hey, he's slightly upset. He's not very, very upset. It's close to neutral and the christmas promotion we all want to win at the christmas promotion. And so this is what we got as a result.

Um so, so what are some of the challenges that we faced? Well, to begin with? Um how do we get to narrow down the list of topics when you're dealing with a model like this? And so through the help and if any of you are, are sort of private equity funded companies through the help of amazon, we used the prompt 100 program and they helped us figure out how to sort of make this a more granular list of topics and, and to get more coarse grained, we used embeddings and clustering in order to narrow the number of topics down to get to a set of topics that makes sense.

Next, we, we, we looked at sort of consistency in the api responses. And so what do you mean you mean by that? Well, you're, you're dealing with a probabilistic model. I don't like the coffee or i had a terrible experience today. Well, those things, you may get bad coffee as a, as a topic, you may get sometimes customer service. How do you make sure that you get the same results every time you build this? And so we had to, to, to build a repository of topics and map those topics back to topics that had already been said so that every time we, we call the api in the same text, we get the same results. And so this is a challenge that you might face if you're trying to repeat results.

The last thing that we, that we sort of experienced was api rate limits. And for the amazon will really helped us in this regard by increasing our, our sort of bedrock rate limits

"And so this was not a challenge from Amazon. But the other thing that, that, that you want to do when you're calling the API is it's pretty expensive to call an API, especially at scale. We're dealing with tons and tons of customer data, all this qualitative data that we're sending. And so we found that batching our requests was a way to get around this. And so a prompt doesn't need to be a 1 to 1 ratio. And so we could send in multiple reviews at the same time and and get those back.

And so if it's reviews or any other type of qualitative data that we're collecting, we're batching that data, sending it back. And these are actionable insights that people are going to make product details later and those product details or, or user experience changes that they're going to use to action, this stuff isn't real time. And so we didn't need to get this in real time. And so batching was a really, really great solution for us.

So overall, I'm thrilled with this sort of Amazon bedrock and using Claude I, we've been able to improve our contextual understanding without having to train our models over and over and over again. You can see general feedback, app experience, app issues, app usability. These are all things that, that that sort of the model pulled out from this example that i that we shared with you today.

So we, we've got this sort of low maintenance taxonomies. We also don't have to go and train our model every single time we have a new customer. And so thinking about this, we have customers across financial services, media, technology, health care, they have different needs and the types of responses they get a general purpose model for us, saves us time and allows us to deploy a lot faster and get great results for all those customers across different industries where the qualitative data that they're collecting is different.

And lastly remember the beatles song that i translated back and forth. Well, we can overcome language barriers. So um Claude and Anthropic really does a great job on western languages from french to german to uh spanish. And our customers operate in all of these different places. And so their prompt, when we prompt data, we can, the model doesn't go and translate that data into english or french or spanish. We can put the data in, in the native language and like a native speaker that language, it can extract those topics but extract those topics back into our base language which might be english or it could be french or it could be spanish for any one of our customers, but they can use both the base language and they don't have to have that understanding of another language. And this works great for some of the multinational brands that we work with every day.

So what are we most excited about in terms of what we're going to do next? Well, we're going to allow people a little bit more control in our text analytics engine of the tags and topics which we generate. And so we're going to train the model with their topics so that we can start to cluster the data so that we can start to build sort of those embeddings and allow them to sort of customize what topics it comes up with.

The next thing we're going to do is we send out all these activities, surveys and other types of activities to collect data from these people. Well, in the past when we had a multinational brand that was using our survey platform, they would have to go to a third party to do translations and to translate surveys into multiple different languages. Well, now they can do that with a click of a button through our platform. And so translations is going to be a really, really good thing in terms of creating content which we can ship to multiple markets and get feedback across a large cross section of different respondents.

The next thing is summarization and summarization is super, super interesting. We're going to build more sort of we've done a great job i think with, with our sort of qualitative, but people also want to start to summarize quantitative data. And we saw some of that a little bit earlier with dennis's presentation in and, and usman's insofar as the extraction of data. And so we want to do some better summarization of that data with our platform.

The last thing is really, really interesting, imagine that you get a survey after this show and, and, and a i 333. And that survey says, how did you like aim 333? And you said it was great. Well, there's not too much analysis i can do if you just responded, it was great. But imagine if the, the, the engine that could prompt, you said, why did you think it was great? And so we've experimented with this, we haven't quite figured out the user interface rents. But why do you think it was great? I think it was great because it covered a lot of really cool use cases. And then which use case did you like better? And you might say, well, we thought alida use case was incredible. And so open-ended prompting is something that we're pretty excited about and and think that it could be transformative in the way that you answer surveys.

So with that, I'm going to pass it back over to dennis and he's going to wrap us up. Thank you so much. Then i think this is a very cool and also very practical application. So exciting to see what these generative systems allow us to do.

So. In the past two years, I've personally have been focusing on responsible a i. Even before ja i came onto the scene with the full force, it was already obvious of how important it is to make sure that we build all these machine learning models and a i systems overall that are using them in a responsible fashion. And so I'm going to talk a little bit about that to close this off.

So, first of all, i want to make sure that you're aware uh of all the controls and measures that we've taken with amazon bedrock. So first of all, we make a very clear statement that you're in control of your data, whatever comes into the models, whatever comes out from these models, right? We're not storing that information. In fact, if you want to log that information um for your own needs, you need to integrate with cloud watch logs, for example, as a solution, set up an s3 bucket and put the results um there.

So uh very important to uh talk about um these aspects here. What about encryption and making sure that the data doesn't leave your sort of virtual private cloud, right? So there is private link and ability to link amazon bedrock with your own vpc. And that's clearly a very important consideration here. And of course, you can expect identity and access management controls on it and also an ability to uh use cloud trail to monitor api activity and who invoked the api and at what time and so on.

So this is sort of uh table stakes in some, in some ways for a to bs services. But it's important to highlight that that's available with uh with bedrock. And uh if you are um encrypting stuff or you're used to encrypting stuff with customer managed keys. This is fully supported with bedrock again. So cmk is there you decide what keys to encrypt with?

All right. Uh how many of you have uh listened in to the keynote or attended the keynote this morning? Yeah, so ah you must know that we have launched guardrails for bedrock. And this is actually a very, very exciting and long awaited uh uh kind of request or feature set of functionality from customers. And you know that you want to have controls and verifications and ability to even benchmark your models on certain characteristics. But even more important that real time as you're running your system, you need some controls in place. And with amazon bedrock guard rails, you have actually pretty interesting capability.

So oftentimes we talk about toxicity of prompts and toxicity of output from these models and of course, ensuring that these, you know, don't happen, right? You don't want the models to offend you in any way that's sort of natural expectation. And there are multiple content filters that better guard rails offer, right? So you can see there is hate insults, sexual filter and violence filter that you can also specify at sort of various levels both for the prompts that are going into the model and the output that comes out.

But a very interesting capability as well. Is this idea of topics, right? So you're actually saying i want these models to stay within a certain conversation a certain topic that i care about. Right. I'm running a business. I would like that customer support or whatever chat bots that i'm launching or q and a systems to answer things that are relevant to what i do. I don't want them to be conversing on random things.

Um, certainly like, i don't want any politics, for example, in, in the standard use, right? Questions and answers coming up. But an example here is an investment advice, for example, right? You want to avoid uh giving investment advice and you specify the topic with the title but also provide a more uh kind of longer description here or definition in language, right? In natural language of what you want to avoid.

And you could see with sample sort of prompts and final responses, you see that in this case, everything was fine. Um the you know, the these um uh uh topic uh was not triggered or was not detected, but at the same time, you can also clearly trigger it and you can see that a particular topic was detected as a result it was denied and you can also issue a particular response, final response saying, hey, this is um something that i can't comment on or i'm not comfortable uh discussing in this or uh and so on, but you have full control over that.

Great. So with that, i just want to wrap up with a couple of resources. Mm the titles didn't come out. Uh let's see. Oh ok. There they are. So we have the amazon bedrock getting started course that's coming from training and certification. So go there and uh learn on how to get going with bedrock. And then also, of course, i mentioned the amazon bedrock workshop again, lots and lots of interesting examples with different models with different approaches.

So for instance, you know, with summarization of these long transcript, well, if you take something like claw, it does allow for a large context window. So you can pass a very large transcript in. But again, you need to look at the economics of it. And uh there are other approaches where you can actually uh chunk that input into sections and then feed it into a model that is able to handle a smaller context and then combine the summary uh collectively.

And of course, like all the um exciting announcements are coming from uh us on the machine learning side. Uh look at the blog post. Well, i guess there's a link but yeah, just, just google amazon uh a iml blogs. And so, for example, there's already a blog out on uh amazon uh bedrock guard rails that i've just talked about.

Great. Uh so there's a way for you to uh reach us uh uh on x uh myself and usman. Uh but also don't forget to fill the session uh survey and um we will be cool to try what a leader does with it. Thank you so much."

你可能感兴趣的:(aws,亚马逊云科技,科技,人工智能,re:Invent,2023,生成式AI,云服务)