This tutorial series shows how to create a multi-tier ASP.NET MVC 4 web application that uses Windows Azure Storage tables, queues, and blobs, and how to deploy the application to a Windows Azure Cloud Service. The tutorials assume that you have no prior experience using Windows Azure. On completing the series, you'll know how to build a resilient and scalable data-driven web application and deploy it to the cloud.
This content is also available as a free e-book in the TechNet E-Book Gallery.
In this tutorial series you'll learn the following:
The application that you'll build is an email list service. The front-end of the multi-tier application includes web pages that administrators of the service use to manage email lists.
There is also a set of pages that administrators use to create messages to be sent to an email list.
Clients of the service are companies that give their customers an opportunity to sign up for a mailing list on the client web site. For example, an administrator sets up a list for Contoso University History Department announcements. When a student interested in History Department announcements clicks a link on the Contoso University web site, Contoso University makes a web service call to the Windows Azure Email Service application. The service method causes an email to be sent to the customer. That email contains a hyperlink, and when the recipient clicks the link, a page welcoming the customer to the History Department Announcements list is displayed.
Every email sent by the service (except the subscribe confirmation) includes a hyperlink that can be used to unsubscribe. If a recipient clicks the link, a web page asks for confirmation of intent to unsubscribe.
If the recipient clicks the Confirm button, a page is displayed confirming that the person has been removed from the list.
Here is a list of the tutorials with a summary of their contents:
If you just want to download the application and try it out, all you need is the first two tutorials. If you want to see all the steps that go into building an application like this from scratch, go through the last three tutorials after you go through the first two.
We chose an email list service for this sample application because it is the kind of application that needs to be resilient and scalable, two features that make it especially appropriate for Windows Azure.
If a server fails while sending out emails to a large list, you want to be able to spin up a new server easily and quickly, and you want the application to pick up where it left off without losing or duplicating any emails. A Windows Azure Cloud Service web or worker role instance (a virtual machine) is automatically replaced if it fails. And Windows Azure Storage queues and tables provide a means to implement server-to-server communication that can survive a failure without losing work.
An email service also must be able to handle spikes in workload, since sometimes you are sending emails to small lists and sometimes to very large lists. In many hosting environments, you have to purchase and maintain sufficient hardware to handle the spikes in workload, and you're paying for all that capacity 100% of the time although you might only use it 5% of the time. With Windows Azure, you pay only for the amount of computing power that you actually need for only as long as you need it. To scale up for a large mailing, you just change a configuration setting to increase the number of servers you have available to process the workload, and this can be done programmatically. For example, you could configure the application so that if the number of work items waiting in the queue exceeds a certain number, Windows Azure automatically spins up additional instances of the worker role that processes those work items.
The front-end stores email lists and messages to be sent to them in Windows Azure tables. When an administrator schedules a message to be sent, a table row containing the scheduled date and other data such as the subject line is added to the message
table. A worker role periodically scans the message
table looking for messages that need to be sent (we'll call this worker role A).
When worker role A finds a message needing to be sent, it does the following tasks:
message
table.A second worker role (worker role B) polls the queue for work items. When worker role B finds a work item, it processes the item by sending the email, and then it deletes the work item from the queue. The following diagram shows these relationships.
No emails are missed if worker role B goes down and has to be restarted, because a queue work item for an email isn't deleted until after the email has been sent. The back-end also implements table processing that prevents multiple emails from getting sent in case worker role A goes down and has to be restarted. In that case, multiple queue work items might be generated for a given destination email address. But for each destination email address, a row in the message
table tracks whether the email has been sent. Depending on the timing of the restart and email processing, worker A uses this row to avoid creating a second queue work item, or worker B uses this row to avoid sending a second email.
Worker role B also polls a subscription queue for work items put there by the Web API service method for new subscriptions. When it finds one, it sends the confirmation email.
The Windows Azure Email Service application stores data in Windows Azure Storage tables. Windows Azure tables are a NoSQL data store, not a relational database like Windows Azure SQL Database. That makes them a good choice when efficiency and scalability are more important than data normalization and relational integrity. For example, in this application, one worker role creates a row every time a queue work item is created, and another one retrieves and updates a row every time an email is sent, which might become a performance bottleneck if a relational database were used. Additionally, Windows Azure tables are cheaper than Windows Azure SQL. For more information about Windows Azure tables, see the resources that are listed at the end of the last tutorial in this series.
The following sections describe the contents of the Windows Azure tables that are used by the Windows Azure Email Service application. For a diagram that shows the tables and their relationships, see the Windows Azure Email Service data diagram later in this page.
The mailinglist
table stores information about mailing lists and the subscribers to mailing lists. (The Windows Azure table naming convention best practice is to use all lower-case letters.) Administrators use web pages to create and edit mailing lists, and clients and subscribers use a set of web pages and a service method to subscribe and unsubscribe.
In NoSQL tables, different rows can have different schemas, and this flexibility is commonly used to make one table store data that would require multiple tables in a relational database. For example, to store mailing list data in SQL Database you could use three tables: a mailinglist
table that stores information about the list, a subscriber
table that stores information about subscribers, and a mailinglistsubscriber
table that associates mailing lists with subscribers and vice versa. In the NoSQL table in this application, all of those functions are rolled into one table named mailinglist
.
In a Windows Azure table, every row has a partition key and a row key that uniquely identifies the row. The partition key divides the table up logically into partitions. Within a partition, the row key uniquely identifies a row. There are no secondary indexes; therefore to make sure that the application will be scalable, it is important to design your tables so that you can always specify partition key and row key values in the Where clause of queries.
The partition key for the mailinglist
table is the name of the mailing list.
The row key for the mailinglist
table can be one of two things: the constant "mailinglist" or the email address of the subscriber. Rows that have row key "mailinglist" include information about the mailing list. Rows that have the email address as the row key have information about the subscribers to the list.
In other words, rows with row key "mailinglist" are equivalent to a mailinglist
table in a relational database. Rows with row key = email address are equivalent to a subscriber
table and a mailinglistsubscriber
association table in a relational database.
Making one table serve multiple purposes in this way facilitates better performance. In a relational database three tables would have to be read, and three sets of rows would have to be sorted and matched up against each other, which takes time. Here just one table is read and its rows are automatically returned in partition key and row key order.
The following grid shows row properties for the rows that contain mailing list information (row key = "MailingList").
Property | Data Type | Description |
---|---|---|
PartitionKey | String | ListName: A unique identifier for the mailing list, for example: contoso1. The typical use for the table is to retrieve all information for a specific mailing list, so using the list name is an efficient way to partition the table. |
RowKey | String | The constant "mailinglist". |
Description | String | Description of the mailing List, for example: "Contoso University History Department announcements". |
FromEmailAddress | String | The "From" email address in the emails sent to this list, for example: [email protected]. |
The following grid shows row properties for the rows that contain subscriber information for the list (row key = email address).
Property | Data Type | Description |
---|---|---|
PartitionKey | String | ListName: The name (unique identifier) of the mailing list, for example: contoso1. |
RowKey | String | EmailAddress: The subscriber email address, for example: [email protected]. |
SubscriberGUID | String | Generated when the email address is added to a list. Used in subscribe and unsubscribe links so that it's difficult to subscribe or unsubscribe someone else's email address. Some queries for the Subscribe and Unsubscribe web pages specify only the PartitionKey and this property. Querying a partition without using the RowKey limits the scalability of the application, because queries will take longer as mailing list sizes increase. An option for improving scalability is to add lookup rows that have the SubscriberGUID in the RowKey property. For example, for each email address one row could have "email:[email protected]" in the RowKey and another row for the same subscriber could have "guid:6f32b03b-90ed-41a9-b8ac-c1310c67b66a" in the RowKey. This is simple to implement because atomic batch transactions on rows within a partition are easy to code. We hope to implement this in the next release of the sample application. For more information, see Real World: Designing a Scalable Partitioning Strategy for Windows Azure Table Storage |
Verified | Boolean | When the row is initially created for a new subscriber, the value is false. It changes to true only after the new subscriber clicks the Confirm hyperlink in the welcome email or an administrator sets it to true. If a message is sent to a list while the Verified value for one of its subscribers is false, no email is sent to that subscriber. |
The following list shows an example of what data in the table might look like.
Partition Key | contoso1 |
---|---|
Row Key | mailinglist |
Description | Contoso University History Department announcements |
FromEmailAddress | [email protected] |
Partition Key | contoso1 |
---|---|
Row Key | [email protected] |
SubscriberGUID | 6f32b03b-90ed-41a9-b8ac-c1310c67b66a |
Verified | true |
Partition Key | contoso1 |
---|---|
Row Key | [email protected] |
SubscriberGUID | 01234567-90ed-41a9-b8ac-c1310c67b66a |
Verified | false |
Partition Key | fabrikam1 |
---|---|
Row Key | mailinglist |
Description | Fabrikam Engineering job postings |
FromEmailAddress | [email protected] |
Partition Key | fabrikam1 |
---|---|
Row Key | [email protected] |
SubscriberGUID | 76543210-90ed-41a9-b8ac-c1310c67b66a |
Verified | true |
The message
table stores information about messages that are scheduled to be sent to a mailing list. Administrators create and edit rows in this table using web pages, and the worker roles use it to pass information about each email from worker role A to worker role B.
The partition key for the message table is the date the email is scheduled to be sent, in yyyy-mm-dd format. This optimizes the table for the query that is executed most often against this table, which selects rows that have ScheduledDate
of today or earlier. However, it does creates a potential performance bottleneck, because Windows Azure Storage tables have a maximum throughput of 500 entities per second for a partition. For each email to be sent, the application writes a message
table row, reads a row, and deletes a row. Therefore the shortest possible time for processing 1,000,000 emails scheduled for a single day is almost two hours, regardless of how many worker roles are added in order to handle increased loads.
The row key for the message
table can be one of two things: the constant "message" plus a unique key for the message called the MessageRef
, or the MessageRef
value plus the email address of the subscriber. Rows that have row key that begins with "message" include information about the message, such as the mailing list to send it to and when it should be sent. Rows that have the MessageRef
and email address as the row key have all of the information needed to send an email to that email address.
In relational database terms, rows with row key that begins with "message" are equivalent to a message
table. Rows with row key = MessageRef
plus email address are equivalent to a join query view that contains mailinglist
, message
, and subscriber
information.
The following grid shows row properties for the message
table rows that have information about the message itself.
Property | Data Type | Description |
---|---|---|
PartitionKey | String | The date the message is scheduled to be sent, in yyyy-mm-dd format. |
RowKey | String | The constant "message" concatenated with the MessageRef value. The MessageRef is a unique value created by getting the Ticks value from DateTime.Now when the row is created. Note: High volume multi-threaded, multi-instance applications should be prepared to handle duplicate RowKey exceptions when using Ticks. Ticks are not guaranteed to be unique. |
ScheduledDate | Date | The date the message is scheduled to be sent. (Same as PartitionKey but in Date format.) |
SubjectLine | String | The subject line of the email. |
ListName | String | The list that this message is to be sent to. |
Status | String |
|
When worker role A creates a queue message for an email to be sent to a list, it creates an email row in the message
table. When worker role B sends the email, it moves the email row to the messagearchive
table and updates the EmailSent
property to true
. When all of the email rows for a message in Processing status have been archived, worker role A sets the status to Completed and moves the message
row to the messagearchive
table.
The following grid shows row properties for the email rows in the message
table.
Property | Data Type | Description |
---|---|---|
PartitionKey | String | The date the message is scheduled to be sent, in yyyy-mm-dd format. |
RowKey | String | The MessageRef value and the destination email address from the subscriber row of the mailinglist table. |
MessageRef | Long | Same as the MessageRef component of the RowKey . |
ScheduledDate | Date | The scheduled date from the message row of the message table. (Same as PartitionKey but in Date format.) |
SubjectLine | String | The email subject line from the message row of the message table. |
ListName | String | The mailing list name from the mailinglist table. |
From EmailAddress | String | The "from" email address from the mailinglist row of the mailinglist table. |
EmailAddress | String | The email address from the subscriber row of the mailinglist table. |
SubscriberGUID | String | The subscriber GUID from the subscriber row of the mailinglist table. |
EmailSent | Boolean | False means the email has not been sent yet; true means the email has been sent. |
There is redundant data in these rows, which you would typically avoid in a relational database. But in this case you are trading some of the disadvantages of redundant data for the benefit of greater processing efficiency and scalability. Because all of the data needed for an email is present in one of these rows, worker role B only needs to read one row in order to send an email when it pulls a work item off the queue.
You might wonder where the body of the email comes from. These rows don't have blob references for the files that contain the body of the email, because that value is derived from the MessageRef
value. For example, if the MessageRef
is 634852858215726983, the blobs are named 634852858215726983.htm and 634852858215726983.txt.
The following list shows an example of what data in the table might look like.
Partition Key | 2012-10-15 |
---|---|
Row Key | message634852858215726983 |
MessageRef | 634852858215726983 |
ScheduledDate | 2012-10-15 |
SubjectLine | New lecture series |
ListName | contoso1 |
Status | Processing |
Partition Key | 2012-10-15 |
---|---|
Row Key | [email protected] |
MessageRef | 634852858215726983 |
ScheduledDate | 2012-10-15 |
SubjectLine | New lecture series |
ListName | contoso1 |
FromEmailAddress | [email protected] |
EmailAddress | [email protected] |
SubscriberGUID | 76543210-90ed-41a9-b8ac-c1310c67b66a |
EmailSent | true |
Partition Key | 2012-10-15 |
---|---|
Row Key | [email protected] |
MessageRef | 634852858215726983 |
ScheduledDate | 2012-10-15 |
SubjectLine | New lecture series |
ListName | contoso1 |
FromEmailAddress | [email protected] |
EmailAddress | [email protected] |
SubscriberGUID | 12345678-90ed-41a9-b8ac-c1310c679876 |
EmailSent | true |
Partition Key | 2012-11-15 |
---|---|
Row Key | message124852858215726999 |
MessageRef | 124852858215726999 |
ScheduledDate | 2012-11-15 |
SubjectLine | New job postings |
ListName | fabrikam |
Status | Pending |
One strategy for making sure that queries execute efficiently, especially if you have to search on fields other than PartitionKey
and RowKey
, is to limit the size of the table. The query in worker role A that checks to see if all emails have been sent for a message needs to find email rows in the message
table that have EmailSent
= false. The EmailSent
value is not in the PartitionKey or RowKey, so this would not be an efficient query for a message with a large number of email rows. Therefore, the application moves email rows to the messagearchive
table as the emails are sent. As a result, the query to check if all emails for a message have been sent only has to query the message table on PartitionKey
and RowKey
because if it finds any email rows for a message at all, that means there are unsent messages and the message can't be marked Complete
.
The schema of rows in the messagearchive
table is identical to that of the message
table. Depending on what you want to do with this archival data, you could limit its size and expense by reducing the number of properties stored for each row, and by deleting rows older than a certain age.
Windows Azure queues facilitate communication between tiers of this multi-tier application, and between worker roles in the back-end tier. Queues are used to communicate between worker role A and worker role B in order to make the application scalable. Worker role A could create a row in the Message table for each email, and worker role B could scan the table for rows representing emails that haven’t been sent, but you wouldn’t be able to add additional instances of worker role B in order to divide up the work. The problem with using table rows to coordinate the work between worker role A and worker role B is that you have no way of ensuring that only one worker role instance will pick up any given table row for processing. Queues give you that assurance. When a worker role instance pulls a work item off a queue, the queue service makes sure that no other worker role instance can pull the same work item. This exclusive lease feature of Windows Azure queues facilitates sharing a workload among multiple instances of a worker role.
Windows Azure also provides the Service Bus queue service. For more information about Windows Azure Storage queues and Service Bus queues, see the resources that are listed at the end of the last tutorial in this series.
The Windows Azure Email Service application uses two queues, named AzureMailQueue
and AzureMailSubscribeQueue
.
The AzureMailQueue
queue coordinates the sending of emails to email lists. Worker role A places a work item on the queue for each email to be sent, and worker role B pulls a work item from the queue and sends the email.
A queue work item contains a comma-delimited string that consists of the scheduled date of the message (partition key to the message
table) and the MessageRef
and EmailAddress
values (row key to the message
table) values, plus a flag indicating whether the item is created after the worker role went down and restarted, for example:
2012-10-15,634852858215726983,student1@contoso.edu,0
Worker role B uses these values to look up the row in the message
table that contains all of the information needed to send the email. If the restart flag indicates a restart, worker B makes sure the email has not already been sent before sending it.
When traffic spikes, the Cloud Service can be reconfigured so that multiple instances of worker role B are instantiated, and each of them can independently pull work items off the queue.
The AzureMailSubscribeQueue
queue coordinates the sending of subscription confirmation emails. In response to a service method call, the service method places a work item on the queue. Worker role B pulls the work item from the queue and sends the subscription confirmation email.
A queue work item contains the subscriber GUID. This value uniquely identifies an email address and the list to subscribe it to, which is all that worker role B needs to send a confirmation email. As explained earlier, this requires a query on a field that is not in the PartitionKey
or RowKey
, which is inefficient. To make the application more scalable, the mailinglist
table would have to be restructured to include the subscriber GUID in the RowKey
.
The following diagram shows the tables and queues and their relationships.
Blobs are "binary large objects." The Windows Azure Blob service provides a means for uploading and storing files in the cloud. For more information about Windows Azure blobs, see the resources that are listed at the end of the last tutorial in this series.
Windows Azure Mail Service administrators put the body of an email in HTML form in an .htm file and in plain text in a .txt file. When they schedule an email, they upload these files in the Create Message web page, and the ASP.NET MVC controller for the page stores the uploaded file in a Windows Azure blob.
Blobs are stored in blob containers, much like files are stored in folders. The Windows Azure Mail Service application uses a single blob container, named azuremailblobcontainer. The name of the blobs in the container is derived by concatenating the MessageRef value with the file extension, for example: 634852858215726983.htm and 634852858215726983.txt.
Since both HTML and plain text messages are essentially strings, we could have designed the application to store the email message body in string properties in the Message
table instead of in blobs. However, there is a 64K limit on the size of a property in a table row, so using a blob avoids that limitation on email body size. (64K is the maximum total size of the property; after allowing for encoding overhead, the maximum string size you can store in a property is actually closer to 48k.)
When you download the Windows Azure Email Service, it is configured so that the front-end and back-end all run in a single Windows Azure Cloud Service.
An alternative architecture is to run the front-end in a Windows Azure Web Site.
Keeping all components in a cloud service simplifies configuration and deployment. If you create the application with the ASP.NET MVC front end in a Windows Azure Web Site, you will have two deployments, one to the Windows Azure Web Site and one to the Windows Azure Cloud Service. In addition, Windows Azure Cloud Service web roles provide the following features that are unavailable in Windows Azure Web Sites:
The alternative architecture might offer some cost benefits, because a Windows Azure Web Site might be less expensive for similar capacity compared to a web role running in a Cloud Service. Later tutorials in the series explain implementation details that differ between the two architectures.
For more information about how to choose between Windows Azure Web Sites and Windows Azure Cloud Services, see Windows Azure Execution Models.
This section provides a brief overview of costs for running the sample application in Windows Azure, given rates in effect when the tutorial was published in December of 2012. Before making any business decisions based on costs, be sure to check current rates on the following web pages:
Costs are affected by the number of web and worker role instances you decide to maintain. In order to qualify for the Azure Cloud Service 99.95% Service Level Agreement (SLA), you must deploy two or more instances of each role. One of the reasons you must run at least two role instances is because the virtual machines that run your application are restarted approximately twice per month for operating system upgrades. (For more information on OS Updates, see Role Instance Restarts Due to OS Upgrades.)
The work performed by the two worker roles in this sample is not time critical and so does not need the 99.5% SLA. Therefore, running a single instance of each worker role is feasible so long as one instance can keep up with the work load. The web role instance is time sensitive, that is, users expect the web site to not have any down time, so a production application should have at least two instances of the web role.
The following table shows the costs for the default architecture for the Windows Azure Email Service sample application assuming a minimal workload. The costs shown are based on using an extra small (shared) virtual machine size. The default virtual machine size when you create a Visual Studio cloud project is small, which is about six times more expensive than the extra small size.
Component or Service | Rate | Cost per month |
---|---|---|
Web role | 2 instances at $.02/hour for extra small instances | $29.00 |
Worker role A (schedules emails to be sent) | 1 instance at $.02/hour for an extra small instance | $14.50 |
Worker role B (sends emails) | 1 instance at $.02/hour for an extra small instance | $14.50 |
Windows Azure storage transactions | 1 million transactions per month at $0.10/million (Each query counts as a transaction; worker role A continuously queries tables for messages that need to be sent. The application is also configured to write diagnostic data to Windows Azure Storage, and each time it does that is a transaction.) | $0.10 |
Windows Azure locally redundant storage | $2.33 for 25 GB (Includes storage for application tables and diagnostic data.) | $2.33 |
Bandwidth | 5 GB egress is free | Free |
SendGrid | Windows Azure customers can send 25,000 emails per month for free | Free |
Total | $60.43 |
As you can see, role instances are a major component of the overall cost. Role instances incur a cost even if they are stopped; you must delete a role instance to not incur any charges. One cost saving approach would be to move all the code from worker role A and worker role B into one worker role. For these tutorials we deliberately chose to implement two worker instances in order to simplify scale out. The work that worker role B does is coordinated by the Windows Azure Queue service, which means that you can scale out worker role B simply by increasing the number of role instances. (Worker role B is the limiting factor for high load conditions.) The work performed by worker role A is not coordinated by queues, therefore you cannot run multiple instances of worker role A. If the two worker roles were combined and you wanted to enable scale out, you would need to implement a mechanism for ensuring that worker role A tasks run in only one instance. (One such mechanism is provided by CloudFx. See the WorkerRole.cs sample.)
It is also possible to move all of the code from the two worker roles into the web role so that everything runs in the web role. However, performing background tasks in ASP.NET is not supported or considered robust, and this architecture would complicate scalability. For more information see The Dangers of Implementing Recurring Background Tasks In ASP.NET. See also How to Combine a Worker and Web Role in Windows Azure and Combining Multiple Azure Worker Roles into an Azure Web Role.
Another architecture alternative that would reduce cost is to use the Autoscaling Application Block to automatically deploy worker roles only during scheduled periods, and delete them when work is completed. For more information on autoscaling, see the links at the end of the last tutorial in this series.
Windows Azure in the future might provide a notification mechanism for scheduled reboots, which would allow you to only spin up an extra web role instance for the reboot time window. You wouldn't qualify for the 99.95 SLA, but you could reduce your costs by almost half and ensure your web application remains available during the reboot interval.
In a production application you would implement an authentication and authorization mechanism like the ASP.NET membership system for the ASP.NET MVC web front-end, including the ASP.NET Web API service method. There are also other options, such as using a shared secret, for securing the Web API service method. Authentication and authorization functionality has been omitted from the sample application to keep it simple to set up and deploy. (The second tutorial in the series shows how to implement IP restrictions so that unauthorized persons can't use the application when you deploy it to the cloud.)
For more information about how to implement authentication and authorization in an ASP.NET MVC web project, see the following resources:
Note: We planned to include a mechanism for securing the Web API service method by using a shared secret, but that was not completed in time for the initial release. Therefore the third tutorial does not show how to build the Web API controller for the subscription process. We hope to include instructions for implementing a secure subscription process in the next version of this tutorial. Until then, you can test the application by using the administrator web pages to subscribe email addresses to lists.
In the next tutorial, you'll download the sample project, configure your development environment, configure the project for your environment, and test the project locally and in the cloud. In the following tutorials you'll see how to build the project from scratch.
For links to additional resources for working with Windows Azure Storage tables, queues, and blobs, see the end of the last tutorial in this series.