本文来自:http://www.vckbase.com/document/viewdoc/?id=1359
本文使用下列技术:ASP.NET,.NET 框架,IIS
用 ASP.NET 编写 Web 应用程序其轻松程度令人难以置信。它是如此的容易,以至于许多开发人员不用花费多少时间来构筑其应用便能获得非常好的性能。在本文中,我将给出10个编写高性能 Web 应用的技巧。我的评论不仅仅局限与 ASP.NET 应用,因为它们只是 Web 应用的一个子集。本文也不是 Web 应用性能调整的权威指南——这方面的内容可以写成一本书。相反,本文可以被视作一个好的起点。
在废寝忘食地工作之前,我常常要去攀岩。在攀岩之前,我总是要看一下指南手册中的线路并阅读以前来此一游的人留下的建议和忠告。但是,不管指南手册有多磨好,在尝试一次特定的具有挑战性的攀爬之前,你都必须付诸实际的行动。同样,在你面临解决的性能问题或者营运一个高吞吐量的站点之前,你只能想方设法编写高性能 Web 应用程序。
我们个人经验来自在微软 ASP.NET 团队从事底层架构程序经理,运行和管理 www.asp.net ,并协助架构 Community Server 过程中的经历,Community Server 是几个有名的 ASP.NET 应用程序的下一个版本(它将 ASP.NET Forums,.Text 和 nGallery 整合到一个平台)。我确信这些帮助过我的技巧也会对你有所裨益。
你应该考虑将应用程序分离成几个逻辑层。你可能听说过术语3-层(或n-层)物理体系结构。它们通常是跨进程和/或硬件对功能进行物理划分的规定的体系结构模式。当系统需要伸缩时,更多的硬件能被添加。然而,总是应该避免与进程和机器忙碌程度相关的性能问题。所以,不管什么时候,只要可能,都要在相同的应用中一起运行 ASP.NET 页面及其相关的组件。
由于代码和层之间的边界分离,使用 Web 服务或远程调用将降低20%以上的性能。
数据层则稍微有些不同,因为数据库通常都用专门的硬件。但是,数据库的处理成本仍然很高,因此最优化代码时,数据层的性能应该是首当其充要关注的地方。
在着手解决你的应用程序的性能问题之前,一定要剖析应用程序,确定问题之所在。获取关键的性能计数器值(如实现垃圾收集所花时间之百分比的性能计数器的值)对于查找应用程序在何处最耗时也是非常重要的。凭借直觉常常也能找到耗时所在。
本文所描述的性能改进有两种类型:大型优化,如使用 ASP.NET Cache,以及不断重复进行的微型优化。这些微型优化有时很有意思。对代码的小小改动便会引起很大的动静,产生成千次的调用。对于大型优化,你可能会看到整体性能的大跳跃。而对微型优化,给定请求可能只是毫秒级的调整,但按每天的请求总数计算,其结果的改进可能是巨大的。
数据层的性能
当调整某个应用程序的性能时,有一个简单的试金石,你可以用它按先后次序:检查代码是否存取数据库?如果是,多长时间存取一次?注意相同的测试也可以被应用于使用 Web 服务或远程调用的代码,但我们本文中不涉及这方面内容。
如果在特定的代码流程中必须具有对数据库的请求以及要考察其它方面,如:想对字符串处理进行优先优化,那么暂且把它放一放,先按照上面定好的优先次序来做。除非你有异乎寻常的性能问题,否则你的时间应该用在尝试最优化与数据库的连接所花的时间,返回的数据量以及多长时间往返一次和数据库的通讯上。
有了这些概括信息,下面就让我们来看看能帮助你改善应用程序性能的十个技巧。我将从能获得最显著效果的改变开始。
技巧 1 —— 返回多个结果集
复审你的数据库代码,看看是否有多于一次的对数据库的访问请求。这样每次往返数据库都降低你的应用程序能处理的每秒请求数。通过在单个数据库请求中返回多结果集,你能降低与数据库通信的总体时间。同时你也将使系统更具伸缩性,因为你减少了数据库服务器处理请求的负担。
虽然你可以用动态 SQL 返回多结果集,我更喜欢使用存储过过程。是否将业务逻辑驻留在存储过程当中是个有待争论的问题,但我认为,如果存储过程中的逻辑能约束返回的数据(降低数据集的尺寸,在网络上传输的时间以及逻辑层不必过虑数据),这是一件好事情。
使用 SqlCommand 命令实例及其 ExecuteReader 方法来处理强类型的各个业务类,你通过调用 NextResult 可以向前移动结果集指针。Figure 1 示范了处理几个带类型的 ArrayLists 例子会话。从数据库只返回你需要的数据还会降低服务器上内存的分配。
技巧 2 —— 分页数据存取
ASP.NET DataGrid 提供了非常好的能力:数据分页支持。当启用 DataGrid 中的分页功能,则每次只显示固定数量的记录。此外,分页用户界面也会显示在 DataGrid 底部用于导航记录。分页用户界面允许你向前向后导航所显示的记录,一次显示固定数量的记录。
有一个美中不足的是用 DataGrid 分页需要将所有数据邦定到此栅格控件(gird)。例如,你的数据层必须返回所有数据,然后 DataGrid 将根据当前页过滤掉所有显示的记录。当你通过 DataGrid 进行分页时,如果有 100,000 条记录被返回,那么每个请求有 99,975 条记录将被废弃掉(假设页尺寸为 25)。当记录数不断增加,此应用程序的性能便会遭受痛苦,因为每次请求所要发送的数据会越来越多。
编写较好的分页代码的一个好的方法是用存储过程。Figure 2 示范了一个用 Northwind 数据库中 Orders 表通过存储过程分页的例子。很简单,只要你在页面中传递索引以及页尺寸即可。相应的结果集先被计算然后被返回。
在 Community Server 中,我们编写了几个分页控件来完成数据分页。你将会看到,我使用了技巧 1 中讨论的思路,从一个存储过程中返回连个结果集:总记录数和请求的数据。
返回的总记录数依赖于所执行的查询不同而不同。例如,某个 WHERE 子句可被用于约束返回的数据。为了计算在分页用户界面显示的总页数,返回的总记录数必须是已知的。例如,如果有 1,000,000 条记录,用一个 WHERE 子句对之过滤后为 1,000 条记录,则分页逻辑必须要知道总记录数以便在分页用户界面中正确呈现。
技巧 3 —— 连接池
建立 Web 应用程序与 SQL Server 之间的 TCP 连接是一项昂贵的操作。微软的开发人员利用连接池技术已经有好长一段时间了,这个技术使他们能重用到数据库的连接。而不是每次请求都建立新的 TCP 连接,新连接仅在连接池中得不到连接时才建立。当连接被关闭时,它被返回到连接池中,在那里它仍然保持与数据库的连接,与完全断开 TCP 连接相反。
当然,你需要提防泄漏的连接。当你处理完毕,一定要关闭连接。重申一次:不管人们怎么吹嘘微软 .NET 框架中的垃圾收集特性,每当你处理完毕,一定要显式地调用连接对象的 Close 或 Dispose 方法。不要指望公共语言运行时(CLR)来为你定时清除和关闭连接。CLR 最终将销毁类并强行关闭连接,但你无法保证该对象的垃圾收集届时会起作用。
为了充分用好连接池,有几条规则必须了然于心。首先,打开连接,进行处理,然后关闭连接。宁愿每个请求的连接打开和关闭多次,也不要保持连接打开状态以及在不同的方法间将它传来传去。其次,使用相同的连接串(如果你使用集成身份检查,那么也要用相同的线程身份)。如果你不用相同的连接串,例如,根据登录用户来定制连接串,你将无法得到连接池所提供的相同的最优化值。当模拟大用户量情形时,如果你使用集成身份检查,那么你的连接池将效力大减。.NET CLR 数据性能计数器在试图跟踪任何与连接池有关的性能问题时是非常有用的。
不管什么时候,只要你的应用程序连接到运行在其它进程中的资源,比如某个数据库,你都应该针对连接到资源所耗时间,发送和接收数据所耗时间以及往返次数进行优化。为了实现较好的性能,应该首当其充优化应用程序中任何种类的忙碌进程。
应用层包含到数据层的连接以及将数据转换成有意义的类实例和业务处理的逻辑。以 Community Server 为例,你要在其中处理 Forums 和 Threads 集合;以及应用许可这样的业务规则;尤其重要的是缓冲(Caching)逻辑也实现其中。
技巧 4 —— ASP.NET Cache API
在编写代码之前要做的头等大事之一是最大限度地构建应用层并发掘 ASP.NET 的 Cache 特性。
如果你的组件在 ASP.NET 应用程序内运行,那么你只需要在应用程序工程中引用 System.Web.dll 即可。当你需要访问 Cache 时,用 HttpRuntime.Cache 属性(相同的对象也可以通过 Page.Cache 和 HttpContext.Cache 访问)。
缓冲数据有几个准则。首先,如果数据能被使用多次,缓冲是个好的后选方案。其次,如果数据对给定请求或用户是一般的数据而非专用数据,那么最好是选择缓冲。如果数据用户或请求专用,如果需要保存期很长但可能不被经常使用,那么仍然要用缓冲。第三,常常被忽略的一个准则是有时缓冲太多的东西。一般来说,在x86机器上,为了降低内存不足错误的几率,运行某个进程不要超过800MB私有字节。因此,缓冲应该有个上限。换句话说,你也许能重用某个计算的结果,但如果该计算有10个参数,你可能试图针对10个置换进行缓冲,这样做可能会给你带来麻烦。ASP.NET 提供的最常见的容错是由覆盖缓冲导致的内存不足错误,尤其是大型数据集。
Cache 有几个重要特性是必须要了解的。第一个是 Cache 实现了最近最少使用(least-recently-used)算法,允许 ASP.NET 强制 Cache 清除操作 —— 如果可用内存下降到低水平 —— 则自动从 Cache 中删除不使用的项目。第二个是 Cache 支持依赖性到期特性,它能强制包括时间,键值,文件失效。时间常常被使用,但 ASP.NET 2.0 引入了具有更强大的失效类型:数据库缓冲失效。也就是当数据库中的数据改变时,缓冲中的条目会自动删除。有关数据库缓冲失效的更多信息参见 Dino Esposito 在 MSDN 杂志 2004 年七月刊的 Cutting Edge 专栏文章。该缓冲的体系结构,参见 Figure 3。
Figure 3 ASP.NET Cache
技巧 5 —— 预请求缓冲(Per-Request Caching)
在本文前面,我曾提到对频繁执行的代码块所做的小小改动可能产生很大的,整体性能的提升。我把其中一个我特别中意的叫做预请求缓冲(per-request caching)。
由于 Cache API 被设计用来缓冲长期数据或直到某个条件被满足,预请求缓冲意旨用于请求期间的缓冲该数据。特定的代码流程被每次请求频繁访问但是数据只需要被拾取,应用,修改或更新一次,这样说太理论化,还是让我们看一个具体的例子吧。
在 Community Server 的 Forums (论坛)应用中,某个页面上使用的每个服务器控件需要个性化数据以确定使用那个皮肤和式样页,以及其它的个性化数据,其中有些数据可以被长时间缓冲,但有些数据,比如用于控件的皮肤在单个请求中只被拾取一次并在该请求执行期间被重用多次。
为了完成预请求缓冲,用 ASP.NET HttpContext。HttpContext 的实例是随每个请求创建的,并可以通过 HttpContext.Current 属性在那个请求执行期间的任何地方存取它。HttpContext 类具有一个特别的 Items 集合属性,被添加到该 Items 集合的对象和数据只是在该请求期间被缓存。就像你可以使用 Cache 来保存频繁使用的数据一样,你可以用 HttpContext.Items 来保存只在某个预请求中使用的数据。在此背景后的逻辑很简单:当数据不存在时被添加到 HttpContext.Items 集合,以及在随后的并发查找中简单地返回 HttpContext.Items 中发现的数据。
技巧 6——后台处理
你的代码流程应该尽可能快,对吧?你自己可能多次发现要完成每个请求或每n个请求的任务代价很高。发出 e-mail 或解析并检查输入数据的有效性就是个例。
在重新生成 ASP.NET Forums 1.0 并把它整合到 Community Server 时,我们发现添加新贴的代码流程非常慢。每次添加帖子,应用程序首先要确保没有重复贴,然后必须用“badword”过滤器解析该贴的表情图像,记号并索引,如果必要还要将帖子添加到相应的队列中,对附件进行有效性检查,最终完成发贴后,给预订者发出 e-mail 通知。显然,这里做的工作太多。
我们发现大多数时间都花在了索引逻辑和发送e-mail上。索引帖子是一个很耗时的操作,此外,内建的 System.Web.Mail 功能要与 SMTP 服务器连接并顺序发送邮件。当特定帖子或主题预定者数量增加时,AddPost 函数的执行时间会越来越长。
并不是每个请求都需要索引邮件,我们想最好是批量集中处理,并且一次只索引25个帖子或每隔五分钟发送一次邮件。我们决定使用的代码与我曾在原型数据库缓冲失效中所使用的代码相同,最终它也被纳入 Visual Studio 2005。
名字空间 System.Threading 中的 Timer 类非常有用,但在.NET 框架中鲜为人知,至少对 Web 开发者来说是这样。一旦创建,Timer 将以可定制的间隔针对线程池中的某个线程调用指定的回调函数。这意味着你不用输入请求到 ASP.NET 应用程序便能让代码实行,这是一种最合适后台处理的情形。你也可以在这种后台处理模式中进行例如索引或发送电子邮件这样的工作。
尽管如此,这个技术存在几个问题,如果你的应用程序域关闭,该定时器实例将停止触发其事件。另外,由于 CLR 有一个硬坎,即每个进程的线程数是固定的,你便可能陷入严重的服务器负荷当中,此时可能就没有线程来处理定时器,从而造成延时。为了让发生这种情况的几率最小化,ASP.NET 通过在进程中预留一定数量的空闲线程,并只使用部分线程来处理请求。然而,如果你有许多异步处理,这样做会有问题。
由于篇幅所限,在此无法列出代码,但你可以从 www.rob-howard.net 下载可消化的例子。其中有 Blackbelt TechEd 2004 展示的幻灯和 Demo。
技巧 7——页面输出缓存和代理服务器
ASP.NET 是你的表示层(或者说应该是);它由页面,用户控件,服务器控件(HttpHandlers and HttpModules)以及它们生成的内容组成。如果你有一个产生输出的 ASP.NET 页面,不管是输出 HTML,XML,图像还是任何其它数据,而且每个请求你都运行这个代码并产生相同的输出,此时最好选择使用页面输出缓存。
只要在页面顶部添加这一行代码即可:
<%@ Page OutputCache VaryByParams="none" Duration="60" %>
你可以为此页面有效地产生一次输出并可以在60秒内多次重用它,一到这个时间点,该页面将重新执行并将再次将输出添加到 ASP.NET Cache。这个行为还能用某些低级编程 APIs 来完成。输出缓存有几个可以配置的设置,比如:VaryByParams 属性。VaryByParams 不是必须的,但允许你指定 HTTP GET 或 HTTP POST 参数来改变缓存入口。例如,default.aspx?Report=1 或 default.aspx?Report=2 可以简单地设置 VaryByParam="Report" 来对输出进行缓存。额外的参数被命名并用用分号分隔。
在使用输出缓存机制时,许多人都不了解 ASP.NET 页还产生一组下游缓存服务器 HTTP 头,比如 Microsoft Internet Security and Acceleration Server 或 Akamai 使用的 HTTP 头。当设置 HTTP 缓存头,文档可以被缓存到这些网络资源,从而响应客户端请求不必返回原服务器。
然而,使用页面输出缓存并不会使你的应用程序更有效率,但它能通过下游缓存技术缓存文档从而潜在地降低服务器的负载。当然,这只能是异步内容;一旦实施下游缓存,你将无法看到任何请求,也不能实现身份认证来防止对它的存取。
技巧 8——运行 IIS 6.0 (如果仅用于内核缓存)
如果你不运行 IIS 6.O(Windows Server 2003),那么你将得不到微软 Web 服务器中一些重大的性能改进。在技巧 7 中,我谈到了输出缓存。在 IIS 5.0 中,请求到达 IIS,然后到达 ASP.NET。当使用缓存时,ASP.NET 中的 HttpModule 接受该请求,并从该缓存中返回内容。
如果你用 IIS 6.0,有一些巧妙的特性叫内核缓存,它不需要将任何代码改成 ASP.NET。当 ASP.NET对请求进行缓存处理,IIS 内核缓存便接收一份缓存数据的拷贝。当请求来自网络驱动器,内核一级的驱动程序(没有到用户模式的上下文转换)接收该请求,如果缓存,则直接用缓存数据响应并完成执行。这意味着当你使用 IIS 内核模式缓存和 ASP.NET 缓存时,你将看到无法置信的性能结果。在开发 Visual Studio 2005 的 ASP.NET 期间,我是负责 ASP.NET 性能的程序经理。开发人员的工作做的真是棒极了,而我基本上每天都在看报告。内核模式缓存结果总是最有趣的。典型的情况是请求/响应往往使网络饱和,但 IIS 的运行仅占 CPU 的百分之五。真令人惊异!当然使用 IIS 6.O 有其它一些原因,但内核模式缓存是显而易见的理由。
技巧 9——使用 Gzip 压缩
虽然使用 gzip 压缩不是一个必须的服务器性能技巧(因为你可能看到 CUP 的使用率上升了),但它能降低服务器发送字节的数量。从而感觉页面更快,而且减少带宽的占用。其压缩的效果好坏取决于所发送的数据以及客户端浏览器是否支持这种压缩(IIS 只会将数据发送到支持 gzip 的浏览器,比如:IE 6.0 和 Firefox),从而使服务器可以在每秒钟里处理更多的请求。事实上,只要你降低返回数据的数量,便能提高每秒所处理的请求数。
有一个好消息是 gzip 压缩是 IIS 6.0 的内建特性,并且比它在 IIS 5.0 中使用的效果更好。但是,要想在 IIS 6.0 中启用 gzip 压缩可能没那么方便,IIS 的属性对话框里找不到设置它的地方。IIS 团队将卓越的 gzip 压缩能力内建在服务器中,但忽视了建立一个启用压缩特性的管理用户界面。要想启用 gzip 压缩机制,你必须深入到 IIS 的 XML 配置设置内部(必须对之相当熟悉才能配置)。顺便提一下,在此感谢 OrcsWeb 的 Scott Forsyth 帮我解决了在 OrcsWeb 数个 www.asp.net 服务器上的这个问题。
与其在本文中包含整个过程,还不如阅读 Brad Wilson 在 IIS6 Compression 上的文章。微软知识库也有一篇关于为ASPX启用压缩特性的文章:Enable ASPX Compression in IIS。但是,还必须注意一点,动态压缩与内核缓存由于某些实现细节的原因,其在 IIS 6.0 中是相互排斥的。
技巧 10——服务器控件的可视状态
可视状态(View State)对于 ASP.NET 来说是个奇特的名字,它在所产生的页面中隐藏输入域以存储某些状态数据。当页面被发回服务器,该服务器能解析,检查其有效性并将这个状态数据应用到页面的控件树中。可视状态是一种非常强大的能力,因为它允许状态被客户端持续化并且它不需要cookies 或 服务器内存来存储该状态。许多 ASP.NET 服务器控件使用可视状态来持续化与页面元素交互期间所作的设置,例如,对数据进行分页时保存当前页显示页。
然而,使用可视状态有许多不利之处,首先,不论是在请求的时候还是提供服务的时候,它都增加造成整个页面的负担。当序列化或反序列化被返回服务器的可视状态数据时还产生一些附加的开销。最终可视状态会增加服务器的内存分配。
最著名的服务器控件要数 DataGrid 了,使用可视状态有过之而无不及,即便是在不需要使用的时候也是如此。ViewState 属性默认是启用的,但如果你不需要它,可以在页面控件级或页面级关闭它。在某个控件中,只要将 EnableViewState 设置为 false,或者在页面里使用如下全局设置:
<%@ Page EnableViewState="false" %>
如果在某页面中不进行回发,或每次请求页面时总是重新产生控件,那么你应该在页面级禁用可视状态。
结论
我已经向你提供了一些我认为有用的编写高性能 ASP.NET 应用程序的技巧。正如我在本文开头时所讲的那样,这是一些很初级的指南,而不是 ASP.NET 性能方面的最终定论。(更多有关改进 ASP.NET 应用程序性能方面的信息请参见:Improving ASP.NET Performance.)只有通过自己的经验方能找到最佳途径来解决具体的性能问题。不管怎样,在你解决问题的过程中,这些技巧多少会对你有所裨益的。在软件开发过程中,每一个应用都有其独特的一面,没有什么东西是绝对的。
——常见的性能神话
最常见的神话之一是 C# 代码比 Visual Basic 代码快。这样的说法是站不住脚的,虽然在 Visual Basic 中存在一些 C# 没有的性能阻碍行为,比如显式地声明类型。但是如果遵循良好的编程实践,没有理由说明 Visual Basic 和 C# 代码不能以几乎同样的性能执行。简单说来,相同的代码产生相同的结果。
另一个神话是后台代码比内联代码快,这是绝对不成立的。性能与你的 ASP.NET 应用程序代码在哪没有什么关系,无论是后台代码文件还是内联在 ASP.NET 页面。有时我更喜欢使用内联代码,因为变更不会产生后台代码那样的更新成本。例如,使用后台代码必须更新整个后台 DLL,那时一个可能引起惊慌的主张。
第三个神话是组件比页面要快。这在经典的 ASP 中是存在的,因为编译的 COM 服务器要比 VBScript 快得多。但是对于页面和组件都是类的 ASP.NET 来说则不然。不论你的代码是以后台代码形式内联在页面,还是分离的组件,所产生的性能差别不大。只是这种组织形式能更好地从逻辑上对功能进行分组,在性能上没有差别。
我想澄清的最后一个神话是用 Web 服务来实现两个应用程序之间各个功能。Web 服务应该被用于连接异构系统或提供系统功能及行为的远程访问。不应该将它用于两个相同系统的内部连接。虽然使用起来很容易,但有很多其它更好的可选方法。最糟的事情莫过于将 Web 服务用于相同服务器上 ASP 和 ASP.NET 应用程序之间的通讯,我已经不厌其烦地对之进行了说明。
W
riting a Web application with ASP.NET is unbelievably easy. So easy, many developers don't take the time to structure their applications for great performance. In this article, I'm going to present 10 tips for writing high-performance Web apps. I'm not limiting my comments to ASP.NET applications because they are just one subset of Web applications. This article won't be the definitive guide for performance-tuning Web applications—an entire book could easily be devoted to that. Instead, think of this as a good place to start.
Before becoming a workaholic, I used to do a lot of rock climbing. Prior to any big climb, I'd review the route in the guidebook and read the recommendations made by people who had visited the site before. But, no matter how good the guidebook, you need actual rock climbing experience before attempting a particularly challenging climb. Similarly, you can only learn how to write high-performance Web applications when you're faced with either fixing performance problems or running a high-throughput site.
My personal experience comes from having been an infrastructure Program Manager on the ASP.NET team at Microsoft, running and managing www.asp.net, and helping architect Community Server, which is the next version of several well-known ASP.NET applications (ASP.NET Forums, .Text, and nGallery combined into one platform). I'm sure that some of the tips that have helped me will help you as well.
You should think about the separation of your application into logical tiers. You might have heard of the term 3-tier (or n-tier) physical architecture. These are usually prescribed architecture patterns that physically divide functionality across processes and/or hardware. As the system needs to scale, more hardware can easily be added. There is, however, a performance hit associated with process and machine hopping, thus it should be avoided. So, whenever possible, run the ASP.NET pages and their associated components together in the same application.
Because of the separation of code and the boundaries between tiers, using Web services or remoting will decrease performance by 20 percent or more.
The data tier is a bit of a different beast since it is usually better to have dedicated hardware for your database. However, the cost of process hopping to the database is still high, thus performance on the data tier is the first place to look when optimizing your code.
Before diving in to fix performance problems in your applications, make sure you profile your applications to see exactly where the problems lie. Key performance counters (such as the one that indicates the percentage of time spent performing garbage collections) are also very useful for finding out where applications are spending the majority of their time. Yet the places where time is spent are often quite unintuitive.
There are two types of performance improvements described in this article: large optimizations, such as using the ASP.NET Cache, and tiny optimizations that repeat themselves. These tiny optimizations are sometimes the most interesting. You make a small change to code that gets called thousands and thousands of times. With a big optimization, you might see overall performance take a large jump. With a small one, you might shave a few milliseconds on a given request, but when compounded across the total requests per day, it can result in an enormous improvement.
Performance on the Data Tier
When it comes to performance-tuning an application, there is a single litmus test you can use to prioritize work: does the code access the database? If so, how often? Note that the same test could be applied for code that uses Web services or remoting, too, but I'm not covering those in this article.
If you have a database request required in a particular code path and you see other areas such as string manipulations that you want to optimize first, stop and perform your litmus test. Unless you have an egregious performance problem, your time would be better utilized trying to optimize the time spent in and connected to the database, the amount of data returned, and how often you make round-trips to and from the database.
With that general information established, let's look at ten tips that can help your application perform better. I'll begin with the changes that can make the biggest difference.
Tip 1—Return Multiple Resultsets
Review your database code to see if you have request paths that go to the database more than once. Each of those round-trips decreases the number of requests per second your application can serve. By returning multiple resultsets in a single database request, you can cut the total time spent communicating with the database. You'll be making your system more scalable, too, as you'll cut down on the work the database server is doing managing requests.
While you can return multiple resultsets using dynamic SQL, I prefer to use stored procedures. It's arguable whether business logic should reside in a stored procedure, but I think that if logic in a stored procedure can constrain the data returned (reduce the size of the dataset, time spent on the network, and not having to filter the data in the logic tier), it's a good thing.
Using a SqlCommand instance and its ExecuteReader method to populate strongly typed business classes, you can move the resultset pointer forward by calling NextResult. Figure 1 shows a sample conversation populating several ArrayLists with typed classes. Returning only the data you need from the database will additionally decrease memory allocations on your server.
Tip 2—Paged Data Access
The ASP.NET DataGrid exposes a wonderful capability: data paging support. When paging is enabled in the DataGrid, a fixed number of records is shown at a time. Additionally, paging UI is also shown at the bottom of the DataGrid for navigating through the records. The paging UI allows you to navigate backwards and forwards through displayed data, displaying a fixed number of records at a time.
There's one slight wrinkle. Paging with the DataGrid requires all of the data to be bound to the grid. For example, your data layer will need to return all of the data and then the DataGrid will filter all the displayed records based on the current page. If 100,000 records are returned when you're paging through the DataGrid, 99,975 records would be discarded on each request (assuming a page size of 25). As the number of records grows, the performance of the application will suffer as more and more data must be sent on each request.
One good approach to writing better paging code is to use stored procedures. Figure 2 shows a sample stored procedure that pages through the Orders table in the Northwind database. In a nutshell, all you're doing here is passing in the page index and the page size. The appropriate resultset is calculated and then returned.
In Community Server, we wrote a paging server control to do all the data paging. You'll see that I am using the ideas discussed in Tip 1, returning two resultsets from one stored procedure: the total number of records and the requested data.
The total number of records returned can vary depending on the query being executed. For example, a WHERE clause can be used to constrain the data returned. The total number of records to be returned must be known in order to calculate the total pages to be displayed in the paging UI. For example, if there are 1,000,000 total records and a WHERE clause is used that filters this to 1,000 records, the paging logic needs to be aware of the total number of records to properly render the paging UI.
Tip 3—Connection Pooling
Setting up the TCP connection between your Web application and SQL Server™ can be an expensive operation. Developers at Microsoft have been able to take advantage of connection pooling for some time now, allowing them to reuse connections to the database. Rather than setting up a new TCP connection on each request, a new connection is set up only when one is not available in the connection pool. When the connection is closed, it is returned to the pool where it remains connected to the database, as opposed to completely tearing down that TCP connection.
Of course you need to watch out for leaking connections. Always close your connections when you're finished with them. I repeat: no matter what anyone says about garbage collection within the Microsoft® .NET Framework, always call Close or Dispose explicitly on your connection when you are finished with it. Do not trust the common language runtime (CLR) to clean up and close your connection for you at a predetermined time. The CLR will eventually destroy the class and force the connection closed, but you have no guarantee when the garbage collection on the object will actually happen.
To use connection pooling optimally, there are a couple of rules to live by. First, open the connection, do the work, and then close the connection. It's okay to open and close the connection multiple times on each request if you have to (optimally you apply Tip 1) rather than keeping the connection open and passing it around through different methods. Second, use the same connection string (and the same thread identity if you're using integrated authentication). If you don't use the same connection string, for example customizing the connection string based on the logged-in user, you won't get the same optimization value provided by connection pooling. And if you use integrated authentication while impersonating a large set of users, your pooling will also be much less effective. The .NET CLR data performance counters can be very useful when attempting to track down any performance issues that are related to connection pooling.
Whenever your application is connecting to a resource, such as a database, running in another process, you should optimize by focusing on the time spent connecting to the resource, the time spent sending or retrieving data, and the number of round-trips. Optimizing any kind of process hop in your application is the first place to start to achieve better performance.
The application tier contains the logic that connects to your data layer and transforms data into meaningful class instances and business processes. For example, in Community Server, this is where you populate a Forums or Threads collection, and apply business rules such as permissions; most importantly it is where the Caching logic is performed.
Tip 4—ASP.NET Cache API
One of the very first things you should do before writing a line of application code is architect the application tier to maximize and exploit the ASP.NET Cache feature.
If your components are running within an ASP.NET application, you simply need to include a reference to System.Web.dll in your application project. When you need access to the Cache, use the HttpRuntime.Cache property (the same object is also accessible through Page.Cache and HttpContext.Cache).
There are several rules for caching data. First, if data can be used more than once it's a good candidate for caching. Second, if data is general rather than specific to a given request or user, it's a great candidate for the cache. If the data is user- or request-specific, but is long lived, it can still be cached, but may not be used as frequently. Third, an often overlooked rule is that sometimes you can cache too much. Generally on an x86 machine, you want to run a process with no higher than 800MB of private bytes in order to reduce the chance of an out-of-memory error. Therefore, caching should be bounded. In other words, you may be able to reuse a result of a computation, but if that computation takes 10 parameters, you might attempt to cache on 10 permutations, which will likely get you into trouble. One of the most common support calls for ASP.NET is out-of-memory errors caused by overcaching, especially of large datasets.
Figure 3 ASP.NET Cache
There are a several great features of the Cache that you need to know. The first is that the Cache implements a least-recently-used algorithm, allowing ASP.NET to force a Cache purge—automatically removing unused items from the Cache—if memory is running low. Secondly, the Cache supports expiration dependencies that can force invalidation. These include time, key, and file. Time is often used, but with ASP.NET 2.0 a new and more powerful invalidation type is being introduced: database cache invalidation. This refers to the automatic removal of entries in the cache when data in the database changes. For more information on database cache invalidation, see Dino Esposito's Cutting Edge column in the July 2004 issue of MSDN®Magazine. For a look at the architecture of the cache, see Figure 3.
Tip 5—Per-Request Caching
Earlier in the article, I mentioned that small improvements to frequently traversed code paths can lead to big, overall performance gains. One of my absolute favorites of these is something I've termed per-request caching.
Whereas the Cache API is designed to cache data for a long period or until some condition is met, per-request caching simply means caching the data for the duration of the request. A particular code path is accessed frequently on each request but the data only needs to be fetched, applied, modified, or updated once. This sounds fairly theoretical, so let's consider a concrete example.
In the Forums application of Community Server, each server control used on a page requires personalization data to determine which skin to use, the style sheet to use, as well as other personalization data. Some of this data can be cached for a long period of time, but some data, such as the skin to use for the controls, is fetched once on each request and reused multiple times during the execution of the request.
To accomplish per-request caching, use the ASP.NET HttpContext. An instance of HttpContext is created with every request and is accessible anywhere during that request from the HttpContext.Current property. The HttpContext class has a special Items collection property; objects and data added to this Items collection are cached only for the duration of the request. Just as you can use the Cache to store frequently accessed data, you can use HttpContext.Items to store data that you'll use only on a per-request basis. The logic behind this is simple: data is added to the HttpContext.Items collection when it doesn't exist, and on subsequent lookups the data found in HttpContext.Items is simply returned.
Tip 6—Background Processing
The path through your code should be as fast as possible, right? There may be times when you find yourself performing expensive tasks on each request or once every n requests. Sending out e-mails or parsing and validation of incoming data are just a few examples.
When tearing apart ASP.NET Forums 1.0 and rebuilding what became Community Server, we found that the code path for adding a new post was pretty slow. Each time a post was added, the application first needed to ensure that there were no duplicate posts, then it had to parse the post using a "badword" filter, parse the post for emoticons, tokenize and index the post, add the post to the moderation queue when required, validate attachments, and finally, once posted, send e-mail notifications out to any subscribers. Clearly, that's a lot of work.
It turns out that most of the time was spent in the indexing logic and sending e-mails. Indexing a post was a time-consuming operation, and it turned out that the built-in System.Web.Mail functionality would connect to an SMTP server and send the e-mails serially. As the number of subscribers to a particular post or topic area increased, it would take longer and longer to perform the AddPost function.
Indexing e-mail didn't need to happen on each request. Ideally, we wanted to batch this work together and index 25 posts at a time or send all the e-mails every five minutes. We decided to use the same code I had used to prototype database cache invalidation for what eventually got baked into Visual Studio® 2005.
The Timer class, found in the System.Threading namespace, is a wonderfully useful, but less well-known class in the .NET Framework, at least for Web developers. Once created, the Timer will invoke the specified callback on a thread from the ThreadPool at a configurable interval. This means you can set up code to execute without an incoming request to your ASP.NET application, an ideal situation for background processing. You can do work such as indexing or sending e-mail in this background process too.
There are a couple of problems with this technique, though. If your application domain unloads, the timer instance will stop firing its events. In addition, since the CLR has a hard gate on the number of threads per process, you can get into a situation on a heavily loaded server where timers may not have threads to complete on and can be somewhat delayed. ASP.NET tries to minimize the chances of this happening by reserving a certain number of free threads in the process and only using a portion of the total threads for request processing. However, if you have lots of asynchronous work, this can be an issue.
There is not enough room to go into the code here, but you can download a digestible sample at www.rob-howard.net. Just grab the slides and demos from the Blackbelt TechEd 2004 presentation.
Tip 7—Page Output Caching and Proxy Servers
ASP.NET is your presentation layer (or should be); it consists of pages, user controls, server controls (HttpHandlers and HttpModules), and the content that they generate. If you have an ASP.NET page that generates output, whether HTML, XML, images, or any other data, and you run this code on each request and it generates the same output, you have a great candidate for page output caching.
By simply adding this line to the top of your page
you can effectively generate the output for this page once and reuse it multiple times for up to 60 seconds, at which point the page will re-execute and the output will once be again added to the ASP.NET Cache. This behavior can also be accomplished using some lower-level programmatic APIs, too. There are several configurable settings for output caching, such as the VaryByParams attribute just described. VaryByParams just happens to be required, but allows you to specify the HTTP GET or HTTP POST parameters to vary the cache entries. For example, default.aspx?Report=1 or default.aspx?Report=2 could be output-cached by simply setting VaryByParam="Report". Additional parameters can be named by specifying a semicolon-separated list.
Many people don't realize that when the Output Cache is used, the ASP.NET page also generates a set of HTTP headers that downstream caching servers, such as those used by the Microsoft Internet Security and Acceleration Server or by Akamai. When HTTP Cache headers are set, the documents can be cached on these network resources, and client requests can be satisfied without having to go back to the origin server.
Using page output caching, then, does not make your application more efficient, but it can potentially reduce the load on your server as downstream caching technology caches documents. Of course, this can only be anonymous content; once it's downstream, you won't see the requests anymore and can't perform authentication to prevent access to it.
Tip 8—Run IIS 6.0 (If Only for Kernel Caching)
If you're not running IIS 6.0 (Windows Server™ 2003), you're missing out on some great performance enhancements in the Microsoft Web server. In Tip 7, I talked about output caching. In IIS 5.0, a request comes through IIS and then to ASP.NET. When caching is involved, an HttpModule in ASP.NET receives the request, and returns the contents from the Cache.
If you're using IIS 6.0, there is a nice little feature called kernel caching that doesn't require any code changes to ASP.NET. When a request is output-cached by ASP.NET, the IIS kernel cache receives a copy of the cached data. When a request comes from the network driver, a kernel-level driver (no context switch to user mode) receives the request, and if cached, flushes the cached data to the response, and completes execution. This means that when you use kernel-mode caching with IIS and ASP.NET output caching, you'll see unbelievable performance results. At one point during the Visual Studio 2005 development of ASP.NET, I was the program manager responsible for ASP.NET performance. The developers did the magic, but I saw all the reports on a daily basis. The kernel mode caching results were always the most interesting. The common characteristic was network saturation by requests/responses and IIS running at about five percent CPU utilization. It was amazing! There are certainly other reasons for using IIS 6.0, but kernel mode caching is an obvious one.
Tip 9—Use Gzip Compression
While not necessarily a server performance tip (since you might see CPU utilization go up), using gzip compression can decrease the number of bytes sent by your server. This gives the perception of faster pages and also cuts down on bandwidth usage. Depending on the data sent, how well it can be compressed, and whether the client browsers support it (IIS will only send gzip compressed content to clients that support gzip compression, such as Internet Explorer 6.0 and Firefox), your server can serve more requests per second. In fact, just about any time you can decrease the amount of data returned, you will increase requests per second.
The good news is that gzip compression is built into IIS 6.0 and is much better than the gzip compression used in IIS 5.0. Unfortunately, when attempting to turn on gzip compression in IIS 6.0, you may not be able to locate the setting on the properties dialog in IIS. The IIS team built awesome gzip capabilities into the server, but neglected to include an administrative UI for enabling it. To enable gzip compression, you have to spelunk into the innards of the XML configuration settings of IIS 6.0 (which isn't for the faint of heart). By the way, the credit goes to Scott Forsyth of OrcsWeb who helped me figure this out for the www.asp.net severs hosted by OrcsWeb.
Rather than include the procedure in this article, just read the article by Brad Wilson at IIS6 Compression. There's also a Knowledge Base article on enabling compression for ASPX, available at Enable ASPX Compression in IIS. It should be noted, however, that dynamic compression and kernel caching are mutually exclusive on IIS 6.0 due to some implementation details.
Tip 10—ServerControlViewState
View state is a fancy name for ASP.NET storing some state data in a hidden input field inside the generated page. When the page is posted back to the server, the server can parse, validate, and apply this view state data back to the page's tree of controls. View state is a very powerful capability since it allows state to be persisted with the client and it requires no cookies or server memory to save this state. Many ASP.NET server controls use view state to persist settings made during interactions with elements on the page, for example, saving the current page that is being displayed when paging through data.
There are a number of drawbacks to the use of view state, however. First of all, it increases the total payload of the page both when served and when requested. There is also an additional overhead incurred when serializing or deserializing view state data that is posted back to the server. Lastly, view state increases the memory allocations on the server.
Several server controls, the most well known of which is the DataGrid, tend to make excessive use of view state, even in cases where it is not needed. The default behavior of the ViewState property is enabled, but if you don't need it, you can turn it off at the control or page level. Within a control, you simply set the EnableViewState property to false, or you can set it globally within the page using this setting:
If you are not doing postbacks in a page or are always regenerating the controls on a page on each request, you should disable view state at the page level.
Conclusion
I've offered you some tips that I've found useful for writing high-performance ASP.NET applications. As I mentioned at the beginning of this article, this is more a preliminary guide than the last word on ASP.NET performance. (More information on improving the performance of ASP.NET apps can be found at Improving ASP.NET Performance.) Only through your own experience can you find the best way to solve your unique performance problems. However, during your journey, these tips should provide you with good guidance. In software development, there are very few absolutes; every application is unique.
See the sidebar "Common Performance Myths"