iOS 中正则表达式使用方法汇总

iOS 中正则表达式使用方法汇总

太阳火神的美丽人生 (http://blog.csdn.net/opengl_es)

本文遵循“署名-非商业用途-保持一致”创作公用协议

转载请保留此句:太阳火神的美丽人生 -  本博客专注于 敏捷开发及移动和物联设备研究:iOS、Android、Html5、Arduino、pcDuino否则,出自本博客的文章拒绝转载或再转载,谢谢合作。


某种语言中的正则工具算是木桶,而这个工具处理的是正则表达式,算是水,那么水很多,无论是淡水还是咸水,或是雨水,至少就Perl正则表达式这一支来说,足以装满任何一个木桶,只有这个木桶做得还不足以容纳这一类的所有的水的时侯;

那么,不要纠结于怪异的差异,以实现你的功能为主,以你的最终目标作为出发点,来研究某项工具能做的事情,以及如何来做,达到预期目的即可,这可能还是敏捷开发的一种表现,当下目的达到即可,不必纠结其它。

然而,在工作和具体任务之外,还是多了解一些为好,以便形成一套自已的知识体系,在任何需要的时侯,从立体角度去界定问题所对应的能解决它的知识板块在你自已知识体系所处的位置。

所以说,传统的软件工程方法,并不是没有它的道理,只不过是茫茫软件学发展过程中的一个必经之路,也是在学习和研究过程中形成的一套体系;而如今当某一层面成形之后,可能象使有五笔输入法一样,不必再去纠结于如何拆字,而是见字就打,以打字为目的,而非以拆字为目的,这时敏捷软件开发思想的当下原则便是长久以来不断摸索研究的结果。

其实,真的没必要把敏捷搞得神乎其神,没边儿没沿儿,做开发做久的人,都会从不断的传统软件工程方法的学习研究过程中走出来,然后以一种更便捷的,去重复劳动的方式来做事情,其实那就是敏捷,以当下和眼前目标为第一要务。

至于时间,这个也是敏捷开发在目前来看难以推行的一个大问题,本来同样一个活儿,一年也能干,一个月也能干,一天有时也能干,长工出细活儿,而我们开发人员也常是以经定的时间来琢磨着,哪些是核心的必做,剩下的时间哪些可以胡弄一下,以减少时间花费,这样正好能按预定时间做完。

然而,现在更多的时,让开发人员估时间,而没有一个相对来说明确和细致的需求可供参考,这些需求有时是在客户的脑子里,有时是在项目经理与客户沟通过程中得到或忽略了,最终造成一个必要因素被无端抛弃。

这其实,更多取决于项目经理的个人能力,一是要技术过硬,二是要有市场和需求的思想,能很准确或有方法地探出客户的真正需求以及需求层级来。

往往,一些开发老手,会以各种方法来“挤对”项目经理,直到把这个根儿给挖出来(可能这也是项目经理想要掌开发人员玩弄于股掌之中的唯一手段吧,如果连这个都被开发人员知道了,他就没什么作用了?!真应了余世维先生讲到的中国经理人的毛病了“一定要让别人觉得自已很重要才行......”),事情就迎刃而解;不过,有时挤对不好,容易引火上身,遭到公司更高层面领导的干预和压制,得不偿失,这样的话还不如做一天和尚撞一天钟,碰个大运,弄好了,走对叉路了,得到表扬,弄不好,耽误工夫,那责任无疑落在开发人员头上,一大堆理由等着你,所以就不要辩解了,明知道黑,说了有什么用,只要闭上眼,摸 着往前走就好了,尽量别让自已再碰壁,留个全尸才好。

能留个全尸,恭喜你,进阶了,成了老油条了!老油条,就是在荆棘路上走得多了,知道哪深哪浅,别管路走得远或近,耽误工夫也跟你没关系,要不然,你用心就等于自送性命。

可怜可悲,真正想做事情的人,确是如此下场,最终活下来,也是得曲线救国,方能自我保全。

可怜,忠勇之士,自古如此;

可悲,时下之风,随波逐流,弑忠无不谓之昏也;

万望明智者,壮志未酬的小老板们,擦亮双眼,贤才、忠勇尽收麾下,如刘备般爱才有道,匆弃川蜀而劳军伐吴(忠言搁置一旁),自毁 钱 程!


补了上面这一段,有些困了,从昨晚7:30睡到今早0:28,6个小时就再也睡不着了,断断续续写这一篇,两个来小时,应该不是梦游中完成。

不知是真的6小时就够了,还是0点到3点这一段的胆排毒时间到了,难道又出问题了?还是先吃药吧,中成药,效果不错,鸡骨草胶囊,吃一回想一回,吃嘛嘛香......希望是这样!


1、NSRegularExpression

Abel  22:14:23
大大们,有没有谁能发一个有关ios正则表达式的资料啊,或者链接也行,我之前学过perl的正则表达式,但是觉得和ios的不大一样


以下内容足够你参考的了!“The regular expression patterns and behavior are based on Perl's regular expressions.”这句说明,其还是基于 Perl正则表达式,不过针对c++环境有一些扩展。
iOS 中正则表达式使用方法汇总_第1张图片


http://userguide.icu-project.org/strings/regexp


2、NSPredicate

另外,谓词 NSPredicate 也可以使用正则表达式来进行过滤,简单地说,就是使用正则表达式语法来进行匹配。构建谓词的格式字符串可以实现一些常规的像SQL语句中的匹配,当使用正则时,需要用在格式字符串中使用 MATCHES 
iOS 中正则表达式使用方法汇总_第2张图片


iOS 中正则表达式使用方法汇总_第3张图片

3、开源正则解析库

     RegexKitLite 或许还有其它的开源库供使用,后续发现不断更新....

    具体使用方法如下: http://regexkit.sourceforge.net/RegexKitLite/index.html#ICUSyntax 此地址需要方可访问,可使用 goAgent。

    下面转贴上面链接原文:


PCRE (Perl Compatible Regular Expressions)

ICU (Internatinal Components For Unicode)

RegexKitLite

Lightweight Objective-C Regular Expressions for Mac OS X using the ICU Library

Introduction to RegexKitLite

This document introduces RegexKitLite for Mac OS XRegexKitLite enables easy access to regular expressions by providing a number of additions to the standard Foundation NSString class. RegexKitLite acts as a bridge between the NSString class and the regular expression engine in the International Components for Unicode, or ICU, dynamic shared library that is shipped with Mac OS X.

亮点 Highlights

  • Uses the regular expression engine from the ICU library which is shipped with Mac OS X.
  • Automatically caches compiled regular expressions.
  • Uses direct access to a strings UTF-16 buffer if it is available.
  • Caches the UTF-16 conversion that is required by the ICU library when direct access to a strings UTF-16 buffer is unavailable.
  • Small size makes it ideal for use in iPhone applications.
  • Multithreading safe.
  • 64-bit support.
  • Custom DTrace probe points.
  • Support for Mac OS X 10.5 Garbage Collection.
  • Support for the Blocks language extension.
  • Uses Core Foundation for greater speed.
  • Very easy to use, all functionality is provided by a category extension to the NSString class.
  • Consists of two files, a header and the Objective-C source.
  • Xcode 3 integrated documentation available.
  • Distributed under the terms of the BSD License.

谁应该阅读本文档 Who Should Read This Document

This document is intended for readers who would like to be able to use regular expressions in their Objective-C applications, whether those applications are for the Mac OS X desktop, or for the iPhone.

This document, and RegexKitLite, is also intended for anyone who has the need to search and manipulate NSString objects. If you've ever used the NSScannerNSCharacterSet, and NSPredicate classes, or any of the theNSString rangeOf… methods, RegexKitLite is for you.

Regular expressions are a powerful way to search, match and extract, and manipulate strings. RegexKitLite can perform many of the same operations that NSScannerNSCharacterSet, and NSPredicate perform, and usually do it with far fewer lines of code. As an example, RegexKitLite Cookbook - Parsing CSV Data contains an example that is just over a dozen lines, but is a full featured CSV, or Comma Separated Value, parser that takes a CSV input and turns it in to a NSArray of NSArrays.

本文档组织结构 Organization of This Document

本文档遵循苹果文档的约定和样式,分成两个主要部分:
This document follows the conventions and styles used by Apples documentation and is divided in to two main parts:

  • 类参考部分,RegexKitLite 的 NSString 附加参考手册
    The Class Reference part, RegexKitLite NSString Additions Reference.
  • 编程指南部分,由下列章节组成:
    The Programming Guide part, which consists of the following chapters:
    • RegexKitLite 综述
      RegexKitLite
       Overview
    • 使用 RegexKitLite
      Using RegexKitLite
    • ICU 语法 
      ICU Syntax
    • RegexKitLite 宝典
      RegexKitLite
       Cookbook
    • 将 RegexKitLite 加入到你的项目中
      Adding RegexKitLite to your Project

额外的信息可以在下面部分找到
Additional information can be found in the following sections:

  • 发布信息,包含 RegexKitLite 4.0 发布声明
    Release Information, which contains the Release Notes for RegexKitLite 4.0
  • 授权信息,包含 RegexKitLite BSD 授权
    License Information, which contains the RegexKitLite BSD License.

通过捐赠支持 RegexKitLite
Supporting RegexKitLite through Financial Donations

A significant amount of time and effort has gone in to the development of RegexKitLite. Even though it is distributed under the terms of the BSD License, you are encouraged to contribute financially if you are using RegexKitLitein a profitable commercial application. Should you decide to contribute to RegexKitLite, please keep the following in mind:

  • What it would have cost you in terms of hours, or consultant fees, to develop similar functionality.
  • The Ohloh.net metrics do not factor in the cost of writing documentation, which is where most of the effort is spent.
  • The target audience for RegexKitLite is very small, so there are relatively few units "sold".

You can contribute by visiting SourceForge.net's donation page for RegexKitLite.

Metrics by Ohloh.net
Support This Project
Important:

You are always required to acknowledge the use of RegexKitLite in your product as specified in the terms of the BSD License.

下载 Download

You can download RegexKitLite distribution that corresponds to this version of the documentation here— RegexKitLite-4.0.tar.bz2 (139.1K). To be automatically notified when a new version of RegexKitLite is available, add the RegexKitLite documentation feed to Xcode.

PDF 文档 PDF Documentation

This document is available in PDF format: RegexKitLite-4.0.pdf (1.1M).

Note:

If you wish to print this document, it is recommend that you use the PDF version.

报告问题 Reporting Bugs

You can file bug reports, or review submitted bugs, at the following URL: http://sourceforge.net/tracker/?group_id=204582&atid=990188

Note:

Anonymous bug reports are no longer accepted due to spam. A SourceForge.net account is required to file a bug report.

联系作者 Contacting The Author

The author can be contacted at [email protected].

RegexKitLite Overview

While RegexKitLite is not a descendant of the RegexKit.framework source code, it does provide a small subset of RegexKits NSString methods for performing various regular expression tasks. These include determining the range that a regular expression matches within a string, easily creating a new string from the results of a match, splitting a string in to a NSArray with a regular expression, and performing search and replace operations with regular expressions using common $n substitution syntax.

RegexKitLite uses the regular expression provided by the ICU library that ships with Mac OS X. The two files, RegexKitLite.h and RegexKitLite.m, and linking against the /usr/lib/libicucore.dylib ICU shared library is all that is required. Adding RegexKitLite to your project only adds a few kilobytes of overhead to your applications size and typically only requires a few kilobytes of memory at run-time. Since a regular expression must first be compiled by the ICU library before it can be used, RegexKitLite keeps a small 4-way set associative cache with a least recently used replacement policy of the compiled regular expressions.

See Also
  • RegexKit Framework
  • International Components for Unicode
  • Unicode Home Page

Official Support from Apple for ICU Regular Expressions

Mac OS X

As of Mac OS X 10.6, the author is not aware of any official support from Apple for linking to the libicucore.dylib library. On the other hand, the author is unaware of any official prohibition against it, either. Linking to the ICU library and making use of the ICU regular expression API is slightly different than making use of private, undocumented API's. There are a number of very good reasons why you shouldn't use private, undocumented API's, such as:

  • The undocumented, private API is not yet mature enough for Apple to commit to supporting it. Once an API is made "public", developers expect future versions to at least be compatible with previously published versions.
  • The undocumented, private API may expose implementation specific details that can change between versions. Public API's are the proper "abstraction layer boundary" that allows the provider of the API to hide implementation specific details.

The ICU library, on the other hand, contains a "published, public API" in which the ICU developers have committed to supporting in later releases, and RegexKitLite uses only these public APIs. One could argue that Apple is not obligated to continue to include the ICU library in later versions of Mac OS X, but this seems unlikely for a number of reasons which will not be discussed here. With the introduction of iPhone OS 3.2, Apple now officially supports iPhone applications linking to the ICU library for the purpose of using its regular expression functionality. This is encouraging news for Mac OS X developers if one assumes that Apple will try to keep some kind of parity between the iPhone OS and Mac OS X API's.

iPhone OS < 3.2

Prior to iPhone OS 3.2, there was never any official support from Apple for linking to the libicucore.dylib library. It was unclear if linking to the library would violate the iPhone OS SDK Agreement prohibition against using undocumented API's, but a large number of iPhone applications choose to use RegexKitLite, and the author is not aware of a single rejection because of it.

iPhone OS ≥ 3.2

Starting with iPhone OS 3.2, Apple now officially allows iPhone OS applications to link with the ICU library. The ICU library contains a lot of functionality for dealing with internationalization and localization, but Apple only officially permits the use of the ICU Regular Expression functionality.

Apple also provides a way to use ICU based regular expressions from Foundation by adding a new option to NSStringCompareOptions– NSRegularExpressionSearch. This new option can be used with the NSStringrangeOfString:options: method, and the following example of its usage is given:

// finds phone number in format nnn-nnn-nnnnNSRange r;NSString *regEx = @"{3}-[0-9]{3}-[0-9]{4}";r = [textView.text rangeOfString:regEx options:NSRegularExpressionSearch];if (r.location != NSNotFound) { NSLog(@"Phone number is %@", [textView.text substringWithRange:r]);} else { NSLog(@"Not found.");}

At this time, rangeOfString:options: is the only regular expression functionality Apple has added to Foundation and capture groups in a regular expression are not supported. Apple also gives the following note:

Note:

As noted in "ICU Regular-Expression Support," the ICU libraries related to regular expressions are included in iPhone OS 3.2. However, you should only use the ICU facilities if the NSString alternative is not sufficient for your needs.

RegexKitLite provides a much richer API, such as the automatic extraction of a match as a NSString. Using RegexKitLite, the example can be rewritten as:

// finds phone number in format nnn-nnn-nnnnNSString *regEx = @"{3}-[0-9]{3}-[0-9]{4}";NSString *match = [textView.text stringByMatching:regEx];if ([match isEqual:@""] == NO) { NSLog(@"Phone number is %@", match);} else { NSLog(@"Not found.");}

What's more, RegexKitLite provides easy access to all the matches of a regular expression in a NSString:

// finds phone number in format nnn-nnn-nnnnNSString *regEx = @"{3}-[0-9]{3}-[0-9]{4}";for(NSString *match in [textView.text componentsMatchedByRegex:regEx]) { NSLog(@"Phone number is %@", match);}

To do the same thing using just NSRegularExpressionSearch would require significantly more code and effort on your part. RegexKitLite also provides powerful search and replace functionality:

// finds phone number in format nnn-nnn-nnnnNSString *regEx = @"({3})-([0-9]{3}-[0-9]{4})";// and transforms the phone number in to the format of (nnn) nnn-nnnnNSString *replaced = [textView.text stringByReplacingOccurrencesOfRegex:regEx withString:@"($1) $2"];

RegexKitLite also has a number of performance enhancing features built in such as caching compiled regular expressions. Although the author does not have any benchmarks comparing NSRegularExpressionSearch toRegexKitLite, it is likely that RegexKitLite outperforms NSRegularExpressionSearch.

See Also
  • What's New in iPhone OS - ICU Regular-Expression Support
  • What's New in iPhone OS - Foundation Framework Changes
  • iPad Programming Guide - ICU Regular-Expression Support
  • iPad Programming Guide - Foundation-Level Regular Expressions
  • NSRegularExpressionSearch
  • - rangeOfString:options:

The iPhone 4.0 SDK Agreement

While iPhone OS 3.2 included official, Apple sanctioned use of linking of the ICU library for the purposes of using the ICU regular expression engine, the iPhone OS 4.0 SDK included the following change to the iPhone OS SDKAgreement:

3.3.1

Applications may only use Documented APIs in the manner prescribed by Apple and must not use or call any private APIs. Applications must be originally written in Objective-C, C, C++, or JavaScript as executed by the iPhone OS WebKit engine, and only code written in C, C++, and Objective-C may compile and directly link against the Documented APIs (e.g., Applications that link to Documented APIs through an intermediary translation or compatibility layer or tool are prohibited).

This raises a number of obvious questions:

  • Does 3.3.1 apply to RegexKitLite?
  • Will the use of RegexKitLite in an iPhone OS application be grounds for rejection under 3.3.1?

There is considerable speculation as to what is covered by this change, but at the time of this writing, there is no empirical evidence or official guidelines from Apple to make any kind of an informed decision as to whether or not the use of RegexKitLite would violate 3.3.1. It is the authors opinion that RegexKitLite could be considered as a compatibility layer between NSString and the now Documented APIs for regular expressions in the ICU library.

It is widely speculated that the motivation for the change to 3.3.1 was to prevent the development of Flash applications for the iPhone. The author believes that most reasonable people would consider the application ofcompatibility layer in this context to mean something entirely different than what it means when applied to RegexKitLite.

At this time, the author is not aware of a single iPhone application that has been rejected due to the use of RegexKitLite. If your application is rejected due to the use of RegexKitLite, please let the author know by emailing[email protected]. As always, CAVEAT EMPTOR.

The Difference Between RegexKit.framework and Regex点击打开链接KitLite

RegexKit.framework and RegexKitLite are two different projects. In retrospect, RegexKitLite should have been given a more distinctive name. Below is a table summarizing some of the key differences between the two:

  RegexKit.framework RegexKitLite
Regex Library PCRE (Perl Compatible Regular Expressions) ICU (Internatinal Components For Unicode)
Library Included Yes, built into framework object file. No, provided by Mac OS X.
Library Linked As Statically linked into framework. Dynamically linked to /usr/lib/libicucore.dylib.
Compiled Size Approximately 371KB per architecture. Very small, approximately 16KB—20KB per architecture.
Style External, linked to framework. Compiled directly in to final executable.
Feature Set Large, with additions to many classes. Minimal, NSString only.
-
Version 0.6.0. About half of the  371KB is the PCRE library.
The default distribution framework shared library file is  1.4MB in size and includes the  ppcppc64i386, and  x86_64 architectures.
If  64-bit support is removed, the framework shared library file size drops to  664KB.
-
Since the ICU library is part of  Mac OS X, it does not add to the final size.

Compiled Regular Expression Cache

The NSString that contains the regular expression must be compiled in to an ICU URegularExpression. This can be an expensive, time consuming step, and the compiled regular expression can be reused again in another search, even if the strings to be searched are different. Therefore RegexKitLite keeps a small cache of recently compiled regular expressions.

The cache is organized as a 4-way set associative cache, and the size of the cache can be tuned with the pre-processor define RKL_CACHE_SIZE. The default cache size, which should always be a prime number, is set to 13. Since the cache is 4-way set associative, the total number of compiled regular expressions that can be cached is RKL_CACHE_SIZE times four, for a total of 13 * 4, or 52. The NSString regexString is mapped to a cache set using modular arithmetic: Cache set ≡ [regexString hash] mod RKL_CACHE_SIZE, i.e. cacheSet = [regexString hash] % 13;. Since RegexKitLite uses Core Foundation, this is actually codedas cacheSet = CFHash(regexString) % RKL_CACHE_SIZE;.

Each of the four "ways" of a cache set are checked to see if it contains a NSString that was used to create the compiled regular expression that is identical to the NSString for the regular expression that is being checked. If there is an exact match, then the matching "way" is updated as the most recently used, and the compiled regular expression is used as-is. Otherwise, the least recently used, or LRU, "way" in the cache set is cleared and replaced with the compiled regular expression for the regular expression that wasn't in the cache.

In addition to the compiled regular expression cache, RegexKitLite keeps a small lookaside cache that maps a regular expressions NSString pointer and RKLRegexOptions directly to a cached compiled regular expression. When a regular expressions NSString pointer and RKLRegexOptions is in the lookaside cache, RegexKitLite can bypass calling CFHash(regexString) and checking each of the four "ways" in a cache set since the lookaside cache has provided the exact cached compiled regular expression. The lookaside cache is quite small at just 64 bytes and it was added because Shark.app profiling during performance tuning showed that CFHash(), while quite fast, was the primary bottleneck when retrieving already compiled and cached regular expressions, typically accounting for ≅40% of the look up time.

Regular Expressions in Mutable Strings

When a regular expression is compiled, an immutable copy of the string is kept. For immutable NSString objects, the copy is usually the same object with its reference count increased by one. Only NSMutableString objects will cause a new, immutable NSString to be created.

If the regular expression being used is stored in a NSMutableString, the cached regular expression will continue to be used as long as the NSMutableString remains unchanged. Once mutated, the changed NSMutableStringwill no longer be a match for the cached compiled regular expression that was being used by it previously. Even if the newly mutated strings hash is congruent to the previous unmutated strings hash modulo RKL_CACHE_SIZE, that is to say they share the same cache set (i.e., ([mutatedString hash] % RKL_CACHE_SIZE) == ([unmutatedString hash] % RKL_CACHE_SIZE)), the immutable copy of the regular expression string used to create the compiled regular expression is used to ensure true equality. The newly mutated string will have to go through the whole regular expression compile and cache creation process.

This means that NSMutableString objects can be safely used as regular expressions, and any mutations to those objects will immediately be detected and reflected in the regular expression used for matching.

Searching Mutable Strings

Unfortunately, the ICU regular expression API requires that the compiled regular expression be "set" to the string to be searched. To search a different string, the compiled regular expression must be "set" to the new string. Therefore, RegexKitLite tracks the last NSString that each compiled regular expression was set to, recording the pointer to the NSString object, its hash, and its length. If any of these parameters are different from the last parameters used for a compiled regular expression, the compiled regular expression is "set" to the new string. Since mutating a string will likely change its hash value, it's generally safe to search NSMutableString objects, and in most cases the mutation will reset the compiled regular expression to the updated contents of the NSMutableString.

Caution:

Care must be taken when mutable strings are searched and there exists the possibility that the string has mutated between searches. See NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings for more information.

Last Match Information

When performing a match, the arguments used to perform the match are kept. If those same arguments are used again, the actual matching operation is skipped because the compiled regular expression already contains the results for the given arguments. This is mostly useful when a regular expression contains multiple capture groups, and the results for different capture groups for the same match are needed. This means that there is only a small penalty for iterating over all the capture groups in a regular expression for a match, and essentially becomes the direct ICU regular expression API equivalent of uregex_start() and uregex_end().

See Also
  • ICU4C C API - Regular Expressions
  • ICU Regular Expression Syntax

UTF-16 Conversion Cache

RegexKitLite is ideal when the string being matched is a non-ASCII, Unicode string. This is because the regular expression engine used, ICU, can only operate on UTF-16 encoded strings. Since Cocoa keeps essentially allnon-ASCII strings encoded in UTF-16 form internally, this means that RegexKitLite can operate directly on the strings buffer without having to make a temporary copy and transcode the string in to ICU's required format.

Like all object oriented programming, the internal representation of an objects information is private. However, the ICU regular expression engine requires that the text to be search be encoded as a UTF-16 string. For pragmatic purposes, Core Foundation has several public functions that can provide direct access to the buffer used to hold the contents of the string, but such direct access is only available if the private buffer is already encoded in the requested direct access format. As a rough rule of thumb, 8-bit simple strings, such as ASCII, are kept in their 8-bit format. Non 8-bit simple strings are stored as UTF-16 strings. Of course, this is an implementation private detail, so this behavior should never be relied upon. It is mentioned because of the tremendous impact on matching performance and efficiency it can have if a string must be converted to UTF-16.

For strings in which direct access to the UTF-16 string is available, RegexKitLite uses that buffer. This is the ideal case as no extra work needs to be performed, such as converting the string in to a UTF-16 string, and allocating memory to hold the temporary conversion. Of course, direct access is not always available, and occasionally the string to be searched will need to be converted in to a UTF-16 string.

RegexKitLite has two conversion cache types. Each conversion cache type contains four buffers each, and buffers are re-used on a least recently used basis. If the selected cache type does not contain the contents of theNSString that is currently being searched in any of its buffers, the least recently used buffer is cleared and the current NSString takes it place. The first conversion cache type is fixed in size and set by the C pre-processordefine RKL_FIXED_LENGTH, which defaults to 2048. Any string whose length is less than RKL_FIXED_LENGTH will use the fixed size conversion cache type. The second conversion cache type, for strings whose length is longer than RKL_FIXED_LENGTH, will use a dynamically sized conversion buffer. The memory allocation for the dynamically sized conversion buffer is resized for each conversion with realloc() to the size needed to hold the entire contents of the UTF-16 converted string.

This strategy was chosen for its relative simplicity. Keeping track of dynamically created resources is required to prevent memory leaks. As designed, there are only four pointers to dynamically allocated memory: the four pointers to hold the conversion contents of strings whose length is larger than RKL_FIXED_LENGTH. However, since realloc() is used to manage those memory allocations, it becomes very difficult to accidentally leak the buffers. Having the fixed sized buffers means that the memory allocation system isn't bothered with many small requests, most of which are transient in nature to begin with. The current strategy tries to strike the best balance between performance and simplicity.

Mutable Strings

When converted in to a UTF-16 string, the hash of the NSString is recorded, along with the pointer to the NSString object and the strings length. In order for the RegexKitLite to use the cached conversion, all of these parameters must be equal to their values of the NSString to be searched. If there is any difference, the cached conversion is discarded and the current NSString, or NSMutableString as the case may be, is reconverted in to aUTF-16 string.

Caution:

Care must be taken when mutable strings are searched and there exists the possibility that the string has mutated between searches. See NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings for more information.

Multithreading Safety

RegexKitLite is also multithreading safe. Access to the compiled regular expression cache and the conversion cache is protected by a single OSSpinLock to ensure that only one thread has access at a time. The lock remains held while the regular expression match is performed since the compiled regular expression returned by the ICU library is not safe to use from multiple threads. Once the match has completed, the lock is released, and another thread is free to lock the cache and perform a match.

Important:

While it is safe to use the same regular expression from any thread at any time, the usual multithreading caveats apply. For example, it is not safe to mutate a NSMutableString in one thread while performing a match on the mutating string in another.

If Blocks functionality is enabled, and a RegexKitLite method that takes a Block as one of its parameters is used, RegexKitLite takes a slightly different approach in order to support the asynchronous, and possibly re-entrant, nature of Blocks.

First, an autoreleased Block helper proxy object is created and is used to keep track of any Block local resources needed to perform a Block-based enumeration.

Then the regular expression cache is checked exactly as before. Once a compiled regular expression is obtained, the ICU function uregex_clone is used to create a Block local copy of the regular expression. After the Block local copy has been made, the global compiled regular expression cache lock is unlocked.

If the string to be searched requires conversion to UTF-16, then a one time use Block local UTF-16 conversion of the string is created.

These changes mean that RegexKitLite Block-based enumeration methods are just as multithreading safe and easy to use as non-Block-based enumeration methods, such as the ability to continue to use RegexKitLite methods without any restrictions from within the Block used for enumeration.

64-bit Support

RegexKitLite is 64-bit clean. Internally, RegexKitLite uses Cocoas standard NSInteger and NSUInteger types for representing integer values. The size of these types change between 32-bit and 64-bit automatically, depending on the target architecture. ICU, on the other hand, uses a signed 32-bit int type for many of its arguments, such as string offset values. Because of this, the maximum length of a string that RegexKitLite will accept is the maximum value that can be represented by a signed 32-bit integer, which is approximately 2 gigabytes. Strings that are longer this limit will raise NSRangeException. This limitation may be significant to those who are switching to 64-bit because the size of the data they need to process exceeds what can be represented with 32-bits.

Note:

Several numeric constants throughout this document will have either L or UL appended to them— for example 0UL, or 2L. This is to ensure that they are treated as 64-bit long or unsigned long values, respectively, when targeting a 64-bit architecture.

Using RegexKitLite

The goal of RegexKitLite is not to be a comprehensive Objective-C regular expression framework, but to provide a set of easy to use primitives from which additional functionality can be created. To this end, RegexKitLiteprovides the following two core primitives from which everything else is built:

  • - (NSInteger)captureCountWithOptions:(RKLRegexOptions)options error:(NSError **)error;
  • - (NSRange)rangeOfRegex:(NSString *)regex options:(RKLRegexOptions)options inRange:(NSRange)range capture:(NSInteger)capture error:(NSError **)error;

There is often a need to create a new string of the characters that were matched by a regular expression. RegexKitLite provides the following method which conveniently combines sending the receiver substringWithRange:with the range returned by rangeOfRegex:.

  • - (NSString *)stringByMatching:(NSString *)regex  options:(RKLRegexOptions)options  inRange:(NSRange)range  capture:(NSInteger)capture  error:(NSError **)error;

RegexKitLite 2.0 adds the ability to split strings by dividing them with a regular expression, and the ability to perform search and replace operations using common $n substitution syntax.replaceOccurrencesOfRegex:withString: is used to modify the contents of NSMutableString objects directly and stringByReplacingOccurrencesOfRegex:withString: will create a new, immutable NSString from the receiver.

  • - (NSArray *)componentsSeparatedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  error:(NSError **)error;
  • - (NSUInteger)replaceOccurrencesOfRegex:(NSString *)regex  options:(RKLRegexOptions)options  withString:(NSString *)replacement  range:(NSRange)range  error:(NSError **)error;
  • - (NSString *)stringByReplacingOccurrencesOfRegex:(NSString *)regex  options:(RKLRegexOptions)options  withString:(NSString *)replacement  range:(NSRange)range  error:(NSError **)error;

RegexKitLite 3.0 adds several new methods that return a NSArray containing the aggregated results of a number of individual regex operations.

  • - (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  error:(NSError **)error;
  • - (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  error:(NSError **)error;
  • - (NSArray *)componentsMatchedByRegex:(NSString *)regex  range:(NSRange)range;

RegexKitLite 4.0 adds several new methods that take advantage of the new blocks language extension.

  • - (BOOL) enumerateStringsMatchedByRegex:( NSString *) regex  usingBlock: (void (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount],  volatile BOOL * const stop))block;
  • - (BOOL) enumerateStringsMatchedByRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock: (void (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount],  const NSRange capturedRanges[captureCount], volatile BOOL * const stop))block;
  • - (BOOL) enumerateStringsSeparatedByRegex:( NSString *) regex  usingBlock: (void (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount],  volatile BOOL * const stop))block;
  • - (BOOL) enumerateStringsSeparatedByRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock: (void (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount],  const NSRange capturedRanges[captureCount], volatile BOOL * const stop))block;
  • - (NSString *) stringByReplacingOccurrencesOfRegex:( NSString *) regex  usingBlock: (NSString (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount],  volatile BOOL * const stop))block;
  • - (NSString *) stringByReplacingOccurrencesOfRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock: (NSString (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount],  volatile BOOL * const stop))block;
  • - (NSUInteger) replaceOccurrencesOfRegex:( NSString *) regex  usingBlock: (NSString (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount],  volatile BOOL * const stop))block;
  • - (NSUInteger) replaceOccurrencesOfRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock: (NSString (^) (NSInteger captureCount,  NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount],  volatile BOOL * const stop))block;

There are no additional classes that supply the regular expression matching functionality, everything is accomplished with the two methods above. These methods are added to the existing NSString class via an Objective-Ccategory extension. See RegexKitLiteNSString Additions Reference for a complete list of methods.

The real workhorse is the rangeOfRegex:options:inRange:capture:error: method. The receiver of the message is an ordinary NSString class member that you wish to perform a regular expression match on. The parameters of the method are a NSString containing the regular expression regex, any RKLRegexOptions match options, the NSRange range of the receiver that is to be searched, the capture number from the regular expression regex that you would like the result for, and an optional error parameter that will contain a NSError object if a problem occurs with the details of the error.

Important:

The C language assigns special meaning to the \ character when inside a quoted " " string in your source code. The \ character is the escape character, and the character that follows has a different meaning than normal. The most common example of this is \n, which translates in to the new-line character. Because of this, you are required to 'escape' any uses of \ by prepending it with another \. In practical terms this means doubling any \ in a regular expression, which unfortunately is quite common, that are inside of quoted " " strings in your source code. Failure to do so will result in numerous warnings from the compiler about unknown escape sequences. To match a single literal \ with a regular expression requires no less than four backslashes: "\\\\".

See Also
  • ICU Regular Expression Syntax
  • RegexKitLite Cookbook
  • RegexKitLiteNSString Additions Reference
  • Regular Expression Options
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Blocks Programming Topics

Finding the Range of a Match

A simple example:

NSString *searchString = @"This is neat.";NSString *regexString = @"(\\w+)\\s+(\\w+)\\s+(\\w+)";NSRange matchedRange = NSMakeRange(NSNotFound, 0UL);NSError *error = NULL; matchedRange = [searchString rangeOfRegex:regexString options:RKLNoOptions inRange:searchRange capture:2L error:&error];NSLog(@"matchedRange: %@", NSStringFromRange(matchedRange));// 2008-03-18 03:51:16.530 test[51583:813] matchedRange: {5, 2}
Continues…
 

In the previous example, the NSRange that capture number 2 matched is {5, 2}, which corresponds to the word is in searchString. Once the NSRange is known, you can create a new string containing just the matching text:

…example
NSString *matchedString = [searchString substringWithRange:matchedRange];NSLog(@"matchedString: '%@'", matchedString);// 2008-03-18 03:51:16.532 test[51583:813] matchedString: 'is'

RegexKitLite can conveniently combine the two steps above with stringByMatching:. This example also demonstrates the use of one of the simpler convenience methods, where some of the arguments are automatically filled in with default values:

NSString *searchString = @"This is neat.";NSString *regexString = @"(\\w+)\\s+(\\w+)\\s+(\\w+)";NSString *matchedString = [searchString stringByMatching:regexString capture:2L];NSLog(@"matchedString: '%@'", matchedString);// 2008-03-18 03:53:42.949 test[51583:813] matchedString: 'is'
See Also
  • - rangeOfRegex:
  • - stringByMatching:
  • ICU Regular Expression Syntax

Search and Replace

You can perform search and replace operations on NSString objects and use common $n capture group substitution in the replacement string:

NSString *searchString = @"This is neat.";NSString *regexString = @"\\b(\\w+)\\b";NSString *replaceWithString = @"{$1}";NSString *replacedString = NULL;replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];NSLog(@"replaced string: '%@'", replacedString);// 2008-07-01 19:03:03.195 test[68775:813] replaced string: '{This}{is} {neat}.'
Important:
Search and replace methods will raise a  RKLICURegexException if the  replacementString contains  $ n capture references where  n is greater than the number of capture groups in the regular expression.

In this example, the regular expression \b(\w+)\b has a single capture group, which is created with the use of () parenthesis. The text that was matched inside the parenthesis is available for use in the replacement text by using $n, where n is the parenthesized capture group you would like to use. Additional capture groups are numbered sequentially in the order that they appear from left to right. Capture group 0 (zero) is also available and is equivalent to all the text that the regular expression matched.

Mutable strings can be manipulated directly:

NSMutableString *mutableString = [NSMutableString stringWithString:@"This is neat."];NSString *regexString = @"\\b(\\w+)\\b";NSString *replaceWithString = @"{$1}";NSUInteger replacedCount = 0UL;replacedCount = [mutableString replaceOccurrencesOfRegex:regexString withString:replaceWithString];NSLog(@"count: %lu string: '%@'", (u_long)replacedCount, mutableString);// 2008-07-01 21:25:43.433 test[69689:813] count: 3 string: '{This}{is} {neat}.'

Search and Replace using Blocks

RegexKitLite 4.0 adds support for performing the same search and replacement on strings, except now the contents of the replacement string are created by the Block that is passed as the argument. For each match that is found in the string, the Block argument is called and passed the details of the match which includes a C array of  NSString objects, one for each capture, along with a C array of  NSRange structures with the range information for the current match. The text that was matched will be replaced with the  NSString object that the Block is required to return. % This allows you complete control over the contents of the replaced text, such as doing complex transformations of the matched text, which is much more flexible and powerful than the simple, fixed replacement functionality provided by  stringByReplacingOccurrencesOfRegex:withString:. The example below is essentially the same as the previous search and replace examples, except this example uses the  capitalizedString method to capitalize the matched result, which is then used in the string that is returned as the replacement text. Note that the first letter in each word in  replacedString is now capitalized.
NSString *searchString = @"This is neat.";NSString *regexString = @"\\b(\\w+)\\b";NSString *replacedString = NULL;replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString usingBlock: ^NSString *(NSInteger captureCount, NSString * const capturedStrings, const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { return([NSString stringWithFormat:@"{%@}", [capturedStrings[1] capitalizedString]]); }];// 2010-04-14 21:00:42.726 test[35053:a0f] replaced string: '{This} {Is} {Neat}.'
See Also
  • - replaceOccurrencesOfRegex:usingBlock:
  • - replaceOccurrencesOfRegex:withString:
  • - stringByReplacingOccurrencesOfRegex:usingBlock:
  • - stringByReplacingOccurrencesOfRegex:withString:
  • ICU Regular Expression Syntax
  • ICU Replacement Text Syntax

Splitting Strings

Strings can be split with a regular expression using the componentsSeparatedByRegex: methods. This functionality is nearly identical to the preexisting NSString method componentsSeparatedByString:, except instead of only being able to use a fixed string as a separator, you can use a regular expression:

NSString *searchString = @"This is neat.";NSString *regexString = @"\\s+";NSArray *splitArray = NULL;splitArray = [searchString componentsSeparatedByRegex:regexString];// splitArray == { @"This", @"is", @"neat." } NSLog(@"splitArray: %@", splitArray);
Continues…
 

The output from NSLog() when run from a shell:

splitArray
shell% ./splitArray 2008-07-01 20:58:39.025 splitArray[69618:813] splitArray: ( This, is, "neat.")shell%

Unfortunately our example string @"This is neat." doesn't allow us to show off the power of regular expressions. As you can probably imagine, splitting the string with the regular expression \s+ allows for one or morewhite space characters to be matched. This can be much more flexible than just a fixed string of @" ", which will split on a single space only. If our example string contained extra spaces, say @"This   is     neat.", the result would have been the same.

See Also
  • - componentsSeparatedByRegex:
  • ICU Regular Expression Syntax

Creating an Array of Every Match

RegexKitLite 3.0 adds several methods that conveniently perform a number of individual RegexKitLite operations and aggregate the results in to a NSArray. Since the result is a NSArray, the standard Cocoa collection enumeration patterns can be used, such as NSEnumerator and Objective-C 2.0's for…in feature. One of the most common tasks is to extract all of the matches of a regular expression from a string.componentsMatchedByRegex: returns the entire text matched by a regular expression even if the regular expression contains additional capture groups, effectively capture group 0. For example:

NSString *searchString = @"$10.23, $1024.42, $3099";NSString *regexString = @"\\$((\\d+)(?:\\.(\\d+)|\\.?))";NSArray *matchArray = NULL;matchArray = [searchString componentsMatchedByRegex:regexString];// matchArray == { @"$10.23", @"$1024.42", @"$3099" }; NSLog(@"matchArray: %@", matchArray);
Continues…
 

The output from NSLog() when run from a shell:

matchArray
shell% ./matchArray 2009-05-06 03:20:03.546 matchArray[69939:10b] matchArray: ( "$10.23", "$1024.42", "$3099")shell%

As the example above demonstrates, componentsMatchedByRegex: returns the entire text that the regular expression matched even though the regular expression contains capture groups.arrayOfCaptureComponentsMatchedByRegex: can be used if you need to get the text that the individual capture groups matched as well:

NSString *searchString = @"$10.23, $1024.42, $3099";NSString *regexString = @"\\$((\\d+)(?:\\.(\\d+)|\\.?))";NSArray *capturesArray = NULL;capturesArray = [searchString arrayOfCaptureComponentsMatchedByRegex:regexString];/* capturesArray ==[NSArray arrayWithObjects: [NSArray arrayWithObjects: @"$10.23", @"10.23", @"10", @"23", NULL], [NSArray arrayWithObjects: @"$1024.42", @"1024.42", @"1024", @"42", NULL], [NSArray arrayWithObjects: @"$3099", @"3099", @"3099", @"", NULL], NULL];*/ NSLog(@"capturesArray: %@", capturesArray);
Continues…
 

The output from NSLog() when run from a shell:

capturesArray
shell% ./capturesArray 2009-05-06 03:25:46.852 capturesArray[69981:10b] capturesArray: ( ( "$10.23", "10.23", 10, 23 ), ( "$1024.42", "1024.42", 1024, 42 ), ( "$3099", 3099, 3099, "" ))shell%
See Also
  • - arrayOfCaptureComponentsMatchedByRegex:
  • - captureComponentsMatchedByRegex:
  • - componentsMatchedByRegex:
  • UsingRegexKitLite - Enumerating Matches
  • Collections Programming Topics for Cocoa - Enumerators: Traversing a Collection's Elements

Enumerating Matches

The RegexKitLite componentsMatchedByRegex: method enables you to quickly create a NSArray containing all the matches of a regular expression in a string. To enumerate the contents of the NSArray, you can send the array an objectEnumerator message.

See Also
  • - componentsMatchedByRegex:
  • NSArray Class Reference
  • NSEnumerator Class Reference
  • Collections Programming Topics for Cocoa - Enumerators: Traversing a Collection's Elements

An example using componentsMatchedByRegex: and a NSEnumerator:

File name:main.m
#import #import "RegexKitLite.h"int main(int argc, char *argv[]) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSString *searchString = @"one\ntwo\n\nfour\n"; NSArray *matchArray = NULL; NSEnumerator *matchEnumerator = NULL; NSString *regexString = @"(?m)^.*$"; NSLog(@"searchString: '%@'", searchString); NSLog(@"regexString : '%@'", regexString); matchArray = [searchString componentsMatchedByRegex:regexString]; matchEnumerator = [matchArray objectEnumerator]; NSUInteger line = 0UL; NSString *matchedString = NULL; while((matchedString = [matchEnumerator nextObject]) != NULL) { NSLog(@"%lu: %lu '%@'", (u_long)++line, (u_long)[matchedString length], matchedString); } [pool release]; return(0);}

The following shell transcript demonstrates compiling the example and executing it. Line number three clearly demonstrates that matches of zero length are possible. Without the additional logic in nextObject to handle this special case, the enumerator would never advance past the match.

Note:

In the shell transcript below, the NSLog() line that prints searchString has been annotated with the '' character to help visually identify the corresponding \n new-line characters in searchString.

shell% cd examples shell% gcc -I.. -g -o main main.m../RegexKitLite.m -framework Foundation -licucore shell% ./main 2008-03-21 15:56:17.469 main[44050:807] searchString: 'one two four '2008-03-21 15:56:17.520 main[44050:807] regexString : '(?m)^.*$'2008-03-21 15:56:17.575 main[44050:807] 1: 3 'one'2008-03-21 15:56:17.580 main[44050:807] 2: 3 'two'2008-03-21 15:56:17.584 main[44050:807] 3: 0 ''2008-03-21 15:56:17.590 main[44050:807] 4: 4 'four'shell%

Enumerating Matches with Objective-C 2.0

You can enumerate all the matches of a regular expression in a string using Objective-C 2.0's for…in feature. Compared to using a NSEnumerator, using for…in not only takes fewer lines of code to accomplish the same thing, it is usually faster as well.

See Also
  • The Objective-C 2.0 Programming Language - Fast Enumeration
  • Collections Programming Topics for Cocoa - Enumerators: Traversing a Collection's Elements

An example using the Objective-C 2.0 for…in feature:

File name:for_in.m
#import #import "RegexKitLite.h"int main(int argc, char *argv[]) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSString *searchString = @"one\ntwo\n\nfour\n"; NSString *regexString = @"(?m)^.*$"; NSUInteger line = 0UL; NSLog(@"searchString: '%@'", searchString); NSLog(@"regexString : '%@'", regexString); for(NSString *matchedString in [searchString componentsMatchedByRegex:regexString]) { NSLog(@"%lu: %lu '%@'", (u_long)++line, (u_long)[matchedString length], matchedString); } [pool release]; return(0);}
Note:

The output of the preceding example is identical to the NSEnumerator shell output.

Enumerating Matches using Blocks

A third way to enumerate all the matches of a regular expression in a string is to use one of the Blocks-based enumeration methods.
See Also
  • - enumerateStringsMatchedByRegex:usingBlock:
  • Regular Expression Enumeration Options
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
An example using  enumerateStringsMatchedByRegex:usingBlock::
#import #import "RegexKitLite.h"int main(int argc, char *argv) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSString *searchString = @"one\ntwo\n\nfour\n"; NSString *regexString = @"(?m)^.*$"; __block NSUInteger line = 0UL; NSLog(@"searchString: '%@'", searchString); NSLog(@"regexString : '%@'", regexString); [searchString enumerateStringsMatchedByRegex:regexString usingBlock: ^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { NSLog(@"%lu: %lu '%@'", ++line, [capturedStrings[0] length], capturedStrings[0]); }]; [pool release]; return(0);}
Note:

The output of the preceding example is identical to the NSEnumerator shell output.

DTrace

Important:

DTrace support is not enabled by default. To enable DTrace support, use the RKL_DTRACE pre-processor flag: -DRKL_DTRACE

RegexKitLite has two DTrace probe points that provide information about its internal caches:

  • RegexKitLite:::compiledRegexCache( unsigned long eventID,  const char *regexUTF8,  int options,  int captures,  int hitMiss,  int icuStatusCode,  const char *icuErrorMessage,  double *hitRate);
  • RegexKitLite:::utf16ConversionCache( unsigned long eventID,  unsigned int lookupResultFlags,  double *hitRate,  const void *string,  unsigned long NSRange_location, unsigned long NSRange_length,  long length);

Each of the probe points supply information via a number of arguments that are accessible through the DTrace variables arg0 … argn.

The first argument, eventID via arg0, is a unique event ID that is incremented each time the RegexKitLite mutex lock is acquired. All the probes that fire while the mutex is held will share the same event ID. This can help if you are trying to correlate multiple events across different CPUs.

Important:

Most uses of the dtrace command require superuser privileges. The examples given here use sudo to execute dtrace as the root user.

The following is available in examples/compiledRegexCache.d and demonstrates the use of all the arguments available via the RegexKitLite:::compiledRegexCache probe point:

File name:compiledRegexCache.d
#!/usr/sbin/dtrace -sRegexKitLite*:::compiledRegexCache { this->eventID = (unsigned long)arg0; this->regexUTF8 = copyinstr(arg1); this->options = (unsigned int)arg2; this->captures = (int)arg3; this->hitMiss = (int)arg4; this->icuStatusCode = (int)arg5; this->icuErrorMessage = (arg6 == 0) ? "" : copyinstr(arg6); this->hitRate = (double *)copyin(arg7, sizeof(double)); printf("%5d: %-60.60s Opt: %#8.8x Cap: %2d Hit: %2d Rate: %6.2f%% code: %5d msg: %s\n", this->eventID, this->regexUTF8, this->options, this->captures, this->hitMiss, *this->hitRate, this->icuStatusCode, this->icuErrorMessage);}

Below is an example of the output, which has been trimmed for brevity, from compiledRegexCache.d:

compiledRegexCache.d
shell% sudo dtrace -Z -q -s compiledRegexCache.d 110: (\[{2})(.+?)(]{2}) Opt: 0x00000000 Cap: 3 Hit: 0 Rate: 63.64% code: 0 msg: 111: (\[{2})(.+?)(]{2}) Opt: 0x00000000 Cap: 3 Hit: 1 Rate: 63.96% code: 0 msg: 131: (\w+ Opt: 0x00000000 Cap: -1 Hit: -1 Rate: 63.36% code: 66310 msg: U_REGEX_MISMATCHED_PAREN 164: \b\s* Opt: 0x00000000 Cap: 0 Hit: 0 Rate: 60.98% code: 0 msg: 165: \$((\d+)(?:\.(\d+)|\.?)) Opt: 0x00000000 Cap: 3 Hit: 1 Rate: 61.21% code: 0 msg: 166: \b(https?)://([a-zA-Z0-9\-.]+)((?:/[a-zA-Z0-9\-._?,'+\&%$… Opt: 0x00000000 Cap: 3 Hit: 0 Rate: 60.84% code: 0 msg: shell%

An example that prints the number of times that a compiled regular expression was not in the cache per second:

shell% sudo dtrace -Z -q -n 'RegexKitLite*:::compiledRegexCache /arg4 == 0/ { @miss[pid, execname] = count(); }' -n 'tick-1sec { printa("%-8d %-40s %@d/sec\n", @miss); trunc(@miss); }' 67003 RegexKitLite_tests 16/sec67008 RegexKitLite_tests 50/sec ^C shell%
See Also
  • RegexKitLite:::compiledRegexCache
  • Solaris Dynamic Tracing Guide (as .PDF)

The following is available in examples/utf16ConversionCache.d and demonstrates the use of all the arguments available via the RegexKitLite:::utf16ConversionCache probe point.

File name:utf16ConversionCache.d
#!/usr/sbin/dtrace -senum { RKLCacheHitLookupFlag = 1 << 0, RKLConversionRequiredLookupFlag = 1 << 1, RKLSetTextLookupFlag = 1 << 2, RKLDynamicBufferLookupFlag = 1 << 3, RKLErrorLookupFlag = 1 << 4};RegexKitLite*:::utf16ConversionCache { this->eventID = (unsigned long)arg0; this->lookupResultFlags = (unsigned int)arg1; this->hitRate = (double *)copyin(arg2, sizeof(double)); this->stringPtr = (void *)arg3; this->NSRange_location = (unsigned long)arg4; this->NSRange_length = (unsigned long)arg5; this->length = (long)arg6; printf("%5lu: flags: %#8.8x {Hit: %d Conv: %d SetText: %d Dyn: %d Error: %d} rate: %6.2f%% string: %#8.8p NSRange {%6lu, %6lu} length: %ld\n", this->eventID, this->lookupResultFlags, (this->lookupResultFlags & RKLCacheHitLookupFlag) != 0, (this->lookupResultFlags & RKLConversionRequiredLookupFlag) != 0, (this->lookupResultFlags & RKLSetTextLookupFlag) != 0, (this->lookupResultFlags & RKLDynamicBufferLookupFlag) != 0, (this->lookupResultFlags & RKLErrorLookupFlag) != 0, *this->hitRate, this->stringPtr, this->NSRange_location, this->NSRange_length, this->length);}

Below is an example of the output, which has been trimmed for brevity, from utf16ConversionCache.d:

utf16ConversionCache.d
shell% sudo dtrace -Z -q -s utf16ConversionCache.d 85: flags: 0x00000000 {Hit: 0 Conv: 0 SetText: 0 Dyn: 0 Error: 0} rate: 59.18% string: 0x0010f530 NSRange { 0, 18} length: 18 86: flags: 0x00000004 {Hit: 0 Conv: 0 SetText: 1 Dyn: 0 Error: 0} rate: 59.18% string: 0x0010f530 NSRange { 0, 18} length: 18 87: flags: 0x00000006 {Hit: 0 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 58.00% string: 0x00054930 NSRange { 1, 37} length: 39 88: flags: 0x00000003 {Hit: 1 Conv: 1 SetText: 0 Dyn: 0 Error: 0} rate: 58.82% string: 0x00054930 NSRange { 1, 37} length: 39 109: flags: 0x00000006 {Hit: 0 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 53.62% string: 0x00054d00 NSRange { 0, 56} length: 56 110: flags: 0x00000006 {Hit: 0 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 52.86% string: 0x00054680 NSRange { 0, 1064} length: 1064 111: flags: 0x00000007 {Hit: 1 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 53.52% string: 0x00054680 NSRange { 46, 978} length: 1064shell%

An example that prints the number of times that a string required a conversion to UTF-16 and was not in the cache per second:

shell% sudo dtrace -Z -q -n 'RegexKitLite*:::utf16ConversionCache /(arg1 & 0x3) == 0x2/ { @miss[pid, execname] = count(); }' -n 'tick-1sec { printa("%-8d %-40s %@d/sec\n", @miss); trunc(@miss); }' 67020 RegexKitLite_tests 73/sec67037 RegexKitLite_tests 64/sec ^C shell%
See Also
  • RegexKitLite:::utf16ConversionCache
  • RegexKitLite:::utf16ConversionCache arg1 Flags
  • Solaris Dynamic Tracing Guide (as .PDF)

ICU Syntax

In this section:

  • ICU Regular Expression Syntax
  • ICU Regular Expression Character Classes
  • Unicode Properties
  • ICU Replacement Text Syntax

ICU Regular Expression Syntax

For your convenience, the regular expression syntax from the ICU documentation is included below. When in doubt, you should refer to the official ICU User Guide - Regular Expressions documentation page.

See Also
  • ICU User Guide - Regular Expressions
  • Unicode Technical Standard #18 - Unicode Regular Expressions
Metacharacters
Character Description
\a Match a BELL\u0007
\A Match at the beginning of the input. Differs from ^ in that \A will not match after a new-line within the input.
\b, outside of a [Set] Match if the current position is a word boundary. Boundaries occur at the transitions between word \w and non-word \W characters, with combining marks ignored.
See also: RKLUnicodeWordBoundaries
\b, within a [Set] Match a BACKSPACE\u0008.
\B Match if the current position is not a word boundary.
\cx Match a Control-x character.
\d Match any character with the Unicode General Category of Nd (Number,Decimal Digit).
\D Match any character that is not a decimal digit.
\e Match an ESCAPE\u001B.
\E Terminates a \Q\E quoted sequence.
\f Match a FORM FEED\u000C.
\G Match if the current position is at the end of the previous match.
\n Match a LINE FEED\u000A.
\N{Unicode Character Name} Match the named Unicode Character.
\p{Unicode Property Name} Match any character with the specified Unicode Property.
\P{Unicode Property Name} Match any character not having the specified Unicode Property.
\Q Quotes all following characters until \E.
\r Match a CARRIAGE RETURN\u000D.
\s Match a white space character. White space is defined as[\t\n\f\r\p{Z}].
\S Match a non-white space character.
\t Match a HORIZONTAL TABULATION\u0009.
\uhhhh Match the character with the hex value hhhh.
\Uhhhhhhhh Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is\U0010ffff.
\w Match a word character. Word characters are[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].
\W Match a non-word character.
\x{h} Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhh Match the character with two digit hex value hh.
\X Match a Grapheme Cluster.
\Z Match if the current position is at the end of input, but before the final line terminator, if one exists.
\z Match if the current position is at the end of input.
\n
Back Reference. Match whatever the  nth capturing group matched.  n must be a number ≥ 1 and ≤ total number of capture groups in the pattern.
Note:
Octal escapes, such as  \012, are not supported.
[pattern] Match any one character from the set. See ICU Regular Expression Character Classes for a full description of what may appear in the pattern.
. Match any character.
^ Match at the beginning of a line.
$ Match at the end of a line.
\ Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /
Operators
Operator Description
| Alternation. A|B matches either A or B.
* Match zero or more times. Match as many times as possible.
+ Match one or more times. Match as many times as possible.
? Match zero or one times. Prefer one.
{n} Match exactly n times.
{n,} Match at least n times. Match as many times as possible.
{n,m} Match between n and m times. Match as many times as possible, but not more than m.
*? Match zero or more times. Match as few times as possible.
+? Match one or more times. Match as few times as possible.
?? Match zero or one times. Prefer zero.
{n}? Match exactly n times.
{n,}? Match at least n times, but no more than required for an overall pattern match.
{n,m}? Match between n and m times. Match as few times as possible, but not less than n.
*+ Match zero or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails. Possessive match.
++ Match one or more times. Possessive match.
?+ Match zero or one times. Possessive match.
{n}+ Match exactly n times. Possessive match.
{n,}+ Match at least n times. Possessive match.
{n,m}+ Match between n and m times. Possessive match.
() Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.
(?:) Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses.
(?>) Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the (?> .
(?#) Free-format comment (?#comment).
(?=) Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position.
(?!) Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position.
(?<=) Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or+ operators).
(?) Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators).
(?ismwx-ismwx:) Flag settings. Evaluate the parenthesized expression with the specified flags enabledor -disabled.
(?ismwx-ismwx) Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.
See also: Regular Expression Options
See Also
  • ICU User Guide - Regular Expressions
  • Regular Expression Options

ICU Regular Expression Character Classes

The following was originally from ICU User Guide - UnicodeSet, but has been adapted to fit the needs of this documentation. Specifically, the ICU UnicodeSet documentation describes an ICU C++ object— UnicodeSet. The termUnicodeSet was effectively replaced with Character Class, which is more appropriate in the context of regular expressions. As always, you should refer to the original, official documentation when in doubt.

See Also
  • ICU User Guide - UnicodeSet
  • UTS #18 Unicode Regular Expressions - Subtraction and Intersection
  • UTS #18 Unicode Regular Expressions - Properties

Overview

A character class is a regular expression pattern that represents a set of Unicode characters or character strings. The following table contains some example character class patterns:

Pattern Description
[a-z] The lower case letters a through z
[abc123] The six characters abc12, and 3
[\p{Letter}] All characters with the Unicode General Category of Letter.
String Values

In addition to being a set of Unicode code point characters, a character class may also contain string values. Conceptually, a character class is always a set of strings, not a set of characters. Historically, regular expressions have treated [] character classes as being composed of single characters only, which is equivalent to a string that contains only a single character.

Character Class Patterns

Patterns are a series of characters bounded by square brackets that contain lists of characters and Unicode property sets. Lists are a sequence of characters that may have ranges indicated by a - between two characters, as ina-z. The sequence specifies the range of all characters from the left to the right, in Unicode order. For example, [a c d-f m] is equivalent to [a c d e f m]. Whitespace can be freely used for clarity as [a c d-f m] means the same as [acd-fm].

Unicode property sets are specified by a Unicode property, such as [:Letter:]. ICU version 2.0 supports General CategoryScript, and Numeric Value properties (ICU will support additional properties in the future). For a list of the property names, see the end of this section. The syntax for specifying the property names is an extension of either POSIX or Perl syntax with the addition of =value. For example, you can match letters by using the POSIX syntax [:Letter:], or by using the Perl syntax \p{Letter}. The type can be omitted for the Category and Script properties, but is required for other properties.

The following table lists the standard and negated forms for specifying Unicode properties in both POSIX or Perl syntax. The negated form specifies a character class that includes everything but the specified property. For example, [:^Letter:] matches all characters that are not [:Letter:].

Syntax Style Standard Negated
POSIX [:type=value:] [:^type=value:]
Perl \p{type=value} \P{type=value}
See Also
  • UTS #18 Unicode Regular Expressions - Properties

Character classes can then be modified using standard set operations— Union, Inverse, Difference, and Intersection.

  • To union two sets, simply concatenate them. For example, [[:letter:] [:number:]]

  • To intersect two sets, use the & operator. For example, [[:letter:] & [a-z]]

  • To take the set-difference of two sets, use the - operator. For example, [[:letter:] - [a-z]]

  • To invert a set, place a ^ immediately after the opening [. For example, [^a-z]. In any other location, the ^ does not have a special meaning.

The binary operators & and - have equal precedence and bind left-to-right. Thus [[:letter:]-[a-z]-[\u0100-\u01FF]] is equivalent to [[[:letter:]-[a-z]]-[\u0100-\u01FF]]. Another example is the set[[ace][bdf] - [abc][def]] is not the empty set, but instead the set [def]. This only really matters for the difference operation, as the intersection operation is commutative.

Another caveat with the & and - operators is that they operate between sets. That is, they must be immediately preceded and immediately followed by a set. For example, the pattern [[:Lu:]-A] is illegal, since it is interpreted as the set [:Lu:] followed by the incomplete range -A. To specify the set of uppercase letters except for A, enclose the A in a set: [[:Lu:]-[A]].

Pattern Description
[a] The set containing a.
[a-z] The set containing a through z and all letters in between, in Unicode order.
[^a-z] The set containing all characters but a through z, that is, U+0000 through a-1 and z+1 through U+FFFF.
[[pat1][pat2]] The union of sets specified by pat1 and pat2.
[[pat1]&[pat2]] The intersection of sets specified by pat1 and pat2.
[[pat1]-[pat2]] The asymmetric difference of sets specified by pat1 and pat2.
[:Lu:] The set of characters belonging to the given Unicode category. In this case, Unicode uppercase letters. The long form for this is [:UppercaseLetter:].
[:L:] The set of characters belonging to all Unicode categories starting with L, that is, [[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]]. The long form for this is [:Letter:].
See Also
  • UTS #18 Unicode Regular Expressions - Subtraction and Intersection
String Values in Character Classes

String values are enclosed in {curly brackets}. For example:

Pattern Description
[abc{def}] A set containing four members, the single characters ab, and c and the string def
[{abc}{def}] A set containing two members, the string abc and the string def.
[{a}{b}{c}][abc] These two sets are equivalent. Each contains three items, the three individual characters ab, and c. A {string} containing a single character is equivalent to that same character specified in any other way.

Character Quoting and Escaping in ICU Character Class Patterns

Single Quote

Two single quotes represent a single quote, either inside or outside single quotes. Text within single quotes is not interpreted in any way, except for two adjacent single quotes. It is taken as literal text— special characters become non-special. These quoting conventions for ICU character classes differ from those of Perl or Java. In those environments, single quotes have no special meaning, and are treated like any other literal character.

Backslash Escapes

Outside of single quotes, certain backslashed characters have special meaning:

Pattern Description
\uhhhh Exactly 4 hex digits; h in [0-9A-Fa-f]
\Uhhhhhhhh Exactly 8 hex digits
\xhh 1-2 hex digits
\ooo 1-3 octal digits; o in [0-7]
\a U+0007 BELL
\b U+0008 BACKSPACE
\t U+0009 HORIZONTAL TAB
\n U+000A LINE FEED
\v U+000B VERTICAL TAB
\f U+000C FORM FEED
\r U+000D CARRIAGE RETURN
\\ U+005C BACKSLASH

Anything else following a backslash is mapped to itself, except in an environment where it is defined to have some special meaning. For example, \p{Lu} is the set of uppercase letters. Any character formed as the result of a backslash escape loses any special meaning and is treated as a literal. In particular, note that \u and \U escapes create literal characters.

Whitespace

Whitespace, as defined by the ICU API, is ignored unless it is quoted or backslashed.

Property Values

The following property value styles are recognized:

Style Description
Short Omits the =type argument. Used to prevent ambiguity and only allowed with the Category and Script properties.
Medium Uses an abbreviated type and value.
Long Uses a full type and value.

If the type or value is omitted, then the = equals sign is also omitted. The short style is only used for Category and Script properties because these properties are very common and their omission is unambiguous.

In actual practice, you can mix type names and values that are omitted, abbreviated, or full. For example, if Category=Unassigned you could use what is in the table explicitly, \p{gc=Unassigned}\p{Category=Cn}, or\p{Unassigned}.

When these are processed, case and whitespace are ignored so you may use them for clarity, if desired. For example, \p{Category = Uppercase Letter} or \p{Category = uppercase letter}.

For a list of properties supported by ICU, see ICU User Guide - Unicode Properties.

See Also
  • ICU User Guide - Unicode Properties
  • UTS #18 Unicode Regular Expressions - Properties

Unicode Properties

The following tables list some of the commonly used Unicode Properties, which can be matched in a regular expression with \p{Property}. The tables were created from the Unicode 5.2 Unicode Character Database, which is the version used by ICU that ships with Mac OS X 10.6.

Category
L Letter
LC CasedLetter
Lu UppercaseLetter
Ll LowercaseLetter
Lt TitlecaseLetter
Lm ModifierLetter
Lo OtherLetter
 
P Punctuation
Pc ConnectorPunctuation
Pd DashPunctuation
Ps OpenPunctuation
Pe ClosePunctuation
Pi InitialPunctuation
Pf FinalPunctuation
Po OtherPunctuation
 
N Number
Nd DecimalNumber
Nl LetterNumber
No OtherNumber
 
M Mark
Mn NonspacingMark
Mc SpacingMark
Me EnclosingMark
 
S Symbol
Sm MathSymbol
Sc CurrencySymbol
Sk ModifierSymbol
So OtherSymbol
 
Z Separator
Zs SpaceSeparator
Zl LineSeparator
Zp ParagraphSeparator
 
C Other
Cc Control
Cf Format
Cs Surrogate
Co PrivateUse
Cn Unassigned
Script
Arabic Armenian Balinese
Bengali Bopomofo Braille
Buginese Buhid Canadian_Aboriginal
Carian Cham Cherokee
Common Coptic Cuneiform
Cypriot Cyrillic Deseret
Devanagari Ethiopic Georgian
Glagolitic Gothic Greek
Gujarati Gurmukhi Han
Hangul Hanunoo Hebrew
Hiragana Inherited Kannada
Katakana Kayah_Li Kharoshthi
Khmer Lao Latin
Lepcha Limbu Linear_B
Lycian Lydian Malayalam
Mongolian Myanmar New_Tai_Lue
Nko Ogham Ol_Chiki
Old_Italic Old_Persian Oriya
Osmanya Phags_Pa Phoenician
Rejang Runic Saurashtra
Shavian Sinhala Sundanese
Syloti_Nagri Syriac Tagalog
Tagbanwa Tai_Le Tamil
Telugu Thaana Thai
Tibetan Tifinagh Ugaritic
Unknown Vai Yi
Extended Property Class
ASCII_Hex_Digit Alphabetic
Bidi_Control Dash
Default_Ignorable_Code_Point Deprecated
Diacritic Extender
Grapheme_Base Grapheme_Extend
Grapheme_Link Hex_Digit
Hyphen IDS_Binary_Operator
IDS_Trinary_Operator ID_Continue
ID_Start Ideographic
Join_Control Logical_Order_Exception
Lowercase Math
Noncharacter_Code_Point Other_Alphabetic
Other_Default_Ignorable_Code_Point Other_Grapheme_Extend
Other_ID_Continue Other_ID_Start
Other_Lowercase Other_Math
Other_Uppercase Pattern_Syntax
Pattern_White_Space Quotation_Mark
Radical STerm
Soft_Dotted Terminal_Punctuation
Unified_Ideograph Uppercase
Variation_Selector White_Space
XID_Continue XID_Start

Unicode Character Database

Unicode properties are defined in the Unicode Character Database, or UCD. From time to time the UCD is revised and updated. The properties available, and the definition of the characters they match, depend on the UCD that ICU was built with.

Note:

In general, the ICU and UCD versions change with each major operating system release.

See Also
  • UTS #18 Unicode Regular Expressions - Properties
  • UTS #18 Unicode Regular Expressions - Compatibility Properties
  • Unicode Character Database
  • The Unicode Standard - Unicode 5.2
  • Versions of the Unicode Standard

ICU Replacement Text Syntax

Replacement Text Syntax
Character Description
$n
The text of capture group  n will be substituted for  $ nn must be ≥  0 and not greater than the number of capture groups. A  $ not followed by a digit has no special meaning, and will appear in the substitution text as itself, a  $.
Important:
Methods will raise a  RKLICURegexException if  n is greater than the number of capture groups in the regular expression.
\ Treat the character following the backslash as a literal, suppressing any special meaning. Backslash escaping in substitution text is only required for $ and \, but may proceed any character. The backslash itself will not be copied to the substitution text.
See Also
  • ICU User Guide - Replacement Text
  • - replaceOccurrencesOfRegex:withString:options:range:error:
  • - stringByReplacingOccurrencesOfRegex:withString:options:range:error:

RegexKitLite Cookbook

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
Jamie Zawinski

This section contains a collection of regular expressions and example code demonstrating how RegexKitLite makes some common programming choirs easier. RegexKitLite makes it easy to match part of a string and extract just that part, or even create an entirely new string using just a few pieces of the original string. A great example of this is a string that contains a URL and you need to extract just a part of it, perhaps the host or maybe just the port used. This example demonstrates how easy it is to extract the port used from a URL, which is then converted in to a NSInteger value:

searchString = @"http://www.example.com:8080/index.html";regexString = @"\\bhttps?://[a-zA-Z0-9\\-.]+(?::(\\d+))?(?:(?:/[a-zA-Z0-9\\-._?,'+\\&%$=~*!():@\\\\]*)+)?";NSInteger portInteger = [[searchString stringByMatching:regexString capture:1L] integerValue];NSLog(@"portInteger: '%ld'", (long)portInteger);// 2008-10-15 08:52:52.500 host_port[8021:807] portInteger: '8080'

Inside you'll find more examples like this that you can use as the starting point for your own regular expression pattern matching solution. Keep in mind that these are meant to be examples to help get you started and not necessarily the ideal solution for every need. Trade‑offs are usually made when creating a regular expression, matching an email address is a perfect example of this. A regular expression that precisely matches the formal definition of email address is both complicated and usually unnecessary. Knowing which trade‑offs are acceptable requires that you understand what it is you're trying to match, the data that you're searching through, and the requirements and uses of the matched results. It won't take long until you gain an appreciation for Jamie Zawinski's infamous quote.

See Also
  • O'Reilly - Mastering Regular Expressions, 3rd edition by Jeffrey Friedl
  • RegExLib.com - Regular Expression Library
  • ICU Userguide - Regular Expressions
  • Regular-Expressions.info - Regex Tutorial, Examples, and Reference
  • Wikipedia - Regular Expression

Pattern Matching Recipes

Numbers

Description Regex Examples
Integer [+\-]?[0-9]+ 123-42+23
Hex Number 0[xX][0-9a-fA-F]+ 0x00xdeadbeef0xF3
Floating Point [+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.) 123..123+.42
Floating Point with Exponent [+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.)(?:[eE][+\-]?[0-9]+)? 123..12310.0E131.23e-7
Comma Separated Number [0-9]{1,3}(?:,[0-9]{3})* 421,2341,234,567
Comma Separated Number [0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)? 421,2341,234,567.89
Extracting and Converting Numbers

NSString includes several methods for converting the contents of the string in to a numeric value in the various C primitive types. The following demonstrates the matching of an int and double in a NSString, and then converting the matched string in to its base type.

Integer conversion…
NSString *searchString = @"The int 5542 to convert";NSString *regexString = @"([+\\-]?[0-9]+)";int matchedInt = [[searchString stringByMatching:regexString capture:1L] intValue];

The variable matchedInt now contains the value of 5542.

Floating Point conversion…
NSString *searchString = @"The double 4321.9876 to convert";NSString *regexString = @"([+\\-]?(?:[0-9]*\\.[0-9]+|[0-9]+\\.))";double matchedDouble = [[searchString stringByMatching:regexString capture:1L] doubleValue];

The variable matchedDouble now contains the value of 4321.9876doubleValue can even convert numbers that are in scientific notation, which represent numbers as n × 10exp:

Floating Point conversion…
NSString *searchString = @"The double 1.010489e5 to convert";NSString *regexString = @"([+\\-]?(?:[0-9]*\\.[0-9]+|[0-9]+\\.)(?:[eE][+\\-]?[0-9]+)?)";double matchedDouble = [[searchString stringByMatching:regexString capture:1L] doubleValue];

The variable matchedDouble now contains the value of 101048.9.

Extracting and Converting Hex Numbers

Converting a string that contains a hex number in to a more basic type, such as an int, takes a little more work. Unfortunately, Foundation does not provide an easy way to convert a hex value in a string in to a more basic type as it does with intValue or doubleValue. Thankfully the standard C library provides a set of functions for performing such a conversion. For this example we will use the strtol() (string to long) function to convert the hex value we've extracted from searchString. We can not pass the pointer to the NSString object that contains the matched hex value since strtol() is part of the standard C library which can only work on pointers to C strings. We use the UTF8String method to get a pointer to a compatible C string of the matched hex value.

Hex conversion…
NSString *searchString = @"A hex value: 0x0badf00d";NSString *regexString = @"\\b(0[xX][0-9a-fA-F]+)\\b";NSString *hexString = [searchString stringByMatching:regexString capture:1L];// Use strtol() to convert the string to a long.long hexLong = strtol([hexString UTF8String], NULL, 16);NSLog(@"hexLong: 0x%lx / %ld", (u_long)hexLong, hexLong);// 2008-09-01 09:40:44.848 hex_example[30583:807] hexLong: 0xbadf00d / 195948557

The full set of string to… functions are: strtol()strtoll()strtoul(), and strtoull(). These convert a string value, from base 2 to base 36, in to a longlong longunsigned long, and unsigned long long respectively.

Adding Hex Value Conversions to NSString

Since it seems to be a frequently asked question, and a common search engine query for RegexKit web site visitors, here is a NSString category addition that converts the receivers text in to a NSInteger value. This is the same functionality as intValue or doubleValue, except that it converts hexadecimal text values instead of decimal text values.

Note:

The following code can also be found in the RegexKitLite distributions examples/ directory.

The example conversion code is fairly quick since it uses Core Foundation directly along with the stack to hold any temporary string conversions. Any whitespace at the beginning of the string will be skipped and the hexadecimal text to be converted may be optionally prefixed with either 0x or 0X. Returns 0 if the receiver does not begin with a valid hexadecimal text representation. Refer to strtol(3) for additional conversion details.

Important:

If the receiver needs to be converted in to an encoding that is compatible with strtol(), only the first sixty characters of the receiver are converted.

File name:NSString-HexConversion.h
#import @interface NSString (HexConversion)-(NSInteger)hexValue;@end
File name:NSString-HexConversion.m
#import "NSString-HexConversion.h"#import #include @implementation NSString (HexConversion)-(NSInteger)hexValue{ CFStringRef cfSelf = (CFStringRef)self; UInt8 buffer[64]; const char *cptr; if((cptr = CFStringGetCStringPtr(cfSelf, kCFStringEncodingMacRoman)) == NULL) { CFRange range = CFRangeMake(0L, CFStringGetLength(cfSelf)); CFIndex usedBytes = 0L; CFStringGetBytes(cfSelf, range, kCFStringEncodingUTF8, '?', false, buffer, 60L, &usedBytes); buffer[usedBytes] = 0; cptr = (const char *)buffer; } return((NSInteger)strtol(cptr, NULL, 16));}@end
See Also
  • strtol(3)
  • - intValue
  • - doubleValue

Text Files

Description Regex
Empty Line (?m:^$)
Empty or Whitespace Only Line (?m-s:^\s*$)
Strip Leading Whitespace (?m-s:^\s*(.*?)$)
Strip Trailing Whitespace (?m-s:^(.*?)\s*$)
Strip Leading and Trailing Whitespace (?m-s:^\s*(.*?)\s*$)
Quoted String, Can Span Multiple Lines, May Contain \" "(?:[^"\\]*+|\\.)*"
Quoted String, Single Line Only, May Contain \" "(?:[^"\\\r\n]*+|\\[^\r\n])*"
HTML Comment (?s:<--.*?-->)
Perl / Shell Comment (?m-s:#.*$)
C, C++, or ObjC Comment (?m-s://.*$)
C, C++, or ObjC Comment and Leading Whitespace (?m-s:\s*//.*$)
C, C++, or ObjC Comment (?s:/\*.*?\*/)
The Newline Debacle

Unfortunately, when processing text files, there is no standard 'newline' character or character sequence. Today this most commonly surfaces when converting text between Microsoft Windows / MS-DOS and Unix / Mac OS X. The reason for the proliferation of newline standards is largely historical and goes back many decades. Below is a table of the dominant newline character sequence 'standards':

Description Sequence C String Control Common Uses
Line Feed \u000A \n ^J Unix, Amiga, Mac OS X
Vertical Tab \u000B \v ^K  
Form Feed \u000C \f ^L  
Carriage Return \u000D \r ^M Apple ][, Mac OS ≤ 9
Next Line (NEL) \u0085     IBM / EBCDIC
Line Separator \u2028     Unicode
Paragraph Separator \u2029     Unicode
Carriage Return + Line Feed \u000D\u000A \r\n ^M^J MS-DOS, Windows

Ideally, one should be flexible enough to accept any of these character sequences if one has to process text files, especially if the origin of those text files is not known. Thankfully, regular expressions excel at just such a task. Below is a regular expression pattern that will match any of the above character sequences. This is also the character sequence that the metacharacter $ matches.

Description Regex Notes
Any newline (?:\r\n|[\n\v\f\r\x85\p{Zl}\p{Zp}]) UTS #18 recommended. Character sequence that $ matches.
See Also
  • UTS #18: Unicode Regular Expressions - Line Boundaries
  • Wikipedia - Newline
Matching the Beginning and End of a Line

It is often necessary to work with the individual lines of a file. There are two regular expression metacharacters, ^ and $, that match the beginning and end of a line, respectively. However, exactly what is matched by ^ and $depends on whether or not the multi-line option is enabled for the regular expression, which by default is disabled. It can be enabled for the entire regular expression by passing RKLMultiline via the options: method argument, or within the regular expression using the options syntax— (?m:).

If multi-line is disabled, then ^ and $ match the beginning and end of the entire string. If there is a newline character sequence at the very end of the string, then $ will match the character just before the newline character sequence. Any newline character sequences in the middle of the string will not be matched.

If multi-line is enabled, then ^ and $ match the beginning and end of a line, where the end of a line is the newline character sequence. The metacharacter ^ matches either the first character in the string, or the first character following a newline character sequence. The metacharacter $ matches either the last character in the string, or the character just before a newline character sequence.

Creating a NSArray Containing Every Line in a String

A common text processing pattern is to process a file one line at a time. Using the recommended regular expression for matching any newline and the componentsSeparatedByRegex: method, you can easily create a NSArraycontaining every line in a file and process it one line at a time:

Process every line…
NSString *fileNameString = @"example";NSString *regexString = @"(?:\r\n|[\n\v\f\r\302\205\\p{Zl}\\p{Zp}])";NSError *error = NULL;NSString *fileString = [NSString stringWithContentsOfFile:fileNameString usedEncoding:NULL error:&error];if(fileString) { NSArray *linesArray = [fileString componentsSeparatedByRegex:regexString]; for(NSString *lineString in linesArray) { // ObjC 2.0 for…in loop. // Per line processing. }} else { NSLog(@"Error reading file '%@'", fileNameString); if(error) { NSLog(@"Error: %@", error); }}

The componentsSeparatedByRegex: method effectively 'chops off' the matched regular expression, or in this case any newline character. In the example above, within the for…in loop, lineString will not have a newline character at the end of the string.

Parsing CSV Data
Description Regex
Split CSV line ,(?=(?:(?:[^"\\]*+|\\")*"(?:[^"\\]*+|\\")*")*(?!(?:[^"\\]*+|\\")*"(?:[^"\\]*+|\\")*$))

This regular expression essentially works by ensuring that there are an even number of unescaped " quotes following a , comma. This is done by using look-head assertions. The first look-head assertion(?=, is a pattern that matches zero or more strings that contain two " characters. Then, a negative look-head assertion matches a single, unpaired " quote character remaining at the $ end of the line. It also uses possessive matches in the form of *+for speed, which prevents the regular expression engine from backtracking excessively. It's certainly not a beginners regular expression.

The following is used as a substitute for a CSV data file in the example below.

Example CSV data…
NSString *csvFileString = @"RegexKitLite,1.0,\"Mar 23, 2008\",27004\n" @"RegexKitLite,1.1,\"Mar 28, 2008\",28081\n" @"RegexKitLite,1.2,\"Apr 01, 2008\",28765\n" @"RegexKitLite,2.0,\"Jul 07, 2008\",40569\n" @"RegexKitLite,2.1,\"Jul 12, 2008\",40660\n";

This example really highlights the power of regular expressions when it comes to processing text. It takes just 17 lines, which includes comments, to parse a CSV data file of any newline type and create a row by column ofNSArray values of the results while correctly handling " quoted values, including escaped \" quotes.

Parse CSV data…
NSString *newlineRegex = @"(?:\r\n|[\n\v\f\r\\x85\\p{Zl}\\p{Zp}])";NSString *splitCSVLineRegex = @",(?=(?:(?:[^\"\\\\]*+|\\\\\")*\"(?:[^\"\\\\]*+|\\\\\")*\")*(?!(?:[^\"\\\\]*+|\\\\\")*\"(?:[^\"\\\\]*+|\\\\\")*$))";// Create a NSArray of every line in csvFileString.NSArray *csvLinesArray = [csvFileString componentsSeparatedByRegex: newlineRegex];// Create an id array to hold the comma split line results.id splitLines[[csvLinesArray count]]; // C99 variable length array.NSUInteger splitLinesIndex = 0UL;   // Index of next splitLines[] member.for(NSString *csvLineString in csvLinesArray) {   // ObjC 2 for…in loop. if([csvLineString isMatchedByRegex:@"^\\s*$"]) { continue; } // Skip empty lines. splitLines[splitLinesIndex++] = [csvLineString componentsSeparatedByRegex: splitCSVLineRegex];}// Gather up all the individual comma split results in to a single NSArray.NSArray *splitLinesArray = [NSArray arrayWithObjects: &splitLines[0] count: splitLinesIndex];

Network and URL

Description Regex
HTTP \bhttps?://[a-zA-Z0-9\-.]+(?:(?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)?
HTTP \b(https?)://([a-zA-Z0-9\-.]+)((?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)?
HTTP \b(https?)://(?:(\S+?)(?::(\S+?))?@)?([a-zA-Z0-9\-.]+)(?::(\d+))?((?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)?
E-Mail \b([a-zA-Z0-9%_.+\-]+)@([a-zA-Z0-9.\-]+?\.[a-zA-Z]{2,6})\b
Hostname \b(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}?[a-zA-Z0-9]\.)+[a-zA-Z]{2,6}\b
IP \b(?:\d{1,3}\.){3}\d{1,3}\b
IP with Optional Netmask \b((?:\d{1,3}\.){3}\d{1,3})(?:/(\d{1,2}))?\b
IP or Hostname \b(?:(?:\d{1,3}\.){3}\d{1,3}|(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}?[a-zA-Z0-9]\.)+[a-zA-Z]{2,6})\b
Creating a NSDictionary of URL Information

The following example demonstrates how to match several fields in a URL and create a NSDictionary with the extracted results. Only the capture groups that result in a successful match will create a corresponding key in the dictionary.

HTTP URL…
NSString *searchString = @"http://johndoe:[email protected]:8080/private/mail/index.html";NSString *regexString = @"\\b(https?)://(?:(\\S+?)(?::(\\S+?))?@)?([a-zA-Z0-9\\-.]+)(?::(\\d+))?((?:/[a-zA-Z0-9\\-._?,'+\\&%$=~*!():@\\\\]*)+)?";if([searchString isMatchedByRegex:regexString]) { NSString *protocolString = [searchString stringByMatching:regexString capture:1L]; NSString *userString = [searchString stringByMatching:regexString capture:2L]; NSString *passwordString = [searchString stringByMatching:regexString capture:3L]; NSString *hostString = [searchString stringByMatching:regexString capture:4L]; NSString *portString = [searchString stringByMatching:regexString capture:5L]; NSString *pathString = [searchString stringByMatching:regexString capture:6L]; NSMutableDictionary *urlDictionary = [NSMutableDictionary dictionary]; if(protocolString) { [urlDictionary setObject:protocolString forKey:@"protocol"]; } if(userString) { [urlDictionary setObject:userString forKey:@"user"]; } if(passwordString) { [urlDictionary setObject:passwordString forKey:@"password"]; } if(hostString) { [urlDictionary setObject:hostString forKey:@"host"]; } if(portString) { [urlDictionary setObject:portString forKey:@"port"]; } if(pathString) { [urlDictionary setObject:pathString forKey:@"path"]; } NSLog(@"urlDictionary: %@", urlDictionary);}

RegexKitLite 4.0 adds a new method, dictionaryByMatchingRegex:…, that makes the creation of NSDictionary objects like this much easier, as the following example demonstrates:

RegexKitLite ≥ 4.0 example…
NSString *searchString = @"http://johndoe:[email protected]:8080/private/mail/index.html";NSString *regexString = @"\\b(https?)://(?:(\\S+?)(?::(\\S+?))?@)?([a-zA-Z0-9\\-.]+)(?::(\\d+))?((?:/[a-zA-Z0-9\\-._?,'+\\&%$=~*!():@\\\\]*)+)?";NSDictionary *urlDictionary = [searchString dictionaryByMatchingRegex:regexString withKeysAndCaptures:@"protocol", 1, @"user", 2, @"password", 3, @"host", 4, @"port", 5, @"path", 6, NULL];if(urlDictionary != NULL) {  NSLog(@"urlDictionary: %@", urlDictionary); }
Note:

Other than the difference in mutability for the dictionary containing the result, the RegexKitLite 4.0 dictionaryByMatchingRegex:… example produces the same result as the more verbose, pre-4.0 example.

These examples can form the basis of a function or method that takes a NSString as an argument and returns a NSDictionary as a result, maybe even as a category addition to NSString. The following is the output when the examples above are compiled and run:

shell% ./http_example 2008-09-01 10:57:55.245 test_nsstring[31306:807] urlDictionary: { host = "www.example.com"; password = secret; path = "/private/mail/index.html"; port = 8080; protocol = http; user = johndoe;}shell%

Adding RegexKitLite to your Project

Note:

The following outlines a typical set of steps that one would perform. This is not the only way, nor the required way to add RegexKitLite to your application. They may not be correct for your project as each project is unique. They are an overview for those unfamiliar with adding additional shared libraries to the list of libraries your application links against.

Outline of Required Steps

The following outlines the steps required to use RegexKitLite in your project.

  • Linking your application to the ICU dynamic shared library.
  • Adding the RegexKitLite.m and RegexKitLite.h files to your project and application target.
  • Import the RegexKitLite.h header.
See Also
  • Xcode Build System Guide - Linking

Adding RegexKitLite using Xcode

Important:
These instructions apply to  Xcode versions 2.4.1 and 3.0. Other versions should be similar, but may vary for specific details.

Unfortunately, adding additional dynamic shared libraries that your application links to is not a straightforward process in Xcode, nor is there any recommended standard way. Two options are presented below— the first is the 'easy' way that alters your applications Xcode build settings to pass an additional command line argument directly to the linker. The second option attempts to add the ICU dynamic shared library to the list of resources for your project and configuring your executable to link against the added resource.

The 'easy' way is the recommended way to link against the ICU dynamic shared library.

The Easy Way To Add The ICU Library
  1. First, determine the build settings layer of your project that should have altered linking configuration change applied to. The build settings in Xcode are divided in to layers and each layer inherits the build settings from the layer above it. The top, global layer is Project Settings, followed by Target Settings, and finally the most specific layer Executable Settings. If your project is large enough to have multiple targets and executables, you probably have an idea which layer is appropriate. If you are unsure or unfamiliar with the different layers, Target Settings is recommended.

  2. Select the appropriate layer from the Project menu. If you are unsure, Project  Edit Active Target is recommended.

  3. Select Build from the tab near the top of the Target Info window. Find the Other Linker Flags build setting from the many build settings available and edit it. Add -licucore [dash ell icucore as a single word, withoutspaces]. If there are already other flags present, it is recommended that you add -licucore to the end of the existing flags.

    Important:
    If other linker flags are present, there must be at least one space separating  -licucore from the other linker flags. For example,  -flag1 -licucore -flag2
    Note:
    The  Configuration drop down menu controls which build configuration the changes you make are applied to.  All Configurations should be selected if this is the first time your are making these changes.
  4. Follow the Add TheRegexKitLite Source Files To Your Project steps below.
See Also
  • Xcode Build System Guide - Build Settings
The Hard Way To Add The ICU Library
  1. First, add the ICU dynamic shared library to your Xcode project. You may choose to add the library to any group in your project, and which groups are created by default is dependent on the template type you chose when you created your project. For a typical Cocoa application project, a good choice is the Frameworks group. To add the ICU dynamic shared library, control/right-click on the Framework group and choose Add Existing Files…

  2. Next, you will need to choose the ICU dynamic shared library file to add. Exactly which file to choose depends on your project, but a fairly safe choice is to select/Developer/SDKs/MacOSX10.6.sdk/usr/lib/libicucore.dylib. You may have installed your developer tools in a different location than the default /Developer directory, and the Mac OS X SDK version should be the one your project is targeting, typically the latest one available.

  3. Then, in the dialog that follows, make sure that Copy items into… is unselected. Select the targets you will be using RegexKitLite in and then click Add to add the ICU dynamic shared library to your project.

  4. Once the ICU dynamic shared library is added to your project, you will need to add it to the libraries that your executable is linked with. To do so, expand the Targets group, and then expand the executable targets you will be using RegexKitLite in. You will then need to select the libicucore.dylib file that you added in the previous step and drag it in to the Link Binary With Libraries group for each executable target that you will be using RegexKitLite in. The order of the files within the Link Binary With Libraries group is not important, and for a typical Cocoa application the group will contain the Cocoa.framework file.

Add The RegexKitLite Source Files To Your Project
  1. Next, add the RegexKitLite source files to your Xcode project. In the Groups & Files outline view on the left, control/right-click on the group that would like to add the files to, then select Add  Existing Files…

    Note:

    You can perform the following steps once for each file (RegexKitLite.h and RegexKitLite.m), or once by selecting both files from the file dialog.

  2. Select the RegexKitLite.h and / or RegexKitLite.m file from the file chooser dialog.

  3. The next dialog will present you with several options. If you have not already copied the RegexKitLite files in to your projects directory, you may want to click on the Copy items into… option. Select the targets that you would like add the RegexKitLite functionality to.

  4. Finally, you will need to include the RegexKitLite.h header file. The best way to do this is very dependent on your project. If your project consists of only half a dozen source files, you can add:

    #import "RegexKitLite.h"

    manually to each source file that makes uses of RegexKitLites features. If your project has grown beyond this, you've probably already organized a common "master" header to include to capture headers that are required by nearly all source files already.

Adding RegexKitLite using the Shell

Using RegexKitLite from the shell is also easy. Again, you need to add the header #import to the appropriate source files. Then, to link to the ICU library, you typically only need to add -licucore, just as you would any other library. Consider the following example:

File name:link_example.m
#import #import #import "RegexKitLite.h"int main(int argc, char *argv[]) { NSAutoreleasePool *pool = [ [NSAutoreleasePool alloc] init]; // Copyright COPYRIGHT_SIGN APPROXIMATELY_EQUAL_TO 2008 // Copyright \u00a9 \u2245 2008 char *utf8CString = "Copyright \xC2\xA9 \xE2\x89\x85 2008"; NSString *regexString = @"Copyright (.*) (\\d+)"; NSString *subjectString = [NSString stringWithUTF8String:utf8CString]; NSString *matchedString = [subjectString stringByMatching:regexString capture:1L]; NSLog(@"subject: \"%@\"", subjectString); NSLog(@"matched: \"%@\"", matchedString); [pool release]; return(0);}

Compiled and run from the shell:

shell% cd examples shell% gcc -g -I.. -o link_example link_example.m../RegexKitLite.m -framework Foundation -licucore shell% ./link_example 2008-03-14 03:52:51.187 test[15283:807] subject: "Copyright © ≅ 2008"2008-03-14 03:52:51.269 test[15283:807] matched: "© ≅"shell%

RegexKitLite NSString Additions Reference

Extends by category NSString, NSMutableString
RegexKitLite 4.0
Declared in
  • RegexKitLite.h
Companion guides
  • ICU User Guide - Regular Expressions

Overview

RegexKitLite is not meant to be a full featured regular expression framework. Because of this, it provides only the basic primitives needed to create additional functionality. It is ideal for developers who:

  • Developing applications for the iPhone.
  • Have modest regular expression needs.
  • Require a very small footprint.
  • Unable or unwilling to add additional, external frameworks.
  • Deal predominantly in UTF-16 encoded Unicode strings.
  • Require the enhanced word breaking functionality provided by the ICU library.

RegexKitLite consists of only two files, the header file RegexKitLite.h and RegexKitLite.m. The only other requirement is to link with the ICU library that comes with Mac OS X. No new classes are created, all functionality is provided as a category extension to the NSString and NSMutableString classes.

See Also
  • RegexKitLite Guide
  • ICU Regular Expression Syntax
  • AddingRegexKitLite to your Project
  • License Information
  • RegexKit Framework
  • International Components for Unicode
  • Unicode Home Page

Compile Time Preprocessor Tunables

The settings listed below are implemented using the C Preprocessor. Some of the setting are simple boolean enabled or disabled settings, while others specify a value, such as the number of cached compiled regular expressions. There are several ways to alter these settings, but if you are not familiar with this style of compile time configuration settings and how to alter them using the C Preprocessor, it is recommended that you use the default values provided.

Setting Default Description
NS_BLOCK_ASSERTIONS n/a RegexKitLite contains a number of extra run-time assertion checks that can be disabled with this flag. The standard NSException.h assertion macros are not used because of the multithreading lock. This flag is typically set for Release style builds where the additional error checking is no longer necessary.
RKL_APPEND_TO_ICU_FUNCTIONS None This flag is useful if you are supplying your own version of the ICU library. When set, this preprocessor define causes the ICU functions used byRegexKitLite to have the value of RKL_APPEND_TO_ICU_FUNCTIONS appended to them. For example, if RKL_APPEND_TO_ICU_FUNCTIONS is set to_4_0 (i.e., -DRKL_APPEND_TO_ICU_FUNCTIONS=_4_0), it would cause uregex_find() to become uregex_find_4_0().
RKL_BLOCKS Automatic Enables blocks support. This feature is automatically enabled if NS_BLOCKS_AVAILABLE is set to 1, which is typically set if support for blocks is appropriate. At the time of this writing, this typically means that the Xcode setting for the minimum version of Mac OS X supported must be 10.6. This feature may be explicitly disabled under all circumstances by setting its value to 0, or alternatively it can be explicitly enabled under all circumstances by setting its value to 1. The behavior is undefined if RKL_BLOCKS is set to 1 and the compiler does not support the blocks language extension or if the run-time does not support blocks.
RKL_CACHE_SIZE 13 RegexKitLite uses a 4-way set associative cache and RKL_CACHE_SIZE controls the number of sets in the cache. The total number of compiled regular expressions that can be cached is RKL_CACHE_SIZE * 4, for a default value of 52RKL_CACHE_SIZE should always be a prime number to maximize the use of the cache.
RKL_DTRACE Disabled This preprocessor define controls whether or not RegexKitLite provider DTrace probe points are enabled. This feature may be explicitly disabled under all circumstances by setting its value to 0.
RKL_FAST_MUTABLE_CHECK Disabled Enables the use of the undocumented, private Core Foundation __CFStringIsMutable() function to determine if the string to be searched is immutable. This can significantly increase the number of matches per second that can be performed on immutable strings since a number of mutation checks can be safely skipped.
RKL_FIXED_LENGTH 2048 Sets the size of the fixed length UTF-16 conversion cache buffer. Strings that need to be converted to UTF-16 that have a length less than this size will use the fixed length conversion cache. Using a fixed sized buffer for all small strings means less malloc() overhead, heap fragmentation, and reduces the chances of a memory leak occurring.
RKL_METHOD_PREPEND None When set, this preprocessor define causes the RegexKitLite methods defined in RegexKitLite.h to have the value of RKL_METHOD_PREPENDprepended to them. For example, if RKL_METHOD_PREPEND is set to xyz_ (i.e., -DRKL_METHOD_PREPEND=xyz_), it would cause clearStringCache to become xyz_clearStringCache.
RKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS Automatic This preprocessor define controls whether or not extra code is included that attempts to automatically register with the NSNotificationCenter for theUIApplicationDidReceiveMemoryWarningNotification notification. This feature is automatically enabled if it can be determined at compile time that the iPhone is being targeted. This feature may be explicitly disabled under all circumstances by setting its value to 0.
RKL_STACK_LIMIT 131072 The maximum amount of stack space that will be used before switching to heap based allocations. This can be useful for multithreading programs where the stack size of secondary threads is much smaller than the main thread.
See Also
  • Assertions and Logging - Using the Assertion Macros

Fast Mutable Checks

Setting RKL_FAST_MUTABLE_CHECK allows RegexKitLite to quickly check if a string to search is immutable or not. Every call to RegexKitLite requires checking a strings hash and length values to guard against a string mutating and using invalid cached data. If the same string is searched repeatedly and it is immutable, these checks aren't necessary since the string can never change while in use. While these checks are fairly quick, it can add approximately 15 to 20 percent of extra overhead, and not performing the checks is always faster.

Since checking a strings mutability requires calling an undocumented, private Core Foundation function, RegexKitLite takes extra precautions and does not use the function directly. Instead, an internal, local stub function is created and called to determine if a string is mutable. The first time this function is called, RegexKitLite uses dlsym() to look up the address of the __CFStringIsMutable() function. If the function is found, RegexKitLite will use it from that point on to determine if a string is immutable. However, if the function is not found, RegexKitLite has no way to determine if a string is mutable or not, so it assumes the worst case that all strings are potentially mutable. This means that the private Core Foundation __CFStringIsMutable() function can go away at any time and RegexKitLite will continue to work, although with slightly less performance.

This feature is disabled by default, but should be fairly safe to enable due to the extra precautions that are taken. If this feature is enabled and the __CFStringIsMutable() function is not found for some reason, RegexKitLitefalls back to its default behavior which is the same as if this feature was not enabled.

iPhone Low Memory Notifications

The RKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS preprocessor define controls whether or not extra code is compiled in that automatically registers for the iPhone UIKitUIApplicationDidReceiveMemoryWarningNotification notification. When enabled, an initialization function tagged with __attribute__((constructor)) is executed by the linker at load time which causes RegexKitLite to check if the low memory notification symbol is available. If the symbol is present then RegexKitLite registers to receive the notification. When the notification is received, RegexKitLite will automatically call clearStringCache to flush the caches and return the memory used to hold any cached compiled regular expressions.

This feature is normally automatically enabled if it can be determined at compile time that the iPhone is being targeted. This feature is safe to enable even if the target is Mac OS X for the desktop. It can also be explicitly disabled, even when targeting the iPhone, by setting RKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS to 0 (i.e., -DRKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS=0).

See Also
  • Memory Usage Performance Guidelines - Responding to Low-Memory Warnings in iPhone OS

Using RegexKitLite with a Custom ICU Build

The details of building and linking to a custom build of ICU will not be covered here. ICU is a very large and complex library that can be configured and packaged in countless ways. Building and linking your application to a custom build of ICU is non‑trivial. Apple provides the full source to the version of ICU that they supply with Mac OS X. At the time of this writing, the latest version available was for Mac OS X 10.6.2— ICU-400.38.tar.gz.

RegexKitLite provides the RKL_APPEND_TO_ICU_FUNCTIONS pre-processor define if you would like to use RegexKitLite with a custom ICU build that you supply. A custom version of ICU will typically have the ICU version appended to all of its functions, and RKL_APPEND_TO_ICU_FUNCTIONS allows you to append that version to the ICU functions that RegexKitLite calls. For example, passing -DRKL_APPEND_TO_ICU_FUNCTIONS=_4_0 to gcc would cause the ICU function uregex_find() used by RegexKitLite to be called as uregex_find_4_0().

Xcode 3 Integrated Documentation

This documentation is available in the Xcode DocSet format at the following URL:

feed://regexkit.sourceforge.net/RegexKitLiteDocSets.atom

For Xcode < 3.2, select Help  Documentation. Then, in the lower left hand corner of the documentation window, there should be a gear icon with a drop down menu indicator which you should select and chooseNew Subscription… and enter the DocSet URL.

For Xcode ≥ 3.2, select Xcode  Preferences…. Then select the Documentation preference group, typically the right most group, and press Add Documentation Set Publisher… and enter the DocSet URL.

Once you have added the URL, a new group should appear, inside which will be the RegexKitLite documentation with a Get button. Click on the Get button and follow the prompts.

Note:

Xcode will ask you to enter an administrators password to install the documentation, which is explained here.

Cached Information and Mutable Strings

While RegexKitLite takes steps to ensure that the information it has cached is valid for the strings it searches, there exists the possibility that out of date cached information may be used when searching mutable strings. For each compiled regular expression, RegexKitLite caches the following information about the last NSString that was searched:

  • The strings lengthhash value, and the pointer to the NSString object.
  • The pointer to the UTF-16 buffer that contains the contents of the string, which may be an internal buffer if the string required conversion.
  • The NSRange used for the inRange: parameter for the last search, and the NSRange result for capture 0 of that search.

An ICU compiled regular expression must be "set" to the text to be searched. Before a compiled regular expression is used, the pointer to the string object to search, its hashlength, and the pointer to the UTF-16 buffer is compared with the values that the compiled regular expression was last "set" to. If any of these values are different, the compiled regular expression is reset and "set" to the new string.

If a NSMutableString is mutated between two uses of the same compiled regular expression and its hashlength, or UTF-16 buffer changes between uses, RegexKitLite will automatically reset the compiled regular expression with the new values of the mutated string. The results returned will correctly reflect the mutations that have taken place between searches.

It is possible that the mutations to a string can go undetected, however. If the mutation keeps the length the same, then the only way a change can be detected is if the strings hash value changes. For most mutations the hashvalue will change, but it is possible for two different strings to share the same hash. This is known as a hash collision. Should this happen, the results returned by RegexKitLite may not be correct.

Therefore, if you are using RegexKitLite to search NSMutableString objects, and those strings may have mutated in such a way that RegexKitLite is unable to detect that the string has changed, you must manually clear the internal cache to ensure that the results accurately reflect the mutations. To clear the cached information for a specific string you send the instance a flushCachedRegexData message:

NSMutableString *aMutableString; // Assumed to be valid.[aMutableString flushCachedRegexData];

To clear all of the cached information in RegexKitLite, which includes all the cached compiled regular expressions along with any cached information and UTF-16 conversions for strings that have been searched, you use the following class method:

[NSString clearStringCache];
Warning:

When searching NSMutableString objects that have mutated between searches, failure to clear the cache may result in undefined behavior. Use flushCachedRegexData to selectively clear the cached information about a NSMutableString object.

See Also
  • + clearStringCache
  • - flushCachedRegexData

Block-based Enumeration Methods

The RegexKitLite Block-based enumeration methods are modeled after their NSString counterparts. There are a few differences, however.

  • RegexKitLite does not support mutating a NSMutableString object while it is under going Block-based enumeration.
  • There is no support for concurrent enumeration.

While RegexKitLite may not support mutating a NSMutableString during Block-based enumeration, it does provide the means to create a new string from the NSString object returned by the block used to enumerate the matches in a string, and in the case of NSMutableString, to replace the contents of that NSMutableString with the modified string at the end of the enumeration. This functionality is available via the following methods:

  • - stringByReplacingOccurrencesOfRegex:usingBlock: (NSString)
  • - replaceOccurrencesOfRegex:usingBlock: (NSMutableString)

Exception to the Cocoa Memory Management Rules for Block-based Enumeration

The standard Cocoa Memory Management Rules specify that objects returned by a method, or in this case the objects passed to a Block, remain valid throughout the scope of the calling method. Due to the potentially large volume of temporary strings that are created during a Block-based enumeration, RegexKitLite makes an exception to this rule– the strings passed to a Block via capturedStrings[] are valid only until the closing brace of the Block:

[searchString enumerateStringsMatchedByRegex:regex usingBlock: ^(NSInteger captureCount, NSString * const capturedStrings, const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { // Block code. } /* <- capturedStrings[] is valid only up to this point. */ ];

If you need to refer to a string past the closing brace of the Block, you need to send that string a retain message. Of course, it is not always necessary to explicitly send a capturedStrings[] string a retain message when you need it to exist past the closing brace of a Block– adding a capturedStrings[] string to a NSMutableDictionary will send the string a retain as a side effect of adding it to the dictionary.

Memory management during RegexKitLite Block-based enumeration is conceptually similar to the following pseudo-code:

NSInteger captureCount = [regex captureCount];while(moreMatches) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; BOOL stop = NO; NSRange capturedRanges[captureCount]; NSString *capturedStrings[captureCount]; for(capture = 0L; capture < captureCount; capture++) { capturedRanges[capture] = [searchString rangeOfRegex:regex capture:capture]; capturedStrings[capture] = [searchString stringByMatching:regex capture:capture]; } // The following line represents the execution of the user supplied Block. enumerationBlock(captureCount, capturedStrings, capturedRanges, &stop); // ... and this represents when the user supplied Block has finished / returned. [pool release]; // capturedStrings[] are sent a release at this point. if(stop != NO) { break; }}

While conceptually and behaviorally similar, it is important to note that RegexKitLite does not actually use or create autorelease pools when performing Block-based enumeration. Instead, a CFMutableArray object is used to accumulate the temporary string objects during an iteration, and at the start of an iteration, any previously accumulated temporary string objects are removed from the array.

See Also
  • Blocks Programming Topics
  • Regular Expression Enumeration Options

Usage Notes

Convenience Methods

For convenience methods where an argument is not present, the default value used is given below.

Argument Default Value
capture: 0
options: RKLNoOptions
range: The entire range of the receiver.
enumerationOptions: RKLRegexEnumerationNoOptions

Exceptions Raised

Methods will raise an exception if their arguments are invalid, such as passing NULL for a required parameter. An invalid regular expression or RKLRegexOptions parameter will not raise an exception. Instead, a NSError object with information about the error will be created and returned via the address given with the optional error argument. If information about the problem is not required, error may be NULL. For convenience methods that do not have an error argument, the primary method is invoked with NULL passed as the argument for error.

Important:
Methods raise  NSInvalidArgumentException if  regex is  NULL, or if  capture < 0 or is not valid for  regex.
Important:
Methods raise  NSRangeException if  range exceeds the bounds of the receiver.
Important:
Methods raise  NSRangeException if the receivers length exceeds the maximum value that can be represented by a signed  32-bit integer, even on  64-bit architectures.
Important:
Search and replace methods raise  RKLICURegexException if  replacement contains  $ n capture references where  n is greater than the number of capture groups in the regular expression  regex.
See Also
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys

Tasks

DTrace Probe Points
  • RegexKitLite:::compiledRegexCache
  • RegexKitLite:::utf16ConversionCache
Clearing Cached Information
  • + clearStringCache
  • - flushCachedRegexData
Determining the Number of Captures
  • + captureCountForRegex: Deprecated in RegexKitLite 3.0
  • + captureCountForRegex:options:error: Deprecated in RegexKitLite 3.0
  • - captureCount
  • - captureCountWithOptions:error:
Finding all Captures of all Matches
  • - arrayOfCaptureComponentsMatchedByRegex:
  • - arrayOfCaptureComponentsMatchedByRegex:range:
  • - arrayOfCaptureComponentsMatchedByRegex:options:range:error:
Getting all the Captures of a Match
  • - captureComponentsMatchedByRegex:
  • - captureComponentsMatchedByRegex:range:
  • - captureComponentsMatchedByRegex:options:range:error:
Finding all Matches
  • - componentsMatchedByRegex:
  • - componentsMatchedByRegex:capture:
  • - componentsMatchedByRegex:range:
  • - componentsMatchedByRegex:options:range:capture:error:
  • - enumerateStringsMatchedByRegex:usingBlock:
  • - enumerateStringsMatchedByRegex:options:inRange:error:enumerationOptions:usingBlock:
Dividing Strings
  • - componentsSeparatedByRegex:
  • - componentsSeparatedByRegex:range:
  • - componentsSeparatedByRegex:options:range:error:
  • - enumerateStringsSeparatedByRegex:usingBlock:
  • - enumerateStringsSeparatedByRegex:options:inRange:error:enumerationOptions:usingBlock:
Identifying Matches
  • - isMatchedByRegex:
  • - isMatchedByRegex:inRange:
  • - isMatchedByRegex:options:inRange:error:
Determining if a Regular Expression is Valid
  • - isRegexValid
  • - isRegexValidWithOptions:error:
Determining the Range of a Match
  • - rangeOfRegex:
  • - rangeOfRegex:capture:
  • - rangeOfRegex:inRange:
  • - rangeOfRegex:options:inRange:capture:error:
Modifying Mutable Strings
  • - replaceOccurrencesOfRegex:withString:
  • - replaceOccurrencesOfRegex:withString:range:
  • - replaceOccurrencesOfRegex:withString:options:range:error:
  • - replaceOccurrencesOfRegex:usingBlock:
  • - replaceOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
Creating Temporary Strings from a Match
  • - stringByMatching:
  • - stringByMatching:capture:
  • - stringByMatching:inRange:
  • - stringByMatching:options:inRange:capture:error:
Replacing Substrings
  • - stringByReplacingOccurrencesOfRegex:withString:
  • - stringByReplacingOccurrencesOfRegex:withString:range:
  • - stringByReplacingOccurrencesOfRegex:​withString:options:range:error:
  • - stringByReplacingOccurrencesOfRegex:usingBlock:
  • - stringByReplacingOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
Creating a Dictionary from a Match
  • - dictionaryByMatchingRegex:withKeysAndCaptures:
  • - dictionaryByMatchingRegex:range:withKeysAndCaptures:
  • - dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
Creating a Dictionary from every Match
  • - arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
  • - arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
  • - arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:

DTrace Probe Points

RegexKitLite:::compiledRegexCache
This probe point fires each time the compiled regular expression cache is accessed.
RegexKitLite:::compiledRegexCache( unsigned long eventID,  const char *regexUTF8,  int options,  int captures,  int hitMiss,  int icuStatusCode,  const char *icuErrorMessage,  double *hitRate);
Arguments
  • arg0eventID
    The unique ID for this mutex lock acquisition.
  • arg1regexUTF8
    Up to 64 characters of the regular expression encoded in  UTF-8. Must be copied with  copyinstr(arg1).
  • arg2options
    The  RKLRegexOptions options used.
  • arg3captures
    The number of captures present in the regular expression, or  -1 if there was an error.
  • arg4hitMiss
    A boolean value that indicates whether or not this event was a cache hit or not, or  -1 if there was an error.
  • arg5icuStatusCode
    If an error occurs, this contains the error number returned by ICU.
  • arg6icuErrorMessage
    If an error occurs, this contains a  UTF-8 encoded string of the ICU error. Must be copied with  copyinstr(arg6).
  • arg7hitRate
    A pointer to a floating point value, between  0.0 and  100.0, that represents the effectiveness of cache. Higher is better. Must be copied with  copyin(arg7, sizeof(double)).
Discussion

An example of how to copy the double value pointed to by hitRate:

RegexKitLite*:::compiledRegexCache { this->hitRate = (double *)copyin(arg7, sizeof(double)); printf("compiledRegexCache hitRate: %6.2f%%\n", this->hitRate);}
See Also
  • RegexKitLite:::utf16ConversionCache
  • UsingRegexKitLite - DTrace
  • Solaris Dynamic Tracing Guide (as .PDF)
RegexKitLite:::utf16ConversionCache
This probe point fires each time the  UTF-16 conversion cache is accessed.
RegexKitLite:::utf16ConversionCache( unsigned long eventID,  unsigned int lookupResultFlags,  double *hitRate,  const void *string,  unsigned long NSRange_location,  unsigned long NSRange_length, long length);
Arguments
  • arg0eventID
    The unique ID for this mutex lock acquisition.
  • arg1lookupResultFlags
    A set of status flags about the result of the conversion cache lookup.
  • arg2hitRate
    A pointer to a floating point value, between  0.0 and  100.0, that represents the effectiveness of cache. Higher is better. Must be copied with  copyin(arg2, sizeof(double)).
  • arg3string
    A pointer to the  NSString that this  UTF-16 conversion cache check is being performed on.
  • arg4NSRange_location
    The location value of the  range argument from the invoking  RegexKitLite method.
  • arg5NSRange_length
    The length value of the  range argument from the invoking  RegexKitLite method.
  • arg6length
    The length of the string.
Discussion

Only strings that require a UTF-16 conversion count towards the value calculated for hitRate.

An example of how to copy the double value pointed to by hitRate:

RegexKitLite*:::utf16ConversionCache { this->hitRate = (double *)copyin(arg2, sizeof(double)); printf("utf16ConversionCache hitRate: %6.2f%%\n", this->hitRate);}
See Also
  • RegexKitLite:::compiledRegexCache
  • RegexKitLite:::utf16ConversionCache arg1 Flags
  • UsingRegexKitLite - DTrace
  • Solaris Dynamic Tracing Guide (as .PDF)

Class Methods

captureCountForRegex:
Returns the number of captures that  regex contains.  Deprecated in RegexKitLite 3.0. Use  captureCount instead.
+ (NSInteger)captureCountForRegex:(NSString *)regex;
Discussion

Since the capture count of a regular expression does not depend on the string to be searched, this is a NSString class method. For example:

NSInteger regexCaptureCount = [NSString captureCountForRegex:@"(\\d+)\.(\\d+)"];// regexCaptureCount would be set to 2.
Availability

Deprecated in RegexKitLite 3.0

Return Value

Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.

See Also
  • + captureCountForRegex:options:error: Deprecated in RegexKitLite 3.0
  • - captureCount
  • - captureCountWithOptions:error:
captureCountForRegex:options:error:
Returns the number of captures that  regex contains.  Deprecated in RegexKitLite 3.0. Use  captureCountWithOptions:error: instead.
+ (NSInteger)captureCountForRegex:(NSString *)regex  options:(RKLRegexOptions)options  error:(NSError **)error;
Discussion

The optional error parameter, if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.

Since the capture count of a regular expression does not depend on the string to be searched, this is a NSString class method. For example:

NSInteger regexCaptureCount = [NSString captureCountForRegex:@"(\\d+)\.(\\d+)" options:RKLNoOptions error:NULL];// regexCaptureCount would be set to 2.
Availability

Deprecated in RegexKitLite 3.0

Return Value

Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.

See Also
  • + captureCountForRegex: Deprecated in RegexKitLite 3.0
  • - captureCount
  • - captureCountWithOptions:error:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
clearStringCache
Clears the cached information about strings and cached compiled regular expressions.
+ (void)clearStringCache;
Discussion

This method clears all the the cached state maintained by RegexKitLite. This includes all the cached compiled regular expressions and any cached UTF-16 conversions.

An example of clearing the cache:

[NSString clearStringCache]; // Clears all RegexKitLite cache state.
Warning:

When searching NSMutableString objects that have mutated between searches, failure to clear the cache may result in undefined behavior. Use flushCachedRegexData to selectively clear the cached information about a NSMutableString object.

Note:

You do not need to call clearStringCache or flushCachedRegexData when using the NSMutableString replaceOccurrencesOfRegex:withString: methods. The cache entry for that regular expression andNSMutableString is automatically cleared as necessary.

Availability

Available in RegexKitLite 1.1 and later.

See Also
  • - flushCachedRegexData
  • NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings

Instance Methods

arrayOfCaptureComponentsMatchedByRegex:
Returns an array containing all the matches from the receiver that were matched by the regular expression  regex. Each match result consists of an array of the substrings matched by all the capture groups present in the regular expression.
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex;
Return Value

NSArray object containing all the matches from the receiver by regex. Each match result consists of a NSArray which contains all the capture groups present in regex. Array index 0 represents all of the text matched by regexand subsequent array indexes contain the text matched by their respective capture group.

Discussion

A match result array index will contain an empty string, or @"", if a capture group did not match any text.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - arrayOfCaptureComponentsMatchedByRegex:range:
  • - arrayOfCaptureComponentsMatchedByRegex:options:range:error:
arrayOfCaptureComponentsMatchedByRegex:range:
Returns an array containing all the matches from the receiver that were matched by the regular expression  regex within  range. Each match result consists of an array of the substrings matched by all the capture groups present in the regular expression.
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex  range:(NSRange)range;
Return Value

NSArray object containing all the matches from the receiver by regex. Each match result consists of a NSArray which contains all the capture groups present in regex. Array index 0 represents all of the text matched by regexand subsequent array indexes contain the text matched by their respective capture group.

Discussion

A match result array index will contain an empty string, or @"", if a capture group did not match any text.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - arrayOfCaptureComponentsMatchedByRegex:
  • - arrayOfCaptureComponentsMatchedByRegex:options:range:error:
arrayOfCaptureComponentsMatchedByRegex:options:range:error:
Returns an array containing all the matches from the receiver that were matched by the regular expression  regex within  range using  options. Each match result consists of an array of the substrings matched by all the capture groups present in the regular expression.
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  error:(NSError **)error;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Return Value

NSArray object containing all the matches from the receiver by regex. Each match result consists of a NSArray which contains all the capture groups present in regex. Array index 0 represents all of the text matched by regexand subsequent array indexes contain the text matched by their respective capture group.

Discussion

If the receiver is not matched by regex then the returned value is a NSArray that contains no items.

A match result array index will contain an empty string, or @"", if a capture group did not match any text.

The match results in the array appear in the order they did in the receiver. For example, this code fragment:

NSString *list = @"$10.23, $1024.42, $3099";NSArray *listItems = [list arrayOfCaptureComponentsMatchedByRegex: @"\\$((\\d+)(?:\\.(\\d+)|\\.?))"];

produces a NSArray equivalent to:

[NSArray arrayWithObjects: [NSArray arrayWithObjects: @"$10.23", @"10.23", @"10", @"23", NULL], // Index 0 [NSArray arrayWithObjects: @"$1024.42", @"1024.42", @"1024", @"42", NULL], // Index 1 [NSArray arrayWithObjects: @"$3099", @"3099", @"3099", @"", NULL], // Index 2 NULL];
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - arrayOfCaptureComponentsMatchedByRegex:
  • - arrayOfCaptureComponentsMatchedByRegex:range:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
Returns an array containing all the matches in the receiver that were matched by the regular expression  regex. Each match result consists of a dictionary containing that matches substrings constructed from the specified set of keys and  captures.
- ( NSArray *) arrayOfDictionariesByMatchingRegex:( NSString *) regex  withKeysAndCaptures:( id) firstKey...;
Parameters
  • regex
    NSString containing a regular expression.
  • firstKey
    The first key to add to the new dictionary.
  • ...
    First the  capture for  firstKey, then a  NULL-terminated list of alternating  keys and  capturesCaptures are specified using  int values.
    Important:

    Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.

    Important:

    Failure to NULL-terminate the keys and captures list will result in undefined behavior.

Return Value
NSArray object containing all the matches from the receiver by  regex. Each match result consists of a  NSDictionary containing that matches substrings constructed from the specified set of  keys and  captures.
Discussion

If the receiver is not matched by regex then the returned value is a NSArray that contains no items.

A dictionary will not contain a given key if its corresponding capture group did not match any text.

Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
  • - arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
Returns an array containing all the matches in the receiver that were matched by the regular expression  regex within  range. Each match result consists of a dictionary containing that matches substrings constructed from the specified set of  keys and  captures.
- ( NSArray *) arrayOfDictionariesByMatchingRegex:( NSString *) regex  range:( NSRange) range  withKeysAndCaptures:( id) firstKey...;
Parameters
  • regex
    NSString containing a regular expression.
  • range
    The range of the receiver to search.
  • firstKey
    The first key to add to the new dictionary.
  • ...
    First the  capture for  firstKey, then a  NULL-terminated list of alternating  keys and  capturesCaptures are specified using  int values.
    Important:

    Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.

    Important:

    Failure to NULL-terminate the keys and captures list will result in undefined behavior.

Return Value
NSArray object containing all the matches from the receiver by  regex. Each match result consists of a  NSDictionary containing that matches substrings constructed from the specified set of  keys and  captures.
Discussion

If the receiver is not matched by regex then the returned value is a NSArray that contains no items.

A dictionary will not contain a given key if its corresponding capture group did not match any text.

Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
  • - arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
Returns an array containing all the matches in the receiver that were matched by the regular expression  regex within  range using  options. Each match result consists of a dictionary containing that matches substrings constructed from the specified set of  keys and  captures.
- ( NSArray *) arrayOfDictionariesByMatchingRegex:( NSString *) regex  options:( RKLRegexOptions) options  range:( NSRange) range  error:( NSError **) error  withKeysAndCaptures:( id) firstKey...;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
  • firstKey
    The first key to add to the new dictionary.
  • ...
    First the  capture for  firstKey, then a  NULL-terminated list of alternating  keys and  capturesCaptures are specified using  int values.
    Important:

    Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.

    Important:

    Failure to NULL-terminate the keys and captures list will result in undefined behavior.

Return Value
NSArray object containing all the matches from the receiver by  regex. Each match result consists of a  NSDictionary containing that matches substrings constructed from the specified set of  keys and  captures.
Discussion

If the receiver is not matched by regex then the returned value is a NSArray that contains no items.

A dictionary will not contain a given key if its corresponding capture group did not match any text. It is important to note that a regular expression can successfully match zero characters:

NSString *name = @"Name: Bob\n" @"Name: John Smith"; NSString *regex = @"(?m)^Name:\\s*(\\w*)\\s*(\\w*)$"; NSArray *nameArray = [name arrayOfDictionariesByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL,) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];// 2010-04-16 01:15:30.061 RegexKitLite[42984:a0f] nameArray: (// { first = Bob, last = "" },// { first = John, last = Smith }// )

Compared to this example, where the second capture group does not match any characters:

NSString *name = @"Name: Bob\n" @"Name: John Smith"; NSString *regex = @"(?m)^Name:\\s*(\\w*)(?:\\s*|\\s+(\\w+))$"; NSArray *nameArray = [name arrayOfDictionariesByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL,) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];// 2010-04-16 01:15:30.061 RegexKitLite[42984:a0f] nameArray: (// { first = Bob },// { first = John, last = Smith }// )
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
  • - arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
captureComponentsMatchedByRegex:
Returns an array containing the substrings matched by each capture group present in  regex for the first match of  regex in the receiver.
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex;
Return Value
NSArray containing the substrings matched by each capture group present in  regex for the first match of  regex in the receiver. Array index  0 represents all of the text matched by  regex and subsequent array indexes contain the text matched by their respective capture group.
Discussion

A match result array index will contain an empty string, or @"", if a capture group did not match any text.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - captureComponentsMatchedByRegex:range:
  • - captureComponentsMatchedByRegex:options:range:error:
captureComponentsMatchedByRegex:range:
Returns an array containing the substrings matched by each capture group present in  regex for the first match of  regex within  range of the receiver.
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex  range:(NSRange)range;
Return Value
NSArray containing the substrings matched by each capture group present in  regex for the first match of  regex within  range of the receiver. Array index  0 represents all of the text matched by  regex and subsequent array indexes contain the text matched by their respective capture group.
Discussion

A match result array index will contain an empty string, or @"", if a capture group did not match any text.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - captureComponentsMatchedByRegex:
  • - captureComponentsMatchedByRegex:options:range:error:
captureComponentsMatchedByRegex:options:range:error:
Returns an array containing the substrings matched by each capture group present in  regex for the first match of  regex within  range of the receiver using  options.
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  error:(NSError **)error;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Return Value
NSArray containing the substrings matched by each capture group present in  regex for the first match of  regex within  range of the receiver using  options. Array index  0 represents all of the text matched by  regex and subsequent array indexes contain the text matched by their respective capture group.
Discussion

If the receiver is not matched by regex then the returned value is a NSArray that contains no items.

A match result array index will contain an empty string, or @"", if a capture group did not match any text.

The returned value is for the first match of regex in the receiver. For example, this code fragment:

NSString *list = @"$10.23, $1024.42, $3099";NSArray *listItems = [list captureComponentsMatchedByRegex: @"\\$((\\d+)(?:\\.(\\d+)|\\.?))"];

produces a NSArray equivalent to:

[NSArray arrayWithObjects: @"$10.23", @"10.23", @"10", @"23", NULL];
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - captureComponentsMatchedByRegex:
  • - captureComponentsMatchedByRegex:range:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
captureCount
Returns the number of captures that  regex contains.
- (NSInteger)captureCount;
Discussion

Returns the capture count of the receiver, which should be a valid regular expression. For example:

NSInteger regexCaptureCount = [@"(\\d+)\.(\\d+)" captureCount];// regexCaptureCount would be set to 2.
Return Value

Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - captureCountWithOptions:error:
captureCountWithOptions:error:
Returns the number of captures that  regex contains.
- (NSInteger)captureCountWithOptions:(RKLRegexOptions)options  error:(NSError **)error;
Discussion

The optional error parameter, if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.

Returns the capture count of the receiver, which should be a valid regular expression. For example:

NSInteger regexCaptureCount = [@"(\\d+)\.(\\d+)" captureCountWithOptions:RKLNoOptions error:NULL];// regexCaptureCount would be set to 2.
Return Value

Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - captureCount
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
componentsMatchedByRegex:
Returns an array containing all the substrings from the receiver that were matched by the regular expression  regex.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex;
Return Value
NSArray object containing all the substrings from the receiver that were matched by  regex.
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - componentsMatchedByRegex:capture:
  • - componentsMatchedByRegex:range:
  • - componentsMatchedByRegex:options:range:capture:error:
componentsMatchedByRegex:capture:
Returns an array containing all the substrings from the receiver that were matched by capture number  capture from the regular expression  regex.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex  capture:(NSInteger)capture;
Return Value
NSArray object containing all the substrings for capture group  capture from the receiver that were matched by  regex.
Discussion

An array index will contain an empty string, or @"", if the capture group did not match any text.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - componentsMatchedByRegex:
  • - componentsMatchedByRegex:range:
  • - componentsMatchedByRegex:options:range:capture:error:
componentsMatchedByRegex:range:
Returns an array containing all the substrings from the receiver that were matched by the regular expression  regex within  range.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex  range:(NSRange)range;
Return Value
NSArray object containing all the substrings from the receiver that were matched by  regex within  range.
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - componentsMatchedByRegex:
  • - componentsMatchedByRegex:capture:
  • - componentsMatchedByRegex:options:range:capture:error:
componentsMatchedByRegex:options:range:capture:error:
Returns an array containing all the substrings from the receiver that were matched by capture number  capture from the regular expression  regex within  range using  options.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  capture:(NSInteger)capture  error:(NSError **)error;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • capture
    The string matched by  capture from  regex to return. Use  0 for the entire string that  regex matched.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Return Value
NSArray object containing all the substrings from the receiver that were matched by capture number  capture from  regex within  range using  options.
Discussion

If the receiver is not matched by regex then the returned value is a NSArray that contains no items.

An array index will contain an empty string, or @"", if a capture group did not match any text.

The match results in the array appear in the order they did in the receiver.

Example:

NSString *list = @"$10.23, $1024.42, $3099";NSArray *listItems = [list componentsMatchedByRegex: @"\\$((\\d+)(?:\\.(\\d+)|\\.?))"];// listItems == [NSArray arrayWithObjects:@"$10.23", @"$1024.42", @"$3099", NULL];

Example of extracting a specific capture group:

NSString *list = @"$10.23, $1024.42, $3099";NSRange listRange = NSMakeRange(0UL, [list length]);NSArray *listItems = [list componentsMatchedByRegex: @"\\$((\\d+)(?:\\.(\\d+)|\\.?))" options:RKLNoOptions range:listRange capture:3L error:NULL];// listItems == [NSArray arrayWithObjects:@"23", @"42", @"", NULL];
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - componentsMatchedByRegex:
  • - componentsMatchedByRegex:capture:
  • - componentsMatchedByRegex:range:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
componentsSeparatedByRegex:
Returns an array containing substrings from the receiver that have been divided by the regular expression  regex.
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex;
Return Value
NSArray object containing the substrings from the receiver that have been divided by  regex.
Availability

Available in RegexKitLite 2.0 and later.

See Also
  • - componentsSeparatedByRegex:range:
  • - componentsSeparatedByRegex:options:range:error:
componentsSeparatedByRegex:range:
Returns an array containing substrings within  range of the receiver that have been divided by the regular expression  regex.
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex  range:(NSRange)range;
Return Value
NSArray object containing the substrings from the receiver that have been divided by  regex.
Availability

Available in RegexKitLite 2.0 and later.

See Also
  • - componentsSeparatedByRegex:
  • - componentsSeparatedByRegex:options:range:error:
componentsSeparatedByRegex:options:range:error:
Returns an array containing substrings within  range of the receiver that have been divided by the regular expression  regex using  options.
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  range:(NSRange)range  error:(NSError **)error;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Return Value
NSArray object containing the substrings from the receiver that have been divided by  regex.
Discussion

The substrings in the array appear in the order they did in the receiver. For example, this code fragment:

NSString *list = @"Norman, Stanley, Fletcher";NSArray *listItems = [list componentsSeparatedByRegex:@",\\s*"];

produces an array { @"Norman", @"Stanley", @"Fletcher" }.

If the receiver begins or ends with regex, then the first or last substring is, respectively, empty. For example, the string ", Norman, Stanley, Fletcher" creates an array that has thesecontents: { @"", @"Norman", @"Stanley", @"Fletcher" }.

If the receiver has no separators that are matched by regex—for example, "Norman"—the array contains the string itself, in this case { @"Norman" }.

If regex contains capture groups—for example, @",(\\s*)"—the array will contain the text matched by each capture group as a separate element appended to the normal result. An additional element will be created for each capture group. If an individual capture group does not match any text the result in the array will be a zero length string—@"". As an example—the regular expression @",(\\s*)" would produce thearray { @"Norman", @" ", @"Stanley", @" ", @"Fletcher" }.

Availability

Available in RegexKitLite 2.0 and later.

See Also
  • - componentsSeparatedByRegex:
  • - componentsSeparatedByRegex:range:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
dictionaryByMatchingRegex:withKeysAndCaptures:
Creates and returns a dictionary containing the matches constructed from the specified set of  keys and  captures for the first match of  regex in the receiver.
- ( NSDictionary *) dictionaryByMatchingRegex:( NSString *) regex  withKeysAndCaptures:( id) firstKey...;
Parameters
  • regex
    NSString containing a regular expression.
  • firstKey
    The first key to add to the new dictionary.
  • ...
    First the  capture for  firstKey, then a  NULL-terminated list of alternating  keys and  capturesCaptures are specified using  int values.
    Important:

    Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.

    Important:

    Failure to NULL-terminate the keys and captures list will result in undefined behavior.

Return Value
NSDictionary containing the matched substrings constructed from the specified set of  keys and  captures.
Discussion

The returned value is for the first match of regex in the receiver.

If the receiver is not matched by regex then the returned value is a NSDictionary that contains no items.

A dictionary will not contain a given key if its corresponding capture group did not match any text.

Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - dictionaryByMatchingRegex:range:withKeysAndCaptures:
  • - dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
dictionaryByMatchingRegex:range:withKeysAndCaptures:
Creates and returns a dictionary containing the matches constructed from the specified set of  keys and  captures for the first match of  regex within  range of the receiver.
- ( NSDictionary *) dictionaryByMatchingRegex:( NSString *) regex  range:( NSRange) range  withKeysAndCaptures:( id) firstKey...;
Parameters
  • regex
    NSString containing a regular expression.
  • range
    The range of the receiver to search.
  • firstKey
    The first key to add to the new dictionary.
  • ...
    First the  capture for  firstKey, then a  NULL-terminated list of alternating  keys and  capturesCaptures are specified using  int values.
    Important:

    Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.

    Important:

    Failure to NULL-terminate the keys and captures list will result in undefined behavior.

Return Value
NSDictionary containing the matched substrings constructed from the specified set of  keys and  captures.
Discussion

The returned value is for the first match of regex in the receiver.

If the receiver is not matched by regex then the returned value is a NSDictionary that contains no items.

A dictionary will not contain a given key if its corresponding capture group did not match any text.

Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - dictionaryByMatchingRegex:withKeysAndCaptures:
  • - dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
Creates and returns a dictionary containing the matches constructed from the specified set of  keys and  captures for the first match of  regex within  range of the receiver using  options.
- ( NSDictionary *) dictionaryByMatchingRegex:( NSString *) regex  options:( RKLRegexOptions) options  range:( NSRange) range  error:( NSError **) error  withKeysAndCaptures:( id) firstKey...;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
  • firstKey
    The first key to add to the new dictionary.
  • ...
    First the  capture for  firstKey, then a  NULL-terminated list of alternating  keys and  capturesCaptures are specified using  int values.
    Important:

    Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.

    Important:

    Failure to NULL-terminate the keys and captures list will result in undefined behavior.

Return Value
NSDictionary containing the matched substrings constructed from the specified set of  keys and  captures.
Discussion

The returned value is for the first match of regex in the receiver.

If the receiver is not matched by regex then the returned value is a NSDictionary that contains no items.

A dictionary will not contain a given key if its corresponding capture group did not match any text. It is important to note that a regular expression can successfully match zero characters:

NSString *name = @"Name: Joe";NSString *regex = @"Name:\\s*(\\w*)\\s*(\\w*)";NSDictionary *nameDictionary = [name dictionaryByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL,) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];// 2010-01-29 12:19:54.559 RegexKitLite[64944:a0f] nameDictionary: {// first = Joe;// last = "";// }

Compared to this example, where the second capture group does not match any characters:

NSString *name = @"Name: Joe";NSString *regex = @"Name:\\s*(\\w*)\\s*(\\w +)?";NSDictionary *nameDictionary = [name dictionaryByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL, [name length]) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];// 2010-01-29 12:12:52.177 RegexKitLite[64893:a0f] nameDictionary: {// first = Joe;// }
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - dictionaryByMatchingRegex:withKeysAndCaptures:
  • - dictionaryByMatchingRegex:range:withKeysAndCaptures:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
enumerateStringsMatchedByRegex:usingBlock:
Enumerates the matches in the receiver by the regular expression  regex and executes  block for each match found.
- ( BOOL) enumerateStringsMatchedByRegex:( NSString *) regex  usingBlock:(void (^)( NSInteger  captureCount,
                          NSString * const  capturedStrings[captureCount],
                         const  NSRange  capturedRanges[captureCount],
                         volatile  BOOL * const  stop)) block;
Parameters
  • regex
    A  NSString containing a regular expression.
  • block
    The block that is executed for each match of  regex in the receiver. The block takes four arguments:
    • captureCount
      The number of strings that  regex captured.  captureCount is always at least  1.
    • capturedStrings
      An array containing the substrings matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to  @"". This argument may be  NULL if  enumerationOptions had  RKLRegexEnumerationCapturedStringsNotRequired set.
    • capturedRanges
      An array containing the ranges matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a  NSRange equal to  {NSNotFound, 0}.
    • stop
      A reference to a  BOOL value that the block can use to stop the enumeration by setting  *stop = YES;, otherwise it should not touch  * stop.
Return Value
Returns  YES if there was no error, otherwise returns  NO.
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - enumerateStringsMatchedByRegex:options:inRange:error:enumerationOptions:usingBlock:
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
enumerateStringsMatchedByRegex:options:inRange:error:enumerationOptions:usingBlock:
Enumerates the matches in the receiver by the regular expression  regex within  range using  options and executes  block using  enumerationOptions for each match found.
- ( BOOL) enumerateStringsMatchedByRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock:(void (^)( NSInteger  captureCount,
                          NSString * const  capturedStrings[captureCount],
                         const  NSRange  capturedRanges[captureCount],
                         volatile  BOOL * const  stop)) block;
Parameters
  • regex
    A  NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
  • enumerationOptions
    A mask of options specified by combining  RKLRegexEnumerationOptions flags with the C bitwise OR operator. Either  0 or  RKLRegexEnumerationNoOptions may be used if no options are required.
  • block
    The block that is executed for each match of  regex in the receiver. The block takes four arguments:
    • captureCount
      The number of strings that  regex captured.  captureCount is always at least  1.
    • capturedStrings
      An array containing the substrings matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to  @"". This argument may be  NULL if  enumerationOptions had  RKLRegexEnumerationCapturedStringsNotRequired set.
    • capturedRanges
      An array containing the ranges matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a  NSRange equal to  {NSNotFound, 0}.
    • stop
      A reference to a  BOOL value that the block can use to stop the enumeration by setting  *stop = YES;, otherwise it should not touch  * stop.
Return Value
Returns  YES if there was no error, otherwise returns  NO and indirectly returns a  NSError object if  error is not  NULL.
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - enumerateStringsMatchedByRegex:usingBlock:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
  • Regular Expression Enumeration Options
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
enumerateStringsSeparatedByRegex:usingBlock:
Enumerates the strings of the receiver that have been divided by the regular expression  regex and executes  block for each divided string.
- ( BOOL) enumerateStringsSeparatedByRegex:( NSString *) regex  usingBlock:(void (^)( NSInteger  captureCount,
                          NSString * const  capturedStrings[captureCount],
                         const  NSRange  capturedRanges[captureCount],
                         volatile  BOOL * const  stop)) block;
Parameters
  • regex
    A  NSString containing a regular expression.
  • block
    The block that is executed for each match of  regex in the receiver. The block takes four arguments:
    • captureCount
      The number of strings that  regex captured.  captureCount is always at least  1.
    • capturedStrings
      An array containing the substrings matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to  @"". This argument may be  NULL if  enumerationOptions had  RKLRegexEnumerationCapturedStringsNotRequired set.
    • capturedRanges
      An array containing the ranges matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a  NSRange equal to  {NSNotFound, 0}.
    • stop
      A reference to a  BOOL value that the block can use to stop the enumeration by setting  *stop = YES;, otherwise it should not touch  * stop.
Return Value
Returns  YES if there was no error, otherwise returns  NO.
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - enumerateStringsSeparatedByRegex:options:inRange:error:enumerationOptions:usingBlock:
  • - componentsSeparatedByRegex:
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
enumerateStringsSeparatedByRegex:options:inRange:error:enumerationOptions:usingBlock:
Enumerates the strings of the receiver that have been divided by the regular expression  regex within  range using  options and executes  block using  enumerationOptions for each divided string.
- ( BOOL) enumerateStringsSeparatedByRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock:(void (^)( NSInteger  captureCount,
                          NSString * const  capturedStrings[captureCount],
                         const  NSRange  capturedRanges[captureCount],
                         volatile  BOOL * const  stop)) block;
Parameters
  • regex
    A  NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
  • enumerationOptions
    A mask of options specified by combining  RKLRegexEnumerationOptions flags with the C bitwise OR operator. Either  0 or  RKLRegexEnumerationNoOptions may be used if no options are required.
  • block
    The block that is executed for each match of  regex in the receiver. The block takes four arguments:
    • captureCount
      The number of strings that  regex captured.  captureCount is always at least  1.
    • capturedStrings
      An array containing the substrings matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to  @"". This argument may be  NULL if  enumerationOptions had  RKLRegexEnumerationCapturedStringsNotRequired set.
    • capturedRanges
      An array containing the ranges matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a  NSRange equal to  {NSNotFound, 0}.
    • stop
      A reference to a  BOOL value that the block can use to stop the enumeration by setting  *stop = YES;, otherwise it should not touch  * stop.
Return Value
Returns  YES if there was no error, otherwise returns  NO and indirectly returns a  NSError object if  error is not  NULL.
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - enumerateStringsSeparatedByRegex:usingBlock:
  • - componentsSeparatedByRegex:options:range:error:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
  • Regular Expression Enumeration Options
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
flushCachedRegexData
Clears any cached information about the receiver.
- (void)flushCachedRegexData;
Discussion

This method should be used when performing searches on NSMutableString objects and there is the possibility that the string has mutated in between calls to RegexKitLite.

This method clears the cached information for the receiver only. This is more selective than clearStringCache, which clears all the cached information from RegexKitLite, including all the cached compiled regular expressions.

RegexKitLite automatically detects the vast majority of string mutations and clears any cached information for the mutated string. To detect mutations, RegexKitLite records a strings length and hash value at the point in time when it caches data for a string. Cached data for a string is invalidated if either of these values change between calls to RegexKitLite. The problem case is when a string is mutated but the strings length remains the same andthe hash value for the mutated string is identical to the hash value of the string before it was mutated. This is known as a hash collision. Since RegexKitLite is unable to detect that a string has mutated when this happens, the programmer needs to explicitly inform RegexKitLite that any cached data about the receiver needs to be cleared by sending flushCachedRegexData to the mutated string.

While it is possible to have "perfect mutation detection", and therefore guarantee that only valid cached data is used, it has a significant performance penalty. The first problem is that when caching information about a string, an immutable copy of that string needs to be made. The second problem is that determining that two strings are not identical is usually very fast and cheap— if their lengths are not the same, no further checks are required. The most expensive case is when two strings are identical because it requires a character by character comparison of the entire string to guarantee that they are equal. The most expensive case also happens to be the most common case, by far. To make matters worst, Cocoa provides no public way to determine if an instance is a mutable NSMutableString or an immutable NSString object. Therefore RegexKitLite must assume the worst case that all strings are mutable and have potentially mutated between calls to RegexKitLite.

RegexKitLite is optimized for the common case which is when regular expression operations are performed on strings that are not mutating. The majority of mutations to a string can be quickly and cheaply detected byRegexKitLite automatically. Since the programmer has the context of the string that is to be matched, and whether or not the string is being mutated, RegexKitLite relies on the programmer to inform it whether or not the possibility exists that the string could have mutated in a way that is undetectable.

An example of clearing a strings cached information:

NSMutableString *mutableSearchString; // Assumed to be valid.NSString *foundString = [mutableSearchString stringByMatching:@"\\d+"]; // Searched..[mutableSearchString replaceCharactersInRange:NSMakeRange(5UL, 10UL) withString:@"[replaced]"]; // Mutated..[mutableSearchString flushCachedRegexData]; // Clear cached information about mutableSearchString.
Warning:

Failure to clear the cached information for a NSMutableString object that has mutated between searches may result in undefined behavior.

Note:

You do not need to call clearStringCache or flushCachedRegexData when using the NSMutableString replaceOccurrencesOfRegex:withString: methods. The cached information for thatNSMutableString is automatically cleared as necessary.

Availability

Available in RegexKitLite 3.0 and later.

See Also
  • + clearStringCache
  • NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings
isMatchedByRegex:
Returns a Boolean value that indicates whether the receiver is matched by  regex.
- (BOOL)isMatchedByRegex:(NSString *)regex;
Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - isMatchedByRegex:inRange:
  • - isMatchedByRegex:options:inRange:error:
isMatchedByRegex:inRange:
Returns a Boolean value that indicates whether the receiver is matched by  regex within  range.
- (BOOL)isMatchedByRegex:(NSString *)regex inRange:(NSRange)range;
Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - isMatchedByRegex:
  • - isMatchedByRegex:options:inRange:error:
isMatchedByRegex:options:inRange:error:
Returns a Boolean value that indicates whether the receiver is matched by  regex within  range.
- (BOOL)isMatchedByRegex:(NSString *)regex  options:(RKLRegexOptions)options  inRange:(NSRange)range  error:(NSError **)error;
Discussion

The optional error parameter, if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.

Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - isMatchedByRegex:
  • - isMatchedByRegex:inRange:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
isRegexValid
Returns a Boolean value that indicates whether the regular expression contained in the receiver is valid.
- (BOOL)isRegexValid;
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - isRegexValidWithOptions:error:
isRegexValidWithOptions:error:
Returns a Boolean value that indicates whether the regular expression contained in the receiver is valid using  options.
- (BOOL)isRegexValidWithOptions:(RKLRegexOptions)options  error:(NSError **)error;
Parameters
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Discussion

This method can be used to determine if a regular expression is valid. For example:

NSError *error = NULL;NSString *regexString = @"[a-z"; // Missing the closing ]if([regexString isRegexValidWithOptions:RKLNoOptions error:&error] == NO) { NSLog(@"The regular expression is invalid. Error: %@", error);}
Availability

Available in RegexKitLite 3.0 and later.

See Also
  • - isRegexValid
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
rangeOfRegex:
Returns the range for the first match of  regex in the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex;
Return Value

NSRange structure giving the location and length of the first match of regex in the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex or an error occurs.

Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - rangeOfRegex:capture:
  • - rangeOfRegex:inRange:
  • - rangeOfRegex:options:inRange:capture:error:
rangeOfRegex:capture:
Returns the range of capture number  capture for the first match of  regex in the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex  capture:(NSInteger)capture;
Return Value

NSRange structure giving the location and length of capture number capture for the first match of regex in the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex or an error occurs.

Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - rangeOfRegex:
  • - rangeOfRegex:inRange:
  • - rangeOfRegex:options:inRange:capture:error:
rangeOfRegex:inRange:
Returns the range for the first match of  regex within  range of the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex  inRange:(NSRange)range;
Return Value

NSRange structure giving the location and length of the first match of regex within range of the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex within range or an error occurs.

Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - rangeOfRegex:
  • - rangeOfRegex:capture:
  • - rangeOfRegex:options:inRange:capture:error:
rangeOfRegex:options:inRange:capture:error:
Returns the range of capture number  capture for the first match of  regex within  range of the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex  options:(RKLRegexOptions)options  inRange:(NSRange)range  capture:(NSInteger)capture  error:(NSError **)error;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • capture
    The matching range of the capture number from  regex to return. Use  0 for the entire range that  regex matched.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Return Value

NSRange structure giving the location and length of capture number capture for the first match of regex within range of the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex within range or an error occurs.

Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - rangeOfRegex:
  • - rangeOfRegex:capture:
  • - rangeOfRegex:inRange:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
replaceOccurrencesOfRegex:usingBlock:
Enumerates the matches in the receiver by the regular expression  regex and executes  block for each match found. Replaces the characters that were matched with the contents of the string returned by  block, returning the number of replacements made.
- ( NSInteger) replaceOccurrencesOfRegex:( NSString *) regex  usingBlock:( NSString *(^)( NSInteger  captureCount,
                               NSString * const  capturedStrings[captureCount],
                              const  NSRange  capturedRanges[captureCount],
                              volatile  BOOL * const  stop)) block;
Parameters
  • regex
    A  NSString containing a regular expression.
  • block
    The block that is executed for each match of  regex in the receiver. The block takes four arguments:
    • captureCount
      The number of strings that  regex captured.  captureCount is always at least  1.
    • capturedStrings
      An array containing the substrings matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to  @"". This argument may be  NULL if  enumerationOptions had  RKLRegexEnumerationCapturedStringsNotRequired set.
    • capturedRanges
      An array containing the ranges matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a  NSRange equal to  {NSNotFound, 0}.
    • stop
      A reference to a  BOOL value that the block can use to stop the enumeration by setting  *stop = YES;, otherwise it should not touch  * stop.
Discussion

This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.

Return Value
Returns  -1 if there was an error, otherwise returns the number of replacements performed.
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - replaceOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
replaceOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
Enumerates the matches in the receiver by the regular expression  regex within  range using  options and executes  block using  enumerationOptions for each match found. Replaces the characters that were matched with the contents of the string returned by  block, returning the number of replacements made.
- ( NSInteger) replaceOccurrencesOfRegex:( NSString *) regex  options:( RKLRegexOptions) options  inRange:( NSRange) range  error:( NSError **) error  enumerationOptions:( RKLRegexEnumerationOptions) enumerationOptions  usingBlock:( NSString *(^)( NSInteger  captureCount,
                               NSString * const  capturedStrings[captureCount],
                              const  NSRange  capturedRanges[captureCount],
                              volatile  BOOL * const  stop)) block;
Parameters
  • regex
    A  NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
  • enumerationOptions
    A mask of options specified by combining  RKLRegexEnumerationOptions flags with the C bitwise OR operator. Either  0 or  RKLRegexEnumerationNoOptions may be used if no options are required.
  • block
    The block that is executed for each match of  regex in the receiver. The block takes four arguments:
    • captureCount
      The number of strings that  regex captured.  captureCount is always at least  1.
    • capturedStrings
      An array containing the substrings matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to  @"". This argument may be  NULL if  enumerationOptions had  RKLRegexEnumerationCapturedStringsNotRequired set.
    • capturedRanges
      An array containing the ranges matched by each capture group present in  regex. The size of the array is  captureCount. If a capture group did not match anything, it will contain a  NSRange equal to  {NSNotFound, 0}.
    • stop
      A reference to a  BOOL value that the block can use to stop the enumeration by setting  *stop = YES;, otherwise it should not touch  * stop.
Discussion

This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.

Return Value
Returns  -1 if there was an error and indirectly returns a  NSError object if  error is not  NULL, otherwise returns the number of replacements performed.
Availability

Available in RegexKitLite 4.0 and later.

See Also
  • - replaceOccurrencesOfRegex:usingBlock:
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
  • Regular Expression Enumeration Options
  • RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
  • Blocks Programming Topics
replaceOccurrencesOfRegex:withString:
Replaces all occurrences of the regular expression  regex with the contents of  replacement string after performing capture group substitutions, returning the number of replacements made.
- (NSInteger)replaceOccurrencesOfRegex:(NSString *)regex  withString:(NSString *)replacement;
Important:
Raises  RKLICURegexException if  replacement contains  $ n capture references where  n is greater than the number of capture groups in the regular expression  regex.
Discussion

This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.

Return Value
Returns  -1 if there was an error, otherwise returns the number of replacements performed.
Availability

Available in RegexKitLite 2.0 and later.

See Also
  • - replaceOccurrencesOfRegex:withString:range:
  • - replaceOccurrencesOfRegex:withString:options:range:error:
  • ICU Replacement Text Syntax
replaceOccurrencesOfRegex:withString:range:
Replaces all occurrences of the regular expression  regex within  range with the contents of  replacement string after performing capture group substitutions, returning the number of replacements made.
- (NSInteger)replaceOccurrencesOfRegex:(NSString *)regex  withString:(NSString *)replacement  range:(NSRange)range;
Important:
Raises  RKLICURegexException if  replacement contains  $ n capture references where  n is greater than the number of capture groups in the regular expression  regex.
Discussion

This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.

Return Value
Returns  -1 if there was an error, otherwise returns the number of replacements performed.
Availability

Available in RegexKitLite 2.0 and later.

See Also
  • - replaceOccurrencesOfRegex:withString:
  • - replaceOccurrencesOfRegex:withString:options:range:error:
  • ICU Replacement Text Syntax
replaceOccurrencesOfRegex:withString:options:range:error:
Replaces all occurrences of the regular expression  regex using  options within  range with the contents of  replacement string after performing capture group substitutions, returning the number of replacements made.
- (NSInteger)replaceOccurrencesOfRegex:(NSString *)regex  options:(RKLRegexOptions)options  withString:(NSString *)replacement  range:(NSRange)range  error:(NSError **)error;
Parameters
  • regex
    NSString containing a regular expression.
  • options
    A mask of options specified by combining  RKLRegexOptions flags with the C bitwise OR operator. Either  0 or  RKLNoOptions may be used if no options are required.
  • range
    The range of the receiver to search.
  • replacement
    The string to use as the replacement text for matches by  regex. See  ICU Replacement Text Syntax for more information.
    Important:
    Raises  RKLICURegexException if  replacement contains  $ n capture references where  n is greater than the number of capture groups in the regular expression  regex.
  • error
    An optional parameter that if set and an error occurs, will contain a  NSError object that describes the problem. This may be set to  NULL if information about any errors is not required.
Discussion

This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.

Return Value
Returns  -1 if there was an error and indirectly returns a  NSError object if  error is not  NULL, otherwise returns the number of replacements performed.
Availability

Available in RegexKitLite 2.0 and later.

See Also
  • - replaceOccurrencesOfRegex:withString:
  • - replaceOccurrencesOfRegex:withString:range:
  • ICU Replacement Text Syntax
  • RegexKitLite NSError Error Domains
  • RegexKitLite NSError and NSException User Info Dictionary Keys
  • Regular Expression Options
stringByMatching:
Returns a string created from the characters of the receiver that are in the range of the first match of  regex.
- (NSString *)stringByMatching:(NSString *)regex;
Return Value

NSString containing the substring of the receiver matched by regex. Returns NULL if the receiver is not matched by regex or an error occurs.

Availability

Available in RegexKitLite 1.0 and later.

See Also
  • - stringByMatching:capture:
  • - stringByMatching:inRange:
  • - stringByMatching:options:inRange:capture:error:
stringByMatching:capture:
Returns a string created from the characters of the receiver that are in the range of the first match of  regex for  capture.
- (NSString *)stringByMatching:(NSString *)regex  capture:(NSInteger)capture;
Return Value

A&nb

你可能感兴趣的:(<=即时总结=>,平台-iOS,敏捷开发-佛的减法)