iOS 中正则表达式使用方法汇总
太阳火神的美丽人生 (http://blog.csdn.net/opengl_es)
本文遵循“署名-非商业用途-保持一致”创作公用协议
转载请保留此句:太阳火神的美丽人生 - 本博客专注于 敏捷开发及移动和物联设备研究:iOS、Android、Html5、Arduino、pcDuino,否则,出自本博客的文章拒绝转载或再转载,谢谢合作。
某种语言中的正则工具算是木桶,而这个工具处理的是正则表达式,算是水,那么水很多,无论是淡水还是咸水,或是雨水,至少就Perl正则表达式这一支来说,足以装满任何一个木桶,只有这个木桶做得还不足以容纳这一类的所有的水的时侯;
那么,不要纠结于怪异的差异,以实现你的功能为主,以你的最终目标作为出发点,来研究某项工具能做的事情,以及如何来做,达到预期目的即可,这可能还是敏捷开发的一种表现,当下目的达到即可,不必纠结其它。
然而,在工作和具体任务之外,还是多了解一些为好,以便形成一套自已的知识体系,在任何需要的时侯,从立体角度去界定问题所对应的能解决它的知识板块在你自已知识体系所处的位置。
所以说,传统的软件工程方法,并不是没有它的道理,只不过是茫茫软件学发展过程中的一个必经之路,也是在学习和研究过程中形成的一套体系;而如今当某一层面成形之后,可能象使有五笔输入法一样,不必再去纠结于如何拆字,而是见字就打,以打字为目的,而非以拆字为目的,这时敏捷软件开发思想的当下原则便是长久以来不断摸索研究的结果。
其实,真的没必要把敏捷搞得神乎其神,没边儿没沿儿,做开发做久的人,都会从不断的传统软件工程方法的学习研究过程中走出来,然后以一种更便捷的,去重复劳动的方式来做事情,其实那就是敏捷,以当下和眼前目标为第一要务。
至于时间,这个也是敏捷开发在目前来看难以推行的一个大问题,本来同样一个活儿,一年也能干,一个月也能干,一天有时也能干,长工出细活儿,而我们开发人员也常是以经定的时间来琢磨着,哪些是核心的必做,剩下的时间哪些可以胡弄一下,以减少时间花费,这样正好能按预定时间做完。
然而,现在更多的时,让开发人员估时间,而没有一个相对来说明确和细致的需求可供参考,这些需求有时是在客户的脑子里,有时是在项目经理与客户沟通过程中得到或忽略了,最终造成一个必要因素被无端抛弃。
这其实,更多取决于项目经理的个人能力,一是要技术过硬,二是要有市场和需求的思想,能很准确或有方法地探出客户的真正需求以及需求层级来。
往往,一些开发老手,会以各种方法来“挤对”项目经理,直到把这个根儿给挖出来(可能这也是项目经理想要掌开发人员玩弄于股掌之中的唯一手段吧,如果连这个都被开发人员知道了,他就没什么作用了?!真应了余世维先生讲到的中国经理人的毛病了“一定要让别人觉得自已很重要才行......”),事情就迎刃而解;不过,有时挤对不好,容易引火上身,遭到公司更高层面领导的干预和压制,得不偿失,这样的话还不如做一天和尚撞一天钟,碰个大运,弄好了,走对叉路了,得到表扬,弄不好,耽误工夫,那责任无疑落在开发人员头上,一大堆理由等着你,所以就不要辩解了,明知道黑,说了有什么用,只要闭上眼,摸 着往前走就好了,尽量别让自已再碰壁,留个全尸才好。
能留个全尸,恭喜你,进阶了,成了老油条了!老油条,就是在荆棘路上走得多了,知道哪深哪浅,别管路走得远或近,耽误工夫也跟你没关系,要不然,你用心就等于自送性命。
可怜可悲,真正想做事情的人,确是如此下场,最终活下来,也是得曲线救国,方能自我保全。
可怜,忠勇之士,自古如此;
可悲,时下之风,随波逐流,弑忠无不谓之昏也;
万望明智者,壮志未酬的小老板们,擦亮双眼,贤才、忠勇尽收麾下,如刘备般爱才有道,匆弃川蜀而劳军伐吴(忠言搁置一旁),自毁 钱 程!
补了上面这一段,有些困了,从昨晚7:30睡到今早0:28,6个小时就再也睡不着了,断断续续写这一篇,两个来小时,应该不是梦游中完成。
不知是真的6小时就够了,还是0点到3点这一段的胆排毒时间到了,难道又出问题了?还是先吃药吧,中成药,效果不错,鸡骨草胶囊,吃一回想一回,吃嘛嘛香......希望是这样!
1、NSRegularExpression
Abel 22:14:23
大大们,有没有谁能发一个有关ios正则表达式的资料啊,或者链接也行,我之前学过perl的正则表达式,但是觉得和ios的不大一样
以下内容足够你参考的了!“The regular expression patterns and behavior are based on Perl's regular expressions.”这句说明,其还是基于 Perl正则表达式,不过针对c++环境有一些扩展。

http://userguide.icu-project.org/strings/regexp
2、NSPredicate
另外,谓词 NSPredicate 也可以使用正则表达式来进行过滤,简单地说,就是使用正则表达式语法来进行匹配。构建谓词的格式字符串可以实现一些常规的像SQL语句中的匹配,当使用正则时,需要用在格式字符串中使用 MATCHES


3、开源正则解析库
RegexKitLite 或许还有其它的开源库供使用,后续发现不断更新....
具体使用方法如下: http://regexkit.sourceforge.net/RegexKitLite/index.html#ICUSyntax 此地址需要方可访问,可使用 goAgent。
下面转贴上面链接原文:
PCRE (Perl Compatible Regular Expressions)
ICU (Internatinal Components For Unicode)
RegexKitLite
Lightweight Objective-C Regular Expressions for Mac OS X using the ICU Library
Introduction to RegexKitLite
This document introduces RegexKitLite for Mac OS X. RegexKitLite enables easy access to regular expressions by providing a number of additions to the standard Foundation NSString class. RegexKitLite acts as a bridge between the NSString class and the regular expression engine in the International Components for Unicode, or ICU, dynamic shared library that is shipped with Mac OS X.
亮点 Highlights
- Uses the regular expression engine from the ICU library which is shipped with Mac OS X.
- Automatically caches compiled regular expressions.
- Uses direct access to a strings UTF-16 buffer if it is available.
- Caches the UTF-16 conversion that is required by the ICU library when direct access to a strings UTF-16 buffer is unavailable.
- Small size makes it ideal for use in iPhone applications.
- Multithreading safe.
- 64-bit support.
- Custom DTrace probe points.
- Support for Mac OS X 10.5 Garbage Collection.
- Support for the Blocks language extension.
- Uses Core Foundation for greater speed.
- Very easy to use, all functionality is provided by a category extension to the NSString class.
- Consists of two files, a header and the Objective-C source.
- Xcode 3 integrated documentation available.
- Distributed under the terms of the BSD License.
谁应该阅读本文档 Who Should Read This Document
This document is intended for readers who would like to be able to use regular expressions in their Objective-C applications, whether those applications are for the Mac OS X desktop, or for the iPhone.
This document, and RegexKitLite, is also intended for anyone who has the need to search and manipulate NSString objects. If you've ever used the NSScanner, NSCharacterSet, and NSPredicate classes, or any of the theNSString rangeOf… methods, RegexKitLite is for you.
Regular expressions are a powerful way to search, match and extract, and manipulate strings. RegexKitLite can perform many of the same operations that NSScanner, NSCharacterSet, and NSPredicate perform, and usually do it with far fewer lines of code. As an example, RegexKitLite Cookbook - Parsing CSV Data contains an example that is just over a dozen lines, but is a full featured CSV, or Comma Separated Value, parser that takes a CSV input and turns it in to a NSArray of NSArrays.
本文档组织结构 Organization of This Document
本文档遵循苹果文档的约定和样式,分成两个主要部分:
This document follows the conventions and styles used by Apples documentation and is divided in to two main parts:
- 类参考部分,RegexKitLite 的 NSString 附加参考手册
The Class Reference part, RegexKitLite NSString Additions Reference.
- 编程指南部分,由下列章节组成:
The Programming Guide part, which consists of the following chapters:
-
- RegexKitLite 综述
RegexKitLite Overview
- 使用 RegexKitLite
Using RegexKitLite
- ICU 语法
ICU Syntax
- RegexKitLite 宝典
RegexKitLite Cookbook
- 将 RegexKitLite 加入到你的项目中
Adding RegexKitLite to your Project
额外的信息可以在下面部分找到
Additional information can be found in the following sections:
- 发布信息,包含 RegexKitLite 4.0 发布声明
Release Information, which contains the Release Notes for RegexKitLite 4.0
- 授权信息,包含 RegexKitLite BSD 授权
License Information, which contains the RegexKitLite BSD License.
通过捐赠支持 RegexKitLite
Supporting RegexKitLite through Financial Donations
A significant amount of time and effort has gone in to the development of RegexKitLite. Even though it is distributed under the terms of the BSD License, you are encouraged to contribute financially if you are using RegexKitLitein a profitable commercial application. Should you decide to contribute to RegexKitLite, please keep the following in mind:
- What it would have cost you in terms of hours, or consultant fees, to develop similar functionality.
- The Ohloh.net metrics do not factor in the cost of writing documentation, which is where most of the effort is spent.
- The target audience for RegexKitLite is very small, so there are relatively few units "sold".
You can contribute by visiting SourceForge.net's donation page for RegexKitLite.
Important:
You are always required to acknowledge the use of RegexKitLite in your product as specified in the terms of the BSD License.
下载 Download
You can download RegexKitLite distribution that corresponds to this version of the documentation here— RegexKitLite-4.0.tar.bz2 (139.1K). To be automatically notified when a new version of RegexKitLite is available, add the RegexKitLite documentation feed to Xcode.
PDF 文档 PDF Documentation
This document is available in PDF format: RegexKitLite-4.0.pdf (1.1M).
Note:
If you wish to print this document, it is recommend that you use the PDF version.
报告问题 Reporting Bugs
You can file bug reports, or review submitted bugs, at the following URL: http://sourceforge.net/tracker/?group_id=204582&atid=990188
Note:
Anonymous bug reports are no longer accepted due to spam. A SourceForge.net account is required to file a bug report.
联系作者 Contacting The Author
The author can be contacted at [email protected].
RegexKitLite Overview
While RegexKitLite is not a descendant of the RegexKit.framework source code, it does provide a small subset of RegexKits NSString methods for performing various regular expression tasks. These include determining the range that a regular expression matches within a string, easily creating a new string from the results of a match, splitting a string in to a NSArray with a regular expression, and performing search and replace operations with regular expressions using common $n substitution syntax.
RegexKitLite uses the regular expression provided by the ICU library that ships with Mac OS X. The two files, RegexKitLite.h and RegexKitLite.m, and linking against the /usr/lib/libicucore.dylib ICU shared library is all that is required. Adding RegexKitLite to your project only adds a few kilobytes of overhead to your applications size and typically only requires a few kilobytes of memory at run-time. Since a regular expression must first be compiled by the ICU library before it can be used, RegexKitLite keeps a small 4-way set associative cache with a least recently used replacement policy of the compiled regular expressions.
- RegexKit Framework
- International Components for Unicode
- Unicode Home Page
Official Support from Apple for ICU Regular Expressions
Mac OS X
As of Mac OS X 10.6, the author is not aware of any official support from Apple for linking to the libicucore.dylib library. On the other hand, the author is unaware of any official prohibition against it, either. Linking to the ICU library and making use of the ICU regular expression API is slightly different than making use of private, undocumented API's. There are a number of very good reasons why you shouldn't use private, undocumented API's, such as:
- The undocumented, private API is not yet mature enough for Apple to commit to supporting it. Once an API is made "public", developers expect future versions to at least be compatible with previously published versions.
- The undocumented, private API may expose implementation specific details that can change between versions. Public API's are the proper "abstraction layer boundary" that allows the provider of the API to hide implementation specific details.
The ICU library, on the other hand, contains a "published, public API" in which the ICU developers have committed to supporting in later releases, and RegexKitLite uses only these public APIs. One could argue that Apple is not obligated to continue to include the ICU library in later versions of Mac OS X, but this seems unlikely for a number of reasons which will not be discussed here. With the introduction of iPhone OS 3.2, Apple now officially supports iPhone applications linking to the ICU library for the purpose of using its regular expression functionality. This is encouraging news for Mac OS X developers if one assumes that Apple will try to keep some kind of parity between the iPhone OS and Mac OS X API's.
iPhone OS < 3.2
Prior to iPhone OS 3.2, there was never any official support from Apple for linking to the libicucore.dylib library. It was unclear if linking to the library would violate the iPhone OS SDK Agreement prohibition against using undocumented API's, but a large number of iPhone applications choose to use RegexKitLite, and the author is not aware of a single rejection because of it.
iPhone OS ≥ 3.2
Starting with iPhone OS 3.2, Apple now officially allows iPhone OS applications to link with the ICU library. The ICU library contains a lot of functionality for dealing with internationalization and localization, but Apple only officially permits the use of the ICU Regular Expression functionality.
Apple also provides a way to use ICU based regular expressions from Foundation by adding a new option to NSStringCompareOptions– NSRegularExpressionSearch. This new option can be used with the NSStringrangeOfString:options: method, and the following example of its usage is given:
// finds phone number in format nnn-nnn-nnnnNSRange r;NSString *regEx = @"{3}-[0-9]{3}-[0-9]{4}";r = [textView.text rangeOfString:regEx options:NSRegularExpressionSearch];if (r.location != NSNotFound) { NSLog(@"Phone number is %@", [textView.text substringWithRange:r]);} else { NSLog(@"Not found.");}
At this time, rangeOfString:options: is the only regular expression functionality Apple has added to Foundation and capture groups in a regular expression are not supported. Apple also gives the following note:
Note:
As noted in "ICU Regular-Expression Support," the ICU libraries related to regular expressions are included in iPhone OS 3.2. However, you should only use the ICU facilities if the NSString alternative is not sufficient for your needs.
RegexKitLite provides a much richer API, such as the automatic extraction of a match as a NSString. Using RegexKitLite, the example can be rewritten as:
// finds phone number in format nnn-nnn-nnnnNSString *regEx = @"{3}-[0-9]{3}-[0-9]{4}";NSString *match = [textView.text stringByMatching:regEx];if ([match isEqual:@""] == NO) { NSLog(@"Phone number is %@", match);} else { NSLog(@"Not found.");}
What's more, RegexKitLite provides easy access to all the matches of a regular expression in a NSString:
// finds phone number in format nnn-nnn-nnnnNSString *regEx = @"{3}-[0-9]{3}-[0-9]{4}";for(NSString *match in [textView.text componentsMatchedByRegex:regEx]) { NSLog(@"Phone number is %@", match);}
To do the same thing using just NSRegularExpressionSearch would require significantly more code and effort on your part. RegexKitLite also provides powerful search and replace functionality:
// finds phone number in format nnn-nnn-nnnnNSString *regEx = @"({3})-([0-9]{3}-[0-9]{4})";// and transforms the phone number in to the format of (nnn) nnn-nnnnNSString *replaced = [textView.text stringByReplacingOccurrencesOfRegex:regEx withString:@"($1) $2"];
RegexKitLite also has a number of performance enhancing features built in such as caching compiled regular expressions. Although the author does not have any benchmarks comparing NSRegularExpressionSearch toRegexKitLite, it is likely that RegexKitLite outperforms NSRegularExpressionSearch.
- What's New in iPhone OS - ICU Regular-Expression Support
- What's New in iPhone OS - Foundation Framework Changes
- iPad Programming Guide - ICU Regular-Expression Support
- iPad Programming Guide - Foundation-Level Regular Expressions
- NSRegularExpressionSearch
- - rangeOfString:options:
The iPhone 4.0 SDK Agreement
While iPhone OS 3.2 included official, Apple sanctioned use of linking of the ICU library for the purposes of using the ICU regular expression engine, the iPhone OS 4.0 SDK included the following change to the iPhone OS SDKAgreement:
3.3.1
Applications may only use Documented APIs in the manner prescribed by Apple and must not use or call any private APIs. Applications must be originally written in Objective-C, C, C++, or JavaScript as executed by the iPhone OS WebKit engine, and only code written in C, C++, and Objective-C may compile and directly link against the Documented APIs (e.g., Applications that link to Documented APIs through an intermediary translation or compatibility layer or tool are prohibited).
This raises a number of obvious questions:
- Does 3.3.1 apply to RegexKitLite?
- Will the use of RegexKitLite in an iPhone OS application be grounds for rejection under 3.3.1?
There is considerable speculation as to what is covered by this change, but at the time of this writing, there is no empirical evidence or official guidelines from Apple to make any kind of an informed decision as to whether or not the use of RegexKitLite would violate 3.3.1. It is the authors opinion that RegexKitLite could be considered as a compatibility layer between NSString and the now Documented APIs for regular expressions in the ICU library.
It is widely speculated that the motivation for the change to 3.3.1 was to prevent the development of Flash applications for the iPhone. The author believes that most reasonable people would consider the application ofcompatibility layer in this context to mean something entirely different than what it means when applied to RegexKitLite.
At this time, the author is not aware of a single iPhone application that has been rejected due to the use of RegexKitLite. If your application is rejected due to the use of RegexKitLite, please let the author know by emailing[email protected]. As always, CAVEAT EMPTOR.
The Difference Between RegexKit.framework and Regex点击打开链接KitLite
RegexKit.framework and RegexKitLite are two different projects. In retrospect, RegexKitLite should have been given a more distinctive name. Below is a table summarizing some of the key differences between the two:
|
RegexKit.framework |
RegexKitLite |
Regex Library |
PCRE (Perl Compatible Regular Expressions) |
ICU (Internatinal Components For Unicode) |
Library Included |
Yes, built into framework object file. |
No, provided by Mac OS X. |
Library Linked As |
Statically linked into framework. |
Dynamically linked to /usr/lib/libicucore.dylib. |
Compiled Size |
Approximately 371KB† per architecture. |
Very small, approximately 16KB—20KB‡ per architecture. |
Style |
External, linked to framework. |
Compiled directly in to final executable. |
Feature Set |
Large, with additions to many classes. |
Minimal, NSString only. |
†
-
Version 0.6.0. About half of the
371KB is the PCRE library.
The default distribution framework shared library file is
1.4MB in size and includes the
ppc,
ppc64,
i386, and
x86_64 architectures.
If
64-bit support is removed, the framework shared library file size drops to
664KB.
‡
-
Since the ICU library is part of
Mac OS X, it does not add to the final size.
Compiled Regular Expression Cache
The NSString that contains the regular expression must be compiled in to an ICU URegularExpression. This can be an expensive, time consuming step, and the compiled regular expression can be reused again in another search, even if the strings to be searched are different. Therefore RegexKitLite keeps a small cache of recently compiled regular expressions.
The cache is organized as a 4-way set associative cache, and the size of the cache can be tuned with the pre-processor define RKL_CACHE_SIZE. The default cache size, which should always be a prime number, is set to 13. Since the cache is 4-way set associative, the total number of compiled regular expressions that can be cached is RKL_CACHE_SIZE times four, for a total of 13 * 4, or 52. The NSString regexString is mapped to a cache set using modular arithmetic: Cache set ≡ [regexString hash] mod RKL_CACHE_SIZE, i.e. cacheSet = [regexString hash] % 13;. Since RegexKitLite uses Core Foundation, this is actually codedas cacheSet = CFHash(regexString) % RKL_CACHE_SIZE;.
Each of the four "ways" of a cache set are checked to see if it contains a NSString that was used to create the compiled regular expression that is identical to the NSString for the regular expression that is being checked. If there is an exact match, then the matching "way" is updated as the most recently used, and the compiled regular expression is used as-is. Otherwise, the least recently used, or LRU, "way" in the cache set is cleared and replaced with the compiled regular expression for the regular expression that wasn't in the cache.
In addition to the compiled regular expression cache, RegexKitLite keeps a small lookaside cache that maps a regular expressions NSString pointer and RKLRegexOptions directly to a cached compiled regular expression. When a regular expressions NSString pointer and RKLRegexOptions is in the lookaside cache, RegexKitLite can bypass calling CFHash(regexString) and checking each of the four "ways" in a cache set since the lookaside cache has provided the exact cached compiled regular expression. The lookaside cache is quite small at just 64 bytes and it was added because Shark.app profiling during performance tuning showed that CFHash(), while quite fast, was the primary bottleneck when retrieving already compiled and cached regular expressions, typically accounting for ≅40% of the look up time.
Regular Expressions in Mutable Strings
When a regular expression is compiled, an immutable copy of the string is kept. For immutable NSString objects, the copy is usually the same object with its reference count increased by one. Only NSMutableString objects will cause a new, immutable NSString to be created.
If the regular expression being used is stored in a NSMutableString, the cached regular expression will continue to be used as long as the NSMutableString remains unchanged. Once mutated, the changed NSMutableStringwill no longer be a match for the cached compiled regular expression that was being used by it previously. Even if the newly mutated strings hash is congruent to the previous unmutated strings hash modulo RKL_CACHE_SIZE, that is to say they share the same cache set (i.e., ([mutatedString hash] % RKL_CACHE_SIZE) == ([unmutatedString hash] % RKL_CACHE_SIZE)), the immutable copy of the regular expression string used to create the compiled regular expression is used to ensure true equality. The newly mutated string will have to go through the whole regular expression compile and cache creation process.
This means that NSMutableString objects can be safely used as regular expressions, and any mutations to those objects will immediately be detected and reflected in the regular expression used for matching.
Searching Mutable Strings
Unfortunately, the ICU regular expression API requires that the compiled regular expression be "set" to the string to be searched. To search a different string, the compiled regular expression must be "set" to the new string. Therefore, RegexKitLite tracks the last NSString that each compiled regular expression was set to, recording the pointer to the NSString object, its hash, and its length. If any of these parameters are different from the last parameters used for a compiled regular expression, the compiled regular expression is "set" to the new string. Since mutating a string will likely change its hash value, it's generally safe to search NSMutableString objects, and in most cases the mutation will reset the compiled regular expression to the updated contents of the NSMutableString.
Caution:
Care must be taken when mutable strings are searched and there exists the possibility that the string has mutated between searches. See NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings for more information.
Last Match Information
When performing a match, the arguments used to perform the match are kept. If those same arguments are used again, the actual matching operation is skipped because the compiled regular expression already contains the results for the given arguments. This is mostly useful when a regular expression contains multiple capture groups, and the results for different capture groups for the same match are needed. This means that there is only a small penalty for iterating over all the capture groups in a regular expression for a match, and essentially becomes the direct ICU regular expression API equivalent of uregex_start() and uregex_end().
- ICU4C C API - Regular Expressions
- ICU Regular Expression Syntax
UTF-16 Conversion Cache
RegexKitLite is ideal when the string being matched is a non-ASCII, Unicode string. This is because the regular expression engine used, ICU, can only operate on UTF-16 encoded strings. Since Cocoa keeps essentially allnon-ASCII strings encoded in UTF-16 form internally, this means that RegexKitLite can operate directly on the strings buffer without having to make a temporary copy and transcode the string in to ICU's required format.
Like all object oriented programming, the internal representation of an objects information is private. However, the ICU regular expression engine requires that the text to be search be encoded as a UTF-16 string. For pragmatic purposes, Core Foundation has several public functions that can provide direct access to the buffer used to hold the contents of the string, but such direct access is only available if the private buffer is already encoded in the requested direct access format. As a rough rule of thumb, 8-bit simple strings, such as ASCII, are kept in their 8-bit format. Non 8-bit simple strings are stored as UTF-16 strings. Of course, this is an implementation private detail, so this behavior should never be relied upon. It is mentioned because of the tremendous impact on matching performance and efficiency it can have if a string must be converted to UTF-16.
For strings in which direct access to the UTF-16 string is available, RegexKitLite uses that buffer. This is the ideal case as no extra work needs to be performed, such as converting the string in to a UTF-16 string, and allocating memory to hold the temporary conversion. Of course, direct access is not always available, and occasionally the string to be searched will need to be converted in to a UTF-16 string.
RegexKitLite has two conversion cache types. Each conversion cache type contains four buffers each, and buffers are re-used on a least recently used basis. If the selected cache type does not contain the contents of theNSString that is currently being searched in any of its buffers, the least recently used buffer is cleared and the current NSString takes it place. The first conversion cache type is fixed in size and set by the C pre-processordefine RKL_FIXED_LENGTH, which defaults to 2048. Any string whose length is less than RKL_FIXED_LENGTH will use the fixed size conversion cache type. The second conversion cache type, for strings whose length is longer than RKL_FIXED_LENGTH, will use a dynamically sized conversion buffer. The memory allocation for the dynamically sized conversion buffer is resized for each conversion with realloc() to the size needed to hold the entire contents of the UTF-16 converted string.
This strategy was chosen for its relative simplicity. Keeping track of dynamically created resources is required to prevent memory leaks. As designed, there are only four pointers to dynamically allocated memory: the four pointers to hold the conversion contents of strings whose length is larger than RKL_FIXED_LENGTH. However, since realloc() is used to manage those memory allocations, it becomes very difficult to accidentally leak the buffers. Having the fixed sized buffers means that the memory allocation system isn't bothered with many small requests, most of which are transient in nature to begin with. The current strategy tries to strike the best balance between performance and simplicity.
Mutable Strings
When converted in to a UTF-16 string, the hash of the NSString is recorded, along with the pointer to the NSString object and the strings length. In order for the RegexKitLite to use the cached conversion, all of these parameters must be equal to their values of the NSString to be searched. If there is any difference, the cached conversion is discarded and the current NSString, or NSMutableString as the case may be, is reconverted in to aUTF-16 string.
Caution:
Care must be taken when mutable strings are searched and there exists the possibility that the string has mutated between searches. See NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings for more information.
Multithreading Safety
RegexKitLite is also multithreading safe. Access to the compiled regular expression cache and the conversion cache is protected by a single OSSpinLock to ensure that only one thread has access at a time. The lock remains held while the regular expression match is performed since the compiled regular expression returned by the ICU library is not safe to use from multiple threads. Once the match has completed, the lock is released, and another thread is free to lock the cache and perform a match.
Important:
While it is safe to use the same regular expression from any thread at any time, the usual multithreading caveats apply. For example, it is not safe to mutate a NSMutableString in one thread while performing a match on the mutating string in another.
If Blocks functionality is enabled, and a RegexKitLite method that takes a Block as one of its parameters is used, RegexKitLite takes a slightly different approach in order to support the asynchronous, and possibly re-entrant, nature of Blocks.
First, an autoreleased Block helper proxy object is created and is used to keep track of any Block local resources needed to perform a Block-based enumeration.
Then the regular expression cache is checked exactly as before. Once a compiled regular expression is obtained, the ICU function uregex_clone is used to create a Block local copy of the regular expression. After the Block local copy has been made, the global compiled regular expression cache lock is unlocked.
If the string to be searched requires conversion to UTF-16, then a one time use Block local UTF-16 conversion of the string is created.
These changes mean that RegexKitLite Block-based enumeration methods are just as multithreading safe and easy to use as non-Block-based enumeration methods, such as the ability to continue to use RegexKitLite methods without any restrictions from within the Block used for enumeration.
64-bit Support
RegexKitLite is 64-bit clean. Internally, RegexKitLite uses Cocoas standard NSInteger and NSUInteger types for representing integer values. The size of these types change between 32-bit and 64-bit automatically, depending on the target architecture. ICU, on the other hand, uses a signed 32-bit int type for many of its arguments, such as string offset values. Because of this, the maximum length of a string that RegexKitLite will accept is the maximum value that can be represented by a signed 32-bit integer, which is approximately 2 gigabytes. Strings that are longer this limit will raise NSRangeException. This limitation may be significant to those who are switching to 64-bit because the size of the data they need to process exceeds what can be represented with 32-bits.
Note:
Several numeric constants throughout this document will have either L or UL appended to them— for example 0UL, or 2L. This is to ensure that they are treated as 64-bit long or unsigned long values, respectively, when targeting a 64-bit architecture.
Using RegexKitLite
The goal of RegexKitLite is not to be a comprehensive Objective-C regular expression framework, but to provide a set of easy to use primitives from which additional functionality can be created. To this end, RegexKitLiteprovides the following two core primitives from which everything else is built:
- - (NSInteger)captureCountWithOptions:(RKLRegexOptions)options error:(NSError **)error;
- - (NSRange)rangeOfRegex:(NSString *)regex options:(RKLRegexOptions)options inRange:(NSRange)range capture:(NSInteger)capture error:(NSError **)error;
There is often a need to create a new string of the characters that were matched by a regular expression. RegexKitLite provides the following method which conveniently combines sending the receiver substringWithRange:with the range returned by rangeOfRegex:.
RegexKitLite 2.0 adds the ability to split strings by dividing them with a regular expression, and the ability to perform search and replace operations using common $n substitution syntax.replaceOccurrencesOfRegex:withString: is used to modify the contents of NSMutableString objects directly and stringByReplacingOccurrencesOfRegex:withString: will create a new, immutable NSString from the receiver.
-
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
error:(NSError **)error;
-
- (NSUInteger)replaceOccurrencesOfRegex:(NSString *)regex
options:(RKLRegexOptions)options
withString:(NSString *)replacement
range:(NSRange)range
error:(NSError **)error;
-
- (NSString *)stringByReplacingOccurrencesOfRegex:(NSString *)regex
options:(RKLRegexOptions)options
withString:(NSString *)replacement
range:(NSRange)range
error:(NSError **)error;
RegexKitLite 3.0 adds several new methods that return a NSArray containing the aggregated results of a number of individual regex operations.
-
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
error:(NSError **)error;
-
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
error:(NSError **)error;
-
- (NSArray *)componentsMatchedByRegex:(NSString *)regex
range:(NSRange)range;
RegexKitLite 4.0 adds several new methods that take advantage of the new blocks language extension.
-
- (BOOL)
enumerateStringsMatchedByRegex:( NSString *)
regex
usingBlock:
(void (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (BOOL)
enumerateStringsMatchedByRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:
(void (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (BOOL)
enumerateStringsSeparatedByRegex:( NSString *)
regex
usingBlock:
(void (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (BOOL)
enumerateStringsSeparatedByRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:
(void (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (NSString *)
stringByReplacingOccurrencesOfRegex:( NSString *)
regex
usingBlock:
(NSString (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (NSString *)
stringByReplacingOccurrencesOfRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:
(NSString (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (NSUInteger)
replaceOccurrencesOfRegex:( NSString *)
regex
usingBlock:
(NSString (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
-
- (NSUInteger)
replaceOccurrencesOfRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:
(NSString (^)
(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop))block;
There are no additional classes that supply the regular expression matching functionality, everything is accomplished with the two methods above. These methods are added to the existing NSString class via an Objective-Ccategory extension. See RegexKitLiteNSString Additions Reference for a complete list of methods.
The real workhorse is the rangeOfRegex:options:inRange:capture:error: method. The receiver of the message is an ordinary NSString class member that you wish to perform a regular expression match on. The parameters of the method are a NSString containing the regular expression regex, any RKLRegexOptions match options, the NSRange range of the receiver that is to be searched, the capture number from the regular expression regex that you would like the result for, and an optional error parameter that will contain a NSError object if a problem occurs with the details of the error.
Important:
The C language assigns special meaning to the \ character when inside a quoted " " string in your source code. The \ character is the escape character, and the character that follows has a different meaning than normal. The most common example of this is \n, which translates in to the new-line character. Because of this, you are required to 'escape' any uses of \ by prepending it with another \. In practical terms this means doubling any \ in a regular expression, which unfortunately is quite common, that are inside of quoted " " strings in your source code. Failure to do so will result in numerous warnings from the compiler about unknown escape sequences. To match a single literal \ with a regular expression requires no less than four backslashes: "\\\\".
- ICU Regular Expression Syntax
- RegexKitLite Cookbook
- RegexKitLiteNSString Additions Reference
- Regular Expression Options
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Blocks Programming Topics
Finding the Range of a Match
A simple example:
NSString *searchString =
@"This is neat.";NSString *regexString =
@"(\\w+)\\s+(\\w+)\\s+(\\w+)";NSRange matchedRange =
NSMakeRange(NSNotFound, 0UL);NSError *error = NULL;
matchedRange = [searchString
rangeOfRegex:regexString
options:RKLNoOptions
inRange:searchRange
capture:2L
error:&error];NSLog(@"matchedRange: %@", NSStringFromRange(matchedRange));//
Continues…
In the previous example, the NSRange that capture number 2 matched is {5, 2}, which corresponds to the word is in searchString. Once the NSRange is known, you can create a new string containing just the matching text:
…example
NSString *matchedString = [searchString
substringWithRange:matchedRange];NSLog(@"matchedString: '%@'", matchedString);//
RegexKitLite can conveniently combine the two steps above with stringByMatching:. This example also demonstrates the use of one of the simpler convenience methods, where some of the arguments are automatically filled in with default values:
NSString *searchString =
@"This is neat.";NSString *regexString =
@"(\\w+)\\s+(\\w+)\\s+(\\w+)";NSString *matchedString = [searchString
stringByMatching:regexString
capture:2L];NSLog(@"matchedString: '%@'", matchedString);//
- - rangeOfRegex:
- - stringByMatching:
- ICU Regular Expression Syntax
Search and Replace
You can perform search and replace operations on NSString objects and use common $n capture group substitution in the replacement string:
NSString *searchString =
@"This is neat.";NSString *regexString =
@"\\b(\\w+)\\b";NSString *replaceWithString =
@"{$1}";NSString *replacedString =
NULL;replacedString =
[searchString
stringByReplacingOccurrencesOfRegex:regexString
withString:replaceWithString];NSLog(@"replaced string: '%@'", replacedString);//
Important:
Search and replace methods will raise a RKLICURegexException if the
replacementString contains
$
n capture references where
n is greater than the number of capture groups in the regular expression.
In this example, the regular expression \b(\w+)\b has a single capture group, which is created with the use of () parenthesis. The text that was matched inside the parenthesis is available for use in the replacement text by using $n, where n is the parenthesized capture group you would like to use. Additional capture groups are numbered sequentially in the order that they appear from left to right. Capture group 0 (zero) is also available and is equivalent to all the text that the regular expression matched.
Mutable strings can be manipulated directly:
NSMutableString *mutableString
= [NSMutableString
stringWithString:@"This is neat."];NSString *regexString
= @"\\b(\\w+)\\b";NSString *replaceWithString
= @"{$1}";NSUInteger replacedCount
= 0UL;replacedCount =
[mutableString
replaceOccurrencesOfRegex:regexString
withString:replaceWithString];NSLog(@"count: %lu string: '%@'", (u_long)replacedCount, mutableString);//
Search and Replace using Blocks
RegexKitLite 4.0 adds support for performing the same search and replacement on strings, except now the contents of the replacement string are created by the Block that is passed as the argument. For each match that is found in the string, the Block argument is called and passed the details of the match which includes a C array of
NSString objects, one for each capture, along with a C array of
NSRange structures with the range information for the current match. The text that was matched will be replaced with the
NSString object that the Block is required to return. % This allows you complete control over the contents of the replaced text, such as doing complex transformations of the matched text, which is much more flexible and powerful than the simple, fixed replacement functionality provided by stringByReplacingOccurrencesOfRegex:withString:. The example below is essentially the same as the previous search and replace examples, except this example uses the
capitalizedString method to capitalize the matched result, which is then used in the string that is returned as the replacement text. Note that the first letter in each word in
replacedString is now capitalized.
NSString *searchString = @"This is neat.";NSString *regexString = @"\\b(\\w+)\\b";NSString *replacedString = NULL;replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString usingBlock: ^NSString *(NSInteger captureCount, NSString * const capturedStrings, const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { return([NSString stringWithFormat:@"{%@}", [capturedStrings[1] capitalizedString]]); }];// 2010-04-14 21:00:42.726 test[35053:a0f] replaced string: '{This} {Is} {Neat}.'
- - replaceOccurrencesOfRegex:usingBlock:
- - replaceOccurrencesOfRegex:withString:
- - stringByReplacingOccurrencesOfRegex:usingBlock:
- - stringByReplacingOccurrencesOfRegex:withString:
- ICU Regular Expression Syntax
- ICU Replacement Text Syntax
Splitting Strings
Strings can be split with a regular expression using the componentsSeparatedByRegex: methods. This functionality is nearly identical to the preexisting NSString method componentsSeparatedByString:, except instead of only being able to use a fixed string as a separator, you can use a regular expression:
NSString *searchString
= @"This is neat.";NSString *regexString
= @"\\s+";NSArray *splitArray
= NULL;splitArray =
[searchString
componentsSeparatedByRegex:regexString];//
NSLog(@"splitArray: %@",
splitArray);
Continues…
The output from NSLog() when run from a shell:
…splitArray
shell%
./splitArray↵
2008-07-01 20:58:39.025 splitArray[69618:813] splitArray: ( This, is, "neat.")shell%
▌
Unfortunately our example string @"This is neat." doesn't allow us to show off the power of regular expressions. As you can probably imagine, splitting the string with the regular expression \s+ allows for one or morewhite space characters to be matched. This can be much more flexible than just a fixed string of @" ", which will split on a single space only. If our example string contained extra spaces, say @"This is neat.", the result would have been the same.
- - componentsSeparatedByRegex:
- ICU Regular Expression Syntax
Creating an Array of Every Match
RegexKitLite 3.0 adds several methods that conveniently perform a number of individual RegexKitLite operations and aggregate the results in to a NSArray. Since the result is a NSArray, the standard Cocoa collection enumeration patterns can be used, such as NSEnumerator and Objective-C 2.0's for…in feature. One of the most common tasks is to extract all of the matches of a regular expression from a string.componentsMatchedByRegex: returns the entire text matched by a regular expression even if the regular expression contains additional capture groups, effectively capture group 0. For example:
NSString *searchString =
@"$10.23, $1024.42, $3099";NSString *regexString =
@"\\$((\\d+)(?:\\.(\\d+)|\\.?))";NSArray *matchArray = NULL;matchArray = [searchString componentsMatchedByRegex:regexString];//
NSLog(@"matchArray: %@",
matchArray);
Continues…
The output from NSLog() when run from a shell:
…matchArray
shell%
./matchArray↵
2009-05-06 03:20:03.546 matchArray[69939:10b] matchArray: ( "$10.23", "$1024.42", "$3099")shell%
▌
As the example above demonstrates, componentsMatchedByRegex: returns the entire text that the regular expression matched even though the regular expression contains capture groups.arrayOfCaptureComponentsMatchedByRegex: can be used if you need to get the text that the individual capture groups matched as well:
NSString *searchString =
@"$10.23, $1024.42, $3099";NSString *regexString =
@"\\$((\\d+)(?:\\.(\\d+)|\\.?))";NSArray *capturesArray = NULL;capturesArray = [searchString arrayOfCaptureComponentsMatchedByRegex:regexString];/* capturesArray ==[NSArray arrayWithObjects: [NSArray arrayWithObjects:
@"$10.23",
@"10.23",
@"10",
@"23",
NULL], [NSArray arrayWithObjects:
@"$1024.42",
@"1024.42",
@"1024",
@"42",
NULL], [NSArray arrayWithObjects:
@"$3099",
@"3099",
@"3099",
@"",
NULL], NULL];*/
NSLog(@"capturesArray: %@",
capturesArray);
Continues…
The output from NSLog() when run from a shell:
…capturesArray
shell%
./capturesArray↵
2009-05-06 03:25:46.852 capturesArray[69981:10b] capturesArray: ( ( "$10.23", "10.23", 10, 23 ), ( "$1024.42", "1024.42", 1024, 42 ), ( "$3099", 3099, 3099, "" ))shell%
▌
- - arrayOfCaptureComponentsMatchedByRegex:
- - captureComponentsMatchedByRegex:
- - componentsMatchedByRegex:
- UsingRegexKitLite - Enumerating Matches
- Collections Programming Topics for Cocoa - Enumerators: Traversing a Collection's Elements
Enumerating Matches
The RegexKitLite componentsMatchedByRegex: method enables you to quickly create a NSArray containing all the matches of a regular expression in a string. To enumerate the contents of the NSArray, you can send the array an objectEnumerator message.
- - componentsMatchedByRegex:
- NSArray Class Reference
- NSEnumerator Class Reference
- Collections Programming Topics for Cocoa - Enumerators: Traversing a Collection's Elements
An example using componentsMatchedByRegex: and a NSEnumerator:
File name:main.m
#import
#import "RegexKitLite.h"int main(int argc, char *argv[]) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSString *searchString = @"one\ntwo\n\nfour\n"; NSArray *matchArray = NULL; NSEnumerator *matchEnumerator = NULL; NSString *regexString = @"(?m)^.*$"; NSLog(@"searchString: '%@'", searchString); NSLog(@"regexString : '%@'", regexString); matchArray = [searchString componentsMatchedByRegex:regexString]; matchEnumerator = [matchArray objectEnumerator]; NSUInteger line = 0UL; NSString *matchedString = NULL; while((matchedString = [matchEnumerator nextObject]) != NULL) { NSLog(@"%lu: %lu '%@'",
(u_long)++line,
(u_long)[matchedString length],
matchedString); } [pool release]; return(0);}
The following shell transcript demonstrates compiling the example and executing it. Line number three clearly demonstrates that matches of zero length are possible. Without the additional logic in nextObject to handle this special case, the enumerator would never advance past the match.
Note:
In the shell transcript below, the NSLog() line that prints searchString has been annotated with the '⏎' character to help visually identify the corresponding \n new-line characters in searchString.
shell%
cd examples↵
shell%
gcc -I.. -g -o main main.m../RegexKitLite.m -framework Foundation -licucore↵
shell%
./main↵
2008-03-21 15:56:17.469 main[44050:807] searchString: 'one
⏎
two
⏎
⏎
four
⏎
'2008-03-21 15:56:17.520 main[44050:807] regexString : '(?m)^.*$'2008-03-21 15:56:17.575 main[44050:807] 1: 3 'one'2008-03-21 15:56:17.580 main[44050:807] 2: 3 'two'2008-03-21 15:56:17.584 main[44050:807] 3: 0 ''2008-03-21 15:56:17.590 main[44050:807] 4: 4 'four'shell%
▌
Enumerating Matches with Objective-C 2.0
You can enumerate all the matches of a regular expression in a string using Objective-C 2.0's for…in feature. Compared to using a NSEnumerator, using for…in not only takes fewer lines of code to accomplish the same thing, it is usually faster as well.
- The Objective-C 2.0 Programming Language - Fast Enumeration
- Collections Programming Topics for Cocoa - Enumerators: Traversing a Collection's Elements
An example using the Objective-C 2.0 for…in feature:
File name:for_in.m
#import
#import "RegexKitLite.h"int main(int argc, char *argv[]) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSString *searchString = @"one\ntwo\n\nfour\n"; NSString *regexString = @"(?m)^.*$"; NSUInteger line = 0UL; NSLog(@"searchString: '%@'", searchString); NSLog(@"regexString : '%@'", regexString); for(NSString *matchedString in [searchString
componentsMatchedByRegex:regexString]) { NSLog(@"%lu: %lu '%@'",
(u_long)++line,
(u_long)[matchedString length],
matchedString); } [pool release]; return(0);}
Note:
The output of the preceding example is identical to the NSEnumerator shell output.
Enumerating Matches using Blocks
A third way to enumerate all the matches of a regular expression in a string is to use one of the Blocks-based enumeration methods.
- - enumerateStringsMatchedByRegex:usingBlock:
- Regular Expression Enumeration Options
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
An example using
enumerateStringsMatchedByRegex:usingBlock::
#import #import "RegexKitLite.h"int main(int argc, char *argv) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSString *searchString = @"one\ntwo\n\nfour\n"; NSString *regexString = @"(?m)^.*$"; __block NSUInteger line = 0UL; NSLog(@"searchString: '%@'", searchString); NSLog(@"regexString : '%@'", regexString); [searchString enumerateStringsMatchedByRegex:regexString usingBlock: ^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { NSLog(@"%lu: %lu '%@'", ++line, [capturedStrings[0] length], capturedStrings[0]); }]; [pool release]; return(0);}
Note:
The output of the preceding example is identical to the NSEnumerator shell output.
DTrace
Important:
DTrace support is not enabled by default. To enable DTrace support, use the RKL_DTRACE pre-processor flag: -DRKL_DTRACE
RegexKitLite has two DTrace probe points that provide information about its internal caches:
-
RegexKitLite:::compiledRegexCache(
unsigned long eventID,
const char *regexUTF8,
int options,
int captures,
int hitMiss,
int icuStatusCode,
const char *icuErrorMessage,
double *hitRate);
-
RegexKitLite:::utf16ConversionCache(
unsigned long eventID,
unsigned int lookupResultFlags,
double *hitRate,
const void *string,
unsigned long NSRange_location,
unsigned long NSRange_length,
long length);
Each of the probe points supply information via a number of arguments that are accessible through the DTrace variables arg0 … argn.
The first argument, eventID via arg0, is a unique event ID that is incremented each time the RegexKitLite mutex lock is acquired. All the probes that fire while the mutex is held will share the same event ID. This can help if you are trying to correlate multiple events across different CPUs.
Important:
Most uses of the dtrace command require superuser privileges. The examples given here use sudo to execute dtrace as the root user.
The following is available in examples/compiledRegexCache.d and demonstrates the use of all the arguments available via the RegexKitLite:::compiledRegexCache probe point:
File name:compiledRegexCache.d
#!/usr/sbin/dtrace -sRegexKitLite*:::compiledRegexCache { this->eventID = (unsigned long)arg0; this->regexUTF8 = copyinstr(arg1); this->options = (unsigned int)arg2; this->captures = (int)arg3; this->hitMiss = (int)arg4; this->icuStatusCode = (int)arg5; this->icuErrorMessage = (arg6 == 0) ? "" : copyinstr(arg6); this->hitRate = (double *)copyin(arg7, sizeof(double)); printf("%5d: %-60.60s Opt: %#8.8x Cap: %2d Hit: %2d Rate: %6.2f%% code: %5d msg: %s\n", this->eventID, this->regexUTF8, this->options, this->captures, this->hitMiss, *this->hitRate, this->icuStatusCode, this->icuErrorMessage);}
Below is an example of the output, which has been trimmed for brevity, from compiledRegexCache.d:
…compiledRegexCache.d
shell%
sudo dtrace -Z -q -s compiledRegexCache.d↵
110: (\[{2})(.+?)(]{2}) Opt: 0x00000000 Cap: 3 Hit: 0 Rate: 63.64% code: 0 msg: 111: (\[{2})(.+?)(]{2}) Opt: 0x00000000 Cap: 3 Hit: 1 Rate: 63.96% code: 0 msg: 131: (\w+ Opt: 0x00000000 Cap: -1 Hit: -1 Rate: 63.36% code: 66310 msg: U_REGEX_MISMATCHED_PAREN 164: \b\s* Opt: 0x00000000 Cap: 0 Hit: 0 Rate: 60.98% code: 0 msg: 165: \$((\d+)(?:\.(\d+)|\.?)) Opt: 0x00000000 Cap: 3 Hit: 1 Rate: 61.21% code: 0 msg: 166: \b(https?)://([a-zA-Z0-9\-.]+)((?:/[a-zA-Z0-9\-._?,'+\&%$… Opt: 0x00000000 Cap: 3 Hit: 0 Rate: 60.84% code: 0 msg: shell%
▌
An example that prints the number of times that a compiled regular expression was not in the cache per second:
shell%
sudo dtrace -Z -q -n 'RegexKitLite*:::compiledRegexCache /arg4 == 0/ { @miss[pid, execname] = count(); }' -n 'tick-1sec { printa("%-8d %-40s %@d/sec\n", @miss); trunc(@miss); }'↵
67003 RegexKitLite_tests 16/sec67008 RegexKitLite_tests 50/sec
^C
shell%
▌
- RegexKitLite:::compiledRegexCache
- Solaris Dynamic Tracing Guide (as .PDF)
The following is available in examples/utf16ConversionCache.d and demonstrates the use of all the arguments available via the RegexKitLite:::utf16ConversionCache probe point.
File name:utf16ConversionCache.d
#!/usr/sbin/dtrace -senum { RKLCacheHitLookupFlag = 1 << 0, RKLConversionRequiredLookupFlag = 1 << 1, RKLSetTextLookupFlag = 1 << 2, RKLDynamicBufferLookupFlag = 1 << 3, RKLErrorLookupFlag = 1 << 4};RegexKitLite*:::utf16ConversionCache { this->eventID = (unsigned long)arg0; this->lookupResultFlags = (unsigned int)arg1; this->hitRate = (double *)copyin(arg2, sizeof(double)); this->stringPtr = (void *)arg3; this->NSRange_location = (unsigned long)arg4; this->NSRange_length = (unsigned long)arg5; this->length = (long)arg6; printf("%5lu: flags: %#8.8x {Hit: %d Conv: %d SetText: %d Dyn: %d Error: %d} rate: %6.2f%% string: %#8.8p NSRange {%6lu, %6lu} length: %ld\n", this->eventID, this->lookupResultFlags, (this->lookupResultFlags & RKLCacheHitLookupFlag) != 0, (this->lookupResultFlags & RKLConversionRequiredLookupFlag) != 0, (this->lookupResultFlags & RKLSetTextLookupFlag) != 0, (this->lookupResultFlags & RKLDynamicBufferLookupFlag) != 0, (this->lookupResultFlags & RKLErrorLookupFlag) != 0, *this->hitRate, this->stringPtr, this->NSRange_location, this->NSRange_length, this->length);}
Below is an example of the output, which has been trimmed for brevity, from utf16ConversionCache.d:
…utf16ConversionCache.d
shell%
sudo dtrace -Z -q -s utf16ConversionCache.d↵
85: flags: 0x00000000 {Hit: 0 Conv: 0 SetText: 0 Dyn: 0 Error: 0} rate: 59.18% string: 0x0010f530 NSRange { 0, 18} length: 18 86: flags: 0x00000004 {Hit: 0 Conv: 0 SetText: 1 Dyn: 0 Error: 0} rate: 59.18% string: 0x0010f530 NSRange { 0, 18} length: 18 87: flags: 0x00000006 {Hit: 0 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 58.00% string: 0x00054930 NSRange { 1, 37} length: 39 88: flags: 0x00000003 {Hit: 1 Conv: 1 SetText: 0 Dyn: 0 Error: 0} rate: 58.82% string: 0x00054930 NSRange { 1, 37} length: 39 109: flags: 0x00000006 {Hit: 0 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 53.62% string: 0x00054d00 NSRange { 0, 56} length: 56 110: flags: 0x00000006 {Hit: 0 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 52.86% string: 0x00054680 NSRange { 0, 1064} length: 1064 111: flags: 0x00000007 {Hit: 1 Conv: 1 SetText: 1 Dyn: 0 Error: 0} rate: 53.52% string: 0x00054680 NSRange { 46, 978} length: 1064shell%
▌
An example that prints the number of times that a string required a conversion to UTF-16 and was not in the cache per second:
shell%
sudo dtrace -Z -q -n 'RegexKitLite*:::utf16ConversionCache /(arg1 & 0x3) == 0x2/ { @miss[pid, execname] = count(); }' -n 'tick-1sec { printa("%-8d %-40s %@d/sec\n", @miss); trunc(@miss); }'↵
67020 RegexKitLite_tests 73/sec67037 RegexKitLite_tests 64/sec
^C
shell%
▌
- RegexKitLite:::utf16ConversionCache
- RegexKitLite:::utf16ConversionCache arg1 Flags
- Solaris Dynamic Tracing Guide (as .PDF)
ICU Syntax
In this section:
- ICU Regular Expression Syntax
- ICU Regular Expression Character Classes
- Unicode Properties
- ICU Replacement Text Syntax
ICU Regular Expression Syntax
For your convenience, the regular expression syntax from the ICU documentation is included below. When in doubt, you should refer to the official ICU User Guide - Regular Expressions documentation page.
- ICU User Guide - Regular Expressions
- Unicode Technical Standard #18 - Unicode Regular Expressions
Operators
Operator |
Description |
| |
Alternation. A|B matches either A or B. |
* |
Match zero or more times. Match as many times as possible. |
+ |
Match one or more times. Match as many times as possible. |
? |
Match zero or one times. Prefer one. |
{n} |
Match exactly n times. |
{n,} |
Match at least n times. Match as many times as possible. |
{n,m} |
Match between n and m times. Match as many times as possible, but not more than m. |
*? |
Match zero or more times. Match as few times as possible. |
+? |
Match one or more times. Match as few times as possible. |
?? |
Match zero or one times. Prefer zero. |
{n}? |
Match exactly n times. |
{n,}? |
Match at least n times, but no more than required for an overall pattern match. |
{n,m}? |
Match between n and m times. Match as few times as possible, but not less than n. |
*+ |
Match zero or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails. Possessive match. |
++ |
Match one or more times. Possessive match. |
?+ |
Match zero or one times. Possessive match. |
{n}+ |
Match exactly n times. Possessive match. |
{n,}+ |
Match at least n times. Possessive match. |
{n,m}+ |
Match between n and m times. Possessive match. |
(…) |
Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match. |
(?:…) |
Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses. |
(?>…) |
Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the (?> . |
(?#…) |
Free-format comment (?#comment). |
(?=…) |
Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position. |
(?!…) |
Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position. |
(?<=…) |
Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or+ operators). |
(?…) |
Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators). |
(?ismwx-ismwx:…) |
Flag settings. Evaluate the parenthesized expression with the specified flags enabledor -disabled. |
(?ismwx-ismwx) |
Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.
See also: Regular Expression Options
|
- ICU User Guide - Regular Expressions
- Regular Expression Options
ICU Regular Expression Character Classes
The following was originally from ICU User Guide - UnicodeSet, but has been adapted to fit the needs of this documentation. Specifically, the ICU UnicodeSet documentation describes an ICU C++ object— UnicodeSet. The termUnicodeSet was effectively replaced with Character Class, which is more appropriate in the context of regular expressions. As always, you should refer to the original, official documentation when in doubt.
- ICU User Guide - UnicodeSet
- UTS #18 Unicode Regular Expressions - Subtraction and Intersection
- UTS #18 Unicode Regular Expressions - Properties
Overview
A character class is a regular expression pattern that represents a set of Unicode characters or character strings. The following table contains some example character class patterns:
Pattern |
Description |
[a-z] |
The lower case letters a through z |
[abc123] |
The six characters a, b, c, 1, 2, and 3 |
[\p{Letter}] |
All characters with the Unicode General Category of Letter. |
String Values
In addition to being a set of Unicode code point characters, a character class may also contain string values. Conceptually, a character class is always a set of strings, not a set of characters. Historically, regular expressions have treated […] character classes as being composed of single characters only, which is equivalent to a string that contains only a single character.
Character Class Patterns
Patterns are a series of characters bounded by square brackets that contain lists of characters and Unicode property sets. Lists are a sequence of characters that may have ranges indicated by a - between two characters, as ina-z. The sequence specifies the range of all characters from the left to the right, in Unicode order. For example, [a c d-f m] is equivalent to [a c d e f m]. Whitespace can be freely used for clarity as [a c d-f m] means the same as [acd-fm].
Unicode property sets are specified by a Unicode property, such as [:Letter:]. ICU version 2.0 supports General Category, Script, and Numeric Value properties (ICU will support additional properties in the future). For a list of the property names, see the end of this section. The syntax for specifying the property names is an extension of either POSIX or Perl syntax with the addition of =value. For example, you can match letters by using the POSIX syntax [:Letter:], or by using the Perl syntax \p{Letter}. The type can be omitted for the Category and Script properties, but is required for other properties.
The following table lists the standard and negated forms for specifying Unicode properties in both POSIX or Perl syntax. The negated form specifies a character class that includes everything but the specified property. For example, [:^Letter:] matches all characters that are not [:Letter:].
Syntax Style |
Standard |
Negated |
POSIX |
[:type=value:] |
[:^type=value:] |
Perl |
\p{type=value} |
\P{type=value} |
- UTS #18 Unicode Regular Expressions - Properties
Character classes can then be modified using standard set operations— Union, Inverse, Difference, and Intersection.
-
To union two sets, simply concatenate them. For example, [[:letter:] [:number:]]
-
To intersect two sets, use the & operator. For example, [[:letter:] & [a-z]]
-
To take the set-difference of two sets, use the - operator. For example, [[:letter:] - [a-z]]
-
To invert a set, place a ^ immediately after the opening [. For example, [^a-z]. In any other location, the ^ does not have a special meaning.
The binary operators & and - have equal precedence and bind left-to-right. Thus [[:letter:]-[a-z]-[\u0100-\u01FF]] is equivalent to [[[:letter:]-[a-z]]-[\u0100-\u01FF]]. Another example is the set[[ace][bdf] - [abc][def]] is not the empty set, but instead the set [def]. This only really matters for the difference operation, as the intersection operation is commutative.
Another caveat with the & and - operators is that they operate between sets. That is, they must be immediately preceded and immediately followed by a set. For example, the pattern [[:Lu:]-A] is illegal, since it is interpreted as the set [:Lu:] followed by the incomplete range -A. To specify the set of uppercase letters except for A, enclose the A in a set: [[:Lu:]-[A]].
Pattern |
Description |
[a] |
The set containing a. |
[a-z] |
The set containing a through z and all letters in between, in Unicode order. |
[^a-z] |
The set containing all characters but a through z, that is, U+0000 through a-1 and z+1 through U+FFFF. |
[[pat1][pat2]] |
The union of sets specified by pat1 and pat2. |
[[pat1]&[pat2]] |
The intersection of sets specified by pat1 and pat2. |
[[pat1]-[pat2]] |
The asymmetric difference of sets specified by pat1 and pat2. |
[:Lu:] |
The set of characters belonging to the given Unicode category. In this case, Unicode uppercase letters. The long form for this is [:UppercaseLetter:]. |
[:L:] |
The set of characters belonging to all Unicode categories starting with L, that is, [[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]]. The long form for this is [:Letter:]. |
- UTS #18 Unicode Regular Expressions - Subtraction and Intersection
String Values in Character Classes
String values are enclosed in {curly brackets}. For example:
Pattern |
Description |
[abc{def}] |
A set containing four members, the single characters a, b, and c and the string def |
[{abc}{def}] |
A set containing two members, the string abc and the string def. |
[{a}{b}{c}][abc] |
These two sets are equivalent. Each contains three items, the three individual characters a, b, and c. A {string} containing a single character is equivalent to that same character specified in any other way. |
Character Quoting and Escaping in ICU Character Class Patterns
Single Quote
Two single quotes represent a single quote, either inside or outside single quotes. Text within single quotes is not interpreted in any way, except for two adjacent single quotes. It is taken as literal text— special characters become non-special. These quoting conventions for ICU character classes differ from those of Perl or Java. In those environments, single quotes have no special meaning, and are treated like any other literal character.
Backslash Escapes
Outside of single quotes, certain backslashed characters have special meaning:
Pattern |
Description |
\uhhhh |
Exactly 4 hex digits; h in [0-9A-Fa-f] |
\Uhhhhhhhh |
Exactly 8 hex digits |
\xhh |
1-2 hex digits |
\ooo |
1-3 octal digits; o in [0-7] |
\a |
U+0007 BELL |
\b |
U+0008 BACKSPACE |
\t |
U+0009 HORIZONTAL TAB |
\n |
U+000A LINE FEED |
\v |
U+000B VERTICAL TAB |
\f |
U+000C FORM FEED |
\r |
U+000D CARRIAGE RETURN |
\\ |
U+005C BACKSLASH |
Anything else following a backslash is mapped to itself, except in an environment where it is defined to have some special meaning. For example, \p{Lu} is the set of uppercase letters. Any character formed as the result of a backslash escape loses any special meaning and is treated as a literal. In particular, note that \u and \U escapes create literal characters.
Whitespace
Whitespace, as defined by the ICU API, is ignored unless it is quoted or backslashed.
Property Values
The following property value styles are recognized:
Style |
Description |
Short |
Omits the =type argument. Used to prevent ambiguity and only allowed with the Category and Script properties. |
Medium |
Uses an abbreviated type and value. |
Long |
Uses a full type and value. |
If the type or value is omitted, then the = equals sign is also omitted. The short style is only used for Category and Script properties because these properties are very common and their omission is unambiguous.
In actual practice, you can mix type names and values that are omitted, abbreviated, or full. For example, if Category=Unassigned you could use what is in the table explicitly, \p{gc=Unassigned}, \p{Category=Cn}, or\p{Unassigned}.
When these are processed, case and whitespace are ignored so you may use them for clarity, if desired. For example, \p{Category = Uppercase Letter} or \p{Category = uppercase letter}.
For a list of properties supported by ICU, see ICU User Guide - Unicode Properties.
- ICU User Guide - Unicode Properties
- UTS #18 Unicode Regular Expressions - Properties
Unicode Properties
The following tables list some of the commonly used Unicode Properties, which can be matched in a regular expression with \p{Property}. The tables were created from the Unicode 5.2 Unicode Character Database, which is the version used by ICU that ships with Mac OS X 10.6.
Category |
L |
Letter |
LC |
CasedLetter |
Lu |
UppercaseLetter |
Ll |
LowercaseLetter |
Lt |
TitlecaseLetter |
Lm |
ModifierLetter |
Lo |
OtherLetter |
|
P |
Punctuation |
Pc |
ConnectorPunctuation |
Pd |
DashPunctuation |
Ps |
OpenPunctuation |
Pe |
ClosePunctuation |
Pi |
InitialPunctuation |
Pf |
FinalPunctuation |
Po |
OtherPunctuation |
|
N |
Number |
Nd |
DecimalNumber |
Nl |
LetterNumber |
No |
OtherNumber |
|
M |
Mark |
Mn |
NonspacingMark |
Mc |
SpacingMark |
Me |
EnclosingMark |
|
S |
Symbol |
Sm |
MathSymbol |
Sc |
CurrencySymbol |
Sk |
ModifierSymbol |
So |
OtherSymbol |
|
Z |
Separator |
Zs |
SpaceSeparator |
Zl |
LineSeparator |
Zp |
ParagraphSeparator |
|
C |
Other |
Cc |
Control |
Cf |
Format |
Cs |
Surrogate |
Co |
PrivateUse |
Cn |
Unassigned |
Script |
Arabic |
Armenian |
Balinese |
Bengali |
Bopomofo |
Braille |
Buginese |
Buhid |
Canadian_Aboriginal |
Carian |
Cham |
Cherokee |
Common |
Coptic |
Cuneiform |
Cypriot |
Cyrillic |
Deseret |
Devanagari |
Ethiopic |
Georgian |
Glagolitic |
Gothic |
Greek |
Gujarati |
Gurmukhi |
Han |
Hangul |
Hanunoo |
Hebrew |
Hiragana |
Inherited |
Kannada |
Katakana |
Kayah_Li |
Kharoshthi |
Khmer |
Lao |
Latin |
Lepcha |
Limbu |
Linear_B |
Lycian |
Lydian |
Malayalam |
Mongolian |
Myanmar |
New_Tai_Lue |
Nko |
Ogham |
Ol_Chiki |
Old_Italic |
Old_Persian |
Oriya |
Osmanya |
Phags_Pa |
Phoenician |
Rejang |
Runic |
Saurashtra |
Shavian |
Sinhala |
Sundanese |
Syloti_Nagri |
Syriac |
Tagalog |
Tagbanwa |
Tai_Le |
Tamil |
Telugu |
Thaana |
Thai |
Tibetan |
Tifinagh |
Ugaritic |
Unknown |
Vai |
Yi |
Extended Property Class |
ASCII_Hex_Digit |
Alphabetic |
Bidi_Control |
Dash |
Default_Ignorable_Code_Point |
Deprecated |
Diacritic |
Extender |
Grapheme_Base |
Grapheme_Extend |
Grapheme_Link |
Hex_Digit |
Hyphen |
IDS_Binary_Operator |
IDS_Trinary_Operator |
ID_Continue |
ID_Start |
Ideographic |
Join_Control |
Logical_Order_Exception |
Lowercase |
Math |
Noncharacter_Code_Point |
Other_Alphabetic |
Other_Default_Ignorable_Code_Point |
Other_Grapheme_Extend |
Other_ID_Continue |
Other_ID_Start |
Other_Lowercase |
Other_Math |
Other_Uppercase |
Pattern_Syntax |
Pattern_White_Space |
Quotation_Mark |
Radical |
STerm |
Soft_Dotted |
Terminal_Punctuation |
Unified_Ideograph |
Uppercase |
Variation_Selector |
White_Space |
XID_Continue |
XID_Start |
Unicode Character Database
Unicode properties are defined in the Unicode Character Database, or UCD. From time to time the UCD is revised and updated. The properties available, and the definition of the characters they match, depend on the UCD that ICU was built with.
Note:
In general, the ICU and UCD versions change with each major operating system release.
- UTS #18 Unicode Regular Expressions - Properties
- UTS #18 Unicode Regular Expressions - Compatibility Properties
- Unicode Character Database
- The Unicode Standard - Unicode 5.2
- Versions of the Unicode Standard
ICU Replacement Text Syntax
Replacement Text Syntax
Character |
Description |
$n |
The text of capture group
n will be substituted for
$
n.
n must be ≥
0 and not greater than the number of capture groups. A
$ not followed by a digit has no special meaning, and will appear in the substitution text as itself, a
$.
Important:
Methods will raise a RKLICURegexException if
n is greater than the number of capture groups in the regular expression.
|
\ |
Treat the character following the backslash as a literal, suppressing any special meaning. Backslash escaping in substitution text is only required for $ and \, but may proceed any character. The backslash itself will not be copied to the substitution text. |
- ICU User Guide - Replacement Text
- - replaceOccurrencesOfRegex:withString:options:range:error:
- - stringByReplacingOccurrencesOfRegex:withString:options:range:error:
RegexKitLite Cookbook
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. |
Jamie Zawinski |
This section contains a collection of regular expressions and example code demonstrating how RegexKitLite makes some common programming choirs easier. RegexKitLite makes it easy to match part of a string and extract just that part, or even create an entirely new string using just a few pieces of the original string. A great example of this is a string that contains a URL and you need to extract just a part of it, perhaps the host or maybe just the port used. This example demonstrates how easy it is to extract the port used from a URL, which is then converted in to a NSInteger value:
searchString = @"http://www.example.com:8080/index.html";regexString = @"\\bhttps?://[a-zA-Z0-9\\-.]+(?::(\\d+))?(?:(?:/[a-zA-Z0-9\\-._?,'+\\&%$=~*!():@\\\\]*)+)?";NSInteger portInteger = [[searchString stringByMatching:regexString capture:1L] integerValue];NSLog(@"portInteger: '%ld'", (long)portInteger);//
Inside you'll find more examples like this that you can use as the starting point for your own regular expression pattern matching solution. Keep in mind that these are meant to be examples to help get you started and not necessarily the ideal solution for every need. Trade‑offs are usually made when creating a regular expression, matching an email address is a perfect example of this. A regular expression that precisely matches the formal definition of email address is both complicated and usually unnecessary. Knowing which trade‑offs are acceptable requires that you understand what it is you're trying to match, the data that you're searching through, and the requirements and uses of the matched results. It won't take long until you gain an appreciation for Jamie Zawinski's infamous quote.
- O'Reilly - Mastering Regular Expressions, 3rd edition by Jeffrey Friedl
- RegExLib.com - Regular Expression Library
- ICU Userguide - Regular Expressions
- Regular-Expressions.info - Regex Tutorial, Examples, and Reference
- Wikipedia - Regular Expression
Pattern Matching Recipes
Numbers
Description |
Regex |
Examples |
Integer |
[+\-]?[0-9]+ |
123-42+23 |
Hex Number |
0[xX][0-9a-fA-F]+ |
0x00xdeadbeef0xF3 |
Floating Point |
[+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.) |
123..123+.42 |
Floating Point with Exponent |
[+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.)(?:[eE][+\-]?[0-9]+)? |
123..12310.0E131.23e-7 |
Comma Separated Number |
[0-9]{1,3}(?:,[0-9]{3})* |
421,2341,234,567 |
Comma Separated Number |
[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)? |
421,2341,234,567.89 |
Extracting and Converting Numbers
NSString includes several methods for converting the contents of the string in to a numeric value in the various C primitive types. The following demonstrates the matching of an int and double in a NSString, and then converting the matched string in to its base type.
Integer conversion…
NSString *searchString = @"The int 5542 to convert";NSString *regexString = @"([+\\-]?[0-9]+)";int matchedInt = [[searchString
stringByMatching:regexString
capture:1L]
intValue];
The variable matchedInt now contains the value of 5542.
Floating Point conversion…
NSString *searchString = @"The double 4321.9876 to convert";NSString *regexString =
@"([+\\-]?(?:[0-9]*\\.[0-9]+|[0-9]+\\.))";double matchedDouble = [[searchString
stringByMatching:regexString
capture:1L]
doubleValue];
The variable matchedDouble now contains the value of 4321.9876. doubleValue can even convert numbers that are in scientific notation, which represent numbers as n × 10exp:
Floating Point conversion…
NSString *searchString = @"The double 1.010489e5 to convert";NSString *regexString =
@"([+\\-]?(?:[0-9]*\\.[0-9]+|[0-9]+\\.)(?:[eE][+\\-]?[0-9]+)?)";double matchedDouble = [[searchString
stringByMatching:regexString capture:1L]
doubleValue];
The variable matchedDouble now contains the value of 101048.9.
Extracting and Converting Hex Numbers
Converting a string that contains a hex number in to a more basic type, such as an int, takes a little more work. Unfortunately, Foundation does not provide an easy way to convert a hex value in a string in to a more basic type as it does with intValue or doubleValue. Thankfully the standard C library provides a set of functions for performing such a conversion. For this example we will use the strtol() (string to long) function to convert the hex value we've extracted from searchString. We can not pass the pointer to the NSString object that contains the matched hex value since strtol() is part of the standard C library which can only work on pointers to C strings. We use the UTF8String method to get a pointer to a compatible C string of the matched hex value.
Hex conversion…
NSString *searchString = @"A hex value: 0x0badf00d";NSString *regexString = @"\\b(0[xX][0-9a-fA-F]+)\\b";NSString *hexString = [searchString stringByMatching:regexString
capture:1L];//
long hexLong = strtol([hexString UTF8String], NULL, 16);NSLog(@"hexLong: 0x%lx / %ld", (u_long)hexLong, hexLong);//
The full set of string to… functions are: strtol(), strtoll(), strtoul(), and strtoull(). These convert a string value, from base 2 to base 36, in to a long, long long, unsigned long, and unsigned long long respectively.
Adding Hex Value Conversions to NSString
Since it seems to be a frequently asked question, and a common search engine query for RegexKit web site visitors, here is a NSString category addition that converts the receivers text in to a NSInteger value. This is the same functionality as intValue or doubleValue, except that it converts hexadecimal text values instead of decimal text values.
Note:
The following code can also be found in the RegexKitLite distributions examples/ directory.
The example conversion code is fairly quick since it uses Core Foundation directly along with the stack to hold any temporary string conversions. Any whitespace at the beginning of the string will be skipped and the hexadecimal text to be converted may be optionally prefixed with either 0x or 0X. Returns 0 if the receiver does not begin with a valid hexadecimal text representation. Refer to strtol(3) for additional conversion details.
Important:
If the receiver needs to be converted in to an encoding that is compatible with strtol(), only the first sixty characters of the receiver are converted.
File name:NSString-HexConversion.h
#import
@interface NSString (HexConversion)-(NSInteger)hexValue;@end
File name:NSString-HexConversion.m
#import "NSString-HexConversion.h"#import
#include @implementation NSString (HexConversion)-(NSInteger)hexValue{ CFStringRef cfSelf = (CFStringRef)self; UInt8 buffer[64]; const char *cptr; if((cptr = CFStringGetCStringPtr(cfSelf, kCFStringEncodingMacRoman)) == NULL) { CFRange range = CFRangeMake(0L, CFStringGetLength(cfSelf)); CFIndex usedBytes = 0L; CFStringGetBytes(cfSelf, range, kCFStringEncodingUTF8, '?', false, buffer, 60L, &usedBytes); buffer[usedBytes] = 0; cptr = (const char *)buffer; } return((NSInteger)strtol(cptr, NULL, 16));}@end
- strtol(3)
- - intValue
- - doubleValue
Text Files
Description |
Regex |
Empty Line |
(?m:^$) |
Empty or Whitespace Only Line |
(?m-s:^\s*$) |
Strip Leading Whitespace |
(?m-s:^\s*(.*?)$) |
Strip Trailing Whitespace |
(?m-s:^(.*?)\s*$) |
Strip Leading and Trailing Whitespace |
(?m-s:^\s*(.*?)\s*$) |
Quoted String, Can Span Multiple Lines, May Contain \" |
"(?:[^"\\]*+|\\.)*" |
Quoted String, Single Line Only, May Contain \" |
"(?:[^"\\\r\n]*+|\\[^\r\n])*" |
HTML Comment |
(?s:<--.*?-->) |
Perl / Shell Comment |
(?m-s:#.*$) |
C, C++, or ObjC Comment |
(?m-s://.*$) |
C, C++, or ObjC Comment and Leading Whitespace |
(?m-s:\s*//.*$) |
C, C++, or ObjC Comment |
(?s:/\*.*?\*/) |
The Newline Debacle
Unfortunately, when processing text files, there is no standard 'newline' character or character sequence. Today this most commonly surfaces when converting text between Microsoft Windows / MS-DOS and Unix / Mac OS X. The reason for the proliferation of newline standards is largely historical and goes back many decades. Below is a table of the dominant newline character sequence 'standards':
Description |
Sequence |
C String |
Control |
Common Uses |
Line Feed |
\u000A |
\n |
^J |
Unix, Amiga, Mac OS X |
Vertical Tab |
\u000B |
\v |
^K |
|
Form Feed |
\u000C |
\f |
^L |
|
Carriage Return |
\u000D |
\r |
^M |
Apple ][, Mac OS ≤ 9 |
Next Line (NEL) |
\u0085 |
|
|
IBM / EBCDIC |
Line Separator |
\u2028 |
|
|
Unicode |
Paragraph Separator |
\u2029 |
|
|
Unicode |
Carriage Return + Line Feed |
\u000D\u000A |
\r\n |
^M^J |
MS-DOS, Windows |
Ideally, one should be flexible enough to accept any of these character sequences if one has to process text files, especially if the origin of those text files is not known. Thankfully, regular expressions excel at just such a task. Below is a regular expression pattern that will match any of the above character sequences. This is also the character sequence that the metacharacter $ matches.
Description |
Regex |
Notes |
Any newline |
(?:\r\n|[\n\v\f\r\x85\p{Zl}\p{Zp}]) |
UTS #18 recommended. Character sequence that $ matches. |
- UTS #18: Unicode Regular Expressions - Line Boundaries
- Wikipedia - Newline
Matching the Beginning and End of a Line
It is often necessary to work with the individual lines of a file. There are two regular expression metacharacters, ^ and $, that match the beginning and end of a line, respectively. However, exactly what is matched by ^ and $depends on whether or not the multi-line option is enabled for the regular expression, which by default is disabled. It can be enabled for the entire regular expression by passing RKLMultiline via the options: method argument, or within the regular expression using the options syntax— (?m:…).
If multi-line is disabled, then ^ and $ match the beginning and end of the entire string. If there is a newline character sequence at the very end of the string, then $ will match the character just before the newline character sequence. Any newline character sequences in the middle of the string will not be matched.
If multi-line is enabled, then ^ and $ match the beginning and end of a line, where the end of a line is the newline character sequence. The metacharacter ^ matches either the first character in the string, or the first character following a newline character sequence. The metacharacter $ matches either the last character in the string, or the character just before a newline character sequence.
Creating a NSArray Containing Every Line in a String
A common text processing pattern is to process a file one line at a time. Using the recommended regular expression for matching any newline and the componentsSeparatedByRegex: method, you can easily create a NSArraycontaining every line in a file and process it one line at a time:
Process every line…
NSString *fileNameString = @"example";NSString *regexString =
@"(?:\r\n|[\n\v\f\r\302\205\\p{Zl}\\p{Zp}])";NSError *error = NULL;NSString *fileString = [NSString
stringWithContentsOfFile:fileNameString
usedEncoding:NULL
error:&error];if(fileString) { NSArray *linesArray = [fileString
componentsSeparatedByRegex:regexString]; for(NSString *lineString in linesArray) {
// //
}} else { NSLog(@"Error reading file '%@'", fileNameString); if(error) { NSLog(@"Error: %@", error); }}
The componentsSeparatedByRegex: method effectively 'chops off' the matched regular expression, or in this case any newline character. In the example above, within the for…in loop, lineString will not have a newline character at the end of the string.
Parsing CSV Data
Description |
Regex |
Split CSV line |
,(?=(?:(?:[^"\\]*+|\\")*"(?:[^"\\]*+|\\")*")*(?!(?:[^"\\]*+|\\")*"(?:[^"\\]*+|\\")*$)) |
This regular expression essentially works by ensuring that there are an even number of unescaped " quotes following a , comma. This is done by using look-head assertions. The first look-head assertion, (?=, is a pattern that matches zero or more strings that contain two " characters. Then, a negative look-head assertion matches a single, unpaired " quote character remaining at the $ end of the line. It also uses possessive matches in the form of *+for speed, which prevents the regular expression engine from backtracking excessively. It's certainly not a beginners regular expression.
The following is used as a substitute for a CSV data file in the example below.
Example CSV data…
NSString *csvFileString
= @"RegexKitLite,1.0,\"Mar 23, 2008\",27004\n"
@"RegexKitLite,1.1,\"Mar 28, 2008\",28081\n"
@"RegexKitLite,1.2,\"Apr 01, 2008\",28765\n"
@"RegexKitLite,2.0,\"Jul 07, 2008\",40569\n"
@"RegexKitLite,2.1,\"Jul 12, 2008\",40660\n";
This example really highlights the power of regular expressions when it comes to processing text. It takes just 17 lines, which includes comments, to parse a CSV data file of any newline type and create a row by column ofNSArray values of the results while correctly handling " quoted values, including escaped \" quotes.
Parse CSV data…
NSString *newlineRegex
= @"(?:\r\n|[\n\v\f\r\\x85\\p{Zl}\\p{Zp}])";NSString *splitCSVLineRegex
= @",(?=(?:(?:[^\"\\\\]*+|\\\\\")*\"(?:[^\"\\\\]*+|\\\\\")*\")*(?!(?:[^\"\\\\]*+|\\\\\")*\"(?:[^\"\\\\]*+|\\\\\")*$))";//
NSArray *csvLinesArray =
[csvFileString componentsSeparatedByRegex:
newlineRegex];//
id splitLines[[csvLinesArray count]]; //
NSUInteger splitLinesIndex = 0UL;
//
for(NSString *csvLineString in csvLinesArray) {
//
if([csvLineString
isMatchedByRegex:@"^\\s*$"])
{ continue; } //
splitLines[splitLinesIndex++] = [csvLineString
componentsSeparatedByRegex:
splitCSVLineRegex];}//
NSArray *splitLinesArray = [NSArray
arrayWithObjects:
&splitLines[0]
count:
splitLinesIndex];
Network and URL
Description |
Regex |
HTTP |
\bhttps?://[a-zA-Z0-9\-.]+(?:(?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)? |
HTTP |
\b(https?)://([a-zA-Z0-9\-.]+)((?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)? |
HTTP |
\b(https?)://(?:(\S+?)(?::(\S+?))?@)?([a-zA-Z0-9\-.]+)(?::(\d+))?((?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)? |
E-Mail |
\b([a-zA-Z0-9%_.+\-]+)@([a-zA-Z0-9.\-]+?\.[a-zA-Z]{2,6})\b |
Hostname |
\b(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}?[a-zA-Z0-9]\.)+[a-zA-Z]{2,6}\b |
IP |
\b(?:\d{1,3}\.){3}\d{1,3}\b |
IP with Optional Netmask |
\b((?:\d{1,3}\.){3}\d{1,3})(?:/(\d{1,2}))?\b |
IP or Hostname |
\b(?:(?:\d{1,3}\.){3}\d{1,3}|(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}?[a-zA-Z0-9]\.)+[a-zA-Z]{2,6})\b |
The following example demonstrates how to match several fields in a URL and create a NSDictionary with the extracted results. Only the capture groups that result in a successful match will create a corresponding key in the dictionary.
HTTP URL…
NSString *searchString =
@"http://johndoe:[email protected]:8080/private/mail/index.html";NSString *regexString =
@"\\b(https?)://(?:(\\S+?)(?::(\\S+?))?@)?([a-zA-Z0-9\\-.]+)(?::(\\d+))?((?:/[a-zA-Z0-9\\-._?,'+\\&%$=~*!():@\\\\]*)+)?";if([searchString isMatchedByRegex:regexString]) { NSString *protocolString = [searchString
stringByMatching:regexString
capture:1L]; NSString *userString = [searchString
stringByMatching:regexString
capture:2L]; NSString *passwordString = [searchString
stringByMatching:regexString
capture:3L]; NSString *hostString = [searchString
stringByMatching:regexString
capture:4L]; NSString *portString = [searchString
stringByMatching:regexString
capture:5L]; NSString *pathString = [searchString
stringByMatching:regexString
capture:6L]; NSMutableDictionary *urlDictionary =
[NSMutableDictionary dictionary]; if(protocolString) { [urlDictionary
setObject:protocolString
forKey:@"protocol"]; } if(userString) { [urlDictionary
setObject:userString
forKey:@"user"]; } if(passwordString) { [urlDictionary
setObject:passwordString
forKey:@"password"]; } if(hostString) { [urlDictionary
setObject:hostString
forKey:@"host"]; } if(portString) { [urlDictionary
setObject:portString
forKey:@"port"]; } if(pathString) { [urlDictionary
setObject:pathString
forKey:@"path"]; } NSLog(@"urlDictionary: %@", urlDictionary);}
RegexKitLite 4.0 adds a new method, dictionaryByMatchingRegex:…, that makes the creation of NSDictionary objects like this much easier, as the following example demonstrates:
RegexKitLite ≥ 4.0 example…
NSString *searchString =
@"http://johndoe:[email protected]:8080/private/mail/index.html";NSString *regexString =
@"\\b(https?)://(?:(\\S+?)(?::(\\S+?))?@)?([a-zA-Z0-9\\-.]+)(?::(\\d+))?((?:/[a-zA-Z0-9\\-._?,'+\\&%$=~*!():@\\\\]*)+)?";NSDictionary *urlDictionary = [searchString
dictionaryByMatchingRegex:regexString
withKeysAndCaptures:@"protocol", 1,
@"user", 2,
@"password", 3,
@"host", 4,
@"port", 5,
@"path", 6,
NULL];if(urlDictionary != NULL) {
NSLog(@"urlDictionary: %@", urlDictionary); }
Note:
Other than the difference in mutability for the dictionary containing the result, the RegexKitLite 4.0 dictionaryByMatchingRegex:… example produces the same result as the more verbose, pre-4.0 example.
These examples can form the basis of a function or method that takes a NSString as an argument and returns a NSDictionary as a result, maybe even as a category addition to NSString. The following is the output when the examples above are compiled and run:
shell%
./http_example↵
2008-09-01 10:57:55.245 test_nsstring[31306:807] urlDictionary: { host = "www.example.com"; password = secret; path = "/private/mail/index.html"; port = 8080; protocol = http; user = johndoe;}shell%
▌
Adding RegexKitLite to your Project
Note:
The following outlines a typical set of steps that one would perform. This is not the only way, nor the required way to add RegexKitLite to your application. They may not be correct for your project as each project is unique. They are an overview for those unfamiliar with adding additional shared libraries to the list of libraries your application links against.
Outline of Required Steps
The following outlines the steps required to use RegexKitLite in your project.
- Linking your application to the ICU dynamic shared library.
- Adding the RegexKitLite.m and RegexKitLite.h files to your project and application target.
- Import the RegexKitLite.h header.
- Xcode Build System Guide - Linking
Adding RegexKitLite using Xcode
Important:
These instructions apply to
Xcode versions 2.4.1 and 3.0. Other versions should be similar, but may vary for specific details.
Unfortunately, adding additional dynamic shared libraries that your application links to is not a straightforward process in Xcode, nor is there any recommended standard way. Two options are presented below— the first is the 'easy' way that alters your applications Xcode build settings to pass an additional command line argument directly to the linker. The second option attempts to add the ICU dynamic shared library to the list of resources for your project and configuring your executable to link against the added resource.
The 'easy' way is the recommended way to link against the ICU dynamic shared library.
The Easy Way To Add The ICU Library
-
First, determine the build settings layer of your project that should have altered linking configuration change applied to. The build settings in Xcode are divided in to layers and each layer inherits the build settings from the layer above it. The top, global layer is , followed by , and finally the most specific layer . If your project is large enough to have multiple targets and executables, you probably have an idea which layer is appropriate. If you are unsure or unfamiliar with the different layers, is recommended.
-
Select the appropriate layer from the menu. If you are unsure, is recommended.
-
Select from the tab near the top of the window. Find the Other Linker Flags build setting from the many build settings available and edit it. Add -licucore [dash ell icucore as a single word, withoutspaces]. If there are already other flags present, it is recommended that you add -licucore to the end of the existing flags.
Important:
If other linker flags are present, there must be at least one space separating
-licucore from the other linker flags. For example,
-flag1 -licucore -flag2
Note:
The
drop down menu controls which build configuration the changes you make are applied to.
should be selected if this is the first time your are making these changes.
- Follow the Add TheRegexKitLite Source Files To Your Project steps below.
- Xcode Build System Guide - Build Settings
The Hard Way To Add The ICU Library
-
First, add the ICU dynamic shared library to your Xcode project. You may choose to add the library to any group in your project, and which groups are created by default is dependent on the template type you chose when you created your project. For a typical Cocoa application project, a good choice is the Frameworks group. To add the ICU dynamic shared library, control/right-click on the Framework group and choose
-
Next, you will need to choose the ICU dynamic shared library file to add. Exactly which file to choose depends on your project, but a fairly safe choice is to select/Developer/SDKs/MacOSX10.6.sdk/usr/lib/libicucore.dylib. You may have installed your developer tools in a different location than the default /Developer directory, and the Mac OS X SDK version should be the one your project is targeting, typically the latest one available.
-
Then, in the dialog that follows, make sure that Copy items into… is unselected. Select the targets you will be using RegexKitLite in and then click to add the ICU dynamic shared library to your project.
-
Once the ICU dynamic shared library is added to your project, you will need to add it to the libraries that your executable is linked with. To do so, expand the Targets group, and then expand the executable targets you will be using RegexKitLite in. You will then need to select the libicucore.dylib file that you added in the previous step and drag it in to the Link Binary With Libraries group for each executable target that you will be using RegexKitLite in. The order of the files within the Link Binary With Libraries group is not important, and for a typical Cocoa application the group will contain the Cocoa.framework file.
Add The RegexKitLite Source Files To Your Project
-
Next, add the RegexKitLite source files to your Xcode project. In the Groups & Files outline view on the left, control/right-click on the group that would like to add the files to, then select
Note:
You can perform the following steps once for each file (RegexKitLite.h and RegexKitLite.m), or once by selecting both files from the file dialog.
-
Select the RegexKitLite.h and / or RegexKitLite.m file from the file chooser dialog.
-
The next dialog will present you with several options. If you have not already copied the RegexKitLite files in to your projects directory, you may want to click on the Copy items into… option. Select the targets that you would like add the RegexKitLite functionality to.
-
Finally, you will need to include the RegexKitLite.h header file. The best way to do this is very dependent on your project. If your project consists of only half a dozen source files, you can add:
#import "RegexKitLite.h"
manually to each source file that makes uses of RegexKitLites features. If your project has grown beyond this, you've probably already organized a common "master" header to include to capture headers that are required by nearly all source files already.
Adding RegexKitLite using the Shell
Using RegexKitLite from the shell is also easy. Again, you need to add the header #import to the appropriate source files. Then, to link to the ICU library, you typically only need to add -licucore, just as you would any other library. Consider the following example:
File name:link_example.m
#import
#import #import "RegexKitLite.h"int main(int argc, char *argv[]) { NSAutoreleasePool *pool = [
[NSAutoreleasePool alloc]
init]; //
//
char *utf8CString =
"Copyright \xC2\xA9 \xE2\x89\x85 2008"; NSString *regexString = @"Copyright (.*) (\\d+)"; NSString *subjectString = [NSString
stringWithUTF8String:utf8CString]; NSString *matchedString = [subjectString
stringByMatching:regexString
capture:1L]; NSLog(@"subject: \"%@\"", subjectString); NSLog(@"matched: \"%@\"", matchedString); [pool release]; return(0);}
Compiled and run from the shell:
shell%
cd examples↵
shell%
gcc -g -I.. -o link_example link_example.m../RegexKitLite.m -framework Foundation -licucore↵
shell%
./link_example↵
2008-03-14 03:52:51.187 test[15283:807] subject: "Copyright © ≅ 2008"2008-03-14 03:52:51.269 test[15283:807] matched: "© ≅"shell%
▌
RegexKitLite NSString Additions Reference
Extends by category
NSString, NSMutableString
RegexKitLite
4.0
Companion guides
- ICU User Guide - Regular Expressions
Overview
RegexKitLite is not meant to be a full featured regular expression framework. Because of this, it provides only the basic primitives needed to create additional functionality. It is ideal for developers who:
- Developing applications for the iPhone.
- Have modest regular expression needs.
- Require a very small footprint.
- Unable or unwilling to add additional, external frameworks.
- Deal predominantly in UTF-16 encoded Unicode strings.
- Require the enhanced word breaking functionality provided by the ICU library.
RegexKitLite consists of only two files, the header file RegexKitLite.h and RegexKitLite.m. The only other requirement is to link with the ICU library that comes with Mac OS X. No new classes are created, all functionality is provided as a category extension to the NSString and NSMutableString classes.
- RegexKitLite Guide
- ICU Regular Expression Syntax
- AddingRegexKitLite to your Project
- License Information
- RegexKit Framework
- International Components for Unicode
- Unicode Home Page
Compile Time Preprocessor Tunables
The settings listed below are implemented using the C Preprocessor. Some of the setting are simple boolean enabled or disabled settings, while others specify a value, such as the number of cached compiled regular expressions. There are several ways to alter these settings, but if you are not familiar with this style of compile time configuration settings and how to alter them using the C Preprocessor, it is recommended that you use the default values provided.
Setting |
Default |
Description |
NS_BLOCK_ASSERTIONS |
n/a |
RegexKitLite contains a number of extra run-time assertion checks that can be disabled with this flag. The standard NSException.h assertion macros are not used because of the multithreading lock. This flag is typically set for style builds where the additional error checking is no longer necessary. |
RKL_APPEND_TO_ICU_FUNCTIONS |
None |
This flag is useful if you are supplying your own version of the ICU library. When set, this preprocessor define causes the ICU functions used byRegexKitLite to have the value of RKL_APPEND_TO_ICU_FUNCTIONS appended to them. For example, if RKL_APPEND_TO_ICU_FUNCTIONS is set to_4_0 (i.e., -DRKL_APPEND_TO_ICU_FUNCTIONS=_4_0), it would cause uregex_find() to become uregex_find_4_0(). |
RKL_BLOCKS |
Automatic |
Enables blocks support. This feature is automatically enabled if NS_BLOCKS_AVAILABLE is set to 1, which is typically set if support for blocks is appropriate. At the time of this writing, this typically means that the Xcode setting for the minimum version of Mac OS X supported must be 10.6. This feature may be explicitly disabled under all circumstances by setting its value to 0, or alternatively it can be explicitly enabled under all circumstances by setting its value to 1. The behavior is undefined if RKL_BLOCKS is set to 1 and the compiler does not support the blocks language extension or if the run-time does not support blocks. |
RKL_CACHE_SIZE |
13 |
RegexKitLite uses a 4-way set associative cache and RKL_CACHE_SIZE controls the number of sets in the cache. The total number of compiled regular expressions that can be cached is RKL_CACHE_SIZE * 4, for a default value of 52. RKL_CACHE_SIZE should always be a prime number to maximize the use of the cache. |
RKL_DTRACE |
Disabled |
This preprocessor define controls whether or not RegexKitLite provider DTrace probe points are enabled. This feature may be explicitly disabled under all circumstances by setting its value to 0. |
RKL_FAST_MUTABLE_CHECK |
Disabled |
Enables the use of the undocumented, private Core Foundation __CFStringIsMutable() function to determine if the string to be searched is immutable. This can significantly increase the number of matches per second that can be performed on immutable strings since a number of mutation checks can be safely skipped. |
RKL_FIXED_LENGTH |
2048 |
Sets the size of the fixed length UTF-16 conversion cache buffer. Strings that need to be converted to UTF-16 that have a length less than this size will use the fixed length conversion cache. Using a fixed sized buffer for all small strings means less malloc() overhead, heap fragmentation, and reduces the chances of a memory leak occurring. |
RKL_METHOD_PREPEND |
None |
When set, this preprocessor define causes the RegexKitLite methods defined in RegexKitLite.h to have the value of RKL_METHOD_PREPENDprepended to them. For example, if RKL_METHOD_PREPEND is set to xyz_ (i.e., -DRKL_METHOD_PREPEND=xyz_), it would cause clearStringCache to become xyz_clearStringCache. |
RKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS |
Automatic |
This preprocessor define controls whether or not extra code is included that attempts to automatically register with the NSNotificationCenter for theUIApplicationDidReceiveMemoryWarningNotification notification. This feature is automatically enabled if it can be determined at compile time that the iPhone is being targeted. This feature may be explicitly disabled under all circumstances by setting its value to 0. |
RKL_STACK_LIMIT |
131072 |
The maximum amount of stack space that will be used before switching to heap based allocations. This can be useful for multithreading programs where the stack size of secondary threads is much smaller than the main thread. |
- Assertions and Logging - Using the Assertion Macros
Fast Mutable Checks
Setting RKL_FAST_MUTABLE_CHECK allows RegexKitLite to quickly check if a string to search is immutable or not. Every call to RegexKitLite requires checking a strings hash and length values to guard against a string mutating and using invalid cached data. If the same string is searched repeatedly and it is immutable, these checks aren't necessary since the string can never change while in use. While these checks are fairly quick, it can add approximately 15 to 20 percent of extra overhead, and not performing the checks is always faster.
Since checking a strings mutability requires calling an undocumented, private Core Foundation function, RegexKitLite takes extra precautions and does not use the function directly. Instead, an internal, local stub function is created and called to determine if a string is mutable. The first time this function is called, RegexKitLite uses dlsym() to look up the address of the __CFStringIsMutable() function. If the function is found, RegexKitLite will use it from that point on to determine if a string is immutable. However, if the function is not found, RegexKitLite has no way to determine if a string is mutable or not, so it assumes the worst case that all strings are potentially mutable. This means that the private Core Foundation __CFStringIsMutable() function can go away at any time and RegexKitLite will continue to work, although with slightly less performance.
This feature is disabled by default, but should be fairly safe to enable due to the extra precautions that are taken. If this feature is enabled and the __CFStringIsMutable() function is not found for some reason, RegexKitLitefalls back to its default behavior which is the same as if this feature was not enabled.
iPhone Low Memory Notifications
The RKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS preprocessor define controls whether or not extra code is compiled in that automatically registers for the iPhone UIKitUIApplicationDidReceiveMemoryWarningNotification notification. When enabled, an initialization function tagged with __attribute__((constructor)) is executed by the linker at load time which causes RegexKitLite to check if the low memory notification symbol is available. If the symbol is present then RegexKitLite registers to receive the notification. When the notification is received, RegexKitLite will automatically call clearStringCache to flush the caches and return the memory used to hold any cached compiled regular expressions.
This feature is normally automatically enabled if it can be determined at compile time that the iPhone is being targeted. This feature is safe to enable even if the target is Mac OS X for the desktop. It can also be explicitly disabled, even when targeting the iPhone, by setting RKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS to 0 (i.e., -DRKL_REGISTER_FOR_IPHONE_LOWMEM_NOTIFICATIONS=0).
- Memory Usage Performance Guidelines - Responding to Low-Memory Warnings in iPhone OS
Using RegexKitLite with a Custom ICU Build
The details of building and linking to a custom build of ICU will not be covered here. ICU is a very large and complex library that can be configured and packaged in countless ways. Building and linking your application to a custom build of ICU is non‑trivial. Apple provides the full source to the version of ICU that they supply with Mac OS X. At the time of this writing, the latest version available was for Mac OS X 10.6.2— ICU-400.38.tar.gz.
RegexKitLite provides the RKL_APPEND_TO_ICU_FUNCTIONS pre-processor define if you would like to use RegexKitLite with a custom ICU build that you supply. A custom version of ICU will typically have the ICU version appended to all of its functions, and RKL_APPEND_TO_ICU_FUNCTIONS allows you to append that version to the ICU functions that RegexKitLite calls. For example, passing -DRKL_APPEND_TO_ICU_FUNCTIONS=_4_0 to gcc would cause the ICU function uregex_find() used by RegexKitLite to be called as uregex_find_4_0().
Xcode 3 Integrated Documentation
This documentation is available in the Xcode DocSet format at the following URL:
feed://regexkit.sourceforge.net/RegexKitLiteDocSets.atom
For Xcode < 3.2, select . Then, in the lower left hand corner of the documentation window, there should be a gear icon with a drop down menu indicator which you should select and choose and enter the DocSet URL.
For Xcode ≥ 3.2, select . Then select the preference group, typically the right most group, and press and enter the DocSet URL.
Once you have added the URL, a new group should appear, inside which will be the RegexKitLite documentation with a Get button. Click on the Get button and follow the prompts.
Note:
Xcode will ask you to enter an administrators password to install the documentation, which is explained here.
While RegexKitLite takes steps to ensure that the information it has cached is valid for the strings it searches, there exists the possibility that out of date cached information may be used when searching mutable strings. For each compiled regular expression, RegexKitLite caches the following information about the last NSString that was searched:
- The strings length, hash value, and the pointer to the NSString object.
- The pointer to the UTF-16 buffer that contains the contents of the string, which may be an internal buffer if the string required conversion.
- The NSRange used for the inRange: parameter for the last search, and the NSRange result for capture 0 of that search.
An ICU compiled regular expression must be "set" to the text to be searched. Before a compiled regular expression is used, the pointer to the string object to search, its hash, length, and the pointer to the UTF-16 buffer is compared with the values that the compiled regular expression was last "set" to. If any of these values are different, the compiled regular expression is reset and "set" to the new string.
If a NSMutableString is mutated between two uses of the same compiled regular expression and its hash, length, or UTF-16 buffer changes between uses, RegexKitLite will automatically reset the compiled regular expression with the new values of the mutated string. The results returned will correctly reflect the mutations that have taken place between searches.
It is possible that the mutations to a string can go undetected, however. If the mutation keeps the length the same, then the only way a change can be detected is if the strings hash value changes. For most mutations the hashvalue will change, but it is possible for two different strings to share the same hash. This is known as a hash collision. Should this happen, the results returned by RegexKitLite may not be correct.
Therefore, if you are using RegexKitLite to search NSMutableString objects, and those strings may have mutated in such a way that RegexKitLite is unable to detect that the string has changed, you must manually clear the internal cache to ensure that the results accurately reflect the mutations. To clear the cached information for a specific string you send the instance a flushCachedRegexData message:
NSMutableString *aMutableString; //
[aMutableString flushCachedRegexData];
To clear all of the cached information in RegexKitLite, which includes all the cached compiled regular expressions along with any cached information and UTF-16 conversions for strings that have been searched, you use the following class method:
[NSString clearStringCache];
Warning:
When searching NSMutableString objects that have mutated between searches, failure to clear the cache may result in undefined behavior. Use flushCachedRegexData to selectively clear the cached information about a NSMutableString object.
- + clearStringCache
- - flushCachedRegexData
Block-based Enumeration Methods
The RegexKitLite Block-based enumeration methods are modeled after their NSString counterparts. There are a few differences, however.
- RegexKitLite does not support mutating a NSMutableString object while it is under going Block-based enumeration.
- There is no support for concurrent enumeration.
While RegexKitLite may not support mutating a NSMutableString during Block-based enumeration, it does provide the means to create a new string from the NSString object returned by the block used to enumerate the matches in a string, and in the case of NSMutableString, to replace the contents of that NSMutableString with the modified string at the end of the enumeration. This functionality is available via the following methods:
- - stringByReplacingOccurrencesOfRegex:usingBlock: (NSString)
- - replaceOccurrencesOfRegex:usingBlock: (NSMutableString)
Exception to the Cocoa Memory Management Rules for Block-based Enumeration
The standard Cocoa Memory Management Rules specify that objects returned by a method, or in this case the objects passed to a Block, remain valid throughout the scope of the calling method. Due to the potentially large volume of temporary strings that are created during a Block-based enumeration, RegexKitLite makes an exception to this rule– the strings passed to a Block via capturedStrings[] are valid only until the closing brace of the Block:
[searchString enumerateStringsMatchedByRegex:regex usingBlock: ^(NSInteger captureCount, NSString * const capturedStrings, const NSRange capturedRanges[captureCount], volatile BOOL * const stop) { //
} /*
*/ ];
If you need to refer to a string past the closing brace of the Block, you need to send that string a retain message. Of course, it is not always necessary to explicitly send a capturedStrings[] string a retain message when you need it to exist past the closing brace of a Block– adding a capturedStrings[] string to a NSMutableDictionary will send the string a retain as a side effect of adding it to the dictionary.
Memory management during RegexKitLite Block-based enumeration is conceptually similar to the following pseudo-code:
NSInteger captureCount = [regex captureCount];while(moreMatches) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; BOOL stop = NO; NSRange capturedRanges[captureCount]; NSString *capturedStrings[captureCount]; for(capture = 0L; capture < captureCount; capture++) { capturedRanges[capture] = [searchString rangeOfRegex:regex capture:capture]; capturedStrings[capture] = [searchString stringByMatching:regex capture:capture]; } //
enumerationBlock(captureCount, capturedStrings, capturedRanges, &stop); //
[pool release]; //
if(stop != NO) { break; }}
While conceptually and behaviorally similar, it is important to note that RegexKitLite does not actually use or create autorelease pools when performing Block-based enumeration. Instead, a CFMutableArray object is used to accumulate the temporary string objects during an iteration, and at the start of an iteration, any previously accumulated temporary string objects are removed from the array.
- Blocks Programming Topics
- Regular Expression Enumeration Options
Usage Notes
Convenience Methods
For convenience methods where an argument is not present, the default value used is given below.
Argument |
Default Value |
capture: |
0 |
options: |
RKLNoOptions |
range: |
The entire range of the receiver. |
enumerationOptions: |
RKLRegexEnumerationNoOptions |
Exceptions Raised
Methods will raise an exception if their arguments are invalid, such as passing NULL for a required parameter. An invalid regular expression or RKLRegexOptions parameter will not raise an exception. Instead, a NSError object with information about the error will be created and returned via the address given with the optional error argument. If information about the problem is not required, error may be NULL. For convenience methods that do not have an error argument, the primary method is invoked with NULL passed as the argument for error.
Important:
Methods raise NSInvalidArgumentException if
regex is
NULL, or if
capture < 0 or is not valid for
regex.
Important:
Methods raise NSRangeException if
range exceeds the bounds of the receiver.
Important:
Methods raise NSRangeException if the receivers length exceeds the maximum value that can be represented by a signed
32-bit integer, even on
64-bit architectures.
Important:
Search and replace methods raise RKLICURegexException if
replacement contains
$
n capture references where
n is greater than the number of capture groups in the regular expression
regex.
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
Tasks
- RegexKitLite:::compiledRegexCache
- RegexKitLite:::utf16ConversionCache
- + clearStringCache
- - flushCachedRegexData
- + captureCountForRegex: Deprecated in RegexKitLite 3.0
- + captureCountForRegex:options:error: Deprecated in RegexKitLite 3.0
- - captureCount
- - captureCountWithOptions:error:
- - arrayOfCaptureComponentsMatchedByRegex:
- - arrayOfCaptureComponentsMatchedByRegex:range:
- - arrayOfCaptureComponentsMatchedByRegex:options:range:error:
- - captureComponentsMatchedByRegex:
- - captureComponentsMatchedByRegex:range:
- - captureComponentsMatchedByRegex:options:range:error:
- - componentsMatchedByRegex:
- - componentsMatchedByRegex:capture:
- - componentsMatchedByRegex:range:
- - componentsMatchedByRegex:options:range:capture:error:
- - enumerateStringsMatchedByRegex:usingBlock:
- - enumerateStringsMatchedByRegex:options:inRange:error:enumerationOptions:usingBlock:
- - componentsSeparatedByRegex:
- - componentsSeparatedByRegex:range:
- - componentsSeparatedByRegex:options:range:error:
- - enumerateStringsSeparatedByRegex:usingBlock:
- - enumerateStringsSeparatedByRegex:options:inRange:error:enumerationOptions:usingBlock:
- - isMatchedByRegex:
- - isMatchedByRegex:inRange:
- - isMatchedByRegex:options:inRange:error:
- - isRegexValid
- - isRegexValidWithOptions:error:
- - rangeOfRegex:
- - rangeOfRegex:capture:
- - rangeOfRegex:inRange:
- - rangeOfRegex:options:inRange:capture:error:
- - replaceOccurrencesOfRegex:withString:
- - replaceOccurrencesOfRegex:withString:range:
- - replaceOccurrencesOfRegex:withString:options:range:error:
- - replaceOccurrencesOfRegex:usingBlock:
- - replaceOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
- - stringByMatching:
- - stringByMatching:capture:
- - stringByMatching:inRange:
- - stringByMatching:options:inRange:capture:error:
- - stringByReplacingOccurrencesOfRegex:withString:
- - stringByReplacingOccurrencesOfRegex:withString:range:
- - stringByReplacingOccurrencesOfRegex:withString:options:range:error:
- - stringByReplacingOccurrencesOfRegex:usingBlock:
- - stringByReplacingOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
- - dictionaryByMatchingRegex:withKeysAndCaptures:
- - dictionaryByMatchingRegex:range:withKeysAndCaptures:
- - dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
- - arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
- - arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
- - arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
DTrace Probe Points
RegexKitLite:::compiledRegexCache
This probe point fires each time the compiled regular expression cache is accessed.
RegexKitLite:::compiledRegexCache(
unsigned long eventID,
const char *regexUTF8,
int options,
int captures,
int hitMiss,
int icuStatusCode,
const char *icuErrorMessage,
double *hitRate);
-
arg0,
eventID
The unique ID for this mutex lock acquisition.
-
arg1,
regexUTF8
Up to 64 characters of the regular expression encoded in
UTF-8. Must be copied with
copyinstr(arg1).
-
arg2,
options
The RKLRegexOptions options used.
-
arg3,
captures
The number of captures present in the regular expression, or
-1 if there was an error.
-
arg4,
hitMiss
A boolean value that indicates whether or not this event was a cache hit or not, or
-1 if there was an error.
-
arg5,
icuStatusCode
If an error occurs, this contains the error number returned by ICU.
-
arg6,
icuErrorMessage
If an error occurs, this contains a
UTF-8 encoded string of the ICU error. Must be copied with
copyinstr(arg6).
-
arg7,
hitRate
A pointer to a floating point value, between
0.0 and
100.0, that represents the effectiveness of cache. Higher is better. Must be copied with
copyin(arg7, sizeof(double)).
An example of how to copy the double value pointed to by hitRate:
RegexKitLite*:::compiledRegexCache { this->hitRate = (double *)copyin(arg7, sizeof(double)); printf("compiledRegexCache hitRate: %6.2f%%\n", this->hitRate);}
- RegexKitLite:::utf16ConversionCache
- UsingRegexKitLite - DTrace
- Solaris Dynamic Tracing Guide (as .PDF)
RegexKitLite:::utf16ConversionCache
This probe point fires each time the
UTF-16 conversion cache is accessed.
RegexKitLite:::utf16ConversionCache(
unsigned long eventID,
unsigned int lookupResultFlags,
double *hitRate,
const void *string,
unsigned long NSRange_location,
unsigned long NSRange_length,
long length);
-
arg0,
eventID
The unique ID for this mutex lock acquisition.
-
arg1,
lookupResultFlags
A set of status flags about the result of the conversion cache lookup.
-
arg2,
hitRate
A pointer to a floating point value, between
0.0 and
100.0, that represents the effectiveness of cache. Higher is better. Must be copied with
copyin(arg2, sizeof(double)).
-
arg3,
string
A pointer to the
NSString that this
UTF-16 conversion cache check is being performed on.
-
arg4,
NSRange_location
The location value of the
range argument from the invoking
RegexKitLite method.
-
arg5,
NSRange_length
The length value of the
range argument from the invoking
RegexKitLite method.
-
arg6,
length
The length of the string.
Only strings that require a UTF-16 conversion count towards the value calculated for hitRate.
An example of how to copy the double value pointed to by hitRate:
RegexKitLite*:::utf16ConversionCache { this->hitRate = (double *)copyin(arg2, sizeof(double)); printf("utf16ConversionCache hitRate: %6.2f%%\n", this->hitRate);}
- RegexKitLite:::compiledRegexCache
- RegexKitLite:::utf16ConversionCache arg1 Flags
- UsingRegexKitLite - DTrace
- Solaris Dynamic Tracing Guide (as .PDF)
Class Methods
captureCountForRegex:
Returns the number of captures that
regex contains.
Deprecated in RegexKitLite 3.0. Use captureCount instead.
+ (NSInteger)captureCountForRegex:(NSString *)regex;
Since the capture count of a regular expression does not depend on the string to be searched, this is a NSString class method. For example:
NSInteger regexCaptureCount = [NSString captureCountForRegex:@"(\\d+)\.(\\d+)"];//
Deprecated in RegexKitLite 3.0
Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.
- + captureCountForRegex:options:error: Deprecated in RegexKitLite 3.0
- - captureCount
- - captureCountWithOptions:error:
captureCountForRegex:options:error:
Returns the number of captures that
regex contains.
Deprecated in RegexKitLite 3.0. Use captureCountWithOptions:error: instead.
+ (NSInteger)captureCountForRegex:(NSString *)regex
options:(RKLRegexOptions)options
error:(NSError **)error;
The optional error parameter, if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.
Since the capture count of a regular expression does not depend on the string to be searched, this is a NSString class method. For example:
NSInteger regexCaptureCount = [NSString captureCountForRegex:@"(\\d+)\.(\\d+)" options:RKLNoOptions error:NULL];//
Deprecated in RegexKitLite 3.0
Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.
- + captureCountForRegex: Deprecated in RegexKitLite 3.0
- - captureCount
- - captureCountWithOptions:error:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
clearStringCache
Clears the cached information about strings and cached compiled regular expressions.
+ (void)clearStringCache;
This method clears all the the cached state maintained by RegexKitLite. This includes all the cached compiled regular expressions and any cached UTF-16 conversions.
An example of clearing the cache:
[NSString clearStringCache]; //
Warning:
When searching NSMutableString objects that have mutated between searches, failure to clear the cache may result in undefined behavior. Use flushCachedRegexData to selectively clear the cached information about a NSMutableString object.
Note:
You do not need to call clearStringCache or flushCachedRegexData when using the NSMutableString replaceOccurrencesOfRegex:withString: methods. The cache entry for that regular expression andNSMutableString is automatically cleared as necessary.
Available in RegexKitLite 1.1 and later.
- - flushCachedRegexData
- NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings
Instance Methods
arrayOfCaptureComponentsMatchedByRegex:
Returns an array containing all the matches from the receiver that were matched by the regular expression
regex. Each match result consists of an array of the substrings matched by all the capture groups present in the regular expression.
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex;
A NSArray object containing all the matches from the receiver by regex. Each match result consists of a NSArray which contains all the capture groups present in regex. Array index 0 represents all of the text matched by regexand subsequent array indexes contain the text matched by their respective capture group.
A match result array index will contain an empty string, or @"", if a capture group did not match any text.
Available in RegexKitLite 3.0 and later.
- - arrayOfCaptureComponentsMatchedByRegex:range:
- - arrayOfCaptureComponentsMatchedByRegex:options:range:error:
arrayOfCaptureComponentsMatchedByRegex:range:
Returns an array containing all the matches from the receiver that were matched by the regular expression
regex within
range. Each match result consists of an array of the substrings matched by all the capture groups present in the regular expression.
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex
range:(NSRange)range;
A NSArray object containing all the matches from the receiver by regex. Each match result consists of a NSArray which contains all the capture groups present in regex. Array index 0 represents all of the text matched by regexand subsequent array indexes contain the text matched by their respective capture group.
A match result array index will contain an empty string, or @"", if a capture group did not match any text.
Available in RegexKitLite 3.0 and later.
- - arrayOfCaptureComponentsMatchedByRegex:
- - arrayOfCaptureComponentsMatchedByRegex:options:range:error:
arrayOfCaptureComponentsMatchedByRegex:options:range:error:
Returns an array containing all the matches from the receiver that were matched by the regular expression
regex within
range using
options. Each match result consists of an array of the substrings matched by all the capture groups present in the regular expression.
- (NSArray *)arrayOfCaptureComponentsMatchedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
error:(NSError **)error;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
A NSArray object containing all the matches from the receiver by regex. Each match result consists of a NSArray which contains all the capture groups present in regex. Array index 0 represents all of the text matched by regexand subsequent array indexes contain the text matched by their respective capture group.
If the receiver is not matched by regex then the returned value is a NSArray that contains no items.
A match result array index will contain an empty string, or @"", if a capture group did not match any text.
The match results in the array appear in the order they did in the receiver. For example, this code fragment:
NSString *list =
@"$10.23, $1024.42, $3099";NSArray
*listItems = [list arrayOfCaptureComponentsMatchedByRegex:
@"\\$((\\d+)(?:\\.(\\d+)|\\.?))"];
produces a NSArray equivalent to:
[NSArray arrayWithObjects: [NSArray arrayWithObjects:
@"$10.23",
@"10.23",
@"10",
@"23",
NULL],
// [NSArray arrayWithObjects:
@"$1024.42",
@"1024.42",
@"1024",
@"42",
NULL],
// [NSArray arrayWithObjects:
@"$3099",
@"3099",
@"3099",
@"",
NULL],
// NULL];
Available in RegexKitLite 3.0 and later.
- - arrayOfCaptureComponentsMatchedByRegex:
- - arrayOfCaptureComponentsMatchedByRegex:range:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
Returns an array containing all the matches in the receiver that were matched by the regular expression
regex. Each match result consists of a dictionary containing that matches substrings constructed from the specified set of
keys and
captures.
- ( NSArray *)
arrayOfDictionariesByMatchingRegex:( NSString *)
regex
withKeysAndCaptures:( id)
firstKey,
...;
A
NSArray object containing all the matches from the receiver by
regex. Each match result consists of a
NSDictionary containing that matches substrings constructed from the specified set of
keys and
captures.
If the receiver is not matched by regex then the returned value is a NSArray that contains no items.
A dictionary will not contain a given key if its corresponding capture group did not match any text.
Available in RegexKitLite 4.0 and later.
- - arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
- - arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
Returns an array containing all the matches in the receiver that were matched by the regular expression
regex within
range. Each match result consists of a dictionary containing that matches substrings constructed from the specified set of
keys and
captures.
- ( NSArray *)
arrayOfDictionariesByMatchingRegex:( NSString *)
regex
range:( NSRange)
range
withKeysAndCaptures:( id)
firstKey,
...;
-
regex
A
NSString containing a regular expression.
-
range
The range of the receiver to search.
-
firstKey
The first key to add to the new dictionary.
-
...
First the
capture for
firstKey, then a
NULL-terminated list of alternating
keys and
captures.
Captures are specified using
int values.
Important:
Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.
Important:
Failure to NULL-terminate the keys and captures list will result in undefined behavior.
A
NSArray object containing all the matches from the receiver by
regex. Each match result consists of a
NSDictionary containing that matches substrings constructed from the specified set of
keys and
captures.
If the receiver is not matched by regex then the returned value is a NSArray that contains no items.
A dictionary will not contain a given key if its corresponding capture group did not match any text.
Available in RegexKitLite 4.0 and later.
- - arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
- - arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
arrayOfDictionariesByMatchingRegex:options:range:error:withKeysAndCaptures:
Returns an array containing all the matches in the receiver that were matched by the regular expression
regex within
range using
options. Each match result consists of a dictionary containing that matches substrings constructed from the specified set of
keys and
captures.
- ( NSArray *)
arrayOfDictionariesByMatchingRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
range:( NSRange)
range
error:( NSError **)
error
withKeysAndCaptures:( id)
firstKey,
...;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
-
firstKey
The first key to add to the new dictionary.
-
...
First the
capture for
firstKey, then a
NULL-terminated list of alternating
keys and
captures.
Captures are specified using
int values.
Important:
Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.
Important:
Failure to NULL-terminate the keys and captures list will result in undefined behavior.
A
NSArray object containing all the matches from the receiver by
regex. Each match result consists of a
NSDictionary containing that matches substrings constructed from the specified set of
keys and
captures.
If the receiver is not matched by regex then the returned value is a NSArray that contains no items.
A dictionary will not contain a given key if its corresponding capture group did not match any text. It is important to note that a regular expression can successfully match zero characters:
NSString *name = @"Name: Bob\n" @"Name: John Smith"; NSString *regex = @"(?m)^Name:\\s*(\\w*)\\s*(\\w*)$"; NSArray *nameArray = [name arrayOfDictionariesByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL,) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];//
//
//
//
Compared to this example, where the second capture group does not match any characters:
NSString *name = @"Name: Bob\n" @"Name: John Smith"; NSString *regex = @"(?m)^Name:\\s*(\\w*)(?:\\s*|\\s+(\\w+))$"; NSArray *nameArray = [name arrayOfDictionariesByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL,) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];//
//
//
//
Available in RegexKitLite 4.0 and later.
- - arrayOfDictionariesByMatchingRegex:withKeysAndCaptures:
- - arrayOfDictionariesByMatchingRegex:range:withKeysAndCaptures:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
captureComponentsMatchedByRegex:
Returns an array containing the substrings matched by each capture group present in
regex for the first match of
regex in the receiver.
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex;
A
NSArray containing the substrings matched by each capture group present in
regex for the first match of
regex in the receiver. Array index
0 represents all of the text matched by
regex and subsequent array indexes contain the text matched by their respective capture group.
A match result array index will contain an empty string, or @"", if a capture group did not match any text.
Available in RegexKitLite 3.0 and later.
- - captureComponentsMatchedByRegex:range:
- - captureComponentsMatchedByRegex:options:range:error:
captureComponentsMatchedByRegex:range:
Returns an array containing the substrings matched by each capture group present in
regex for the first match of
regex within
range of the receiver.
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex
range:(NSRange)range;
A
NSArray containing the substrings matched by each capture group present in
regex for the first match of
regex within
range of the receiver. Array index
0 represents all of the text matched by
regex and subsequent array indexes contain the text matched by their respective capture group.
A match result array index will contain an empty string, or @"", if a capture group did not match any text.
Available in RegexKitLite 3.0 and later.
- - captureComponentsMatchedByRegex:
- - captureComponentsMatchedByRegex:options:range:error:
captureComponentsMatchedByRegex:options:range:error:
Returns an array containing the substrings matched by each capture group present in
regex for the first match of
regex within
range of the receiver using
options.
- (NSArray *)captureComponentsMatchedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
error:(NSError **)error;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
A
NSArray containing the substrings matched by each capture group present in
regex for the first match of
regex within
range of the receiver using
options. Array index
0 represents all of the text matched by
regex and subsequent array indexes contain the text matched by their respective capture group.
If the receiver is not matched by regex then the returned value is a NSArray that contains no items.
A match result array index will contain an empty string, or @"", if a capture group did not match any text.
The returned value is for the first match of regex in the receiver. For example, this code fragment:
NSString *list =
@"$10.23, $1024.42, $3099";NSArray
*listItems = [list captureComponentsMatchedByRegex:
@"\\$((\\d+)(?:\\.(\\d+)|\\.?))"];
produces a NSArray equivalent to:
[NSArray arrayWithObjects:
@"$10.23",
@"10.23",
@"10",
@"23",
NULL];
Available in RegexKitLite 3.0 and later.
- - captureComponentsMatchedByRegex:
- - captureComponentsMatchedByRegex:range:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
captureCount
Returns the number of captures that
regex contains.
- (NSInteger)captureCount;
Returns the capture count of the receiver, which should be a valid regular expression. For example:
NSInteger regexCaptureCount = [@"(\\d+)\.(\\d+)" captureCount];//
Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.
Available in RegexKitLite 3.0 and later.
- - captureCountWithOptions:error:
captureCountWithOptions:error:
Returns the number of captures that
regex contains.
- (NSInteger)captureCountWithOptions:(RKLRegexOptions)options
error:(NSError **)error;
The optional error parameter, if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.
Returns the capture count of the receiver, which should be a valid regular expression. For example:
NSInteger regexCaptureCount = [@"(\\d+)\.(\\d+)" captureCountWithOptions:RKLNoOptions error:NULL];//
Returns -1 if an error occurs. Otherwise the number of captures in regex is returned, or 0 if regex does not contain any captures.
Available in RegexKitLite 3.0 and later.
- - captureCount
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
componentsMatchedByRegex:
Returns an array containing all the substrings from the receiver that were matched by the regular expression
regex.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex;
A
NSArray object containing all the substrings from the receiver that were matched by
regex.
Available in RegexKitLite 3.0 and later.
- - componentsMatchedByRegex:capture:
- - componentsMatchedByRegex:range:
- - componentsMatchedByRegex:options:range:capture:error:
componentsMatchedByRegex:capture:
Returns an array containing all the substrings from the receiver that were matched by capture number
capture from the regular expression
regex.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex
capture:(NSInteger)capture;
A
NSArray object containing all the substrings for capture group
capture from the receiver that were matched by
regex.
An array index will contain an empty string, or @"", if the capture group did not match any text.
Available in RegexKitLite 3.0 and later.
- - componentsMatchedByRegex:
- - componentsMatchedByRegex:range:
- - componentsMatchedByRegex:options:range:capture:error:
componentsMatchedByRegex:range:
Returns an array containing all the substrings from the receiver that were matched by the regular expression
regex within
range.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex
range:(NSRange)range;
A
NSArray object containing all the substrings from the receiver that were matched by
regex within
range.
Available in RegexKitLite 3.0 and later.
- - componentsMatchedByRegex:
- - componentsMatchedByRegex:capture:
- - componentsMatchedByRegex:options:range:capture:error:
componentsMatchedByRegex:options:range:capture:error:
Returns an array containing all the substrings from the receiver that were matched by capture number
capture from the regular expression
regex within
range using
options.
- (NSArray *)componentsMatchedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
capture:(NSInteger)capture
error:(NSError **)error;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
capture
The string matched by
capture from
regex to return. Use
0 for the entire string that
regex matched.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
A
NSArray object containing all the substrings from the receiver that were matched by capture number
capture from
regex within
range using
options.
If the receiver is not matched by regex then the returned value is a NSArray that contains no items.
An array index will contain an empty string, or @"", if a capture group did not match any text.
The match results in the array appear in the order they did in the receiver.
Example:
NSString *list =
@"$10.23, $1024.42, $3099";NSArray
*listItems = [list componentsMatchedByRegex:
@"\\$((\\d+)(?:\\.(\\d+)|\\.?))"];//
Example of extracting a specific capture group:
NSString *list =
@"$10.23, $1024.42, $3099";NSRange listRange =
NSMakeRange(0UL,
[list length]);NSArray
*listItems = [list componentsMatchedByRegex:
@"\\$((\\d+)(?:\\.(\\d+)|\\.?))"
options:RKLNoOptions
range:listRange
capture:3L
error:NULL];//
Available in RegexKitLite 3.0 and later.
- - componentsMatchedByRegex:
- - componentsMatchedByRegex:capture:
- - componentsMatchedByRegex:range:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
componentsSeparatedByRegex:
Returns an array containing substrings from the receiver that have been divided by the regular expression
regex.
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex;
A
NSArray object containing the substrings from the receiver that have been divided by
regex.
Available in RegexKitLite 2.0 and later.
- - componentsSeparatedByRegex:range:
- - componentsSeparatedByRegex:options:range:error:
componentsSeparatedByRegex:range:
Returns an array containing substrings within
range of the receiver that have been divided by the regular expression
regex.
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex
range:(NSRange)range;
A
NSArray object containing the substrings from the receiver that have been divided by
regex.
Available in RegexKitLite 2.0 and later.
- - componentsSeparatedByRegex:
- - componentsSeparatedByRegex:options:range:error:
componentsSeparatedByRegex:options:range:error:
Returns an array containing substrings within
range of the receiver that have been divided by the regular expression
regex using
options.
- (NSArray *)componentsSeparatedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
range:(NSRange)range
error:(NSError **)error;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
A
NSArray object containing the substrings from the receiver that have been divided by
regex.
The substrings in the array appear in the order they did in the receiver. For example, this code fragment:
NSString *list = @"Norman, Stanley, Fletcher";NSArray *listItems = [list componentsSeparatedByRegex:@",\\s*"];
produces an array { @"Norman", @"Stanley", @"Fletcher" }.
If the receiver begins or ends with regex, then the first or last substring is, respectively, empty. For example, the string ", Norman, Stanley, Fletcher" creates an array that has thesecontents: { @"", @"Norman", @"Stanley", @"Fletcher" }.
If the receiver has no separators that are matched by regex—for example, "Norman"—the array contains the string itself, in this case { @"Norman" }.
If regex contains capture groups—for example, @",(\\s*)"—the array will contain the text matched by each capture group as a separate element appended to the normal result. An additional element will be created for each capture group. If an individual capture group does not match any text the result in the array will be a zero length string—@"". As an example—the regular expression @",(\\s*)" would produce thearray { @"Norman", @" ", @"Stanley", @" ", @"Fletcher" }.
Available in RegexKitLite 2.0 and later.
- - componentsSeparatedByRegex:
- - componentsSeparatedByRegex:range:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
dictionaryByMatchingRegex:withKeysAndCaptures:
Creates and returns a dictionary containing the matches constructed from the specified set of
keys and
captures for the first match of
regex in the receiver.
- ( NSDictionary *)
dictionaryByMatchingRegex:( NSString *)
regex
withKeysAndCaptures:( id)
firstKey,
...;
A
NSDictionary containing the matched substrings constructed from the specified set of
keys and
captures.
The returned value is for the first match of regex in the receiver.
If the receiver is not matched by regex then the returned value is a NSDictionary that contains no items.
A dictionary will not contain a given key if its corresponding capture group did not match any text.
Available in RegexKitLite 4.0 and later.
- - dictionaryByMatchingRegex:range:withKeysAndCaptures:
- - dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
dictionaryByMatchingRegex:range:withKeysAndCaptures:
Creates and returns a dictionary containing the matches constructed from the specified set of
keys and
captures for the first match of
regex within
range of the receiver.
- ( NSDictionary *)
dictionaryByMatchingRegex:( NSString *)
regex
range:( NSRange)
range
withKeysAndCaptures:( id)
firstKey,
...;
-
regex
A
NSString containing a regular expression.
-
range
The range of the receiver to search.
-
firstKey
The first key to add to the new dictionary.
-
...
First the
capture for
firstKey, then a
NULL-terminated list of alternating
keys and
captures.
Captures are specified using
int values.
Important:
Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.
Important:
Failure to NULL-terminate the keys and captures list will result in undefined behavior.
A
NSDictionary containing the matched substrings constructed from the specified set of
keys and
captures.
The returned value is for the first match of regex in the receiver.
If the receiver is not matched by regex then the returned value is a NSDictionary that contains no items.
A dictionary will not contain a given key if its corresponding capture group did not match any text.
Available in RegexKitLite 4.0 and later.
- - dictionaryByMatchingRegex:withKeysAndCaptures:
- - dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
dictionaryByMatchingRegex:options:range:error:withKeysAndCaptures:
Creates and returns a dictionary containing the matches constructed from the specified set of
keys and
captures for the first match of
regex within
range of the receiver using
options.
- ( NSDictionary *)
dictionaryByMatchingRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
range:( NSRange)
range
error:( NSError **)
error
withKeysAndCaptures:( id)
firstKey,
...;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
-
firstKey
The first key to add to the new dictionary.
-
...
First the
capture for
firstKey, then a
NULL-terminated list of alternating
keys and
captures.
Captures are specified using
int values.
Important:
Use of non-int sized capture arguments will result in undefined behavior. Do not append capture arguments with a L suffix.
Important:
Failure to NULL-terminate the keys and captures list will result in undefined behavior.
A
NSDictionary containing the matched substrings constructed from the specified set of
keys and
captures.
The returned value is for the first match of regex in the receiver.
If the receiver is not matched by regex then the returned value is a NSDictionary that contains no items.
A dictionary will not contain a given key if its corresponding capture group did not match any text. It is important to note that a regular expression can successfully match zero characters:
NSString *name = @"Name: Joe";NSString *regex = @"Name:\\s*(\\w*)\\s*(\\w*)";NSDictionary *nameDictionary = [name dictionaryByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL,) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];//
//
//
//
Compared to this example, where the second capture group does not match any characters:
NSString *name = @"Name: Joe";NSString *regex = @"Name:\\s*(\\w*)\\s*(\\w
+)?";NSDictionary *nameDictionary = [name dictionaryByMatchingRegex:regex options:RKLNoOptions range:NSMakeRange(0UL, [name length]) error:NULL withKeysAndCaptures:@"first", 1, @"last", 2, NULL];//
//
//
Available in RegexKitLite 4.0 and later.
- - dictionaryByMatchingRegex:withKeysAndCaptures:
- - dictionaryByMatchingRegex:range:withKeysAndCaptures:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
enumerateStringsMatchedByRegex:usingBlock:
Enumerates the matches in the receiver by the regular expression
regex and executes
block for each match found.
- ( BOOL)
enumerateStringsMatchedByRegex:( NSString *)
regex
usingBlock:(void (^)( NSInteger
captureCount,
NSString * const
capturedStrings[captureCount],
const NSRange
capturedRanges[captureCount],
volatile BOOL * const
stop))
block;
Returns
YES if there was no error, otherwise returns
NO.
Available in RegexKitLite 4.0 and later.
- - enumerateStringsMatchedByRegex:options:inRange:error:enumerationOptions:usingBlock:
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
enumerateStringsMatchedByRegex:options:inRange:error:enumerationOptions:usingBlock:
Enumerates the matches in the receiver by the regular expression
regex within
range using
options and executes
block using
enumerationOptions for each match found.
- ( BOOL)
enumerateStringsMatchedByRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:(void (^)( NSInteger
captureCount,
NSString * const
capturedStrings[captureCount],
const NSRange
capturedRanges[captureCount],
volatile BOOL * const
stop))
block;
-
regex
A NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
-
enumerationOptions
A mask of options specified by combining RKLRegexEnumerationOptions flags with the C bitwise OR operator. Either
0 or RKLRegexEnumerationNoOptions may be used if no options are required.
-
block
The block that is executed for each match of
regex in the receiver. The block takes four arguments:
-
captureCount
The number of strings that
regex captured.
captureCount is always at least
1.
-
capturedStrings
An array containing the substrings matched by each capture group present in
regex. The size of the array is
captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to
@"". This argument may be
NULL if
enumerationOptions had RKLRegexEnumerationCapturedStringsNotRequired set.
-
capturedRanges
An array containing the ranges matched by each capture group present in
regex. The size of the array is
captureCount. If a capture group did not match anything, it will contain a NSRange equal to
{NSNotFound, 0}.
-
stop
A reference to a BOOL value that the block can use to stop the enumeration by setting
*stop = YES;, otherwise it should not touch
*
stop.
Returns
YES if there was no error, otherwise returns
NO and indirectly returns a
NSError object if
error is not
NULL.
Available in RegexKitLite 4.0 and later.
- - enumerateStringsMatchedByRegex:usingBlock:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
- Regular Expression Enumeration Options
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
enumerateStringsSeparatedByRegex:usingBlock:
Enumerates the strings of the receiver that have been divided by the regular expression
regex and executes
block for each divided string.
- ( BOOL)
enumerateStringsSeparatedByRegex:( NSString *)
regex
usingBlock:(void (^)( NSInteger
captureCount,
NSString * const
capturedStrings[captureCount],
const NSRange
capturedRanges[captureCount],
volatile BOOL * const
stop))
block;
Returns
YES if there was no error, otherwise returns
NO.
Available in RegexKitLite 4.0 and later.
- - enumerateStringsSeparatedByRegex:options:inRange:error:enumerationOptions:usingBlock:
- - componentsSeparatedByRegex:
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
enumerateStringsSeparatedByRegex:options:inRange:error:enumerationOptions:usingBlock:
Enumerates the strings of the receiver that have been divided by the regular expression
regex within
range using
options and executes
block using
enumerationOptions for each divided string.
- ( BOOL)
enumerateStringsSeparatedByRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:(void (^)( NSInteger
captureCount,
NSString * const
capturedStrings[captureCount],
const NSRange
capturedRanges[captureCount],
volatile BOOL * const
stop))
block;
-
regex
A NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
-
enumerationOptions
A mask of options specified by combining RKLRegexEnumerationOptions flags with the C bitwise OR operator. Either
0 or RKLRegexEnumerationNoOptions may be used if no options are required.
-
block
The block that is executed for each match of
regex in the receiver. The block takes four arguments:
-
captureCount
The number of strings that
regex captured.
captureCount is always at least
1.
-
capturedStrings
An array containing the substrings matched by each capture group present in
regex. The size of the array is
captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to
@"". This argument may be
NULL if
enumerationOptions had RKLRegexEnumerationCapturedStringsNotRequired set.
-
capturedRanges
An array containing the ranges matched by each capture group present in
regex. The size of the array is
captureCount. If a capture group did not match anything, it will contain a NSRange equal to
{NSNotFound, 0}.
-
stop
A reference to a BOOL value that the block can use to stop the enumeration by setting
*stop = YES;, otherwise it should not touch
*
stop.
Returns
YES if there was no error, otherwise returns
NO and indirectly returns a
NSError object if
error is not
NULL.
Available in RegexKitLite 4.0 and later.
- - enumerateStringsSeparatedByRegex:usingBlock:
- - componentsSeparatedByRegex:options:range:error:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
- Regular Expression Enumeration Options
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
flushCachedRegexData
Clears any cached information about the receiver.
- (void)flushCachedRegexData;
This method should be used when performing searches on NSMutableString objects and there is the possibility that the string has mutated in between calls to RegexKitLite.
This method clears the cached information for the receiver only. This is more selective than clearStringCache, which clears all the cached information from RegexKitLite, including all the cached compiled regular expressions.
RegexKitLite automatically detects the vast majority of string mutations and clears any cached information for the mutated string. To detect mutations, RegexKitLite records a strings length and hash value at the point in time when it caches data for a string. Cached data for a string is invalidated if either of these values change between calls to RegexKitLite. The problem case is when a string is mutated but the strings length remains the same andthe hash value for the mutated string is identical to the hash value of the string before it was mutated. This is known as a hash collision. Since RegexKitLite is unable to detect that a string has mutated when this happens, the programmer needs to explicitly inform RegexKitLite that any cached data about the receiver needs to be cleared by sending flushCachedRegexData to the mutated string.
While it is possible to have "perfect mutation detection", and therefore guarantee that only valid cached data is used, it has a significant performance penalty. The first problem is that when caching information about a string, an immutable copy of that string needs to be made. The second problem is that determining that two strings are not identical is usually very fast and cheap— if their lengths are not the same, no further checks are required. The most expensive case is when two strings are identical because it requires a character by character comparison of the entire string to guarantee that they are equal. The most expensive case also happens to be the most common case, by far. To make matters worst, Cocoa provides no public way to determine if an instance is a mutable NSMutableString or an immutable NSString object. Therefore RegexKitLite must assume the worst case that all strings are mutable and have potentially mutated between calls to RegexKitLite.
RegexKitLite is optimized for the common case which is when regular expression operations are performed on strings that are not mutating. The majority of mutations to a string can be quickly and cheaply detected byRegexKitLite automatically. Since the programmer has the context of the string that is to be matched, and whether or not the string is being mutated, RegexKitLite relies on the programmer to inform it whether or not the possibility exists that the string could have mutated in a way that is undetectable.
An example of clearing a strings cached information:
NSMutableString *mutableSearchString; //
NSString *foundString = [mutableSearchString stringByMatching:@"\\d+"]; //
[mutableSearchString replaceCharactersInRange:NSMakeRange(5UL, 10UL) withString:@"[replaced]"]; //
[mutableSearchString flushCachedRegexData]; //
Warning:
Failure to clear the cached information for a NSMutableString object that has mutated between searches may result in undefined behavior.
Note:
You do not need to call clearStringCache or flushCachedRegexData when using the NSMutableString replaceOccurrencesOfRegex:withString: methods. The cached information for thatNSMutableString is automatically cleared as necessary.
Available in RegexKitLite 3.0 and later.
- + clearStringCache
- NSStringRegexKitLite Additions Reference - Cached Information and Mutable Strings
isMatchedByRegex:
Returns a Boolean value that indicates whether the receiver is matched by
regex.
- (BOOL)isMatchedByRegex:(NSString *)regex;
Available in RegexKitLite 1.0 and later.
- - isMatchedByRegex:inRange:
- - isMatchedByRegex:options:inRange:error:
isMatchedByRegex:inRange:
Returns a Boolean value that indicates whether the receiver is matched by
regex within
range.
- (BOOL)isMatchedByRegex:(NSString *)regex inRange:(NSRange)range;
Available in RegexKitLite 1.0 and later.
- - isMatchedByRegex:
- - isMatchedByRegex:options:inRange:error:
isMatchedByRegex:options:inRange:error:
Returns a Boolean value that indicates whether the receiver is matched by
regex within
range.
- (BOOL)isMatchedByRegex:(NSString *)regex
options:(RKLRegexOptions)options
inRange:(NSRange)range
error:(NSError **)error;
The optional error parameter, if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.
Available in RegexKitLite 1.0 and later.
- - isMatchedByRegex:
- - isMatchedByRegex:inRange:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
isRegexValid
Returns a Boolean value that indicates whether the regular expression contained in the receiver is valid.
- (BOOL)isRegexValid;
Available in RegexKitLite 3.0 and later.
- - isRegexValidWithOptions:error:
isRegexValidWithOptions:error:
Returns a Boolean value that indicates whether the regular expression contained in the receiver is valid using
options.
- (BOOL)isRegexValidWithOptions:(RKLRegexOptions)options
error:(NSError **)error;
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
This method can be used to determine if a regular expression is valid. For example:
NSError *error = NULL;NSString *regexString = @"[a-z"; //
if([regexString isRegexValidWithOptions:RKLNoOptions error:&error] == NO) { NSLog(@"The regular expression is invalid. Error: %@", error);}
Available in RegexKitLite 3.0 and later.
- - isRegexValid
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
rangeOfRegex:
Returns the range for the first match of
regex in the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex;
A NSRange structure giving the location and length of the first match of regex in the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex or an error occurs.
Available in RegexKitLite 1.0 and later.
- - rangeOfRegex:capture:
- - rangeOfRegex:inRange:
- - rangeOfRegex:options:inRange:capture:error:
rangeOfRegex:capture:
Returns the range of capture number
capture for the first match of
regex in the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex
capture:(NSInteger)capture;
A NSRange structure giving the location and length of capture number capture for the first match of regex in the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex or an error occurs.
Available in RegexKitLite 1.0 and later.
- - rangeOfRegex:
- - rangeOfRegex:inRange:
- - rangeOfRegex:options:inRange:capture:error:
rangeOfRegex:inRange:
Returns the range for the first match of
regex within
range of the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex
inRange:(NSRange)range;
A NSRange structure giving the location and length of the first match of regex within range of the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex within range or an error occurs.
Available in RegexKitLite 1.0 and later.
- - rangeOfRegex:
- - rangeOfRegex:capture:
- - rangeOfRegex:options:inRange:capture:error:
rangeOfRegex:options:inRange:capture:error:
Returns the range of capture number
capture for the first match of
regex within
range of the receiver.
- (NSRange)rangeOfRegex:(NSString *)regex
options:(RKLRegexOptions)options
inRange:(NSRange)range
capture:(NSInteger)capture
error:(NSError **)error;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
capture
The matching range of the capture number from
regex to return. Use
0 for the entire range that
regex matched.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
A NSRange structure giving the location and length of capture number capture for the first match of regex within range of the receiver. Returns {NSNotFound, 0} if the receiver is not matched by regex within range or an error occurs.
Available in RegexKitLite 1.0 and later.
- - rangeOfRegex:
- - rangeOfRegex:capture:
- - rangeOfRegex:inRange:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
replaceOccurrencesOfRegex:usingBlock:
Enumerates the matches in the receiver by the regular expression
regex and executes
block for each match found. Replaces the characters that were matched with the contents of the string returned by
block, returning the number of replacements made.
- ( NSInteger)
replaceOccurrencesOfRegex:( NSString *)
regex
usingBlock:( NSString *(^)( NSInteger
captureCount,
NSString * const
capturedStrings[captureCount],
const NSRange
capturedRanges[captureCount],
volatile BOOL * const
stop))
block;
This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.
Returns
-1 if there was an error, otherwise returns the number of replacements performed.
Available in RegexKitLite 4.0 and later.
- - replaceOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
replaceOccurrencesOfRegex:options:inRange:error:enumerationOptions:usingBlock:
Enumerates the matches in the receiver by the regular expression
regex within
range using
options and executes
block using
enumerationOptions for each match found. Replaces the characters that were matched with the contents of the string returned by
block, returning the number of replacements made.
- ( NSInteger)
replaceOccurrencesOfRegex:( NSString *)
regex
options:( RKLRegexOptions)
options
inRange:( NSRange)
range
error:( NSError **)
error
enumerationOptions:( RKLRegexEnumerationOptions)
enumerationOptions
usingBlock:( NSString *(^)( NSInteger
captureCount,
NSString * const
capturedStrings[captureCount],
const NSRange
capturedRanges[captureCount],
volatile BOOL * const
stop))
block;
-
regex
A NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
error
An optional parameter that if set and an error occurs, will contain a NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
-
enumerationOptions
A mask of options specified by combining RKLRegexEnumerationOptions flags with the C bitwise OR operator. Either
0 or RKLRegexEnumerationNoOptions may be used if no options are required.
-
block
The block that is executed for each match of
regex in the receiver. The block takes four arguments:
-
captureCount
The number of strings that
regex captured.
captureCount is always at least
1.
-
capturedStrings
An array containing the substrings matched by each capture group present in
regex. The size of the array is
captureCount. If a capture group did not match anything, it will contain a pointer to a string that is equal to
@"". This argument may be
NULL if
enumerationOptions had RKLRegexEnumerationCapturedStringsNotRequired set.
-
capturedRanges
An array containing the ranges matched by each capture group present in
regex. The size of the array is
captureCount. If a capture group did not match anything, it will contain a NSRange equal to
{NSNotFound, 0}.
-
stop
A reference to a BOOL value that the block can use to stop the enumeration by setting
*stop = YES;, otherwise it should not touch
*
stop.
This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.
Returns
-1 if there was an error and indirectly returns a
NSError object if
error is not
NULL, otherwise returns the number of replacements performed.
Available in RegexKitLite 4.0 and later.
- - replaceOccurrencesOfRegex:usingBlock:
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
- Regular Expression Enumeration Options
- RegexKitLiteNSString Additions Reference - Block-based Enumeration Methods
- Blocks Programming Topics
replaceOccurrencesOfRegex:withString:
Replaces all occurrences of the regular expression
regex with the contents of
replacement string after performing capture group substitutions, returning the number of replacements made.
- (NSInteger)replaceOccurrencesOfRegex:(NSString *)regex
withString:(NSString *)replacement;
Important:
Raises RKLICURegexException if
replacement contains
$
n capture references where
n is greater than the number of capture groups in the regular expression
regex.
This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.
Returns
-1 if there was an error, otherwise returns the number of replacements performed.
Available in RegexKitLite 2.0 and later.
- - replaceOccurrencesOfRegex:withString:range:
- - replaceOccurrencesOfRegex:withString:options:range:error:
- ICU Replacement Text Syntax
replaceOccurrencesOfRegex:withString:range:
Replaces all occurrences of the regular expression
regex within
range with the contents of
replacement string after performing capture group substitutions, returning the number of replacements made.
- (NSInteger)replaceOccurrencesOfRegex:(NSString *)regex
withString:(NSString *)replacement
range:(NSRange)range;
Important:
Raises RKLICURegexException if
replacement contains
$
n capture references where
n is greater than the number of capture groups in the regular expression
regex.
This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.
Returns
-1 if there was an error, otherwise returns the number of replacements performed.
Available in RegexKitLite 2.0 and later.
- - replaceOccurrencesOfRegex:withString:
- - replaceOccurrencesOfRegex:withString:options:range:error:
- ICU Replacement Text Syntax
replaceOccurrencesOfRegex:withString:options:range:error:
Replaces all occurrences of the regular expression
regex using
options within
range with the contents of
replacement string after performing capture group substitutions, returning the number of replacements made.
- (NSInteger)replaceOccurrencesOfRegex:(NSString *)regex
options:(RKLRegexOptions)options
withString:(NSString *)replacement
range:(NSRange)range
error:(NSError **)error;
-
regex
A
NSString containing a regular expression.
-
options
A mask of options specified by combining RKLRegexOptions flags with the C bitwise OR operator. Either
0 or RKLNoOptions may be used if no options are required.
-
range
The range of the receiver to search.
-
replacement
The string to use as the replacement text for matches by
regex. See ICU Replacement Text Syntax for more information.
Important:
Raises RKLICURegexException if
replacement contains
$
n capture references where
n is greater than the number of capture groups in the regular expression
regex.
-
error
An optional parameter that if set and an error occurs, will contain a
NSError object that describes the problem. This may be set to
NULL if information about any errors is not required.
This method modifies the receivers contents. An exception will be raised if it is sent to an immutable object.
Returns
-1 if there was an error and indirectly returns a
NSError object if
error is not
NULL, otherwise returns the number of replacements performed.
Available in RegexKitLite 2.0 and later.
- - replaceOccurrencesOfRegex:withString:
- - replaceOccurrencesOfRegex:withString:range:
- ICU Replacement Text Syntax
- RegexKitLite NSError Error Domains
- RegexKitLite NSError and NSException User Info Dictionary Keys
- Regular Expression Options
stringByMatching:
Returns a string created from the characters of the receiver that are in the range of the first match of
regex.
- (NSString *)stringByMatching:(NSString *)regex;
A NSString containing the substring of the receiver matched by regex. Returns NULL if the receiver is not matched by regex or an error occurs.
Available in RegexKitLite 1.0 and later.
- - stringByMatching:capture:
- - stringByMatching:inRange:
- - stringByMatching:options:inRange:capture:error:
stringByMatching:capture:
Returns a string created from the characters of the receiver that are in the range of the first match of
regex for
capture.
- (NSString *)stringByMatching:(NSString *)regex
capture:(NSInteger)capture;