IOS应用抓取HTML网页数据


举例抓取hao123上的搞笑图片及Gif动画的网址为例。

1.目标网址:http://www.hao123.com/gaoxiao?pn=1


2.获取HTML数据。方法如下:

NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:@"http://www.hao123.com/gaoxiao?pn=1"] encoding:NSUTF8StringEncoding error:nil];

3.分析网页内容,找到需要的资源链接前后的关键字符串。

目标网址资源前后关键字分别为:

前:

@"
后:
@"\" src="

4.从htmlString中截取需要的字符串。方法如下:

为NSString添加一个Catalog

@interface NSString (MYNSStringExtensionMethods)
- (NSArray *)componentsSeparatedFromString:(NSString *)fromString toString:(NSString *)toString;
@end

@implementation NSString (MYNSStringExtensionMethods)

- (NSArray *)componentsSeparatedFromString:(NSString *)fromString toString:(NSString *)toString
{
    if (!fromString || !toString || fromString.length == 0 || toString.length == 0) {
        return nil;
    }
    NSMutableArray *subStringsArray = [[NSMutableArray alloc] init];
    NSString *tempString = self;
    NSRange range = [tempString rangeOfString:fromString];
    while (range.location != NSNotFound) {
        tempString = [tempString substringFromIndex:(range.location + range.length)];
        range = [tempString rangeOfString:toString];
        if (range.location != NSNotFound) {
            [subStringsArray addObject:[tempString substringToIndex:range.location]];
            range = [tempString rangeOfString:fromString];
        }
        else
        {
            break;
        }
    }
    return subStringsArray;
}

@end

5.获取并输出资源地址

NSArray *urls = [htmlString componentsSeparatedFromString:@"NSLog(@"find urls:%@", urls);


输出结果:

find urls: (
http://img.hao123.com/data/3_a43d768470ea5785e5bbf3ca2c81e4a7_430,
http://img6.hao123.com/data/3_c87ac28d85b361b5efc9654cdb24c745_430,
http://img0.hao123.com/data/3_759c73a935eb8c3ebae5646eb71b3028_0,
http://img.hao123.com/data/3_415b7834328e6a4fc70f50854828df22_0,
http://img5.hao123.com/data/3_e84664284f1cbf59eb364d147fc1610f_430
)


本文描述的内容在下面的工程中使用:

工程名:shakefun

下载地址:shakefun

效果展示:



你可能感兴趣的:(IOS)