本文通过分析源码,深究GLIBC
中strtok
和strtok_r
函数的实现原理和使用过程中的注意事项。
#include
char *strtok(char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr);
strtok
以包含在delim
中的字符为分割符,将str
分割成一个个子串;若str
为空值NULL
,则函数内部保存的静态指针(指向上一次分割位置后一个字节)在下一次调用中将作为起始位置。strtok_r
功能同strtok
,不过其将strtok
函数内部保存的指针显示化,通过saveptr
输入,以saveptr
作为分割的起始位置。str
: 待分割的源字符串delim
: 分割符字符集合saveptr
: 一个指向char *
的指针变量,保存分割时的上下文'/0’
,所以可以成功打印子串的内容#include
#include
int main(void) {
char str[12] = "hello,world\0";
char *token = strtok(str, ",");
while (token != NULL) {
printf("%s\n", token);
token = strtok(NULL, ",");
}
return 0;
}
char str[12] = "hello,world\0";
printf("str before strtok: %s\n", str);
char *token = strtok(str, ",");
printf("str after strtok: %s\n", str);
$ str before strtok: hello,world
$ str after strtok: hello
如上实验,str
的值,在对其做strtok
操作之后,发生了变化,分割符之后的内容不见了。事实上,strtok
函数是根据输入的分割符(即,
),找到其首次出现的位置(即world
之前的,
),将其修改为'/0’
。
因为strtok
函数会修改源字符串,所以第一个参数不可为字符串常量,不然程序会抛出异常。
NULL
char str[12] = "hello,world\0";
char *token = strtok(str, ",");
while (token != NULL) {
printf("%s\n", token);
token = strtok(NULL, ",");
}
$ hello
$ world
在第一次提取子串时,strtok
用一个指针指向了分割符的下一位,即’w’所在的位置,后续的提取给strtok
的第一个参数传递了空值NULL
,strtok
会从上一次调用隐式保存的位置,继续分割字符串。
char str[12] = "hello,world\0";
char *token = strtok(str, ",l");
printf("%s\n", token);
$ he
由上可见,strtok
函数在分割字符串时,不是完整匹配第二个参数传入的分割符,而是使用包含在分割符集合中的字符进行匹配。
char str[13] = ",hello,world\0";
char *token = strtok(str, ",");
printf("%s\n", token);
$ hello
如上所示,若首字符为分割符,strtok
采用了比常规处理更快的方式,直接跳过了首字符。
strtok
为不可重入函数,使用strtok_r
更灵活和安全strtok
函数在内部使用了静态变量,即用静态指针保存了下一次调用的起始位置,对调用者不可见;strtok_r
则将strtok
内部隐式保存的指针,以参数的形式由调用者进行传递、保存甚至是修改,使函数更具灵活性和安全性;此外,在windows
也有分割字符串安全函数strtok_s
。
strtok.c:
/* Copyright (C) 1991-2018 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
. */
#include
/* Parse S into tokens separated by characters in DELIM.
If S is NULL, the last string strtok() was called with is
used. For example:
char s[] = "-abc-=-def";
x = strtok(s, "-"); // x = "abc"
x = strtok(NULL, "-="); // x = "def"
x = strtok(NULL, "="); // x = NULL
// s = "abc\0=-def\0"
*/
char *
strtok (char *s, const char *delim)
{
static char *olds;
return __strtok_r (s, delim, &olds);
}
strtok_r.c:
/* Reentrant string tokenizer. Generic version.
Copyright (C) 1991-2018 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
. */
#ifdef HAVE_CONFIG_H
# include
#endif
#include
#ifndef _LIBC
/* Get specification. */
# include "strtok_r.h"
# define __strtok_r strtok_r
#endif
/* Parse S into tokens separated by characters in DELIM.
If S is NULL, the saved pointer in SAVE_PTR is used as
the next starting point. For example:
char s[] = "-abc-=-def";
char *sp;
x = strtok_r(s, "-", &sp); // x = "abc", sp = "=-def"
x = strtok_r(NULL, "-=", &sp); // x = "def", sp = NULL
x = strtok_r(NULL, "=", &sp); // x = NULL
// s = "abc\0-def\0"
*/
char *
__strtok_r (char *s, const char *delim, char **save_ptr)
{
char *end;
if (s == NULL)
s = *save_ptr;
if (*s == '\0')
{
*save_ptr = s;
return NULL;
}
/* Scan leading delimiters. */
s += strspn (s, delim);
if (*s == '\0')
{
*save_ptr = s;
return NULL;
}
/* Find the end of the token. */
end = s + strcspn (s, delim);
if (*end == '\0')
{
*save_ptr = end;
return s;
}
/* Terminate the token and make *SAVE_PTR point past it. */
*end = '\0';
*save_ptr = end + 1;
return s;
}
#ifdef weak_alias
libc_hidden_def (__strtok_r)
weak_alias (__strtok_r, strtok_r)
#endif
微信公众号同步更新,微信搜索"AnSwEr不是答案"或者扫描二维码,即可订阅。