我从97年接触互联网的web开发,至今已经过去9年了,从最初的frontpage做html页面到学会ASP+access+IIS开始,就跟web开发干上了,后来又依次使用了ASP+SQLServer+IIS、JSP+Oracle+Jrun(Resin/Tomcat)、PHP+Syabse(MySQL)+Apache … 最后我定格到了 PHP+MySQL+Apache+Linux(BSD) 的架构上,也就是大家常说的LAMP架构,这说来有很多理由,网上也有很多人讨论各种架构和开发语言之间的优劣,我就不多说了,简单说一下我喜欢LAMP的几个主要原因:
1、全开放的免费平台;
2、简单易上手、各种资源丰富;
3、PHP、MySQL、Apache与Linux(BSD)系统底层以及彼此间无缝结合,非常高效;
4、均使用最高效的语言C/C++开发,性能可靠;
5、PHP语言和C的风格基本一致,还吸取了Java和C++的诸多架构优点;
6、这是最关键的一点,那就是PHP可以非常方便的使用C/C++开发扩展模块,给了PHP无限的扩张性!
基于以上原因,我非常喜欢基于PHP语言的架构,其中最关键的一点就是最后一点,以前在Yahoo和mop均推广使用这个平台,在C扩展php方面也有一些经验,在此和大家分享一下,希望可以抛砖引玉。
用C语言编写PHP的扩展模块的方法有几种,根据最后的表现形式有两种,一种是直接编译进php,一种是编译为php的so扩展模块来被php调用,另外根据编译的方式有两种,一种使用phpize工具(php编译后有的),一种使用ext_skel工具(php自带的),我们使用最多,也是最方便的方式就是使用ext_skel工具来编写php的so扩展模块,这里也主要介绍这种方式。
我们在php的源码目录里面可以看到有个ext目录(我这里说的php都是基于Linux平台的php来说的,不包括windows下的),在ext目录下有个工具 ext_skel ,这个工具可以让我们简单的开发出php的扩展模块,它提供了一个通用的php扩展模块开发步骤和模板。下面我们以开发一个在php里面进行utf8/gbk/gb2312三种编码转换的扩展模块为例子进行说明。在这个模块中,我们要最终提供以下几个函数接口:
(1) string toplee_big52gbk(string s)
将输入字符串从BIG5码转换成GBK
(2) string toplee_gbk2big5(string s)
将输入字符串从GBK转换成BIG5码
(3) string toplee_normalize_name(string s)
将输入字符串作以下处理:全角转半角,strim,大写转小写
(4) string toplee_fan2jian(int code, string s)
将输入的GBK繁体字符串转换成简体
(5) string toplee_decode_utf(string s)
将utf编码的字符串转换成UNICODE
(6) string toplee_decode_utf_gb(string s)
将utf编码的字符串转换成GB
(7) string toplee_decode_utf_big5(string s)
将utf编码的字符串转换成BIG5
(8) string toplee_encode_utf_gb(string s)
将输入的GBKf编码的字符串转换成utf编码
首先,我们进入ext目录下,运行下面命令:
#./ext_skel –extname=toplee
这时,php会自动在ext目录下为我们生成一个目录toplee,里面包含下面几个文件
.cvsignore
CREDITS
EXPERIMENTAL
config.m4
php_toplee.h
tests
toplee.c
toplee.php
其中最有用的就是config.m4和toplee.c文件
接下来我们修改config.m4文件
#vi ./config.m4
找到里面有类似这样几行
dnl PHP_ARG_WITH(toplee, for toplee support,
dnl Make sure that the comment is aligned:
dnl [ --with-toplee Include toplee support])
dnl Otherwise use enable:
dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [ --enable-toplee Enable toplee support])
上面的几行意思是说告诉php编译的使用使用那种方式加载我们的扩展模块toplee,我们使用–with-toplee的方式,于是我们修改为下面的样子
PHP_ARG_WITH(toplee, for toplee support,
Make sure that the comment is aligned:
[ --with-toplee Include toplee support])
dnl Otherwise use enable:
dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [ --enable-toplee Enable toplee support])
然后我们要做的关键事情就是编写toplee.c,这个是我们编写模块的主要文件,如果您什么都不修改,其实也完成了一个php扩展模块的编写,里面有类似下面的几行代码
PHP_FUNCTION
(
confirm_toplee_compiled
)
{
char
*
arg
=
NULL
;
int
arg_len
,
len
;
char
string
[
256
]
;
if
(
zend_parse_parameters
(
ZEND_NUM_ARGS
()
TSRMLS_CC
,
"
s
"
, &
arg
, &
arg_len
)
==
FAILURE
)
{
return
;
}
len
=
sprintf
(
string
,
"
Congratulations! You have successfully modified ext/%.78s/config.m4. Module %.78s is now compiled into PHP.
"
,
"
toplee
"
,
arg
)
;
RETURN_STRINGL
(
string
,
len
,
1
)
;
}
如果我们在后面完成php的编译时把新的模块编译进去,那么我们就可以在php脚本中调用函数toplee(),它会输出一段字符串“Congratulations! You have successfully modified ext/toplee/config.m4. Module toplee is now compiled into PHP.”
下面是我们对toplee.c的修改,让其支持我们预先规划的功能和接口,下面是toplee.c的源代码
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
#ifdef
HAVE_CONFIG_H
#include
"
config.h
"
#endif
#include
"
php.h
"
#include
"
php_ini.h
"
#include
"
ext/standard/info.h
"
#include
"
php_gbk.h
"
#include
"
toplee_util.h
"
static
int
le_gbk
;
function_entry
gbk_functions
[]
=
{
PHP_FE
(
toplee_decode_utf
,
NULL
)
PHP_FE
(
toplee_decode_utf_gb
,
NULL
)
PHP_FE
(
toplee_decode_utf_big5
,
NULL
)
PHP_FE
(
toplee_encode_utf_gb
,
NULL
)
PHP_FE
(
toplee_big52gbk
,
NULL
)
PHP_FE
(
toplee_gbk2big5
,
NULL
)
PHP_FE
(
toplee_fan2jian
,
NULL
)
PHP_FE
(
toplee_normalize_name
,
NULL
)
{
NULL
,
NULL
,
NULL
}
}
;
zend_module_entry
gbk_module_entry
=
{
#if
ZEND_MODULE_API_NO
>=
20010901
STANDARD_MODULE_HEADER
,
#endif
"
gbk
"
,
gbk_functions
,
PHP_MINIT
(
gbk
)
,
PHP_MSHUTDOWN
(
gbk
)
,
PHP_RINIT
(
gbk
)
,
PHP_RSHUTDOWN
(
gbk
)
,
PHP_MINFO
(
gbk
)
,
#if
ZEND_MODULE_API_NO
>=
20010901
"
0.1
"
,
#endif
STANDARD_MODULE_PROPERTIES
}
;
#ifdef
COMPILE_DL_GBK
ZEND_GET_MODULE
(
gbk
)
#endif
PHP_INI_BEGIN
()
PHP_INI_ENTRY
(
"
gbk2uni
"
,
""
,
PHP_INI_SYSTEM
,
NULL
)
PHP_INI_ENTRY
(
"
uni2gbk
"
,
""
,
PHP_INI_SYSTEM
,
NULL
)
PHP_INI_ENTRY
(
"
uni2big5
"
,
""
,
PHP_INI_SYSTEM
,
NULL
)
PHP_INI_ENTRY
(
"
big52uni
"
,
""
,
PHP_INI_SYSTEM
,
NULL
)
PHP_INI_ENTRY
(
"
big52gbk
"
,
""
,
PHP_INI_SYSTEM
,
NULL
)
PHP_INI_ENTRY
(
"
gbk2big5
"
,
""
,
PHP_INI_SYSTEM
,
NULL
)
// STD_PHP_INI_ENTRY("gbk.global_value", "42", PHP_INI_ALL, OnUpdateInt, global_value, zend_gbk_globals, gbk_globals)
// STD_PHP_INI_ENTRY("gbk.global_string", "foobar", PHP_INI_ALL, OnUpdateString, global_string, zend_gbk_globals, gbk_globals)
PHP_INI_END
()
char
gbk2uni_file
[
256
]
;
char
uni2gbk_file
[
256
]
;
char
big52uni_file
[
256
]
;
char
uni2big5_file
[
256
]
;
char
gbk2big5_file
[
256
]
;
char
big52gbk_file
[
256
]
;
//utf file init flag
static
int
initutf
=
0
;
PHP_MINIT_FUNCTION
(
gbk
)
{
REGISTER_INI_ENTRIES
()
;
memset
(
gbk2uni_file
,
0
,
sizeof
(
gbk2uni_file
))
;
memset
(
uni2gbk_file
,
0
,
sizeof
(
uni2gbk_file
))
;
memset
(
big52uni_file
,
0
,
sizeof
(
big52uni_file
))
;
memset
(
uni2big5_file
,
0
,
sizeof
(
uni2big5_file
))
;
memset
(
gbk2big5_file
,
0
,
sizeof
(
gbk2big5_file
))
;
memset
(
big52gbk_file
,
0
,
sizeof
(
big52gbk_file
))
;
strncpy
(
gbk2uni_file
,
INI_STR
(
"
gbk2uni
"
)
,
sizeof
(
gbk2uni_file
)
-
1
)
;
strncpy
(
uni2gbk_file
,
INI_STR
(
"
uni2gbk
"
)
,
sizeof
(
uni2gbk_file
)
-
1
)
;
strncpy
(
big52uni_file
,
INI_STR
(
"
big52uni
"
)
,
sizeof
(
big52uni_file
)
-
1
)
;
strncpy
(
uni2big5_file
,
INI_STR
(
"
uni2big5
"
)
,
sizeof
(
uni2big5_file
)
-
1
)
;
strncpy
(
gbk2big5_file
,
INI_STR
(
"
gbk2big5
"
)
,
sizeof
(
uni2big5_file
)
-
1
)
;
strncpy
(
big52gbk_file
,
INI_STR
(
"
big52gbk
"
)
,
sizeof
(
uni2big5_file
)
-
1
)
;
//InitMMResource();
InitResource
()
;
if
((
uni2gbk_file
[
0
]
== '/
0
'
)
||
(
uni2big5_file
[
0
]
== '/
0
'
)
||
(
gbk2big5_file
[
0
]
== '/
0
'
)
||
(
big52gbk_file
[
0
]
== '/
0
'
)
||
(
gbk2uni_file
[
0
]
== '/
0
'
)
||
(
big52uni_file
[
0
]
== '/
0
'
))
{
return
FAILURE
;
}
if
(
gbk2uni_file
[
0
]
!= '/
0
'
)
{
if
(
LoadOneCodeTable
(
CODE_GBK2UNI
,
gbk2uni_file
)
!=
NULL
)
{
toplee_cleanup_mmap
(
NULL
)
;
return
FAILURE
;
}
}
if
(
uni2gbk_file
[
0
]
!= '/
0
'
)
{
if
(
LoadOneCodeTable
(
CODE_UNI2GBK
,
uni2gbk_file
)
!=
NULL
)
{
toplee_cleanup_mmap
(
NULL
)
;
return
FAILURE
;
}
}
if
(
big52uni_file
[
0
]
!= '/
0
'
)
{
if
(
LoadOneCodeTable
(
CODE_BIG52UNI
,
big52uni_file
)
!=
NULL
)
{
toplee_cleanup_mmap
(
NULL
)
;
return
FAILURE
;
}
}
if
(
uni2big5_file
[
0
]
!= '/
0
'
)
{
if
(
LoadOneCodeTable
(
CODE_UNI2BIG5
,
uni2big5_file
)
!=
NULL
)
{
toplee_cleanup_mmap
(
NULL
)
;
return
FAILURE
;
}
}
if
(
gbk2big5_file
[
0
]
!= '/
0
'
)
{
if
(
LoadOneCodeTable
(
CODE_GBK2BIG5
,
gbk2big5_file
)
!=
NULL
)
{
toplee_cleanup_mmap
(
NULL
)
;
return
FAILURE
;
}
}
if
(
big52gbk_file
[
0
]
!= '/
0
'
)
{
if
(
LoadOneCodeTable
(
CODE_BIG52GBK
,
big52gbk_file
)
!=
NULL
)
{
toplee_cleanup_mmap
(
NULL
)
;
return
FAILURE
;
}
}
initutf
=
1
;
return
SUCCESS
;
}
PHP_MSHUTDOWN_FUNCTION
(
gbk
)
{
UNREGISTER_INI_ENTRIES
()
;
toplee_cleanup_mmap
(
NULL
)
;
return
SUCCESS
;
}
PHP_RINIT_FUNCTION
(
gbk
)
{
return
SUCCESS
;
}
PHP_RSHUTDOWN_FUNCTION
(
gbk
)
{
return
SUCCESS
;
}
PHP_MINFO_FUNCTION
(
gbk
)
{
php_info_print_table_start
()
;
php_info_print_table_header
(
2
,
"
gbk support
"
,
"
enabled
"
)
;
php_info_print_table_end
()
;
DISPLAY_INI_ENTRIES
()
;
}
PHP_FUNCTION
(
toplee_decode_utf
)
{
char
*
s
=
NULL
, *
t
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
t
=
strdup
(
s
)
;
if
(
t
==
NULL
)
RETURN_FALSE
DecodePureUTF
(
t
,
KEEP_UNICODE
)
;
RETVAL_STRING
(
t
,
1
)
;
free
(
t
)
;
return
;
}
PHP_FUNCTION
(
toplee_decode_utf_gb
)
{
char
*
s
=
NULL
, *
t
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
t
=
strdup
(
s
)
;
if
(
t
==
NULL
)
RETURN_FALSE
DecodePureUTF
(
t
,
DECODE_UNICODE
)
;
RETVAL_STRING
(
t
,
1
)
;
free
(
t
)
;
return
;
}
PHP_FUNCTION
(
toplee_decode_utf_big5
)
{
char
*
s
=
NULL
, *
t
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
t
=
strdup
(
s
)
;
if
(
t
==
NULL
)
RETURN_FALSE
DecodePureUTF
(
t
,
DECODE_UNICODE
|
DECODE_BIG5
)
;
RETVAL_STRING
(
t
,
1
)
;
free
(
t
)
;
return
;
}
int
EncodePureUTF
(
unsigned
char
*
strSrc
,
unsigned
char
*
strDst
,
int
nDstLen
,
int
nFlag
)
{
int
nRet
;
int
pos
;
unsigned
short
c
;
unsigned
short
*
uBuf
;
int
nSize
;
int
nLen
;
int
nReturn
;
nLen
=
strlen
((
const
char
*
)
strSrc
)
;
if
(
nDstLen
<
nLen
*
2
+
1
)
return
0
;
nSize
=
nLen
+
1
;
uBuf
=
(
unsigned
short
*
)
emalloc
(
sizeof
(
unsigned
short
)
*
nSize
)
;
nRet
=
MultiByteToWideChar
(
936
,
0
,
(
const
char
*
)
strSrc
,
strlen
((
const
char
*
)
strSrc
)
,
uBuf
,
nSize
)
;
nReturn
=
0
;
pos
=
nRet
;
while
(
pos
>
0
)
{
c
= *
uBuf
;
if
(
c
<
0x80
)
{
strDst
[
nReturn
++
]
=
(
char
)
c
;
}
else
if
(
c
<
0x800
)
{
strDst
[
nReturn
++
]
=
(
0xc0
|
(
c
>>
6
))
;
strDst
[
nReturn
++
]
=
(
0x80
|
(
c
&
0x3f
))
;
}
else
if
(
c
<
0x10000
)
{
strDst
[
nReturn
++
]
=
(
0xe0
|
(
c
>>
12
))
;
strDst
[
nReturn
++
]
=
(
0x80
|
((
c
>>
6
)
&
0x3f
))
;
strDst
[
nReturn
++
]
=
(
0x80
|
(
c
&
0x3f
))
;
}
else
if
(
c
<
0x200000
)
{
strDst
[
nReturn
++
]
=
(
0xf0
|
(
c
>>
18
))
;
strDst
[
nReturn
++
]
=
(
0x80
|
((
c
>>
12
)
&
0x3f
))
;
strDst
[
nReturn
++
]
=
(
0x80
|
((
c
>>
6
)
&
0x3f
))
;
strDst
[
nReturn
++
]
=
(
0x80
|
(
c
&
0x3f
))
;
}
pos
--;
uBuf
++;
}
strDst
[
nReturn
]
='/
0
';
return
nReturn
;
}
PHP_FUNCTION
(
toplee_encode_utf_gb
)
{
char
*
s
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
char
*
sRet
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
sRet
=
emalloc
(
strlen
(
s
)
*
2
+
1
)
;
EncodePureUTF
(
s
,
sRet
,
strlen
(
s
)
*
2
+
1
,
0
)
;
RETVAL_STRING
(
sRet
,
1
)
;
return
;
}
PHP_FUNCTION
(
toplee_big52gbk
)
{
char
*
s
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
char
*
sRet
=
NULL
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
sRet
=
estrdup
(
s
)
;
if
(
NULL
==
sRet
)
RETURN_FALSE
BIG52GBK
(
sRet
,
strlen
(
sRet
))
;
RETVAL_STRING
(
sRet
,
1
)
;
return
;
}
PHP_FUNCTION
(
toplee_gbk2big5
)
{
char
*
s
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
char
*
sRet
=
NULL
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
sRet
=
estrdup
(
s
)
;
if
(
NULL
==
sRet
)
RETURN_FALSE
GBK2BIG5
(
sRet
,
strlen
(
sRet
))
;
RETVAL_STRING
(
sRet
,
1
)
;
return
;
}
PHP_FUNCTION
(
toplee_normalize_name
)
{
char
*
s
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
;
char
*
sRet
=
NULL
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
s
"
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
NormalizeName
(
s
)
;
RETURN_STRING
(
s
,
1
)
;
return
;
}
PHP_FUNCTION
(
toplee_fan2jian
)
{
char
*
s
=
NULL
;
int
argc
=
ZEND_NUM_ARGS
()
;
int
s_len
,
code
;
char
*
sRet
=
NULL
;
char
*
pSource
;
char
*
pDest1
=
NULL
, *
pDest2
=
NULL
;
int
nSourceLen
,
nDestLen
;
if
(
zend_parse_parameters
(
argc
TSRMLS_CC
,
"
ls
"
, &
code
, &
s
, &
s_len
)
==
FAILURE
)
return
;
if
(
!
initutf
)
RETURN_FALSE
pSource
=
s
;
nSourceLen
=
s_len
;
pDest1
=
malloc
(
nSourceLen
*
2
)
;
pDest2
=
malloc
(
nSourceLen
+
1
)
;
if
(
NULL
==
pDest1
||
NULL
==
pDest2
)
goto
_f2j_err
;
memset
(
pDest1
,
0
,
nSourceLen
*
2
)
;
memset
(
pDest2
,
0
,
nSourceLen
+
1
)
;
nDestLen
=
MultiByteToWideChar
(
code
,
0
,
pSource
,
nSourceLen
,
(
short
*
)
pDest1
,
nSourceLen
*
2
)
;
if
(
0
>=
nDestLen
)
goto
_f2j_err
;
nDestLen
=
WideCharToMultiByte
(
code
,
0
,
(
short
*
)
pDest1
,
nDestLen
,
pDest2
,
nSourceLen
,
NULL
,
NULL
)
;
if
(
0
>=
nDestLen
)
goto
_f2j_err
;
RETVAL_STRING
(
pDest2
,
1
)
;
if
(
pDest1
!=
NULL
)
free
(
pDest1
)
;
if
(
pDest2
!=
NULL
)
free
(
pDest2
)
;
return
;
_f2j_err
:
if
(
pDest1
!=
NULL
)
free
(
pDest1
)
;
if
(
pDest2
!=
NULL
)
free
(
pDest2
)
;
RETURN_FALSE
;
}
.
事实上我们在这个文件里面定义了所有我们要实现的接口,剩下的部分就是我们再编写几个具体实现的C语言代码,有关C具体实现的技术细节就不在此讨论,有个关键的大家注意就是,您可以在ext/toplee目录下加入您所有用于实现您在toplee.c里面定义的接口的C源文件和头文件,让toplee.c在编译的时候可以调用到,这些都是标准的C语言语法。Michael就不另说,下我把我们实现的几个代码都贴出来:
chn_util.h
#ifndef
__CHN_UTIL_H__
#define
__CHN_UTIL_H__
#include
"
common.h
"
#define
LANG_GB
1
#define
LANG_B5
2
#define
GB_FULL_COUNT
(
20
+
26
*
2
+
5
+
4
+
26
)
#define
B5_FULL_COUNT
(
20
+
26
*
2
+
5
+
4
+
24
)
BOOL
FullToHalf
(
char
*
str
,
int
nLang
)
;
void
LowerString
(
char
*
str
)
;
void
TrimString
(
char
*
str
)
;
#endif
// __CHN_UTIL_H__
.
chn_util.c
#include
<
stdio.h
>
#include
<
assert.h
>
#include
<
string.h
>
#include
"
common.h
"
#include
"
chn_util.h
"
// 0123456789 !@()-_+'<>
static
char
*
GBFull
[
GB_FULL_COUNT
]
=
{
"
0
"
,
"
1
"
,
"
2
"
,
"
3
"
,
"
4
"
,
"
5
"
,
"
6
"
,
"
7
"
,
"
8
"
,
"
9
"
,
"
"
,
"
@
"
,
"
(
"
,
"
)
"
,
"
-
"
,
"
_
"
,
"
+
"
,
"
'
"
,
"
<
"
,
"
>
"
,
"
a
"
,
"
b
"
,
"
c
"
,
"
d
"
,
"
e
"
,
"
f
"
,
"
g
"
,
"
h
"
,
"
i
"
,
"
j
"
,
"
k
"
,
"
l
"
,
"
m
"
,
"
n
"
,
"
o
"
,
"
p
"
,
"
q
"
,
"
r
"
,
"
s
"
,
"
t
"
,
"
u
"
,
"
v
"
,
"
w
"
,
"
x
"
,
"
y
"
,
"
z
"
,
"
A
"
,
"
B
"
,
"
C
"
,
"
D
"
,
"
E
"
,
"
F
"
,
"
G
"
,
"
H
"
,
"
I
"
,
"
J
"
,
"
K
"
,
"
L
"
,
"
M
"
,
"
N
"
,
"
O
"
,
"
P
"
,
"
Q
"
,
"
R
"
,
"
S
"
,
"
T
"
,
"
U
"
,
"
V
"
,
"
W
"
,
"
X
"
,
"
Y
"
,
"
Z
"
,
"
。
"
,
"
·
"
,
"
.
"
,
"
﹒
"
,
"
&
"
,
"
《
"
,
"
〈
"
,
"
〉
"
,
"
》
"
,
"
﹐
"
,
"
,
"
,
"
﹔
"
,
"
;
"
,
"
﹕
"
,
"
:
"
,
"
﹖
"
,
"
?
"
,
"
﹗
"
,
"
!
"
,
"
—
"
,
"
‘
"
,
"
’
"
,
"
“
"
,
"
”
"
,
"
~
"
,
"
∶
"
,
"
`
"
,
"
|
"
,
"
[
"
,
"
]
"
,
"
{
"
,
"
}
"
,
"
#
"
,
"
$
"
,
"
%
"
}
;
static
char
GBEnHalf
[
GB_FULL_COUNT
+
1
]
=
"
0123456789 @()-_+
/
'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
"
"
....&<<>>,,;;::
/
?
/
?!!-
/
'
/
'
/
"
/
"
~:`|[]{}#$%
"
;
// ⒈⒉⒊⒋⒌⒍⒎⒏⌒∨∠ˇ≌≈
static
char
*
B5Full
[
B5_FULL_COUNT
]
=
{
"
"
,
"
"
,
"
⒈
"
,
"
⒉
"
,
"
⒊
"
,
"
⒋
"
,
"
⒌
"
,
"
⒍
"
,
"
⒎
"
,
"
⒏
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
⌒
"
,
"
∨
"
,
"
∠
"
,
"
ˇ
"
,
"
≌
"
,
"
≈
"
,
"
㈤
"
,
"
㈥
"
,
"
㈦
"
,
"
㈧
"
,
"
㈨
"
,
"
㈩
"
,
"
"
,
"
"
,
"
Ⅰ
"
,
"
Ⅱ
"
,
"
Ⅲ
"
,
"
Ⅳ
"
,
"
Ⅴ
"
,
"
Ⅵ
"
,
"
Ⅶ
"
,
"
Ⅷ
"
,
"
Ⅸ
"
,
"
Ⅹ
"
,
"
Ⅺ
"
,
"
Ⅻ
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
⑾
"
,
"
⑿
"
,
"
⒀
"
,
"
⒁
"
,
"
⒂
"
,
"
⒃
"
,
"
⒄
"
,
"
⒅
"
,
"
⒆
"
,
"
⒇
"
,
"
①
"
,
"
②
"
,
"
③
"
,
"
④
"
,
"
⑤
"
,
"
⑥
"
,
"
⑦
"
,
"
⑧
"
,
"
⑨
"
,
"
⑩
"
,
"
"
,
"
"
,
"
㈠
"
,
"
㈡
"
,
"
㈢
"
,
"
㈣
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
‘
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
ˉ
"
,
"
ˇ
"
,
"
¨
"
,
"
〃
"
,
"
°
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
"
,
"
…
"
,
"
"
,
"
"
}
;
static
char
B5EnHalf
[
B5_FULL_COUNT
+
1
]
=
"
0123456789 @()-_+
/
'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
"
"
....&<<>>,,;;::
/
?
/
?!!-
/
'
/
'
/
"
/
"
~|[]{}#$%
"
;
static
int
_bFHSortFlag
=
0
;
static
void
_sorttable
(
char
*
tableFull
[]
,
char
*
tableHalf
,
int
nSize
)
{
int
i
,
j
;
char
*
p
;
char
cTemp
;
for
(
i
=
0
;
i
<
nSize
;
i
++
)
{
for
(
j
=
i
+
1
;
j
<
nSize
;
j
++
)
{
if
(
strcmp
(
tableFull
[
i
]
,
tableFull
[
j
])
<
0
)
{
p
=
tableFull
[
i
]
;
tableFull
[
i
]
=
tableFull
[
j
]
;
tableFull
[
j
]
=
p
;
cTemp
=
tableHalf
[
i
]
;
tableHalf
[
i
]
=
tableHalf
[
j
]
;
tableHalf
[
j
]
=
cTemp
;
}
}
}
}
BOOL
FullToHalf
(
char
*
str
,
int
nCodePage
)
{
char
*
pSrc
=
str
;
char
*
pDest
=
str
;
char
**
pFull
;
char
*
pEnHalf
;
int
nCount
;
BOOL
bContinue
=
FALSE
;
int
nHigh
,
nLow
,
nMid
,
nResult
;
if
(
!
_bFHSortFlag
)
{
_sorttable
(
GBFull
,
GBEnHalf
,
GB_FULL_COUNT
)
;
_sorttable
(
B5Full
,
B5EnHalf
,
B5_FULL_COUNT
)
;
_bFHSortFlag
=
1
;
}
assert
(
NULL
!=
str
)
;
if
((
LANG_GB
==
nCodePage
)
||
(
936
==
nCodePage
))
{
pFull
=
GBFull
;
pEnHalf
=
GBEnHalf
;
nCount
=
GB_FULL_COUNT
;
}
else
if
((
LANG_B5
==
nCodePage
)
||
(
950
==
nCodePage
))
{
pFull
=
B5Full
;
pEnHalf
=
B5EnHalf
;
nCount
=
B5_FULL_COUNT
;
}
else
{
assert
(
FALSE
)
;
return
FALSE
;
}
while
(
'/
0
' != *
pSrc
)
{
if
(
0x81
<=
(
BYTE
)
*
pSrc
)
{
// 改用二分法,可以极大提高效率
nLow
=
0
;
nHigh
=
nCount
-
1
;
while
(
nLow
<=
nHigh
)
{
nMid
=
(
nLow
+
nHigh
)
/
2
;
nResult
=
strncmp
(
pSrc
,
pFull
[
nMid
]
,
2
)
;;
if
(
0
==
nResult
)
{
*
pDest
++ =
pEnHalf
[
nMid
]
;
pSrc
+=
2
;
bContinue
=
TRUE
;
break
;
}
if
(
nResult
>
0
)
nHigh
=
nMid
-
1
;
else
nLow
=
nMid
+
1
;
}
if
(
!
bContinue
)
{
// 判断其他符号
if
(
(
0xA1
<=
(
BYTE
)
*
pSrc
)
&&
(
0xA9
>=
(
BYTE
)
*
pSrc
)
)
{
*
pDest
++ = ' ';
pSrc
+=
2
;
bContinue
=
TRUE
;
}
}
if
(
bContinue
)
{
bContinue
=
FALSE
;
continue
;
}
*
pDest
++ = *
pSrc
++;
// copy head char, and the next statement copy tail char
if
(
*
pSrc
== '/
0
'
)
break
;
}
*
pDest
++ = *
pSrc
++;
// ascii code
}
*
pDest
= '/
0
';
return
TRUE
;
}
BOOL
MyIsDBCSLeadByte
(
BYTE
TestChar
)
{
if
((
TestChar
>
0X80
)
&&
(
TestChar
<
0xFF
))
return
TRUE
;
else
return
FALSE
;
}
void
LowerString
(
char
*
str
)
{
while
(
*
str
)
{
if
(
!
MyIsDBCSLeadByte
(
*
str
))
{
if
(
(
*
str
>='
A
'
)
&&
(
*
str
<='
Z
'
)
)
*
str
=
(
char
)(
*
str
+
(
'
a
'-'
A
'
))
;
}
else
{
str
++;
if
(
!*
str
)
break
;
}
str
++;
}
return
;
}
BOOL
myisspace
(
char
c
)
{
return
((
c
==' '
)
||
(
c
=='/
t
'
)
||
(
c
=='/
r
'
)
||
(
c
=='/
n
'
))
;
}
void
TrimString
(
char
*
str
)
{
char
*
pDst
;
char
*
pSrc
;
char
*
pLast
;
char
cCurrent
;
int
nState
;
pLast
=
pDst
=
pSrc
=
str
;
nState
=
0
;
while
(
*
pSrc
)
{
cCurrent
=*
pSrc
;
switch
(
nState
)
{
case
0
:
if
(
!
myisspace
(
cCurrent
))
{
nState
=
1
;
continue
;
}
break
;
case
1
:
if
(
myisspace
(
cCurrent
))
{
nState
=
2
;
*
pDst
=
cCurrent
;
}
else
{
*
pDst
=
cCurrent
;
pLast
=
pDst
+
1
;
}
pDst
++;
break
;
case
2
:
if
(
myisspace
(
cCurrent
))
{
*
pDst
=
cCurrent
;
}
else
{
*
pDst
=
cCurrent
;
pLast
=
pDst
+
1
;
}
pDst
++;
break
;
}
pSrc
++;
}
*
pLast
='/
0
';
return
;
}
.
toplee_util.c
......
int
ToBase64
(
void
*
pSrc
,
int
nSrcLen
,
char
*
strBase64
,
int
*
nBase64Len
)
{
static
char
*
v
=
"
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
"
;
.......... 中间代码有长达
3000
多行,本文省略掉了 ........
void
NormalizeName
(
char
*
p
)
{
FullToHalf
(
p
,
CODE_PAGE_GBK
)
;
TrimString
(
p
)
;
LowerString
(
p
)
;
}
.
toplee_util.h
#ifndef
__TOPLEE_UTIL_INCLUDE__
#define
__TOPLEE_UTIL_INCLUDE__
1
#include
<
sys/stat.h
>
#include
<
sys/types.h
>
#include
<
sys/mman.h
>
#include
<
string.h
>
#include
<
stdlib.h
>
#ifdef
LINUX
#include
<
time.h
>
#endif
#include
"
common.h
"
//#include "euc2uni.h"
#ifndef
TRUE
#define
TRUE
1
#define
FALSE
0
#endif
#define
ASCII
0
#define
HZ_HEAD
1
#define
HZ_TAIL
2
#ifdef
BIG_ENDDING
#define
DEFAULT_UNICODE
0x3000
#define
DEFAULT_GBK_CODE
0xA1A1
#define
DEFAULT_BIG5_CODE
0xA140
#else
#define
DEFAULT_UNICODE
0x0030
#define
DEFAULT_GBK_CODE
0xA1A1
#define
DEFAULT_BIG5_CODE
0x40A1
#endif
#define
CODE_PAGE_GBK
936
#define
CODE_PAGE_BIG5
950
#define
CODE_PAGE_EUC
932
#define
CHARSET_DEFAULT
0
#define
CHARSET_UNICODE
1
#define
CHARSET_UTF8
2
// 24066 = ( 0xFE - 0x81 + 1 ) * ( 0xFE - 0x40 + 1)
#define
GBK_COUNT
24066
// 16999 = ( 0xF9 - 0xA1 + 1 ) * ( 0xFE - 0x40 + 1)
#define
BIG5_COUNT
16999
typedef
struct
tagMMapFile2
{
BOOL
bUsed
;
struct
stat
finfo
;
void
*
mm
;
}
MMapFile
;
//int LoadEuc2UniTable(char *strFileName);
//void FreeEuc2UniTable(void);
int
ToBase64
(
void
*
pSrc
,
int
nSrcLen
,
char
*
strBase64
,
int
*
nBase64Len
)
;
int
FromBase64
(
char
*
strSrc
,
int
nSrcLen
,
void
*
pDest
,
int
*
nDestLen
)
;
int
htmlencode
(
char
*
strInput
,
int
nInputLen
,
char
*
strOutBuf
,
int
nOutBufLen
)
;
int
MultiByteToWideChar
(
unsigned
int
uCodePage
,
unsigned
long
lFlags
,
char
*
pMultiByteStr
,
int
nMultiByte
,
unsigned
short
*
pWideChar
,
int
nWideChar
)
;
int
WideCharToMultiByte
(
unsigned
int
uCodePage
,
unsigned
long
dwFlags
,
unsigned
short
*
pWideCharStr
,
int
nWideChar
,
char
*
pMultiByteStr
,
int
nMultiByte
,
const
char
*
lpDefaultChar
,
int
*
lpUseDefaultChar
)
;
#define
ASCII
0
#define
HZ_HEAD
1
#define
HZ_TAIL
2
void
GBK2BIG5
(
char
*
lpString
,
int
cbString
)
;
void
BIG52GBK
(
char
*
lpString
,
int
cbString
)
;
void
LowerString
(
char
*
str
)
;
void
TrimString
(
char
*
str
)
;
void
DecodeFormString
(
char
*
str
)
;
void
DecodeUTF
(
char
*
str
)
;
#define
DECODE_UNICODE
0
#define
KEEP_UNICODE
1
#define
DECODE_GBK
0
#define
DECODE_BIG5
2
int
DecodePureUTF
(
unsigned
char
*
str
,
int
nFlag
)
;
#define
LANG_GB
1
// used by httpstrtoint and FullToHalf
#define
LANG_B5
2
#define
LANG_ENG
3
#define
LANG_UNKNOWN
4
int
httpstrtoint
(
char
*
strHttp
)
;
void
lowerhttpprefix
(
char
*
strUrl
)
;
#define
FULL_COUNT
(
21
+
26
*
2
+
5
)
BOOL
FullToHalf
(
char
*
str
,
int
nLang
)
;
#define
URLDESCSEPCHAR
'|'
char
*
DescriptFromUrl
(
char
*
strUrl
)
;
#define
CODE_GBK2UNI
1
#define
CODE_UNI2GBK
2
#define
CODE_BIG52UNI
3
#define
CODE_UNI2BIG5
4
#define
CODE_GBK2BIG5
5
#define
CODE_BIG52GBK
6
const
char
*
mmapOneFile
(
char
*
pFileName
,
MMapFile
*
mmapfile
)
;
void
toplee_cleanup_mmap
(
void
*
dummy
)
;
void
InitMMResource
(
void
)
;
const
char
*
LoadOneCodeTable
(
int
nType
,
char
*
strFileName
)
;
int
getcuryear
()
;
char
*
mstrncpy
(
char
*
strDest
,
char
*
strSrc
,
size_t
nCount
)
;
int
formurlencode
(
char
*
strInput
,
int
nInputLen
,
char
*
strOutBuf
,
int
nOutBufLen
)
;
int
wmlencode
(
char
*
strInput
,
int
nInputLen
,
char
*
strOutBuf
,
int
nOutBufLen
)
;
int
htmlencode
(
char
*
strInput
,
int
nInputLen
,
char
*
strOutBuf
,
int
nOutBufLen
)
;
#define
MAX_INTERNAL_BUFF
16384
int
gb2uni_encode
(
char
*
strInput
,
int
nInputLen
,
char
*
strOutBuf
,
int
nOutBufLen
)
;
int
unicodeencode
(
char
*
strInput
,
int
nInputLen
,
char
*
strOutBuf
,
int
nOutBufLen
)
;
char
*
stristr
(
const
char
*
big
,
const
char
*
little
)
;
typedef
struct
auto_string
{
int
len
,
inc_len
;
char
*
strval
;
}
struAutoString
;
#define
DEF_INC_LEN
(
1024
)
#define
DEF_INT_LEN
12
void
init_auto_string
(
struAutoString
*
astr
,
int
inc_len
)
;
int
add_auto_string
(
struAutoString
*
astr
,
char
*
new_str
)
;
void
free_auto_string
(
struAutoString
*
astr
)
;
int
unistrcmp
(
const
char
*
str1
,
int
str1len
,
const
char
*
str2
,
int
str2len
)
;
void
NormalizeName
(
char
*
p
)
;
#endif
// __TOPLEE_UTIL_INCLUDE__
.
php_toplee.h
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
#ifndef
PHP_GBK_H
#define
PHP_GBK_H
extern
zend_module_entry
gbk_module_entry
;
#define
phpext_gbk_ptr
&
gbk_module_entry
#ifdef
PHP_WIN32
#define
PHP_GBK_API
__declspec
(
dllexport
)
#else
#define
PHP_GBK_API
#endif
#ifdef
ZTS
#include
"
TSRM.h
"
#endif
PHP_MINIT_FUNCTION
(
gbk
)
;
PHP_MSHUTDOWN_FUNCTION
(
gbk
)
;
PHP_RINIT_FUNCTION
(
gbk
)
;
PHP_RSHUTDOWN_FUNCTION
(
gbk
)
;
PHP_MINFO_FUNCTION
(
gbk
)
;
PHP_FUNCTION
(
confirm_gbk_compiled
)
;
PHP_FUNCTION
(
toplee_decode_utf
)
;
PHP_FUNCTION
(
toplee_decode_utf_gb
)
;
PHP_FUNCTION
(
toplee_decode_utf_big5
)
;
PHP_FUNCTION
(
toplee_encode_utf_gb
)
;
PHP_FUNCTION
(
toplee_big52gbk
)
;
PHP_FUNCTION
(
toplee_gbk2big5
)
;
PHP_FUNCTION
(
toplee_fan2jian
)
;
PHP_FUNCTION
(
toplee_normalize_name
)
;
#ifdef
ZTS
#define
GBK_G
(
v
)
TSRMG
(
gbk_globals_id
,
zend_gbk_globals
*,
v
)
#else
#define
GBK_G
(
v
)
(
gbk_globals
.
v
)
#endif
#endif
.
至此,我们完成了所有C 代码的编写,本模块实现还需要用到几个码表文件,比如gb2b5.tab,uni2gb.tab之类的,这些码表文件我就不提供了,可以查一些文档如何生成,网上也有很多这样的tab码表文件下载。
接下来,我们就可以进行测试和编译了
回到php源码的根目录,运行命令
#./buildconf
#./configure –with-toplee=shared ……
#./make
#./make install
此时,就完成了模块往php里面的编译,由于加上了shared参数,toplee模块将编译后生成 toplee.so,可以在php.ini或者extensions.ini文件里面使用extension=toplee.so来调用,也可以在php中使用dl()函数动态调用,然后就可以在php里面使用之前我们定义好的几个函数接口了。
因Michael技术实力有限,本文有不正确之处请高手指正,也希望通过本文起到抛砖引玉之效果,让更多的php爱好者一起来分享个人的宝贵经验!