Linux/FreeBSD下用C语言开发PHP的so扩展模块例解

我从97年接触互联网的web开发,至今已经过去9年了,从最初的frontpage做html页面到学会ASP+access+IIS开始,就跟web开发干上了,后来又依次使用了ASP+SQLServer+IIS、JSP+Oracle+Jrun(Resin/Tomcat)、PHP+Syabse(MySQL)+Apache … 最后我定格到了 PHP+MySQL+Apache+Linux(BSD) 的架构上,也就是大家常说的LAMP架构,这说来有很多理由,网上也有很多人讨论各种架构和开发语言之间的优劣,我就不多说了,简单说一下我喜欢LAMP的几个主要原因:

1、全开放的免费平台;
2、简单易上手、各种资源丰富;
3、PHP、MySQL、Apache与Linux(BSD)系统底层以及彼此间无缝结合,非常高效;
4、均使用最高效的语言C/C++开发,性能可靠;
5、PHP语言和C的风格基本一致,还吸取了Java和C++的诸多架构优点;
6、这是最关键的一点,那就是PHP可以非常方便的使用C/C++开发扩展模块,给了PHP无限的扩张性!

基于以上原因,我非常喜欢基于PHP语言的架构,其中最关键的一点就是最后一点,以前在Yahoo和mop均推广使用这个平台,在C扩展php方面也有一些经验,在此和大家分享一下,希望可以抛砖引玉。

用C语言编写PHP的扩展模块的方法有几种,根据最后的表现形式有两种,一种是直接编译进php,一种是编译为php的so扩展模块来被php调用,另外根据编译的方式有两种,一种使用phpize工具(php编译后有的),一种使用ext_skel工具(php自带的),我们使用最多,也是最方便的方式就是使用ext_skel工具来编写php的so扩展模块,这里也主要介绍这种方式。

我们在php的源码目录里面可以看到有个ext目录(我这里说的php都是基于Linux平台的php来说的,不包括windows下的),在ext目录下有个工具 ext_skel ,这个工具可以让我们简单的开发出php的扩展模块,它提供了一个通用的php扩展模块开发步骤和模板。下面我们以开发一个在php里面进行utf8/gbk/gb2312三种编码转换的扩展模块为例子进行说明。在这个模块中,我们要最终提供以下几个函数接口:

(1) string toplee_big52gbk(string s)
将输入字符串从BIG5码转换成GBK
(2) string toplee_gbk2big5(string s)
将输入字符串从GBK转换成BIG5码
(3) string toplee_normalize_name(string s)
将输入字符串作以下处理:全角转半角,strim,大写转小写
(4) string toplee_fan2jian(int code, string s)
将输入的GBK繁体字符串转换成简体
(5) string toplee_decode_utf(string s)
将utf编码的字符串转换成UNICODE
(6) string toplee_decode_utf_gb(string s)
将utf编码的字符串转换成GB
(7) string toplee_decode_utf_big5(string s)
将utf编码的字符串转换成BIG5
(8) string toplee_encode_utf_gb(string s)
将输入的GBKf编码的字符串转换成utf编码

首先,我们进入ext目录下,运行下面命令:
#./ext_skel –extname=toplee
这时,php会自动在ext目录下为我们生成一个目录toplee,里面包含下面几个文件
.cvsignore
CREDITS
EXPERIMENTAL
config.m4
php_toplee.h
tests
toplee.c
toplee.php

其中最有用的就是config.m4和toplee.c文件
接下来我们修改config.m4文件
#vi ./config.m4
找到里面有类似这样几行

dnl PHP_ARG_WITH(toplee, for toplee support,
dnl Make sure that the comment is aligned:
dnl [  --with-toplee             Include toplee support])

dnl Otherwise use enable:

dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [  --enable-toplee           Enable toplee support])

上面的几行意思是说告诉php编译的使用使用那种方式加载我们的扩展模块toplee,我们使用–with-toplee的方式,于是我们修改为下面的样子

PHP_ARG_WITH(toplee, for toplee support,
Make sure that the comment is aligned:
[  --with-toplee             Include toplee support])

dnl Otherwise use enable:

dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [  --enable-toplee           Enable toplee support])

然后我们要做的关键事情就是编写toplee.c,这个是我们编写模块的主要文件,如果您什么都不修改,其实也完成了一个php扩展模块的编写,里面有类似下面的几行代码

PHP_FUNCTION ( confirm_toplee_compiled )
{
        
char * arg = NULL ;
        
int arg_len , len ;
        
char string [ 256 ] ;
 
        
if ( zend_parse_parameters ( ZEND_NUM_ARGS () TSRMLS_CC , " s " , & arg , & arg_len ) == FAILURE ) {
                
return ;
        
}
 
        
len = sprintf ( string , " Congratulations! You have successfully modified ext/%.78s/config.m4. Module %.78s is now compiled into PHP. " , " toplee " , arg ) ;
        
RETURN_STRINGL ( string , len , 1 ) ;
}

如果我们在后面完成php的编译时把新的模块编译进去,那么我们就可以在php脚本中调用函数toplee(),它会输出一段字符串“Congratulations! You have successfully modified ext/toplee/config.m4. Module toplee is now compiled into PHP.”

下面是我们对toplee.c的修改,让其支持我们预先规划的功能和接口,下面是toplee.c的源代码

/*
  +----------------------------------------------------------------------+
  | PHP Version 4                                                        |
  +----------------------------------------------------------------------+
  | Copyright (c) 1997-2002 The PHP Group                                |
  +----------------------------------------------------------------------+
  | This source file is subject to version 2.02 of the PHP license,      |
  | that is bundled with this package in the file LICENSE, and is        |
  | available at through the world-wide-web at                           |
  | http://www.php.net/license/2_02.txt.                                 |
  | If you did not receive a copy of the PHP license and are unable to   |
  | obtain it through the world-wide-web, please send a note to          |
  | [email protected] so we can mail you a copy immediately.               |
  +----------------------------------------------------------------------+
  | Author:                                                              |
  +----------------------------------------------------------------------+
 
 
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/

 
#ifdef HAVE_CONFIG_H
#include " config.h "
#endif
 
#include " php.h "
#include " php_ini.h "
#include " ext/standard/info.h "
#include " php_gbk.h "
#include " toplee_util.h "
 
/* If you declare any globals in php_gbk.h uncomment this:
ZEND_DECLARE_MODULE_GLOBALS(gbk)
*/

 
/* True global resources - no need for thread safety here */
static int le_gbk ;
 
/* {{{ gbk_functions[]
 *
 * Every user visible function must have an entry in gbk_functions[].
 */

function_entry gbk_functions [] = {
    
PHP_FE ( toplee_decode_utf ,    NULL )
    
PHP_FE ( toplee_decode_utf_gb ,    NULL )
    
PHP_FE ( toplee_decode_utf_big5 ,    NULL )
    
PHP_FE ( toplee_encode_utf_gb ,    NULL )
 
    
PHP_FE ( toplee_big52gbk ,    NULL )
    
PHP_FE ( toplee_gbk2big5 ,    NULL )
    
PHP_FE ( toplee_fan2jian ,    NULL )
    
PHP_FE ( toplee_normalize_name ,    NULL )
    
{ NULL , NULL , NULL }     /* Must be the last line in gbk_functions[] */
} ;
/* }}} */
 
/* {{{ gbk_module_entry
 */

zend_module_entry gbk_module_entry = {
#if ZEND_MODULE_API_NO >= 20010901
    
STANDARD_MODULE_HEADER ,
#endif
    
" gbk " ,
    
gbk_functions ,
    
PHP_MINIT ( gbk ) ,
    
PHP_MSHUTDOWN ( gbk ) ,
    
PHP_RINIT ( gbk ) ,        /* Replace with NULL if there's nothing to do at request start */
    
PHP_RSHUTDOWN ( gbk ) ,    /* Replace with NULL if there's nothing to do at request end */
    
PHP_MINFO ( gbk ) ,
#if ZEND_MODULE_API_NO >= 20010901
    
" 0.1 " , /* Replace with version number for your extension */
#endif
    
STANDARD_MODULE_PROPERTIES
} ;
/* }}} */
 
#ifdef COMPILE_DL_GBK
ZEND_GET_MODULE ( gbk )
#endif
 
/* {{{ PHP_INI
 */

/* Remove comments and fill if you need to have entries in php.ini*/
PHP_INI_BEGIN ()
    
PHP_INI_ENTRY ( " gbk2uni " ,            "" ,        PHP_INI_SYSTEM ,    NULL )
    
PHP_INI_ENTRY ( " uni2gbk " ,            "" ,        PHP_INI_SYSTEM ,    NULL )
    
PHP_INI_ENTRY ( " uni2big5 " ,            "" ,        PHP_INI_SYSTEM ,    NULL )
    
PHP_INI_ENTRY ( " big52uni " ,            "" ,        PHP_INI_SYSTEM ,    NULL )
    
PHP_INI_ENTRY ( " big52gbk " ,            "" ,        PHP_INI_SYSTEM ,    NULL )
    
PHP_INI_ENTRY ( " gbk2big5 " ,            "" ,        PHP_INI_SYSTEM ,    NULL )
//    STD_PHP_INI_ENTRY("gbk.global_value",      "42", PHP_INI_ALL, OnUpdateInt, global_value, zend_gbk_globals, gbk_globals)
//    STD_PHP_INI_ENTRY("gbk.global_string", "foobar", PHP_INI_ALL, OnUpdateString, global_string, zend_gbk_globals, gbk_globals)
PHP_INI_END ()
 
/* }}} */
 
/* {{{ php_gbk_init_globals
 */

/* Uncomment this function if you have INI entries
static void php_gbk_init_globals(zend_gbk_globals *gbk_globals)
{
    gbk_globals->global_value = 0;
    gbk_globals->global_string = NULL;
}
*/

/* }}} */
 
char gbk2uni_file [ 256 ] ;
char uni2gbk_file [ 256 ] ;
char big52uni_file [ 256 ] ;
char uni2big5_file [ 256 ] ;
char gbk2big5_file [ 256 ] ;
char big52gbk_file [ 256 ] ;
 
//utf file init flag
static int initutf = 0 ;
 
/* {{{ PHP_MINIT_FUNCTION
 */

PHP_MINIT_FUNCTION ( gbk )
{
    
/* If you have INI entries, uncomment these lines
    ZEND_INIT_MODULE_GLOBALS(gbk, php_gbk_init_globals, NULL);*/

    
REGISTER_INI_ENTRIES () ;
    
memset ( gbk2uni_file , 0 , sizeof ( gbk2uni_file )) ;
    
memset ( uni2gbk_file , 0 , sizeof ( uni2gbk_file )) ;
    
memset ( big52uni_file , 0 , sizeof ( big52uni_file )) ;
    
memset ( uni2big5_file , 0 , sizeof ( uni2big5_file )) ;
    
memset ( gbk2big5_file , 0 , sizeof ( gbk2big5_file )) ;
    
memset ( big52gbk_file , 0 , sizeof ( big52gbk_file )) ;
    
    
strncpy ( gbk2uni_file , INI_STR ( " gbk2uni " ) , sizeof ( gbk2uni_file ) - 1 ) ;
    
strncpy ( uni2gbk_file , INI_STR ( " uni2gbk " ) , sizeof ( uni2gbk_file ) - 1 ) ;
    
strncpy ( big52uni_file , INI_STR ( " big52uni " ) , sizeof ( big52uni_file ) - 1 ) ;
    
strncpy ( uni2big5_file , INI_STR ( " uni2big5 " ) , sizeof ( uni2big5_file ) - 1 ) ;
    
strncpy ( gbk2big5_file , INI_STR ( " gbk2big5 " ) , sizeof ( uni2big5_file ) - 1 ) ;
    
strncpy ( big52gbk_file , INI_STR ( " big52gbk " ) , sizeof ( uni2big5_file ) - 1 ) ;
 
    
//InitMMResource();
    
InitResource () ;
    
if (( uni2gbk_file [ 0 ] == '/ 0 ' ) || ( uni2big5_file [ 0 ] == '/ 0 ' )
      ||
( gbk2big5_file [ 0 ] == '/ 0 ' ) || ( big52gbk_file [ 0 ] == '/ 0 ' )
      ||
( gbk2uni_file [ 0 ] == '/ 0 ' ) || ( big52uni_file [ 0 ] == '/ 0 ' ))
    
{
        
return FAILURE ;
    
}
 
    
if ( gbk2uni_file [ 0 ] != '/ 0 ' )
    
{
        
if ( LoadOneCodeTable ( CODE_GBK2UNI , gbk2uni_file ) != NULL )
        
{
            
toplee_cleanup_mmap ( NULL ) ;
            
return FAILURE ;
        
}
    
}
 
    
if ( uni2gbk_file [ 0 ] != '/ 0 ' )
    
{
        
if ( LoadOneCodeTable ( CODE_UNI2GBK , uni2gbk_file ) != NULL )
        
{
            
toplee_cleanup_mmap ( NULL ) ;
            
return FAILURE ;
        
}
    
}
 
    
if ( big52uni_file [ 0 ] != '/ 0 ' )
    
{
        
if ( LoadOneCodeTable ( CODE_BIG52UNI , big52uni_file ) != NULL )
        
{
            
toplee_cleanup_mmap ( NULL ) ;
            
return FAILURE ;
        
}
    
}
 
    
if ( uni2big5_file [ 0 ] != '/ 0 ' )
    
{
        
if ( LoadOneCodeTable ( CODE_UNI2BIG5 , uni2big5_file ) != NULL )
        
{
            
toplee_cleanup_mmap ( NULL ) ;
            
return FAILURE ;
        
}
    
}
    
    
if ( gbk2big5_file [ 0 ] != '/ 0 ' )
    
{
        
if ( LoadOneCodeTable ( CODE_GBK2BIG5 , gbk2big5_file ) != NULL )
        
{
            
toplee_cleanup_mmap ( NULL ) ;
            
return FAILURE ;
        
}
    
}
 
    
if ( big52gbk_file [ 0 ] != '/ 0 ' )
    
{
        
if ( LoadOneCodeTable ( CODE_BIG52GBK , big52gbk_file ) != NULL )
        
{
            
toplee_cleanup_mmap ( NULL ) ;
            
return FAILURE ;
        
}
    
}
    
    
initutf = 1 ;
    
return SUCCESS ;
}
/* }}} */
 
/* {{{ PHP_MSHUTDOWN_FUNCTION
 */

PHP_MSHUTDOWN_FUNCTION ( gbk )
{
    
/* uncomment this line if you have INI entries*/
    
UNREGISTER_INI_ENTRIES () ;
    
    
toplee_cleanup_mmap ( NULL ) ;
    
return SUCCESS ;
}
/* }}} */
 
/* Remove if there's nothing to do at request start */
/* {{{ PHP_RINIT_FUNCTION
 */

PHP_RINIT_FUNCTION ( gbk )
{
    
return SUCCESS ;
}
/* }}} */
 
/* Remove if there's nothing to do at request end */
/* {{{ PHP_RSHUTDOWN_FUNCTION
 */

PHP_RSHUTDOWN_FUNCTION ( gbk )
{
    
return SUCCESS ;
}
/* }}} */
 
/* {{{ PHP_MINFO_FUNCTION
 */

PHP_MINFO_FUNCTION ( gbk )
{
    
php_info_print_table_start () ;
    
php_info_print_table_header ( 2 , " gbk support " , " enabled " ) ;
    
php_info_print_table_end () ;
 
    
/* Remove comments if you have entries in php.ini*/
    
DISPLAY_INI_ENTRIES () ;
    
}
/* }}} */
 
 
/* Remove the following function when you have succesfully modified config.m4
   so that your module can be compiled into PHP, it exists only for testing
   purposes. */

 
/* {{{ proto  toplee_decode_utf(string s)
    */

PHP_FUNCTION ( toplee_decode_utf )
{
    
char * s = NULL , * t = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
    
t = strdup ( s ) ;
    
if ( t == NULL )
        
RETURN_FALSE
 
 
    
DecodePureUTF ( t , KEEP_UNICODE ) ;
    
RETVAL_STRING ( t , 1 ) ;
    
free ( t ) ;
    
return ;
}
/* }}} */
 
/* {{{ proto  toplee_decode_utf_gb(string s)
    */

PHP_FUNCTION ( toplee_decode_utf_gb )
{
    
char * s = NULL , * t = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
    
t = strdup ( s ) ;
    
if ( t == NULL )
        
RETURN_FALSE
 
    
DecodePureUTF ( t , DECODE_UNICODE ) ;
    
RETVAL_STRING ( t , 1 ) ;
    
free ( t ) ;
    
return ;
 
}
/* }}} */
 
/* {{{ proto  toplee_decode_utf_big5(string s)
    */

PHP_FUNCTION ( toplee_decode_utf_big5 )
{
    
char * s = NULL , * t = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
    
t = strdup ( s ) ;
    
if ( t == NULL )
        
RETURN_FALSE
 
 
    
DecodePureUTF ( t , DECODE_UNICODE | DECODE_BIG5 ) ;
    
RETVAL_STRING ( t , 1 ) ;
    
free ( t ) ;
    
return ;
}
/* }}} */
int EncodePureUTF ( unsigned char * strSrc ,
    
unsigned char * strDst , int nDstLen , int nFlag )
{
    
int nRet ;
    
int pos ;
    
unsigned short c ;
    
unsigned short * uBuf ;
    
int nSize ;
    
int nLen ;
    
int nReturn ;
 
    
nLen = strlen (( const char * ) strSrc ) ;
    
if ( nDstLen < nLen * 2 + 1 )
        
return 0 ;
 
    
nSize = nLen + 1 ;
    
uBuf = ( unsigned short * ) emalloc ( sizeof ( unsigned short ) * nSize ) ;
 
    
nRet = MultiByteToWideChar ( 936 , 0 , ( const char * ) strSrc , strlen (( const char * ) strSrc ) ,
        
uBuf , nSize ) ;
 
    
nReturn = 0 ;
    
pos = nRet ;
    
while ( pos > 0 )
    
{
        
c = * uBuf ;
        
if ( c < 0x80 ) {
            
strDst [ nReturn ++ ] = ( char ) c ;
        
} else if ( c < 0x800 ) {
            
strDst [ nReturn ++ ] = ( 0xc0 | ( c >> 6 )) ;
            
strDst [ nReturn ++ ] = ( 0x80 | ( c & 0x3f )) ;
        
} else if ( c < 0x10000 ) {
            
strDst [ nReturn ++ ] = ( 0xe0 | ( c >> 12 )) ;
            
strDst [ nReturn ++ ] = ( 0x80 | (( c >> 6 ) & 0x3f )) ;
            
strDst [ nReturn ++ ] = ( 0x80 | ( c & 0x3f )) ;
        
} else if ( c < 0x200000 ) {
            
strDst [ nReturn ++ ] = ( 0xf0 | ( c >> 18 )) ;
            
strDst [ nReturn ++ ] = ( 0x80 | (( c >> 12 ) & 0x3f )) ;
            
strDst [ nReturn ++ ] = ( 0x80 | (( c >> 6 ) & 0x3f )) ;
            
strDst [ nReturn ++ ] = ( 0x80 | ( c & 0x3f )) ;
        
}
        
pos --;
        
uBuf ++;
    
}
    
strDst [ nReturn ] ='/ 0 ';
 
    
return nReturn ;
}
 
/* {{{ proto  toplee_encode_utf_gb(string s)
    */

PHP_FUNCTION ( toplee_encode_utf_gb )
{
    
char * s = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
    
char * sRet ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
    
sRet = emalloc ( strlen ( s ) * 2 + 1 ) ;
 
    
EncodePureUTF ( s , sRet , strlen ( s ) * 2 + 1 , 0 ) ;
    
RETVAL_STRING ( sRet , 1 ) ;
    
return ;
}
/* }}} */
 
 
/* {{{ proto  toplee_big52gbk(string s)
    */

PHP_FUNCTION ( toplee_big52gbk )
{
    
char * s = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
    
char * sRet = NULL ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
        
    
sRet = estrdup ( s ) ;
    
if ( NULL == sRet )
        
RETURN_FALSE
 
    
BIG52GBK ( sRet , strlen ( sRet )) ;
    
RETVAL_STRING ( sRet , 1 ) ;
    
return ;
}
/* }}} */
 
/* {{{ proto  toplee_gbk2big5(string s)
    */

PHP_FUNCTION ( toplee_gbk2big5 )
{
    
char * s = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
    
char * sRet = NULL ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
        
    
sRet = estrdup ( s ) ;
    
if ( NULL == sRet )
        
RETURN_FALSE
 
    
GBK2BIG5 ( sRet , strlen ( sRet )) ;
    
RETVAL_STRING ( sRet , 1 ) ;
    
return ;
}
/* }}} */
 
/* {{{ proto  toplee_normalize_name(string s)
    */

PHP_FUNCTION ( toplee_normalize_name )
{
    
char * s = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len ;
    
char * sRet = NULL ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
        
        
NormalizeName ( s ) ;   
 
        
RETURN_STRING ( s , 1 ) ;
 
        
    
return ;
}
/* }}} */
 
/* {{{ proto  toplee_fan2jian(int code, string s)
    */

PHP_FUNCTION ( toplee_fan2jian )
{
    
char * s = NULL ;
    
int argc = ZEND_NUM_ARGS () ;
    
int s_len , code ;
    
char * sRet = NULL ;
    
char * pSource ;
        
char * pDest1 = NULL , * pDest2 = NULL ;
        
int nSourceLen , nDestLen ;
 
    
if ( zend_parse_parameters ( argc TSRMLS_CC , " ls " , & code , & s , & s_len ) == FAILURE )
        
return ;
 
    
if ( ! initutf )
        
RETURN_FALSE
 
        
pSource = s ;
        
nSourceLen = s_len ;
        
pDest1 = malloc ( nSourceLen * 2 ) ;
        
pDest2 = malloc ( nSourceLen + 1 ) ;
        
if ( NULL == pDest1 || NULL == pDest2 )
                
goto _f2j_err ;
 
        
memset ( pDest1 , 0 , nSourceLen * 2 ) ;
        
memset ( pDest2 , 0 , nSourceLen + 1 ) ;
        
nDestLen = MultiByteToWideChar ( code , 0 , pSource , nSourceLen , ( short * ) pDest1 , nSourceLen * 2 ) ;
        
        
if ( 0 >= nDestLen )
                
goto _f2j_err ;
                
        
nDestLen = WideCharToMultiByte ( code , 0 , ( short * ) pDest1 , nDestLen , pDest2 , nSourceLen , NULL , NULL ) ;
        
if ( 0 >= nDestLen )
                
goto _f2j_err ;
 
        
RETVAL_STRING ( pDest2 , 1 ) ;
        
if ( pDest1 != NULL )
                
free ( pDest1 ) ;
        
if ( pDest2 != NULL )
                
free ( pDest2 ) ;
        
return ;
 
_f2j_err :
        
if ( pDest1 != NULL )
                
free ( pDest1 ) ;
        
if ( pDest2 != NULL )
                
free ( pDest2 ) ;
        
RETURN_FALSE ;
}
/* }}} */
 
/*
 * Local variables:
 * tab-width: 4
 * c-basic-offset: 4
 * End:
 * vim600: noet sw=4 ts=4 fdm=marker
 * vim<600: noet sw=4 ts=4
 */

.

事实上我们在这个文件里面定义了所有我们要实现的接口,剩下的部分就是我们再编写几个具体实现的C语言代码,有关C具体实现的技术细节就不在此讨论,有个关键的大家注意就是,您可以在ext/toplee目录下加入您所有用于实现您在toplee.c里面定义的接口的C源文件和头文件,让toplee.c在编译的时候可以调用到,这些都是标准的C语言语法。Michael就不另说,下我把我们实现的几个代码都贴出来:
chn_util.h

#ifndef __CHN_UTIL_H__
#define __CHN_UTIL_H__
 
#include " common.h "
 
#define     LANG_GB             1
#define LANG_B5             2
 
#define GB_FULL_COUNT     ( 20 + 26 * 2 + 5 + 4 + 26 )
#define B5_FULL_COUNT     ( 20 + 26 * 2 + 5 + 4 + 24 )
 
BOOL FullToHalf ( char * str , int nLang ) ;
 
void LowerString ( char * str ) ;
 
void TrimString ( char * str ) ;
 
#endif // __CHN_UTIL_H__

.

chn_util.c

#include < stdio.h >
#include < assert.h >
#include < string.h >
#include " common.h "
#include " chn_util.h "
 
 
// 0123456789 !@()-_+'<>
static char * GBFull [ GB_FULL_COUNT ] =
        
{ " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
"   " , " " , " " , " " , " " , " _ " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " · " , " " , " " , " " ,
        
" " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " "
} ;
 
static char GBEnHalf [ GB_FULL_COUNT + 1 ] =
        
" 0123456789 @()-_+ / '<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
        
" ....&<<>>,,;;:: / ? / ?!!- / ' / ' / " / " ~:`|[]{}#$% " ;
 
// ⒈⒉⒊⒋⒌⒍⒎⒏⌒∨∠ˇ≌≈
static char * B5Full [ B5_FULL_COUNT ] =
        
{ " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " ˇ " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " ,
        
" " , " " , " " , "  " , " " ,
        
" " , " " , " " , " " ,
        
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
        
" ˉ " , " ˇ " , " ¨ " , " " , " ° " , " " , " " , " " , " " , " " , " " ,
        
" " , " "
} ;
 
static char B5EnHalf [ B5_FULL_COUNT + 1 ] =
        
" 0123456789 @()-_+ / '<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
        
" ....&<<>>,,;;:: / ? / ?!!- / ' / ' / " / " ~|[]{}#$% " ;
 
 
static int _bFHSortFlag = 0 ;
 
static void _sorttable ( char * tableFull [] , char * tableHalf , int nSize )
{
    
int i , j ;
    
char * p ;
    
char cTemp ;
 
    
for ( i = 0 ; i < nSize ; i ++ )
    
{
        
for ( j = i + 1 ; j < nSize ; j ++ )
        
{
            
if ( strcmp ( tableFull [ i ] , tableFull [ j ]) < 0 )
            
{
                
p = tableFull [ i ] ;
                
tableFull [ i ] = tableFull [ j ] ;
                
tableFull [ j ] = p ;
                
cTemp = tableHalf [ i ] ;
                
tableHalf [ i ] = tableHalf [ j ] ;
                
tableHalf [ j ] = cTemp ;
            
}
        
}
    
}
}
 
BOOL FullToHalf ( char * str , int nCodePage )
{
    
char * pSrc = str ;
    
char * pDest = str ;
    
char ** pFull ;
    
char * pEnHalf ;
    
int nCount ;
    
BOOL bContinue = FALSE ;
    
int nHigh , nLow , nMid , nResult ;
 
    
if ( ! _bFHSortFlag )
    
{
        
_sorttable ( GBFull , GBEnHalf , GB_FULL_COUNT ) ;
        
_sorttable ( B5Full , B5EnHalf , B5_FULL_COUNT ) ;
        
_bFHSortFlag = 1 ;
    
}
 
    
assert ( NULL != str ) ;
    
if (( LANG_GB == nCodePage ) || ( 936 == nCodePage ))
    
{
        
pFull = GBFull ;
        
pEnHalf = GBEnHalf ;
        
nCount = GB_FULL_COUNT ;
    
}
    
else if (( LANG_B5 == nCodePage ) || ( 950 == nCodePage ))
    
{
        
pFull = B5Full ;
        
pEnHalf = B5EnHalf ;
        
nCount = B5_FULL_COUNT ;
    
}
    
else
    
{
        
assert ( FALSE ) ;
        
return FALSE ;
    
}
 
    
while ( '/ 0 ' != * pSrc )
    
{
        
if ( 0x81 <= ( BYTE ) * pSrc )
        
{
            
//    改用二分法,可以极大提高效率
            
nLow = 0 ;
            
nHigh = nCount - 1 ;
            
while ( nLow <= nHigh )
            
{
                
nMid = ( nLow + nHigh ) / 2 ;
                
nResult = strncmp ( pSrc , pFull [ nMid ] , 2 ) ;;
                
if ( 0 == nResult )
                
{
                    *
pDest ++ = pEnHalf [ nMid ] ;
                    
pSrc += 2 ;
                    
bContinue = TRUE ;
                    
break ;
                
}
                
if ( nResult > 0 )
                    
nHigh = nMid - 1 ;
                
else
                    
nLow = nMid + 1 ;
            
}
 
            
if ( ! bContinue )
            
{
                
// 判断其他符号
                
if ( ( 0xA1 <= ( BYTE ) * pSrc ) &&
                    
( 0xA9 >= ( BYTE ) * pSrc ) )
                
{
                    *
pDest ++ = ' ';
                    
pSrc += 2 ;
                    
bContinue = TRUE ;
                
}
            
}
 
/*            for (nIndex = 0; nIndex < nCount; nIndex++)
            {
                assert(NULL != pFull[nIndex]);
                if (NULL != pFull[nIndex])
                {
                    if (0 == strncmp(pSrc, pFull[nIndex], 2))
                    {
                        *pDest++ = pEnHalf[nIndex];    // convert full to half
                        pSrc += 2;
 
                        bContinue = TRUE;
                        break;
                    }
                }
            }*/

 
            
if ( bContinue )
            
{
                
bContinue = FALSE ;
                
continue ;
            
}
 
            *
pDest ++ = * pSrc ++;    // copy head char, and the next statement copy tail char
            
if ( * pSrc == '/ 0 ' )
                
break ;
        
}
        
        *
pDest ++ = * pSrc ++;    // ascii code
    
}
 
    *
pDest = '/ 0 ';
    
return TRUE ;
}
 
BOOL MyIsDBCSLeadByte ( BYTE TestChar )
{
    
if (( TestChar > 0X80 ) && ( TestChar < 0xFF ))
        
return TRUE ;
    
else
        
return FALSE ;
}
 
 
void LowerString ( char * str )
{
    
while ( * str )
    
{
        
if ( ! MyIsDBCSLeadByte ( * str ))
        
{        
            
if ( ( * str >=' A ' ) && ( * str <=' Z ' ) )
                *
str = ( char )( * str + ( ' a '-' A ' )) ;
        
}
        
else
        
{
            
str ++;
            
if ( !* str )
                
break ;
        
}
        
str ++;
    
}
    
return ;
}
 
BOOL myisspace ( char c )
{
    
return (( c ==' ' ) || ( c =='/ t ' ) || ( c =='/ r ' ) || ( c =='/ n ' )) ;   
}
 
void TrimString ( char * str )
{
    
char *    pDst ;
    
char *    pSrc ;
    
char *    pLast ;
    
char     cCurrent ;
    
int     nState ;
 
 
    
pLast = pDst = pSrc = str ;
    
nState = 0 ;
 
    
while ( * pSrc )
    
{
        
cCurrent =* pSrc ;
        
switch ( nState )
        
{
        
case 0 :
            
if ( ! myisspace ( cCurrent ))
            
{
                
nState = 1 ;
                
continue ;
            
}
            
break ;
        
case 1 :
            
if ( myisspace ( cCurrent ))
            
{
                
nState = 2 ;
                *
pDst = cCurrent ;
            
}
            
else
            
{
                *
pDst = cCurrent ;
                
pLast = pDst + 1 ;
            
}
            
pDst ++;
            
break ;
        
case 2 :
            
if ( myisspace ( cCurrent ))
            
{
                *
pDst = cCurrent ;
            
}
            
else
            
{
                *
pDst = cCurrent ;
                
pLast = pDst + 1 ;
            
}
            
pDst ++;
            
break ;
        
}
        
pSrc ++;
    
}
 
    *
pLast ='/ 0 ';
    
return ;
}

.

toplee_util.c

......
 
int ToBase64 ( void * pSrc , int nSrcLen , char * strBase64 , int * nBase64Len )
{
    
static char * v = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ " ;
 
 
.......... 中间代码有长达
3000 多行,本文省略掉了 ........
 
void NormalizeName ( char * p )
{
        
FullToHalf ( p , CODE_PAGE_GBK ) ;
        
TrimString ( p ) ;
        
LowerString ( p ) ;
}

.

toplee_util.h

#ifndef __TOPLEE_UTIL_INCLUDE__
#define __TOPLEE_UTIL_INCLUDE__     1
 
#include < sys/stat.h >
#include < sys/types.h >
#include < sys/mman.h >
#include < string.h >
#include < stdlib.h >
#ifdef LINUX
#include < time.h >
#endif
 
#include " common.h "
 
//#include "euc2uni.h"
 
/*
typedef int    BOOL;
*/

#ifndef TRUE
#define TRUE     1
#define FALSE     0
#endif
 
#define ASCII                 0
#define HZ_HEAD                 1
#define HZ_TAIL                 2
 
#ifdef BIG_ENDDING
#define DEFAULT_UNICODE             0x3000
#define DEFAULT_GBK_CODE         0xA1A1
#define DEFAULT_BIG5_CODE         0xA140
#else
#define DEFAULT_UNICODE             0x0030
#define DEFAULT_GBK_CODE         0xA1A1
#define DEFAULT_BIG5_CODE         0x40A1
#endif
 
#define CODE_PAGE_GBK     936
#define CODE_PAGE_BIG5     950
#define CODE_PAGE_EUC     932
 
#define CHARSET_DEFAULT     0
#define CHARSET_UNICODE     1
#define CHARSET_UTF8         2
 
 
// 24066 = ( 0xFE - 0x81 + 1 ) * ( 0xFE - 0x40 + 1)
#define GBK_COUNT             24066
 
// 16999 = ( 0xF9 - 0xA1 + 1 ) * ( 0xFE - 0x40 + 1)
#define BIG5_COUNT             16999
 
typedef struct tagMMapFile2
{
    
BOOL bUsed ;
    
struct stat finfo ;
    
void * mm ;
} MMapFile ;
 
 
//int LoadEuc2UniTable(char *strFileName);
//void FreeEuc2UniTable(void);
 
int ToBase64 ( void * pSrc , int nSrcLen , char * strBase64 , int * nBase64Len ) ;
int FromBase64 ( char * strSrc , int nSrcLen , void * pDest , int * nDestLen ) ;
int htmlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
 
int MultiByteToWideChar ( unsigned int uCodePage unsigned long lFlags ,
    
char * pMultiByteStr , int nMultiByte ,
    
unsigned short * pWideChar , int nWideChar ) ;
int WideCharToMultiByte ( unsigned int uCodePage , unsigned long dwFlags ,
    
unsigned short * pWideCharStr , int nWideChar ,
    
char * pMultiByteStr , int nMultiByte ,
    
const char * lpDefaultChar , int * lpUseDefaultChar ) ;
 
#define ASCII                 0
#define HZ_HEAD                 1
#define HZ_TAIL                 2
 
void GBK2BIG5 ( char * lpString , int cbString ) ;
void BIG52GBK ( char * lpString , int cbString ) ;
 
void LowerString ( char * str ) ;
void TrimString ( char * str ) ;
void DecodeFormString ( char * str ) ;
void DecodeUTF ( char * str ) ;
 
#define DECODE_UNICODE     0
#define KEEP_UNICODE     1
 
#define DECODE_GBK         0
#define DECODE_BIG5         2
 
int DecodePureUTF ( unsigned char * str , int nFlag ) ;
 
 
#define LANG_GB             1         // used by httpstrtoint and FullToHalf
#define LANG_B5             2
#define LANG_ENG         3
#define LANG_UNKNOWN     4
 
int httpstrtoint ( char * strHttp ) ;
void lowerhttpprefix ( char * strUrl ) ;
 
 
#define FULL_COUNT     ( 21 + 26 * 2 + 5 )
 
BOOL FullToHalf ( char * str , int nLang ) ;
 
 
#define     URLDESCSEPCHAR         '|'
char * DescriptFromUrl ( char * strUrl ) ;
 
#define CODE_GBK2UNI     1
#define CODE_UNI2GBK     2
#define CODE_BIG52UNI     3
#define CODE_UNI2BIG5     4
#define CODE_GBK2BIG5     5
#define CODE_BIG52GBK     6
 
 
const char * mmapOneFile ( char * pFileName , MMapFile * mmapfile ) ;
void toplee_cleanup_mmap ( void * dummy ) ;
void InitMMResource ( void ) ;
const char * LoadOneCodeTable ( int nType , char * strFileName ) ;
 
int getcuryear () ;
 
char * mstrncpy ( char * strDest , char * strSrc , size_t nCount ) ;
int formurlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
 
int wmlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
int htmlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
 
#define MAX_INTERNAL_BUFF     16384
int gb2uni_encode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
int unicodeencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
 
char * stristr ( const char * big , const char * little ) ;
 
 
 
typedef struct auto_string
{
    
int     len , inc_len ;
    
char     * strval ;
} struAutoString ;
#define DEF_INC_LEN         ( 1024 )
#define DEF_INT_LEN         12
 
void init_auto_string ( struAutoString * astr , int inc_len ) ;
int add_auto_string ( struAutoString * astr , char * new_str ) ;
void free_auto_string ( struAutoString * astr ) ;
 
int unistrcmp ( const char * str1 , int str1len , const char * str2 , int str2len ) ;
 
void NormalizeName ( char * p ) ;
 
#endif // __TOPLEE_UTIL_INCLUDE__

.

php_toplee.h

/*
  +----------------------------------------------------------------------+
  | PHP Version 4                                                        |
  +----------------------------------------------------------------------+
  | Copyright (c) 1997-2002 The PHP Group                                |
  +----------------------------------------------------------------------+
  | This source file is subject to version 2.02 of the PHP license,      |
  | that is bundled with this package in the file LICENSE, and is        |
  | available at through the world-wide-web at                           |
  | http://www.php.net/license/2_02.txt.                                 |
  | If you did not receive a copy of the PHP license and are unable to   |
  | obtain it through the world-wide-web, please send a note to          |
  | [email protected] so we can mail you a copy immediately.               |
  +----------------------------------------------------------------------+
  | Author:                                                              |
  +----------------------------------------------------------------------+
 
 
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/

 
#ifndef PHP_GBK_H
#define PHP_GBK_H
 
extern zend_module_entry gbk_module_entry ;
#define phpext_gbk_ptr & gbk_module_entry
 
#ifdef PHP_WIN32
#define PHP_GBK_API __declspec ( dllexport )
#else
#define PHP_GBK_API
#endif
 
#ifdef ZTS
#include " TSRM.h "
#endif
 
PHP_MINIT_FUNCTION ( gbk ) ;
PHP_MSHUTDOWN_FUNCTION ( gbk ) ;
PHP_RINIT_FUNCTION ( gbk ) ;
PHP_RSHUTDOWN_FUNCTION ( gbk ) ;
PHP_MINFO_FUNCTION ( gbk ) ;
 
PHP_FUNCTION ( confirm_gbk_compiled ) ;    /* For testing, remove later. */
 
PHP_FUNCTION ( toplee_decode_utf ) ;
PHP_FUNCTION ( toplee_decode_utf_gb ) ;
PHP_FUNCTION ( toplee_decode_utf_big5 ) ;
PHP_FUNCTION ( toplee_encode_utf_gb ) ;
 
PHP_FUNCTION ( toplee_big52gbk ) ;
PHP_FUNCTION ( toplee_gbk2big5 ) ;
PHP_FUNCTION ( toplee_fan2jian ) ;
PHP_FUNCTION ( toplee_normalize_name ) ;
 
/*
      Declare any global variables you may need between the BEGIN
    and END macros here:     
 
ZEND_BEGIN_MODULE_GLOBALS(gbk)
    int   global_value;
    char *global_string;
ZEND_END_MODULE_GLOBALS(gbk)
*/

 
/* In every utility function you add that needs to use variables
   in php_gbk_globals, call TSRM_FETCH(); after declaring other
   variables used by that function, or better yet, pass in TSRMLS_CC
   after the last function argument and declare your utility function
   with TSRMLS_DC after the last declared argument.  Always refer to
   the globals in your function as GBK_G(variable).  You are
   encouraged to rename these macros something shorter, see
   examples in any other php module directory.
*/

 
#ifdef ZTS
#define GBK_G ( v ) TSRMG ( gbk_globals_id , zend_gbk_globals *, v )
#else
#define GBK_G ( v ) ( gbk_globals . v )
#endif
 
#endif     /* PHP_GBK_H */
 
 
/*
 * Local variables:
 * tab-width: 4
 * c-basic-offset: 4
 * indent-tabs-mode: t
 * End:
 */

.

至此,我们完成了所有C 代码的编写,本模块实现还需要用到几个码表文件,比如gb2b5.tab,uni2gb.tab之类的,这些码表文件我就不提供了,可以查一些文档如何生成,网上也有很多这样的tab码表文件下载。

接下来,我们就可以进行测试和编译了

回到php源码的根目录,运行命令
#./buildconf
#./configure –with-toplee=shared ……
#./make
#./make install

此时,就完成了模块往php里面的编译,由于加上了shared参数,toplee模块将编译后生成 toplee.so,可以在php.ini或者extensions.ini文件里面使用extension=toplee.so来调用,也可以在php中使用dl()函数动态调用,然后就可以在php里面使用之前我们定义好的几个函数接口了。

因Michael技术实力有限,本文有不正确之处请高手指正,也希望通过本文起到抛砖引玉之效果,让更多的php爱好者一起来分享个人的宝贵经验!

 

你可能感兴趣的:(c,PHP,function,扩展,语言,Zend)