用c语言写一个php扩展_有一个针对PHP的性别扩展

用c语言写一个php扩展

Unlike in our “mainstream” paid course about exploring PHP, I like to explore the weird and forgotten areas of the language.

与我们的“主流” 付费课程有关探索PHP不同 ,我喜欢探索该语言的怪异和被遗忘的领域。

Recently, I ventured into a section of the PHP manual which lists extensions that are used to help with Human Language and Character Encoding. I had never looked at them as a whole – while dealing with gettext, for example, I always kind of landed directly on it and ignored the rest. Well, of those others, there’s one that caught my eye – especially in this day and age given the various controversies – the Gender extension.

最近,我冒险进入PHP手册的一部分,列出了用于帮助人类语言和字符编码的扩展。 我从未将它们作为一个整体进行研究–例如,在处理gettext时 ,我总是直接直接进入它,而忽略了其余部分。 好吧,在其他人群中,有一个引起了我的关注-尤其是在当今时代,由于各种争议- 性别延伸 。

用c语言写一个php扩展_有一个针对PHP的性别扩展_第1张图片

This extension, in short, tries to guess the gender of first names. As its introduction says:

简而言之,此扩展名试图猜测名字的性别。 正如其介绍所言:

Gender PHP extension is a port of the gender.c program originally written by Joerg Michael. The main purpose is to find out the gender of firstnames. The current database contains >40000 firstnames from 54 countries.

Gender PHP扩展名是Joerg Michael最初编写的gender.c程序的一部分。 主要目的是找出名字的性别。 当前数据库包含来自54个国家/地区的40000多个名字。

This is interesting beyond the fact that the author is kinda called George Michael. In fact, there are many aspects of this extension that are quite baffling.

除了作者有点叫乔治迈克尔 ( George George)的事实外,这很有意思。 实际上,此扩展的许多方面都令人困惑。

While its last stable release was in 2015, the extension uses namespaces which clearly indicates that it’s not some kind of long lost remnant of the past – a relatively recent effort was made to make it conform to modern coding standards. Even the example code uses namespaces:

虽然该扩展的最后一个稳定版本发布于2015年,但该扩展使用的命名空间清楚地表明它并不是过去的长期遗留物-相对近期的努力使它符合现代编码标准。 甚至示例代码也使用名称空间:

get($name, $country);
$data = $gender->country($country);

switch($result) {
    case Gender::IS_FEMALE:
        printf("The name %s is female in %s\n", $name, $data['country']);
    break;

    case Gender::IS_MOSTLY_FEMALE:
        printf("The name %s is mostly female in %s\n", $name, $data['country']);
    break;

    case Gender::IS_MALE:
        printf("The name %s is male in %s\n", $name, $data['country']);
    break;

    case Gender::IS_MOSTLY_MALE:
        printf("The name %s is mostly male in %s\n", $name, $data['country']);
    break;

    case Gender::IS_UNISEX_NAME:
        printf("The name %s is unisex in %s\n", $name, $data['country']);
    break;

    case Gender::IS_A_COUPLE:
        printf("The name %s is both male and female in %s\n", $name, $data['country']);
    break;

    case Gender::NAME_NOT_FOUND:
        printf("The name %s was not found for %s\n", $name, $data['country']);
    break;

    case Gender::ERROR_IN_NAME:
        echo "There is an error in the given name!\n";
    break;

    default:
        echo "An error occurred!\n";
    break;

}

While we have this code here, let’s take a look at it.

当我们在这里有这段代码时,让我们看一下它。

Some really confusing constant names in there – how does a name contain an error? What’s the difference between unisex and couple names? Digging deeper, we see some more curious constants.

那里确实有些令人困惑的常量名称-名称如何包含错误? 男女通用和夫妻名字有什么区别? 深入研究,我们会发现更多好奇的常数。

For example, the class has short names of countries as constants (e.g. BRITAIN) which reference an array containing both an international code for the country (UK) and the full country name (GREAT BRITAIN).

例如,该类将国家/地区的简称作为常量(例如BRITAIN ),它引用一个包含该国家/地区的国际代码( UK )和完整国家/地区名称( GREAT BRITAIN )的数组。

$gender = new Gender\Gender;
var_dump($gender->country(Gender\Gender::BRITAIN));

array(2) {
  'country_short' =>
  string(2) "UK"
  'country' =>
  string(13) "Great Britain"
}

Only, UK isn’t the international code one would expect here – it’s GB. Why they chose this route rather than rely on an existing package of geonames or even just an accurate list of constants is anyone’s guess.

只是, UK不是人们期望在这里使用的国际代码,而是GB 。 为什么他们选择这条路线而不是依靠现有的地名包,甚至仅是常量的准确列表,这是每个人的猜测。

Once in use, the class uses the get method to return the gender of a name, provided we’ve given it the name and the country (optional – searches across all countries if omitted). But the country has to be the constant of the class (so you need to know it by heart or use their values when adding it to the UI because it won’t match any standard country code list) and it also returns an integer – another constant defined in the class, like so:

使用后,该类将使用get方法返回名称的性别,前提是我们已为其指定名称和国家(可选-如果省略,则在所有国家/地区搜索)。 但是国家/地区必须是该类的常量(因此,将其添加到UI中时,您需要内心知道它,或者使用它们的值 ,因为它与任何标准国家/地区代码列表都不匹配),并且它还会返回一个整数 -另一个类中定义的常量,如下所示:

const integer IS_FEMALE = 70 ;
const integer IS_MOSTLY_FEMALE = 102 ;
const integer IS_MALE = 77 ;
const integer IS_MOSTLY_MALE = 109 ;
const integer IS_UNISEX_NAME = 63 ;
const integer IS_A_COUPLE = 67 ;
const integer NAME_NOT_FOUND = 32 ;
const integer ERROR_IN_NAME = 69 ;

There’s just no rhyme or reason to any of these values.

这些价值观中没有任何押韵或理由。

Another method, isNick, checks if a name is a nickname or alias for another name. This makes sense in cases like Bob vs Robert or Dick vs Richard, but can it really scale past these predictable English values? The method is doubly confusing because it says it returns an array in the signature, whereas the description says it’s a boolean.

另一个方法isNick ,检查名称是昵称还是其他名称的别名。 在鲍勃(Bob)与罗伯特(Robert)或迪克(Dick)与理查德(Richard)之类的案件中,这是有道理的,但是它真的可以超越这些可预测的英语价值吗? 该方法令人困惑,因为它说它在签名中返回一个数组,而描述中说它是一个布尔值。

Wrong description of method return type

Finally, the similarNames method will return an array of names similar to the one provided, given the name and a country (if country is omitted, then it compares names across all countries). Does this include aliases? What’s the basis for similarity? Are Mario and Maria similar despite being opposite genders? Or is Mario just similar to Marek? Is Mario similar to Marek at all? There’s no information.

最后, similarNames方法将返回类似于提供一个名称的数组,给出的名字和一个国家(如果省略的国家,那么它在所有的国家比较名称)。 这是否包括别名? 相似性的依据是什么? 马里奥(Mario)和玛丽亚(Maria)性别不同,是否相似? 还是Mario与Marek相似? 马里奥类似于马立克呢 没有信息

I just had to find out for myself, so I installed it and tested the thing.

我只需要自己了解一下,因此我安装了它并对其进行了测试。

安装 (Installation)

I tested this on an isolated environment via Homestead Improved with PECL pre-installed.

我通过预装PECL的Homestead Improvement在隔离的环境中进行了测试。

sudo pecl install gender
echo "extension=gender.so" | sudo tee /etc/php/7.1/mods-available/gender.ini
sudo phpenmod gender
pear run-scripts pecl/gender

The last command will ask where to put a dictionary. I assume this is there for the purposes of extending it. I selected ., as in “current folder”. Let’s try it out by making a simple index.php file with the example content from above and testing that first.

最后一个命令将询问在哪里放置字典。 我认为这里是为了扩展它。 我选择了. ,如“当前文件夹”中所示。 让我们通过上面的示例内容创建一个简单的index.php文件并进行测试来进行尝试。

用c语言写一个php扩展_有一个针对PHP的性别扩展_第2张图片

Sure enough, it works. Okay, let’s change the country to $country = Gender::CROATIA;.

果然,它可行。 好的,让我们将国家更改为$country = Gender::CROATIA;

用c语言写一个php扩展_有一个针对PHP的性别扩展_第3张图片

Okay, sure, it’s not a common name, and not in that format, but it’s most similar to Milena, which is a female name in Croatia. Let’s see what’s similar to Milena via similar.php:

好吧,可以肯定的是,这不是通用名称,也不是这种格式,但它与Milena最相似,Milena是克罗地亚的女性名称。 让我们通过similar.php来了解与Milena的相似之处:

similarNames("Milena", Gender::CROATIA);

var_dump($similar);
用c语言写一个php扩展_有一个针对PHP的性别扩展_第4张图片

Not what I expected. Let’s see the original, Milene.

不是我所期望的。 让我们看看原始的Milene。

用c语言写一个php扩展_有一个针对PHP的性别扩展_第5张图片

So Milena is listed as a name similar to Milene, but Milene isn’t similar to Milena? Additionally, there seem to be some encoding issues on two of them? And the Croatian alphabet doesn’t even have the letter “y”, we definitely have neither of those similar names, regardless of what’s hiding under the question mark.

因此,Milena被列为与Milene类似的名称,但是Milene与Milena不相似吗? 此外,其中两个似乎还有一些编码问题? 而且克罗地亚字母甚至没有字母“ y”,无论问号下隐藏着什么,我们都绝对没有类似的名字。

Okay, let’s try something else. Let’s see if Bob is an alias of Robert in alias.php:

好吧,让我们尝试其他的事情。 让我们看看Bob是alias.php的Robert的别名:

isNick('Bob', 'Robert', Gender::USA));
用c语言写一个php扩展_有一个针对PHP的性别扩展_第6张图片

Indeed, that does seem to be true. Low hanging fruit, though. Let’s see a local one.

确实,这确实是事实。 不过,低挂的水果。 让我们来看一个本地的。

var_dump($gender->isNick('Tea', 'Dorotea', Gender::CROATIA));
用c语言写一个php扩展_有一个针对PHP的性别扩展_第7张图片

Oh come on.

哦,拜托

What about the Mario / Maria / Marek issue from the beginning? Let’s see similarities for them in order.

从一开始,Mario / Maria / Marek问题如何? 让我们按顺序查看它们的相似之处。

用c语言写一个php扩展_有一个针对PHP的性别扩展_第8张图片
用c语言写一个php扩展_有一个针对PHP的性别扩展_第9张图片
用c语言写一个php扩展_有一个针对PHP的性别扩展_第10张图片

Not good.

不好。

A couple more tries. To make testing easier, let’s change the $name and $country lines in index.php to:

再尝试几次。 为了简化测试,我们将index.php$name$country行更改为:

$name = $argv[1];
$country = constant(Gender::class.'::'.strtoupper($argv[2]));

Now we can test from the CLI without editing the file.

现在,我们可以从CLI进行测试,而无需编辑文件。

Final few tries. I have a female friend from Tunisia called Manel. I would assume her name would go for male in most of the world because it ends with a consonant. Let’s test hers and some other names.

最后几次尝试。 我有一个来自突尼斯的女性朋友,名叫Manel。 我想她的名字在世界上大部分地区都是男性的,因为它以辅音结尾。 让我们测试一下她的名字和其他一些名字。

用c语言写一个php扩展_有一个针对PHP的性别扩展_第11张图片

No Tunisia? Maybe it isn’t documented in the manual, let’s output all the defined constants and check.

没有突尼斯? 也许它没有在手册中记录,让我们输出所有定义的常量并检查。

// constants.php
getConstants());
用c语言写一个php扩展_有一个针对PHP的性别扩展_第12张图片

No, looks like those docs are spot on. At this point, I stop my playing around with this tool.

不,看起来像那些文档一样。 至此,我停止使用此工具了。



The whole situation is made even more interesting by the fact that this is a simple class, and definitely doesn’t need to be an extension. No one will call this often enough to care about the performance boost of an extension vs. a package, and a package can be installed by non-sudo users, and people can contribute to it more easily.

整个情况是事实,这是一个简单的类,绝对不必为扩展变得更加有趣。 没有人会经常这样称呼它,以至于不关心扩展对软件包的性能提升,并且非sudo用户可以安装软件包,并且人们可以更轻松地为它做贡献。

How this extension, which is both inaccurate and incomplete, and could be a simple class, ended up in the PHP manual is unclear, but it goes to show that there’s a lot of cleaning up to be done yet in the PHP core (I include the manual as the “core”) before we get PHP’s reputation up. In the 9 years (nine!) since development on this port started, not even all countries have been added to the internal list and yet someone decided this extension should be in the manual.

目前尚不清楚此扩展如何不准确且不完整,并且可能是一个简单的类,最终在PHP手册中得到的结果尚不清楚,但这表明在PHP核心中仍有许多工作要做(我包括手册作为“核心”),然后再提高PHP的声誉。 自从该端口开始开发以来的9年(九个!)中,甚至没有将所有国家都添加到内部列表中,但是有人决定将此扩展名包含在手册中。

Do you have more information about this extension? Do you see a point to it? Which other oddball extensions or built-in features did you find in the manual or in PHP in general?

您是否有关于此扩展程序的更多信息? 您看到这一点了吗? 您在手册或PHP中还找到了其他哪些奇怪的扩展或内置功能?

翻译自: https://www.sitepoint.com/theres-a-gender-extension-for-php/

用c语言写一个php扩展

你可能感兴趣的:(python,java,php,人工智能,linux)