Skip to content

TianLiangZhou/ffi-pinyin

Repository files navigation

ffi-pinyin

该项目是将中文汉字转换成拼音,使用rust构建动态链接库来给php调用。 该库主要是为了提高php中文转换拼音的性能问题而构建(特别是长文章)。

环境

需要php >= 7.4 以上的版本并且开启了FFI扩展。如果你需要自己编译库还需要装rust 工具链。

还需要设置php.ini 中的ffi.enableOn

Usage

该库提供四种基础用法:不带音标,带音标,首字母,多音字带音标。

<?php

include __DIR__ . '/../src/Pinyin.php';

$py = FastFFI\Pinyin\Pinyin::new();

echo "无音标: ", $py->plain("中国人...😄😄👩", false, false), "\n";
echo "音标: ", $py->tone("中国人", false, false), "\n";
echo "音标数字: ", $py->toneNum("中国人", false, false), "\n";
echo "音标数字结尾: ", $py->toneNumEnd("中国人", false, false), "\n";
echo "首字母: ", $py->letter("中国人", false, false), "\n";
echo "音标转换模式: ", $py->tone("中国人😄😄", true, false), "\n";
echo "音标多音词模式: ", $py->tone("中国人", false, true), "\n";

echo "音标未识别跳过: ", $py->plain("PHP永远滴神,rust永远的神", true, false, '-'), "\n";
echo "音标未识别不分隔: ", $py->plain("PHP永远滴神,rust永远的神", false, false, '-', true), "\n";


var_export($py->plainArray("PHP永远滴神,rust永远的神", false, false, true));

以上程序执行后的结果:

无音标: zhong guo ren . . . 😄 😄 👩
音标: zhōng guó rén
音标数字: zho1ng guo2 re2n
音标数字结尾: zhong1 guo2 ren2
首字母: z g r
音标转换模式: zhōng guó rén
音标多音词模式: zhōng:zhòng guó rén
音标未识别跳过: yong-yuan-di-shen-yong-yuan-de-shen
音标未识别不分隔: PHP-yong-yuan-di-shen-,rust-yong-yuan-de-shen

array (
  0 => 'PHP',
  1 => 'yong',
  2 => 'yuan',
  3 => 'di',
  4 => 'shen',
  5 => ',rust',
  6 => 'yong',
  7 => 'yuan',
  8 => 'de',
  9 => 'shen',
)

多音字是以:来连接的。

Benchmark

选用了比较流行的https://github.com/overtrue/pinyin 作为比较对象。

使用的测试命令:

[meshell@ffi-pinyin#] ./vendor/bin/phpbench run --report=default 

使同等数据循环100次测试结果:


\Bench

    benchFFI................................I0 [μ Mo]/r: 2.007 2.007 (ms) [μSD μRSD]/r: 0.000ms 0.00%
    benchNative.............................I0 [μ Mo]/r: 128.229 128.229 (ms) [μSD μRSD]/r: 0.000ms 0.00%
    benchNativeMemory.......................I0 [μ Mo]/r: 91.516 91.516 (ms) [μSD μRSD]/r: 0.000ms 0.00%
    benchNativeGenerator....................I0 [μ Mo]/r: 12,223.686 12,223.686 (ms) [μSD μRSD]/r: 0.000ms 0.00%

benchmark subject set revs iter mem_peak time_rev comp_z_value comp_deviation
Bench benchFFI 0 1 0 569,696b 2,007.000μs 0.00σ 0.00%
Bench benchNative 0 1 0 2,679,192b 128,229.000μs 0.00σ 0.00%
Bench benchNativeMemory 0 1 0 2,678,544b 91,516.000μs 0.00σ 0.00%
Bench benchNativeGenerator 0 1 0 632,680b 12,223,686.000μs 0.00σ 0.00%

相比之下与比较对象最快的也相差45倍的性能之差。

单次执行测试结果:


\Bench

    benchFFI................................I0 [μ Mo]/r: 1.599 1.599 (ms) [μSD μRSD]/r: 0.000ms 0.00%
    benchNative.............................I0 [μ Mo]/r: 19.783 19.783 (ms) [μSD μRSD]/r: 0.000ms 0.00%
    benchNativeMemory.......................I0 [μ Mo]/r: 21.160 21.160 (ms) [μSD μRSD]/r: 0.000ms 0.00%
    benchNativeGenerator....................I0 [μ Mo]/r: 125.524 125.524 (ms) [μSD μRSD]/r: 0.000ms 0.00%

benchmark subject set revs iter mem_peak time_rev comp_z_value comp_deviation
Bench benchFFI 0 1 0 569,696b 1,599.000μs 0.00σ 0.00%
Bench benchNative 0 1 0 2,679,192b 19,783.000μs 0.00σ 0.00%
Bench benchNativeMemory 0 1 0 2,678,544b 21,160.000μs 0.00σ 0.00%
Bench benchNativeGenerator 0 1 0 632,680b 125,524.000μs 0.00σ 0.00%

相比之下与比较对象最快的差不多20倍的性能之差。

在线转换

FAQ

  • centos上执行失败?

    确定是不是glibc版本过低。可以使用ldd lib/libffi_pinyin.so 来查看库信息。 如果出现/lib64/libc.so.6: version 'glibc_2.18' not found就说明你服务的glibc版本过低。 下载glibc编译升级,下载地址: wget http://mirrors.ustc.edu.cn/gnu/libc/glibc-2.18.tar.gz