V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
zhengken
V2EX  ›  C++

为什么 C++ 中 PCRE 正则匹配只能捕获 19 个 group 出来

  •  
  •   zhengken · 2022-08-20 16:50:44 +08:00 · 1477 次点击
    这是一个创建于 827 天前的主题,其中的信息可能已经有所发展或是发生改变。

    程序问题

    我的正则 pattern 是 (a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)

    然后我需要匹配的字符串是 abcdefghijklmnopqrstuvwxyz

    程序输出是:

    i_0:0 i_1:26 i_2:0 i_3:1 i_4:1 i_5:2 i_6:2 i_7:3 i_8:3 i_9:4 i_10:4 i_11:5 i_12:5 i_13:6 i_14:6 i_15:7 i_16:7 i_17:8 i_18:8 i_19:9 i_20:9 i_21:10 i_22:10 i_23:11 i_24:11 i_25:12 i_26:12 i_27:13 i_28:13 i_29:14 i_30:14 i_31:15 i_32:15 i_33:16 i_34:16 i_35:17 i_36:17 i_37:18 i_38:18 i_39:19 i_40:0 i_41:0 i_42:0 i_43:0 i_44:0 i_45:0 i_46:0 i_47:0 i_48:0 i_49:0 i_50:0 i_51:0 i_52:0 i_53:0 i_54:0 i_55:0 i_56:0 i_57:0 i_58:0 i_59:0
    

    问题:为什么只匹配了 19 组出来?

    相关代码

    #include <pcre.h>
    #include <iostream>
    
    pcre* _rex;
    pcre_extra* _rexEx;
    
    void CompileRexStr(const std::string& rex) {
        const char* errorinfo;
        int errpos = 0;
        _rex = NULL;
        _rexEx = NULL;
    
        _rex = pcre_compile(rex.c_str(), PCRE_UTF8, &errorinfo, &errpos, NULL);
        _rexEx = pcre_study(_rex, PCRE_STUDY_JIT_COMPILE, &errorinfo);
    }
    
    int main(){
        std::string rex = "(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)";
        CompileRexStr(rex);
    
        std::string str = "abcdefghijklmnopqrstuvwxyz";
        int result[60] = {0};
        int cur = 0;
        int pos = pcre_exec(_rex, _rexEx, str.c_str(), str.length(), cur, 0, result, 60);
    
        for(int i=0;i < 60; i++) {
            std::cout << "i_" << i << ":" << result[i] << " ";
        }
    
        return 0;
    }
    
    1 条回复
    zhengken
        1
    zhengken  
    OP
       2022-08-20 19:05:01 +08:00
    https://stackoverflow.com/questions/73425423/why-pcre-regex-only-capture-19-groups/73425562#73425562

    StackOverFlow 上面有老哥回复我了,这个问题还真没注意到,result 的长度需要 (group 个数 + 1) * 3

    > The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down.
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   3633 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 28ms · UTC 04:27 · PVG 12:27 · LAX 20:27 · JFK 23:27
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.