ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

c – 来自语义动作的boost :: spirit访问位置迭代器

2019-09-27 08:05:40  阅读:223  来源: 互联网

标签:boost-spirit-qi c boost abstract-syntax-tree


可以说我有这样的代码(行号作为参考):

1:
2:function FuncName_1 {
3:    var Var_1 = 3;
4:    var  Var_2 = 4;
5:    ...

我想编写一个语法来解析这样的文本,将所有的标识符(函数和变量名)信息放入一个树(utree?).
每个节点应保留:line_num,column_num和符号值.例:

root: FuncName_1 (line:2,col:10)
  children[0]: Var_1 (line:3, col:8)
  children[1]: Var_1 (line:4, col:9)

我想将它放入树中,因为我计划遍历该树,并且对于每个节点,我必须知道“上下文”:(当前节点的所有父节点).

例如,在使用Var_1处理节点时,我必须知道这是函数FuncName_1的局部变量的名称(当前正在作为节点处理,但前面有一个级别)

我无法弄清楚一些事情

>这可以通过语义动作和utree在Spirit中完成吗?或者我应该使用变体<>树木?
>如何同时将三个信息(column,line,symbol_name)传递给节点?我知道我必须使用pos_iterator作为语法的迭代器类型,但如何在sematic动作中访问这些信息?

我是Boost的新手,所以我一遍又一遍地阅读Spirit文章,我试着谷歌我的问题,但我不知道怎么能把所有的部分放在一起找到解决方案.好像以前没有人像我这样的用例(或者我只是找不到它)
看起来像位置迭代器的唯一解决方案是具有解析错误处理的解决方案,但这不是我感兴趣的情况.
只解析我正在处理的代码的代码如下,但我不知道如何继续前进.

  #include <boost/spirit/include/qi.hpp>
  #include <boost/spirit/include/support_line_pos_iterator.hpp>

  namespace qi = boost::spirit::qi;
  typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;

  template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
  struct ParseGrammar: public qi::grammar<Iterator, Skipper>
  {
        ParseGrammar():ParseGrammar::base_type(SourceCode)
        {
           using namespace qi;
           KeywordFunction = lit("function");
           KeywordVar    = lit("var");
           SemiColon     = lit(';');

           Identifier = lexeme [alpha >> *(alnum | '_')];
           VarAssignemnt = KeywordVar >> Identifier >> char_('=') >> int_ >> SemiColon;
           SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignemnt >> '}';
        }

        qi::rule<Iterator, Skipper> SourceCode;
        qi::rule<Iterator > KeywordFunction;
        qi::rule<Iterator,  Skipper> VarAssignemnt;
        qi::rule<Iterator> KeywordVar;
        qi::rule<Iterator> SemiColon;
        qi::rule<Iterator > Identifier;
  };

  int main()
  {
     std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 = 4; }";

     pos_iterator_t first(content.begin()), iter = first, last(content.end());
     ParseGrammar<pos_iterator_t> resolver;    //  Our parser
     bool ok = phrase_parse(iter,
                            last,
                            resolver,
                            qi::space);

     std::cout << std::boolalpha;
     std::cout << "\nok : " << ok << std::endl;
     std::cout << "full   : " << (iter == last) << std::endl;
     if(ok && iter == last)
     {
        std::cout << "OK: Parsing fully succeeded\n\n";
     }
     else
     {
        int line   = get_line(iter);
        int column = get_column(first, iter);
        std::cout << "-------------------------\n";
        std::cout << "ERROR: Parsing failed or not complete\n";
        std::cout << "stopped at: " << line  << ":" << column << "\n";
        std::cout << "remaining: '" << std::string(iter, last) << "'\n";
        std::cout << "-------------------------\n";
     }
     return 0;
  }

解决方法:

这是一个有趣的练习,我最终将on_success [1]的工作演示放在一起,以注释AST节点.

让我们假设我们想要一个AST:

namespace ast
{
struct LocationInfo {
    unsigned line, column, length;
};

struct Identifier     : LocationInfo {
    std::string name;
};

struct VarAssignment  : LocationInfo {
    Identifier id;
    int value;
};

struct SourceCode     : LocationInfo {
    Identifier function;
    std::vector<VarAssignment> assignments;
};
}

我知道,’位置信息’对于SourceCode节点来说可能有点过分,但是你知道……无论如何,为了便于为这些节点分配属性而不需要语义操作或许多特制的构造函数:

#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier,    (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode,    (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))

那里.现在我们可以声明规则来公开这些属性:

qi::rule<Iterator, ast::SourceCode(),    Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()>         Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;

我们根本没有(基本上)修改语法:属性传播是“只是自动”[2]:

KeywordFunction = lit("function");
KeywordVar      = lit("var");
SemiColon       = lit(';');

Identifier      = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment   = KeywordVar >> Identifier >> '=' >> int_ >> SemiColon; 
SourceCode      = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';

魔术

我们如何获取附加到节点的源位置信息?

auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier,    set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode,    set_location_info);

现在,annotate只是一个可调用的懒惰版本,定义为:

template<typename It>
struct annotation_f {
    typedef void result_type;

    annotation_f(It first) : first(first) {}
    It const first;

    template<typename Val, typename First, typename Last>
    void operator()(Val& v, First f, Last l) const {
        do_annotate(v, f, l, first);
    }
  private:
    void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
        using std::distance;
        li.line   = get_line(f);
        li.column = get_column(first, f);
        li.length = distance(f, l);
    }
    static void do_annotate(...) { }
};

由于get_column的工作方式,仿函数是有状态的(因为它记得启动迭代器)[3].正如您所看到的,do_annotate只接受从LocationInfo派生的任何内容.

现在,布丁的证明:

std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 = 4; }";

pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first);    //  Our parser

ast::SourceCode program;
bool ok = phrase_parse(iter,
        last,
        resolver,
        qi::space,
        program);

std::cout << std::boolalpha;
std::cout << "ok  : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
    std::cout << "OK: Parsing fully succeeded\n\n";

    std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
    for (auto const& va : program.assignments)
        std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
    int line   = get_line(iter);
    int column = get_column(first, iter);
    std::cout << "-------------------------\n";
    std::cout << "ERROR: Parsing failed or not complete\n";
    std::cout << "stopped at: " << line  << ":" << column << "\n";
    std::cout << "remaining: '" << std::string(iter, last) << "'\n";
    std::cout << "-------------------------\n";
}

这打印:

ok  : true
full: true
OK: Parsing fully succeeded

Function name: FuncName_1 (see L1:1:56)
variable Var_1 assigned value 3 at L2:3:14
variable Var_2 assigned value 4 at L3:3:15

完整的演示程序

见它Live On Coliru

还显示:

>错误处理,例如:

Error: expecting "=" in line 3: 

var  Var_2 - 4; }
           ^---- here
ok  : false
full: false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 1:1
remaining: 'function FuncName_1 {
var Var_1 = 3;
var  Var_2 - 4; }'
-------------------------

> BOOST_SPIRIT_DEBUG宏
>方便地流式传输任何AST节点的LocationInfo部分,有点hacky方式,抱歉:)

//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;

typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;

namespace ast
{
    namespace manip { struct LocationInfoPrinter; }

    struct LocationInfo {
        unsigned line, column, length;
        manip::LocationInfoPrinter printLoc() const;
    };

    struct Identifier     : LocationInfo {
        std::string name;
    };

    struct VarAssignment  : LocationInfo {
        Identifier id;
        int value;
    };

    struct SourceCode     : LocationInfo {
        Identifier function;
        std::vector<VarAssignment> assignments;
    };

    ///////////////////////////////////////////////////////////////////////////
    // Completely unnecessary tweak to get a "poor man's" io manipulator going
    // so we can do `std::cout << x.printLoc()` on types of `x` deriving from
    // LocationInfo
    namespace manip {
        struct LocationInfoPrinter {
            LocationInfoPrinter(LocationInfo const& ref) : ref(ref) {}
            LocationInfo const& ref;
            friend std::ostream& operator<<(std::ostream& os, LocationInfoPrinter const& lip) {
                return os << lip.ref.line << ':' << lip.ref.column << ':' << lip.ref.length;
            }
        };
    }

    manip::LocationInfoPrinter LocationInfo::printLoc() const { return { *this }; }
    // feel free to disregard this hack
    ///////////////////////////////////////////////////////////////////////////
}

BOOST_FUSION_ADAPT_STRUCT(ast::Identifier,    (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode,    (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))

struct error_handler_f {
    typedef qi::error_handler_result result_type;
    template<typename T1, typename T2, typename T3, typename T4>
        qi::error_handler_result operator()(T1 b, T2 e, T3 where, T4 const& what) const {
            std::cerr << "Error: expecting " << what << " in line " << get_line(where) << ": \n" 
                << std::string(b,e) << "\n"
                << std::setw(std::distance(b, where)) << '^' << "---- here\n";
            return qi::fail;
        }
};

template<typename It>
struct annotation_f {
    typedef void result_type;

    annotation_f(It first) : first(first) {}
    It const first;

    template<typename Val, typename First, typename Last>
    void operator()(Val& v, First f, Last l) const {
        do_annotate(v, f, l, first);
    }
  private:
    void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
        using std::distance;
        li.line   = get_line(f);
        li.column = get_column(first, f);
        li.length = distance(f, l);
    }
    static void do_annotate(...) {}
};

template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, ast::SourceCode(), Skipper>
{
    ParseGrammar(Iterator first) : 
        ParseGrammar::base_type(SourceCode),
        annotate(first)
    {
        using namespace qi;
        KeywordFunction = lit("function");
        KeywordVar      = lit("var");
        SemiColon       = lit(';');

        Identifier      = as_string [ alpha >> *(alnum | char_("_")) ];
        VarAssignment   = KeywordVar > Identifier > '=' > int_ > SemiColon; // note: expectation points
        SourceCode      = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';

        on_error<fail>(VarAssignment, handler(_1, _2, _3, _4));
        on_error<fail>(SourceCode, handler(_1, _2, _3, _4));

        auto set_location_info = annotate(_val, _1, _3);
        on_success(Identifier,    set_location_info);
        on_success(VarAssignment, set_location_info);
        on_success(SourceCode,    set_location_info);

        BOOST_SPIRIT_DEBUG_NODES((KeywordFunction)(KeywordVar)(SemiColon)(Identifier)(VarAssignment)(SourceCode))
    }

    phx::function<error_handler_f> handler;
    phx::function<annotation_f<Iterator>> annotate;

    qi::rule<Iterator, ast::SourceCode(),    Skipper> SourceCode;
    qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
    qi::rule<Iterator, ast::Identifier()>             Identifier;
    // no skipper, no attributes:
    qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
};

int main()
{
    std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 - 4; }";

    pos_iterator_t first(content.begin()), iter = first, last(content.end());
    ParseGrammar<pos_iterator_t> resolver(first);    //  Our parser

    ast::SourceCode program;
    bool ok = phrase_parse(iter,
            last,
            resolver,
            qi::space,
            program);

    std::cout << std::boolalpha;
    std::cout << "ok  : " << ok << std::endl;
    std::cout << "full: " << (iter == last) << std::endl;
    if(ok && iter == last)
    {
        std::cout << "OK: Parsing fully succeeded\n\n";

        std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
        for (auto const& va : program.assignments)
            std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
    }
    else
    {
        int line   = get_line(iter);
        int column = get_column(first, iter);
        std::cout << "-------------------------\n";
        std::cout << "ERROR: Parsing failed or not complete\n";
        std::cout << "stopped at: " << line  << ":" << column << "\n";
        std::cout << "remaining: '" << std::string(iter, last) << "'\n";
        std::cout << "-------------------------\n";
    }
    return 0;
}

[1]遗憾地记录了,除了魔法样本之外

[2]好吧,我使用as_string来正确分配Identifier而不需要太多工作

[3]在性能方面可能有更聪明的方法,但就目前而言,让我们保持简单

标签:boost-spirit-qi,c,boost,abstract-syntax-tree
来源: https://codeday.me/bug/20190927/1823165.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有