标签:python parsing grammar string-parsing pyparsing
我有一个pyparsing问题,我花了几天时间试图修复,没有运气.
这是相关的伪代码:
class Parser(object):
def __init__(self):
self.multilineCommands = []
self.grammar = <pyparsing grammar> # depends on self.multilineCommands
所以,我正在尝试通过一组特定的doctests.但有问题的测试在实例化后更新了self.multilineCommands.尽管正确设置属性没有问题,但self.grammar似乎对更改视而不见,并且未通过测试.
但是,如果我在__init __()中设置self.multilineCommands,那么测试全部通过.
如何让self.grammar与self.multilineCommands保持同步?
跟进
所以,这里的部分问题是我正在重构我没写过的代码.我在pyparsing方面的经验也仅限于我在这个项目上的工作.
Pyparsing作者Paul McGuire发布了一个有用的回复,但我无法让它发挥作用.这可能是我的错误,但更大的问题是我过度简化了上面写的伪代码.
所以,我要发布实际的代码.
警告!
你将要看到的是未经审查的.看到它可能会让你感到畏缩……甚至可能会哭.在原始模块中,这段代码只是整个“神级”的一部分.将以下内容拆分为Parser类只是步骤1(显然,第1步足以打破测试).
class Parser(object):
'''Container object pyparsing-related parsing.
'''
def __init__(self, *args, **kwargs):
r'''
>>> c = Cmd()
>>> c.multilineCommands = ['multiline']
>>> c.multilineCommands
['multiline']
>>> c.parser.multilineCommands
['multiline']
>>> c.case_insensitive = True
>>> c.case_insensitive
True
>>> c.parser.case_insensitive
True
>>> print (c.parser('').dump())
[]
>>> print (c.parser('/* empty command */').dump())
[]
>>> print (c.parser('plainword').dump())
['plainword', '']
- command: plainword
- statement: ['plainword', '']
- command: plainword
>>> print (c.parser('termbare;').dump())
['termbare', '', ';', '']
- command: termbare
- statement: ['termbare', '', ';']
- command: termbare
- terminator: ;
- terminator: ;
>>> print (c.parser('termbare; suffx').dump())
['termbare', '', ';', 'suffx']
- command: termbare
- statement: ['termbare', '', ';']
- command: termbare
- terminator: ;
- suffix: suffx
- terminator: ;
>>> print (c.parser('barecommand').dump())
['barecommand', '']
- command: barecommand
- statement: ['barecommand', '']
- command: barecommand
>>> print (c.parser('COMmand with args').dump())
['command', 'with args']
- args: with args
- command: command
- statement: ['command', 'with args']
- args: with args
- command: command
>>> print (c.parser('command with args and terminator; and suffix').dump())
['command', 'with args and terminator', ';', 'and suffix']
- args: with args and terminator
- command: command
- statement: ['command', 'with args and terminator', ';']
- args: with args and terminator
- command: command
- terminator: ;
- suffix: and suffix
- terminator: ;
>>> print (c.parser('simple | piped').dump())
['simple', '', '|', ' piped']
- command: simple
- pipeTo: piped
- statement: ['simple', '']
- command: simple
>>> print (c.parser('double-pipe || is not a pipe').dump())
['double', '-pipe || is not a pipe']
- args: -pipe || is not a pipe
- command: double
- statement: ['double', '-pipe || is not a pipe']
- args: -pipe || is not a pipe
- command: double
>>> print (c.parser('command with args, terminator;sufx | piped').dump())
['command', 'with args, terminator', ';', 'sufx', '|', ' piped']
- args: with args, terminator
- command: command
- pipeTo: piped
- statement: ['command', 'with args, terminator', ';']
- args: with args, terminator
- command: command
- terminator: ;
- suffix: sufx
- terminator: ;
>>> print (c.parser('output into > afile.txt').dump())
['output', 'into', '>', 'afile.txt']
- args: into
- command: output
- output: >
- outputTo: afile.txt
- statement: ['output', 'into']
- args: into
- command: output
>>> print (c.parser('output into;sufx | pipethrume plz > afile.txt').dump())
['output', 'into', ';', 'sufx', '|', ' pipethrume plz', '>', 'afile.txt']
- args: into
- command: output
- output: >
- outputTo: afile.txt
- pipeTo: pipethrume plz
- statement: ['output', 'into', ';']
- args: into
- command: output
- terminator: ;
- suffix: sufx
- terminator: ;
>>> print (c.parser('output to paste buffer >> ').dump())
['output', 'to paste buffer', '>>', '']
- args: to paste buffer
- command: output
- output: >>
- statement: ['output', 'to paste buffer']
- args: to paste buffer
- command: output
>>> print (c.parser('ignore the /* commented | > */ stuff;').dump())
['ignore', 'the /* commented | > */ stuff', ';', '']
- args: the /* commented | > */ stuff
- command: ignore
- statement: ['ignore', 'the /* commented | > */ stuff', ';']
- args: the /* commented | > */ stuff
- command: ignore
- terminator: ;
- terminator: ;
>>> print (c.parser('has > inside;').dump())
['has', '> inside', ';', '']
- args: > inside
- command: has
- statement: ['has', '> inside', ';']
- args: > inside
- command: has
- terminator: ;
- terminator: ;
>>> print (c.parser('multiline has > inside an unfinished command').dump())
['multiline', ' has > inside an unfinished command']
- multilineCommand: multiline
>>> print (c.parser('multiline has > inside;').dump())
['multiline', 'has > inside', ';', '']
- args: has > inside
- multilineCommand: multiline
- statement: ['multiline', 'has > inside', ';']
- args: has > inside
- multilineCommand: multiline
- terminator: ;
- terminator: ;
>>> print (c.parser('multiline command /* with comment in progress;').dump())
['multiline', ' command /* with comment in progress;']
- multilineCommand: multiline
>>> print (c.parser('multiline command /* with comment complete */ is done;').dump())
['multiline', 'command /* with comment complete */ is done', ';', '']
- args: command /* with comment complete */ is done
- multilineCommand: multiline
- statement: ['multiline', 'command /* with comment complete */ is done', ';']
- args: command /* with comment complete */ is done
- multilineCommand: multiline
- terminator: ;
- terminator: ;
>>> print (c.parser('multiline command ends\n\n').dump())
['multiline', 'command ends', '\n', '\n']
- args: command ends
- multilineCommand: multiline
- statement: ['multiline', 'command ends', '\n', '\n']
- args: command ends
- multilineCommand: multiline
- terminator: ['\n', '\n']
- terminator: ['\n', '\n']
>>> print (c.parser('multiline command "with term; ends" now\n\n').dump())
['multiline', 'command "with term; ends" now', '\n', '\n']
- args: command "with term; ends" now
- multilineCommand: multiline
- statement: ['multiline', 'command "with term; ends" now', '\n', '\n']
- args: command "with term; ends" now
- multilineCommand: multiline
- terminator: ['\n', '\n']
- terminator: ['\n', '\n']
>>> print (c.parser('what if "quoted strings /* seem to " start comments?').dump())
['what', 'if "quoted strings /* seem to " start comments?']
- args: if "quoted strings /* seem to " start comments?
- command: what
- statement: ['what', 'if "quoted strings /* seem to " start comments?']
- args: if "quoted strings /* seem to " start comments?
- command: what
'''
# SETTINGS
self._init_settings()
# GRAMMAR
self._init_grammars()
# PARSERS
# For easy reference to all contained parsers.
# Hacky, I know. But I'm trying to fix code
# elsewhere at the moment... :P)
self._parsers = set()
self._init_prefixParser()
self._init_terminatorParser()
self._init_saveParser()
self._init_inputParser()
self._init_outputParser()
# intermission! :D
# (update grammar(s) containing parsers)
self.afterElements = \
pyparsing.Optional(self.pipe + pyparsing.SkipTo(self.outputParser ^ self.stringEnd, ignore=self.doNotParse)('pipeTo')) + \
pyparsing.Optional(self.outputParser('output') + pyparsing.SkipTo(self.stringEnd, ignore=self.doNotParse).setParseAction(lambda x: x[0].strip())('outputTo'))
self._grammars.add('afterElements')
# end intermission
self._init_blankLineTerminationParser()
self._init_multilineParser()
self._init_singleLineParser()
self._init_optionParser()
# Put it all together:
self.mainParser = \
( self.prefixParser +
( self.stringEnd |
self.multilineParser |
self.singleLineParser |
self.blankLineTerminationParser |
self.multilineCommand + pyparsing.SkipTo(
self.stringEnd,
ignore=self.doNotParse)
)
)
self.mainParser.ignore(self.commentGrammars)
#self.mainParser.setDebug(True)
# And we've got mainParser.
#
# SPECIAL METHODS
#
def __call__(self, *args, **kwargs):
'''Call an instance for convenient parsing. Example:
p = Parser()
result = p('some stuff for p to parse')
This just calls `self.parseString()`, so it's safe to
override should you choose.
'''
return self.parseString(*args, **kwargs)
def __getattr__(self, attr):
# REMEMBER: This is only called when normal attribute lookup fails
raise AttributeError('Could not find {0!r} in class Parser'.format(attr))
@property
def multilineCommands(self):
return self._multilineCommands
@multilineCommands.setter
def multilineCommands(self, value):
value = list(value) if not isinstance(value, list) else value
self._multilineCommands = value
@multilineCommands.deleter
def multilineCommands(self):
del self._multilineCommands
self._multilineCommands = []
#
# PSEUDO_PRIVATE METHODS
#
def _init_settings(self, *args, **kwargs):
self._multilineCommands = []
self.abbrev = True # recognize abbreviated commands
self.blankLinesAllowed = False
self.case_insensitive = True
self.identchars = cmd.IDENTCHARS
self.legalChars = u'!#$%.:?@_' + pyparsing.alphanums + pyparsing.alphas8bit
self.noSpecialParse = {'ed','edit','exit','set'}
self.redirector = '>' # for sending output to file
self.reserved_words = []
self.shortcuts = {'?' : 'help' ,
'!' : 'shell',
'@' : 'load' ,
'@@': '_relative_load'}
self.terminators = [';']
self.keywords = [] + self.reserved_words
def _init_grammars(self, *args, **kwargs):
# Basic grammars
self.commentGrammars = (pyparsing.pythonStyleComment|pyparsing.cStyleComment).ignore(pyparsing.quotedString).suppress()
self.commentInProgress = '/*' + pyparsing.SkipTo( pyparsing.stringEnd ^ '*/' )
self.doNotParse = self.commentGrammars | self.commentInProgress | pyparsing.quotedString
self.fileName = pyparsing.Word(self.legalChars + '/\\')
self.inputFrom = self.fileName('inputFrom')
self.inputMark = pyparsing.Literal('<')
self.pipe = pyparsing.Keyword('|', identChars='|')
self.stringEnd = pyparsing.stringEnd ^ '\nEOF'
# Complex grammars
self.multilineCommand = pyparsing.Or([pyparsing.Keyword(c, caseless=self.case_insensitive) for c in self.multilineCommands ])('multilineCommand')
self.multilineCommand.setName('multilineCommand')
self.oneLineCommand = ( ~self.multilineCommand + pyparsing.Word(self.legalChars))('command')
# Hack-y convenience access to grammars
self._grammars = {
# Basic grammars
'commentGrammars',
'commentInProgress',
'doNotParse',
'fileName',
'inputFrom',
'inputMark',
'noSpecialParse',
'pipe',
'reserved_words',
'stringEnd',
# Complex grammars
'multilineCommand',
'oneLineCommand'
}
self.inputFrom.setParseAction(replace_with_file_contents)
self.inputMark.setParseAction(lambda x: '')
self.commentGrammars.addParseAction(lambda x: '')
if not self.blankLinesAllowed:
self.blankLineTerminator = (pyparsing.lineEnd * 2)('terminator')
if self.case_insensitive:
self.multilineCommand.setParseAction(lambda x: x[0].lower())
self.oneLineCommand.setParseAction(lambda x: x[0].lower())
def _init_all_parsers(self):
self._init_prefixParser()
self._init_terminatorParser()
self._init_saveParser()
self._init_inputParser()
self._init_outputParser()
# intermission! :D
# (update grammar(s) containing parsers)
self.afterElements = \
pyparsing.Optional(self.pipe + pyparsing.SkipTo(self.outputParser ^ self.stringEnd, ignore=self.doNotParse)('pipeTo')) + \
pyparsing.Optional(self.outputParser('output') + pyparsing.SkipTo(self.stringEnd, ignore=self.doNotParse).setParseAction(lambda x: x[0].strip())('outputTo'))
self._grammars.setName('afterElements')
self._grammars.add('afterElements')
# end intermission
# FIXME:
# For some reason it's necessary to set this again.
# (Otherwise pyparsing results include `outputTo`, but not `output`.)
self.outputParser('output')
self._init_blankLineTerminationParser()
self._init_multilineParser()
self._init_singleLineParser()
self._init_optionParser()
def _init_prefixParser(self):
self.prefixParser = pyparsing.Empty()
self.prefixParser.setName('prefixParser')
self._parsers.add('prefixParser')
def _init_terminatorParser(self):
self.terminatorParser = pyparsing.Or([ (hasattr(t, 'parseString') and t) or pyparsing.Literal(t) for t in self.terminators])('terminator')
self.terminatorParser.setName('terminatorParser')
self._parsers.add('terminatorParser')
def _init_saveParser(self):
self.saveparser = (pyparsing.Optional(pyparsing.Word(pyparsing.nums)|'*')('idx') +
pyparsing.Optional(pyparsing.Word(self.legalChars + '/\\'))('fname') +
pyparsing.stringEnd)
self.saveparser.setName('saveParser')
self._parsers.add('saveParser')
def _init_outputParser(self):
# outputParser = (pyparsing.Literal('>>') | (pyparsing.WordStart() + '>') | pyparsing.Regex('[^=]>'))('output')
self.outputParser = self.redirector * 2 | (pyparsing.WordStart() + self.redirector) | pyparsing.Regex('[^=]' + self.redirector)('output')
self.outputParser.setName('outputParser')
self._parsers.add('outputParser')
def _init_inputParser(self):
# a not-entirely-satisfactory way of distinguishing < as in "import from" from <
# as in "lesser than"
self.inputParser = self.inputMark + \
pyparsing.Optional(self.inputFrom) + \
pyparsing.Optional('>') + \
pyparsing.Optional(self.fileName) + \
(pyparsing.stringEnd | '|')
self.inputParser.ignore(self.commentInProgress)
self.inputParser.setName('inputParser')
self._parsers.add('inputParser')
def _init_blankLineTerminationParser(self):
self.blankLineTerminationParser = pyparsing.NoMatch
if not self.blankLinesAllowed:
self.blankLineTerminationParser = ((self.multilineCommand ^ self.oneLineCommand) + pyparsing.SkipTo(self.blankLineTerminator, ignore=self.doNotParse).setParseAction(lambda x: x[0].strip())('args') + self.blankLineTerminator )
# FIXME: Does this call *really* have to be reassigned into the variable???
self.blankLineTerminationParser = self.blankLineTerminationParser.setResultsName('statement')
self.blankLineTerminationParser.setName('blankLineTerminationParser')
self._parsers.add('blankLineTerminationParser')
def _init_multilineParser(self):
#self.multilineParser = self.multilineParser.setResultsName('multilineParser')
self.multilineParser = (
(
(self.multilineCommand('multilineCommand') ^ self.oneLineCommand)
+ pyparsing.SkipTo(self.terminatorParser, ignore=self.doNotParse).setParseAction(lambda x: x[0].strip())('args')
+ self.terminatorParser
)('statement')
+ pyparsing.SkipTo(
self.outputParser ^ self.pipe ^ self.stringEnd, ignore=self.doNotParse
).setParseAction(lambda x: x[0].strip())('suffix')
+ self.afterElements)
self.multilineParser.ignore(self.commentInProgress)
self.multilineParser.setName('multilineParser')
self._parsers.add('multilineParser')
def _init_singleLineParser(self):
#self.singleLineParser = self.singleLineParser.setResultsName('singleLineParser')
self.singleLineParser = ((self.oneLineCommand + pyparsing.SkipTo(self.terminatorParser ^ self.stringEnd ^ self.pipe ^ self.outputParser, ignore=self.doNotParse).setParseAction(lambda x:x[0].strip())('args'))('statement') +
pyparsing.Optional(self.terminatorParser) + self.afterElements)
self.singleLineParser.setName('singleLineParser')
self._parsers.add('singleLineParser')
def _init_optionParser(self):
# Different from the other parsers.
# This one is based on optparse.OptionParser,
# not pyparsing.
#
# It's included here to keep all parsing-related
# code under one roof.
# TODO: Why isn't this using cmd2's OptionParser?
self.optionParser = optparse.OptionParser()
self._parsers.add('optionParser')
def parseString(self, *args, **kwargs):
'''Parses a string using `self.mainParser`.'''
return self.mainParser.parseString(*args, **kwargs)
你有它.残酷的事实. ☺
编辑2012-11-12:我在这个问题的原始标题中错误地使用了术语“类属性”.这是一个愚蠢的错误,我为任何困惑道歉.它现在已经更正为“实例属性”.
解决方法:
将self.multilineCommands定义为Forward,如下所示:
self.multlineCommands = Forward()
然后像平常一样使用self.multilineCommands定义语法的其余部分.在测试中,使用<<“注入”self.multilineCommands“的不同表达式.操作符:
self.multilineCommands << (test expression 1)
然后,当您使用整体语法进行解析时,您的pyparsing测试表达式将用于self.multilineCommands所在的位置.
(注意:
请务必将右侧括在()中,以防止由于我不幸选择<<<<<对于这个操作符.在下一个pyparsing版本中,我将添加对<< =并弃用<<的支持对于此操作,这将解决大部分此问题.)
编辑
这是一个灵活的解析器,它具有一个只写属性,可以接受一个字符串列表作为允许的关键字.解析器本身是一个简单的函数调用解析器,它解析带有单个数字参数的函数,或者常量pi或π或e.
# coding=UTF-8
from pyparsing import *
class FlexParser(object):
def __init__(self, fixedPart):
self._dynamicExpr = Forward()
self.parser = self._dynamicExpr + fixedPart
def _set_keywords(self, kw_list):
# accept a list of words, convert it to a MatchFirst of
# Keywords defined using those words
self._dynamicExpr << (MatchFirst(map(Keyword, kw_list)))
keywords = property(fset=_set_keywords)
def parseString(self,s):
return self.parser.parseString(s)
E = CaselessKeyword("e").setParseAction(replaceWith(2.71828))
PI = (CaselessKeyword("pi") | "π").setParseAction(replaceWith(3.14159))
numericLiteral = PI | E | Regex(r'[+-]?\d+(\.\d*)?').setParseAction(lambda t:float(t[0]))
fp = FlexParser('(' + numericLiteral + ')')
fp.keywords = "sin cos tan asin acos atan sqrt".split()
print fp.parseString("sin(30)")
print fp.parseString("cos(π)")
print fp.parseString("sqrt(-1)")
现在只需将一个单词列表分配给keywords属性即可更改关键字. setter方法将列表转换为MatchFirst of Keywords.请注意,现在,解析“sin(30)”将引发异常:
fp.keywords = "foo bar baz boo".split()
print fp.parseString("foo(1000)")
print fp.parseString("baz(e)")
print fp.parseString("bar(1729)")
print fp.parseString("sin(30)") # raises a ParseException
标签:python,parsing,grammar,string-parsing,pyparsing 来源: https://codeday.me/bug/20190709/1412778.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。