[hawkmoth] Re: [PATCH 07/13] hawkmoth: split cursor walking into separate function

From: Jani Nikula <jani@xxxxxxxxxx>
To: Bruno Santos <brunomanuelsantos@xxxxxxxxxxxxxxxxxx>, hawkmoth@xxxxxxxxxxxxx
Date: Sat, 12 Jan 2019 11:14:58 +0200

On Tue, 08 Jan 2019, Bruno Santos <brunomanuelsantos@xxxxxxxxxxxxxxxxxx> wrote:

This patch's ultimate goal is to have a recursive parser to walk the cursor
graph. Right now, in spite of the name of the new function, this promise is
yet
to be accomplished. In fact, this patch does not change behaviour at all. It
does provide some ground work to achieve that goal though.

Nitpick, it does change the behaviour for unhandled token kinds. ;)

See also comments inline.

---
hawkmoth/hawkmoth.py | 258 +++++++++++++++++++++----------------------
1 file changed, 123 insertions(+), 135 deletions(-)

diff --git a/hawkmoth/hawkmoth.py b/hawkmoth/hawkmoth.py
index e9c84bf..6b45aab 100755
--- a/hawkmoth/hawkmoth.py
+++ b/hawkmoth/hawkmoth.py
@@ -91,9 +91,29 @@ def comment_extract(tu):

     return top_level_comments, comments

+def _result(comment, cursor=None, fmt=docstr.Type.TEXT, nest=0,
+            name=None, ttype=None, args=None, compat=None):
+
+    # FIXME: docstr.generate changes the number of lines in output. This
impacts
+    # the error reporting via meta['line']. Adjust meta to take this into
+    # account.
+
+    doc = docstr.generate(text=comment.spelling, fmt=fmt,
+                          name=name, ttype=ttype, args=args, compat=compat)
+
+    doc = docstr.nest(doc, nest)
+
+    meta = {'line': comment.extent.start.line}
+    if cursor:
+        meta['cursor.kind']        = cursor.kind,
+        meta['cursor.displayname'] = cursor.displayname,
+        meta['cursor.spelling']    = cursor.spelling
+
+    return [(doc, meta)]
+
# Return None for simple macros, a potentially empty list of arguments for
# function-like macros
-def get_macro_args(cursor):
+def _get_macro_args(cursor):
     if cursor.kind != CursorKind.MACRO_DEFINITION:
         return None

@@ -117,6 +137,103 @@ def get_macro_args(cursor):

     return None

+def _recursive_parse(comments, cursor, nest, **options):
+    compat = options.get('compat')
+    comment = comments[cursor.hash]
+    name = cursor.spelling
+    ttype = cursor.type.spelling
+
+    if cursor.kind == CursorKind.MACRO_DEFINITION:
+        # FIXME: check args against comment
+        args = _get_macro_args(cursor)
+        fmt = docstr.Type.MACRO if args is None else docstr.Type.MACRO_FUNC
+
+        return _result(comment, cursor=cursor, fmt=fmt,
+                       nest=nest, name=name, args=args, compat=compat)
+
+    if cursor.kind == CursorKind.VAR_DECL:

Perhaps just a personal preference, but I do prefer having the complete
if ladder if-elif-elif-...-else instead of repeated if statements.

+        fmt = docstr.Type.VAR
+
+        return _result(comment, cursor=cursor, fmt=fmt,
+                       nest=nest, name=name, ttype=ttype, compat=compat)
+
+    if cursor.kind == CursorKind.TYPEDEF_DECL:
+        # FIXME: function pointers typedefs.
+        fmt = docstr.Type.TYPE
+
+        return _result(comment, cursor=cursor, fmt=fmt,
+                       nest=nest, name=ttype, compat=compat)
+
+    if cursor.kind in [CursorKind.STRUCT_DECL, CursorKind.UNION_DECL]:
+        fmt = docstr.Type.TYPE
+        result = _result(comment, cursor=cursor, fmt=fmt,
+                         nest=nest, name=ttype, compat=compat)
+
+        nest += 1
+        for c in cursor.get_children():
+            if c.kind != CursorKind.FIELD_DECL:
+                # FIXME: Recurse for new structure or union.
+                pass
+
+            if c.hash not in comments:
+                continue
+
+            comment = comments[c.hash]
+            if not docstr.is_doc(comment.spelling):
+                continue
+            text = comment.spelling
+            fmt = docstr.Type.MEMBER
+            name = c.spelling
+            ttype = c.type.spelling
+
+            result.extend(_result(comment, cursor=cursor, fmt=fmt, nest=nest,
+                                  name=name, ttype=ttype, compat=compat))
+
+        return result
+
+    if cursor.kind == CursorKind.ENUM_DECL:
+        # FIXME: Handle anonymous enumerators.
+        comment = comments[cursor.hash]
+        fmt = docstr.Type.TYPE
+        name = cursor.type.spelling
+        compat = options.get('compat')
+        result = _result(comment, cursor=cursor, fmt=fmt, nest=nest,
+                         name=name, compat=compat)
+
+        nest += 1
+        for c in cursor.get_children():
+            if c.hash not in comments:
+                continue
+            comment = comments[c.hash]
+            if not docstr.is_doc(comment.spelling):
+                continue
+
+            text = comment.spelling
+            fmt = docstr.Type.ENUM_VAL
+            name = c.spelling
+
+            result.extend(_result(comment, cursor=cursor, fmt=fmt,
+                                  nest=nest, name=name, compat=compat))
+
+        return result
+
+    if cursor.kind == CursorKind.FUNCTION_DECL:
+        # FIXME: check args against comment
+        # FIXME: children may contain extra stuff if the return type is a
+        # typedef, for example
+        # FIXME: handle ... params
+        args = []
+        for c in cursor.get_children():
+            if c.kind == CursorKind.PARM_DECL:
+                args.append('{ttype} {arg}'.format(ttype=c.type.spelling,
+                                                   arg=c.spelling))
+
+        fmt = docstr.Type.FUNC
+        ttype = cursor.result_type.spelling
+
+        return _result(comment, cursor=cursor, fmt=fmt, nest=nest,
+                       name=name, ttype=ttype, args=args, compat=compat)

You add the return [] here in a later patch; I think that bit should be
added already here. Or perhaps just keep what we have now for unhandled
cursor kinds; that can be improved upon later.

Otherwise, LGTM.

BR,
Jani.

+
# return a list of (comment, metadata) tuples
# options - dictionary with directive options
def parse(filename, **options):
@@ -135,21 +252,14 @@ def parse(filename, **options):

     top_level_comments, comments = comment_extract(tu)

-    # FIXME: docstr.generate changes the number of lines in output. This
impacts
-    # the error reporting via meta['line']. Adjust meta to take this into
-    # account.
-
     result = []
     compat = options.get('compat')

     for comment in top_level_comments:
-        if not docstr.is_doc(comment.spelling):
-            continue
-
-        doc = docstr.generate(comment.spelling, compat=compat)
-        meta = {'line': comment.extent.start.line}
-        result.append((doc, meta))
+        if docstr.is_doc(comment.spelling):
+            result.extend(_result(comment, compat=compat))

+    # Bootstrap the individual parsers.
     for cursor in tu.cursor.get_children():
         if cursor.hash not in comments:
             continue
@@ -157,131 +267,9 @@ def parse(filename, **options):
         if not docstr.is_doc(comment.spelling):
             continue

-        text = comment.spelling
-        name = None
-        ttype = None
-        args = None
+        result.extend(_recursive_parse(comments, cursor, 0, **options))

-        if cursor.kind == CursorKind.MACRO_DEFINITION:
-            # FIXME: check args against doc_comment
-            args = get_macro_args(cursor)
-            name = cursor.spelling
-            fmt = docstr.Type.MACRO if args is None else
docstr.Type.MACRO_FUNC
-
-        elif cursor.kind == CursorKind.VAR_DECL:
-            fmt = docstr.Type.VAR
-            ttype = cursor.type.spelling
-            name = cursor.spelling
-
-        elif cursor.kind == CursorKind.TYPEDEF_DECL:
-            fmt = docstr.Type.TYPE
-            name = cursor.spelling
-
-        elif cursor.kind == CursorKind.STRUCT_DECL or \
-             cursor.kind == CursorKind.UNION_DECL or \
-             cursor.kind == CursorKind.ENUM_DECL:
-            fmt = docstr.Type.TYPE
-            name = cursor.type.spelling
-
-        elif cursor.kind == CursorKind.FUNCTION_DECL:
-            # FIXME: check args against doc_comment
-            # FIXME: children may contain extra stuff if the return type is a
-            # typedef, for example
-            # FIXME: handle ... params
-            args = []
-            for c in cursor.get_children():
-                if c.kind == CursorKind.PARM_DECL:
-                    args.append('{ttype} {arg}'.format(
-                        ttype=c.type.spelling,
-                        arg=c.spelling))
-
-            fmt = docstr.Type.FUNC
-            name = cursor.spelling
-            ttype = cursor.result_type.spelling
-
-        else:
-            fmt = docstr.Type.TEXT
-            text = 'warning: unhandled cursor {kind} {name}\n'.format(
-                kind=str(cursor.kind),
-                name=cursor.spelling)
-
-        doc = docstr.generate(text=text, fmt=fmt, name=name,
-                              ttype=ttype, args=args, compat=compat)
-
-        meta = {
-            'line':               comment.extent.start.line,
-            'cursor.kind':        cursor.kind,
-            'cursor.displayname': cursor.displayname,
-            'cursor.spelling':    cursor.spelling
-        }
-
-        result.append((doc, meta))
-
-        # FIXME: Needs some code deduplication with the above.
-        if cursor.kind == CursorKind.STRUCT_DECL or \
-           cursor.kind == CursorKind.UNION_DECL:
-
-            for c in cursor.get_children():
-                if c.kind != CursorKind.FIELD_DECL:
-                    # FIXME: handle nested unions
-                    pass
-                    # continue
-
-                if c.hash not in comments:
-                    continue
-                comment = comments[c.hash]
-                if not docstr.is_doc(comment.spelling):
-                    continue
-                text = comment.spelling
-
-                # FIXME: this is sooo ugly, handles unnamed vs. named structs
-                # in typedefs
-                # FIXME: do this recursively and smartly
-                # FIXME: back references to parent definition
-
-                fmt = docstr.Type.MEMBER
-                name = c.spelling
-                ttype = c.type.spelling
-
-                doc = docstr.generate(text=text, fmt=fmt, name=name,
-                                      ttype=ttype, args=args, compat=compat)
-                doc = docstr.nest(doc, 1)
-
-                meta = {
-                    'line':               comment.extent.start.line,
-                    'cursor.kind':        cursor.kind,
-                    'cursor.displayname': cursor.displayname,
-                    'cursor.spelling':    cursor.spelling
-                }
-
-                result.append((doc, meta))
-
-        elif cursor.kind == CursorKind.ENUM_DECL:
-            for c in cursor.get_children():
-                if c.hash not in comments:
-                    continue
-                comment = comments[c.hash]
-                if not docstr.is_doc(comment.spelling):
-                    continue
-
-                text = comment.spelling
-                fmt = docstr.Type.ENUM_VAL
-                name = c.spelling
-
-                doc = docstr.generate(text=text, fmt=fmt, name=name,
-                                      ttype=ttype, args=args, compat=compat)
-                doc = docstr.nest(doc, 1)
-
-                meta = {
-                    'line':               comment.extent.start.line,
-                    'cursor.kind':        cursor.kind,
-                    'cursor.displayname': cursor.displayname,
-                    'cursor.spelling':    cursor.spelling
-                }
-
-                result.append((doc, meta))
-
-    # sort to interleave top level comments back in place
+    # Sort all elements by order of appearance.
     result.sort(key=lambda r: r[1]['line'])

     return result
--
2.20.1

References:
- [hawkmoth] [PATCH 00/13] Parser overhaul
  - From: Bruno Santos
- [hawkmoth] [PATCH 07/13] hawkmoth: split cursor walking into separate function
  - From: Bruno Santos

[hawkmoth] Re: [PATCH 07/13] hawkmoth: split cursor walking into separate function

Other related posts: