[hawkmoth] Re: [PATCH v2 12/15] hawkmoth: fix documentation of anonymous types

From: Jani Nikula <jani@xxxxxxxxxx>
To: Bruno Santos <brunomanuelsantos@xxxxxxxxxxxxxxxxxx>, hawkmoth@xxxxxxxxxxxxx
Date: Wed, 23 Jan 2019 20:37:19 +0200

On Sat, 19 Jan 2019, Bruno Santos <brunomanuelsantos@xxxxxxxxxxxxxxxxxx> wrote:

Instead of arbitrary long unique identifiers issued by Clang, all
anonymous types are now marked '<anonymous>' in the documentation. Note
that there is no quirk-less solution for some cases though. This patch
tries to make fair assumptions as to how someone would actually document
the code.

Cases handled are for structures, unions and enumerators. A single
approach is used for all three for simplicity and consistency in spite
of the drawbacks. Namely, anonymous and uninstantiated enumerators will
not be handled correctly. However, it's not unreasonable to require that
the enumerator is named or instantiated.

I think I'll have to digest this a bit more, and consider the cases I
definitely want covered and how.

It should be noted that the proposed solution differs from Doxygen
(which remains as the de facto standard for C documentation). Doxygen
handles one particular edge case in a simplistic way that leads to
documentation not matching the intended output. In particular, in the
example:

/** Document foo without declaring a type. */
struct {
...
} foo;

Doxygen takes the documentation to belong to the structure, which is
unnamed and therefore not issued, somewhat mangling the output in the
process. With this patch, Hawkmoth will do the expected thing and
document 'foo' as a variable, nesting all the documented elements that
would otherwise have nowhere to go. Since Doxygen outputs garbage in
this scenario, this can still be considered compatible, only with added
features.

FWIW, in general, this is my rough list of priorities:

- Handle bog standard most common use cases.

- Implementation simplicity. If there's an edge case that requires a
fair amount of code or complexity, ignore it. In particular, if
there's a reasonable workaround.

- Principle of least surprise. What would the user mean to document?

- Doxygen/Javadoc/etc. compatibility.

For example, having both the tokenizer and AST passes is extra
complexity, but it's required to properly handle /** style comments and
to document macros.

As to this specific patch, there's something that feels a bit too quirky
now. I think one of the things you've overlooked is
cursor.is_anonymous(), which will make things quite a bit nicer. I'll
try to look at other things that we could get directly from clang,
particularly for avoiding the split/join on the cursor.type.spelling.

BR,
Jani.

---
hawkmoth/hawkmoth.py | 87 ++++++++++++++++++++++++++++++++------------
1 file changed, 63 insertions(+), 24 deletions(-)

diff --git a/hawkmoth/hawkmoth.py b/hawkmoth/hawkmoth.py
index 130ffca..52720d9 100755
--- a/hawkmoth/hawkmoth.py
+++ b/hawkmoth/hawkmoth.py
@@ -137,12 +137,15 @@ def _get_macro_args(cursor):
     return None

def _recursive_parse(comments, cursor, nest, compat):
-    comment = comments[cursor.hash]
     name = cursor.spelling
     ttype = cursor.type.spelling

     if cursor.kind == CursorKind.MACRO_DEFINITION:
+        if cursor.hash not in comments:
+            return []
+
         # FIXME: check args against comment
+        comment = comments[cursor.hash]
         args = _get_macro_args(cursor)
         fmt = docstr.Type.MACRO if args is None else docstr.Type.MACRO_FUNC

@@ -151,55 +154,91 @@ def _recursive_parse(comments, cursor, nest, compat):

     elif cursor.kind == CursorKind.VAR_DECL:
         fmt = docstr.Type.VAR
+        c = cursor

-        return _result(comment, cursor=cursor, fmt=fmt,
-                       nest=nest, name=name, ttype=ttype, compat=compat)
+        # If the type is anonymous, then the variable was declared together
with
+        # the type. We then take the documentation as if it was meant for the
+        # variable.
+        if ttype.find('(anonymous ') >= 0:
+            c = next(cursor.get_children())
+            ttype = ' '.join([ttype.split()[0], '<anonymous>'])
+
+        if c.hash not in comments:
+            # Not inheriting documentation from an anonymous type definition
nor
+            # documented on its own. Do nothing.
+            return []
+
+        comment = comments[c.hash]
+        text = comment.spelling
+
+        return _result(comment, cursor=c, fmt=fmt, nest=nest,
+                       name=name, ttype=ttype, compat=compat)

     elif cursor.kind == CursorKind.TYPEDEF_DECL:
+        if cursor.hash not in comments:
+            return []
+
         # FIXME: function pointers typedefs.
         fmt = docstr.Type.TYPE
+        comment = comments[cursor.hash]

         return _result(comment, cursor=cursor, fmt=fmt,
                        nest=nest, name=ttype, compat=compat)

     elif cursor.kind in [CursorKind.STRUCT_DECL, CursorKind.UNION_DECL,
-                         CursorKind.ENUM_DECL]:
+                       CursorKind.ENUM_DECL]:
+        if cursor.hash not in comments:
+            return []

-        # FIXME:
-        # Handle cases where variables are instantiated on type declaration,
-        # including anonymous cases. Idea is that if there is a variable
-        # instantiation, the documentation should be applied to the variable
if
-        # the structure is anonymous or to the type otherwise.
-        #
-        # Due to the new recursiveness of the parser, fixing this here,
_should_
-        # handle all cases (struct, union, enum).
-
-        # FIXME: Handle anonymous enumerators.
-
-        fmt = docstr.Type.TYPE
-        result = _result(comment, cursor=cursor, fmt=fmt,
-                         nest=nest, name=ttype, compat=compat)
+        # Document this cursor only if it's not anonymous.
+        if cursor.type.spelling.find('(anonymous at ') < 0:
+            fmt = docstr.Type.TYPE
+            comment = comments[cursor.hash]
+            result = _result(comment, cursor=cursor, fmt=fmt,
+                             nest=nest, name=ttype, compat=compat)
+        else:
+            result = []

         nest += 1
         for c in cursor.get_children():
-            if c.hash in comments:
-                result.extend(_recursive_parse(comments, c, nest, compat))
+            result.extend(_recursive_parse(comments, c, nest, compat))

         return result

     elif cursor.kind == CursorKind.ENUM_CONSTANT_DECL:
+        if cursor.hash not in comments:
+            return []
+
         fmt = docstr.Type.ENUM_VAL
+        comment = comments[cursor.hash]

         return _result(comment, cursor=cursor, fmt=fmt,
                        nest=nest, name=name, compat=compat)

     elif cursor.kind == CursorKind.FIELD_DECL:
+        c = cursor
         fmt = docstr.Type.MEMBER

-        return _result(comment, cursor=cursor, fmt=fmt,
-                       nest=nest, name=name, ttype=ttype, compat=compat)
+        # If the type is anonymous, then the variable was declared together
with
+        # the type. We then take the documentation as if it was meant for the
+        # variable.
+        if ttype.find('(anonymous ') >= 0:
+            c = next(cursor.get_children())
+            ttype = ' '.join([ttype.split()[0], '<anonymous>'])
+
+        if c.hash not in comments:
+            # Not inheriting documentation from an anonymous type definition
nor
+            # documented on its own. Do nothing.
+            return []
+
+        comment = comments[c.hash]
+        return _result(comment, cursor=c, fmt=fmt, nest=nest,
+                       name=name, ttype=ttype, compat=compat)

     elif cursor.kind == CursorKind.FUNCTION_DECL:
+        if cursor.hash not in comments:
+            return []
+
         # FIXME: check args against comment
         # FIXME: children may contain extra stuff if the return type is a
         # typedef, for example
@@ -211,6 +250,7 @@ def _recursive_parse(comments, cursor, nest, compat):
                                                    arg=c.spelling))

         fmt = docstr.Type.FUNC
+        comment = comments[cursor.hash]
         ttype = cursor.result_type.spelling

         return _result(comment, cursor=cursor, fmt=fmt, nest=nest,
@@ -262,8 +302,7 @@ def parse(filename, **options):

     # Bootstrap the individual parsers.
     for cursor in tu.cursor.get_children():
-        if cursor.hash in comments:
-            result.extend(_recursive_parse(comments, cursor, 0, compat))
+        result.extend(_recursive_parse(comments, cursor, 0, compat))

     # Sort all elements by order of appearance.
     result.sort(key=lambda r: r[1]['line'])
--
2.20.1

Follow-Ups:
- [hawkmoth] Re: [PATCH v2 12/15] hawkmoth: fix documentation of anonymous types
  - From: Bruno Santos

References:
- [hawkmoth] [PATCH v2 00/15] Parser overhaul
  - From: Bruno Santos
- [hawkmoth] [PATCH v2 12/15] hawkmoth: fix documentation of anonymous types
  - From: Bruno Santos

[hawkmoth] Re: [PATCH v2 12/15] hawkmoth: fix documentation of anonymous types

Other related posts: