Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Ed94 committed Aug 8, 2023
1 parent d2fc1d0 commit c7647ab
Show file tree
Hide file tree
Showing 8 changed files with 109 additions and 111 deletions.
12 changes: 9 additions & 3 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@ The library API is a composition of code element constructors.
These build up a code AST to then serialize with a file builder.

This code base attempts follow the [handmade philosophy](https://handmade.network/manifesto),
its not meant to be a black box metaprogramming utility, its meant for the user to extend for their project domain.
its not meant to be a black box metaprogramming utility, it should be easy to intergrate into a user's their project domain.

## Notes

The project has reached an *alpha* state, all the current functionality works for the test cases but it will most likely break in many other cases.
The [issues](https://github.com/Ed94/gencpp/issues) marked with v1.0 Feature indicate whats left before the library is considered feature complete.

A `natvis` and `natstepfilter` are provided in the scripts directory.

Expand All @@ -25,7 +24,7 @@ A metaprogram is built to generate files before the main program is built. We'll

`gen.cpp` \`s `main()` is defined as `gen_main()` which the user will have to define once for their program. There they will dictate everything that should be generated.

In order to keep the locality of this code within the same files the following pattern may be used:
In order to keep the locality of this code within the same files the following pattern may be used (although this pattern isn't required at all):

Within `program.cpp` :

Expand All @@ -41,6 +40,8 @@ u32 gen_main()
}
#endif

// "Stage" agnostic code.

#ifndef GEN_TIME
#include "program.gen.cpp"

Expand All @@ -56,6 +57,7 @@ Example using each construction interface:

### Upfront

Validation and construction through a functional interface.

```cpp
Code t_uw = def_type( name(uw) );
Expand All @@ -75,6 +77,8 @@ Code header;

### Parse

Validation through ast construction.

```cpp
Code header = parse_struct( code(
struct ArrayHeader
Expand All @@ -89,6 +93,8 @@ Code header = parse_struct( code(

### Untyped

No validation, just glorified text injection.

```cpp
Code header = code_str(
struct ArrayHeader
Expand Down
68 changes: 44 additions & 24 deletions docs/Parsing.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,36 @@
# Parsing

The library features a naive parser tailored for only what the library needs to construct the supported syntax of C++ into its AST.
This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept under 5000 loc.
This parser does not, and should not do the compiler's job. By only supporting this minimal set of features, the parser is kept (so far) under 5000 loc.

The parsing implementation supports the following for the user:

```cpp
CodeClass parse_class ( StrC class_def );
CodeEnum parse_enum ( StrC enum_def );
CodeBody parse_export_body ( StrC export_def );
CodeExtern parse_extern_link ( StrC exten_link_def);
CodeFriend parse_friend ( StrC friend_def );
CodeFn parse_function ( StrC fn_def );
CodeBody parse_global_body ( StrC body_def );
CodeNS parse_namespace ( StrC namespace_def );
CodeOperator parse_operator ( StrC operator_def );
CodeOpCast parse_operator_cast( StrC operator_def );
CodeStruct parse_struct ( StrC struct_def );
CodeTemplate parse_template ( StrC template_def );
CodeType parse_type ( StrC type_def );
CodeTypedef parse_typedef ( StrC typedef_def );
CodeUnion parse_union ( StrC union_def );
CodeUsing parse_using ( StrC using_def );
CodeVar parse_variable ( StrC var_def );
CodeClass parse_class ( StrC class_def );
CodeConstructor parse_constructor ( StrC constructor_def );
CodeDestructor parse_destructor ( StrC destructor_def );
CodeEnum parse_enum ( StrC enum_def );
CodeBody parse_export_body ( StrC export_def );
CodeExtern parse_extern_link ( StrC exten_link_def );
CodeFriend parse_friend ( StrC friend_def );
CodeFn parse_function ( StrC fn_def );
CodeBody parse_global_body ( StrC body_def );
CodeNS parse_namespace ( StrC namespace_def );
CodeOperator parse_operator ( StrC operator_def );
CodeOpCast parse_operator_cast( StrC operator_def );
CodeStruct parse_struct ( StrC struct_def );
CodeTemplate parse_template ( StrC template_def );
CodeType parse_type ( StrC type_def );
CodeTypedef parse_typedef ( StrC typedef_def );
CodeUnion parse_union ( StrC union_def );
CodeUsing parse_using ( StrC using_def );
CodeVar parse_variable ( StrC var_def );
```
***Parsing will aggregate any tokens within a function body or expression statement to an untyped Code AST.***
Everything is done in one pass for both the preprocessor directives and the rest of the language.
The parser performs no macro expansion as the scope of gencpp feature-set is to only support the preprocessor for the goal of having rudimentary awareness of preprocessor ***conditionals***, ***defines***, and ***includes***, and ***`pragmas`**.
The parser performs no macro expansion as the scope of gencpp feature-set is to only support the preprocessor for the goal of having rudimentary awareness of preprocessor ***conditionals***, ***defines***, and ***includes***, and ***pragmas**.
The keywords supported for the preprocessor are:
Expand All @@ -41,12 +43,30 @@ The keywords supported for the preprocessor are:
* undef
* pragma
Each directive `#` line is considered one preproecessor unit, and will be treated as one Preprocessor AST. *These ASTs will be considered members or entries of braced scope they reside within*.
All keywords except *include* are suppported as members of a scope for a class/struct, global, or namespace body.
Each directive `#` line is considered one preproecessor unit, and will be treated as one Preprocessor AST. *These ASTs will be considered members or entries of braced scope they reside within*.
If a directive is used with an unsupported keyword its will be processed as an untyped AST.
Any preprocessor definition abuse that changes the syntax of the core language is unsupported and will fail to parse if not kept within an execution scope (function body, or expression assignment).
The preprocessor lines are stored as members of their associated scope they are parsed within. ( Global, Namespace, Class/Struct )
Exceptions to the above rule (If its too hard to keep track of just follow the above notion):
Any preprocessor definition abuse that changes the syntax of the core language is unsupported and will fail to parse if not kept within an execution scope (function body, or expression assignment).
Exceptions:
* Typedefs allow of a macro exansion to be defined after the keyword; Ex: `typedef GEN_FILE_OPEN_PROC( file_open_proc );`
* function signatures are allowed for a preprocessed macro: `neverinline MACRO() { ... }`
* typedefs allow for a preprocessed macro: `typedef MACRO();`
*(See functions `parse_operator_function_or_variable` and `parse_typedef` )*
The lexing and parsing takes shortcuts from whats expected in the standard.
* Numeric literals are not checked for validity.
* The parse API treats any execution scope definitions with no validation and are turned into untyped Code ASTs.
* *This includes the assignment of variables.*
* Attributes ( `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU) )
* Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function
* Or in the usual spot for class, structs, (*right after the declaration keyword*)
* typedefs have attributes with the type (`parse_type`)
* As a general rule; if its not available from the upfront constructors, its not available in the parsing constructors.
* *Upfront constructors are not necessarily used in the parsing constructors, this is just a good metric to know what can be parsed.*
* Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting)
Empty lines used throughout the file are preserved for formatting purposes for ast serialization.
97 changes: 38 additions & 59 deletions docs/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,6 @@ Otherwise the library is free of any templates.

### *WHAT IS NOT PROVIDED*

* Execution statement validation : Execution expressions are defined using the untyped AST.
* Lambdas (This naturally means its unsupported)
* Non-trivial template validation support.
* RAII : This needs support for constructors/destructor parsing
* Haven't gotten around to yet (its in the github issues)

Keywords kept from "Modern C++":

* constexpr : Great to store compile-time constants.
Expand All @@ -55,13 +49,9 @@ Keywords kept from "Modern C++":
* import : ^^
* module : ^^

When it comes to expressions:

**There is no support for validating expressions.**
Its difficult to parse without enough benefits (At the metaprogramming level).

When it comes to templates:

**Only trivial template support is provided.**
The intention is for only simple, non-recursive substitution.
The parameters of the template are treated like regular parameter AST entries.
Expand All @@ -78,7 +68,7 @@ Use at your own mental peril.

### The Data & Interface

As mentioned in [Usage](#usage), the user is provided Code objects by calling the constructor's functions to generate them or find existing matches.
As mentioned in root readme, the user is provided Code objects by calling the constructor's functions to generate them or find existing matches.

The AST is managed by the library and provided the user via its interface.
However, the user may specifiy memory configuration.
Expand All @@ -89,39 +79,44 @@ Data layout of AST struct:
union {
struct
{
AST* Attributes; // Class, Enum, Function, Struct, Typedef, Union, Using, Variable
AST* Specs; // Function, Operator, Type symbol, Variable
AST* Attributes; // Class, Enum, Function, Struct, Typedef, Union, Using, Variable
AST* Specs; // Function, Operator, Type symbol, Variable
union {
AST* ParentType; // Class, Struct
AST* ReturnType; // Function, Operator
AST* UnderlyingType; // Enum, Typedef
AST* ValueType; // Parameter, Variable
AST* InitializerList; // Constructor, Destructor
AST* ParentType; // Class, Struct
AST* ReturnType; // Function, Operator
AST* UnderlyingType; // Enum, Typedef
AST* ValueType; // Parameter, Variable
};
AST* Params; // Function, Operator, Template
union {
AST* ArrExpr; // Type Symbol
AST* Body; // Class, Enum, Function, Namespace, Struct, Union
AST* Declaration; // Friend, Template
AST* Value; // Parameter, Variable
AST* BitfieldSize; // Varaiable (Class/Struct Data Member)
AST* Params; // Function, Operator, Template
};
union {
AST* ArrExpr; // Type Symbol
AST* Body; // Class, Constructr, Destructor, Enum, Function, Namespace, Struct, Union
AST* Declaration; // Friend, Template
AST* Value; // Parameter, Variable
};
};
StringCached Content; // Attributes, Comment, Execution, Include
StringCached Content; // Attributes, Comment, Execution, Include
SpecifierT ArrSpecs[AST::ArrSpecs_Cap]; // Specifiers
};
union {
AST* Prev;
AST* Front; // Used by CodeBody
AST* Last; // Used by CodeParam
AST* Front;
AST* Last;
};
union {
AST* Next;
AST* Back; // Used by CodeBody
AST* Back;
};
AST* Parent;
StringCached Name;
CodeT Type;
ModuleFlag ModuleFlags;
union {
b32 IsFunction; // Used by typedef to not serialize the name field.
OperatorT Op;
AccessSpec ParentAccess;
s32 NumEntries;
Expand All @@ -145,7 +140,7 @@ uw ArrSpecs_Cap =
- sizeof(StringCached)
- sizeof(CodeT)
- sizeof(ModuleFlag)
- sizeof(s32)
- sizeof(u32)
)
/ sizeof(SpecifierT) -1; // -1 for 4 extra bytes (Odd num of AST*)
```
Expand All @@ -155,7 +150,7 @@ uw ArrSpecs_Cap =
Data Notes:

* The allocator definitions used are exposed to the user incase they want to dictate memory usage
* You'll find the memory handling in `init`, `gen_string_allocator`, `get_cached_string`, `make_code`.
* You'll find the memory handling in `init`, `deinit`, `reset`, `gen_string_allocator`, `get_cached_string`, `make_code`.
* ASTs are wrapped for the user in a Code struct which is a wrapper for a AST* type.
* Both AST and Code have member symbols but their data layout is enforced to be POD types.
* This library treats memory failures as fatal.
Expand All @@ -168,15 +163,15 @@ Data Notes:
* Linked lists used children nodes on bodies, and parameters.
* Its intended to generate the AST in one go and serialize after. The constructors and serializer are designed to be a "one pass, front to back" setup.
* Allocations can be tuned by defining the folloiwng macros:
* `GEN_BUILDER_STR_BUFFER_RESERVE`
* `GEN_CODEPOOL_NUM_BLOCKS` : Number of blocks per code pool in the code allocator
* `GEN_GLOBAL_BUCKET_SIZE` : Size of each bucket area for the global allocator
* `GEN_LEX_ALLOCATOR_SIZE`
* `GEN_CODEPOOL_NUM_BLOCKS` : Number of blocks per code pool in the code allocator
* `GEN_SIZE_PER_STRING_ARENA` : Size per arena used with string caching.
* `GEN_MAX_COMMENT_LINE_LENGTH` : Longest length a comment can have per line.
* `GEN_MAX_NAME_LENGTH` : Max length of any identifier.
* `GEN_MAX_UNTYPED_STR_LENGTH` : Max content length for any untyped code.
* `GEN_SIZE_PER_STRING_ARENA` : Size per arena used with string caching.
* `GEN_TOKEN_FMT_TOKEN_MAP_MEM_SIZE` : token_fmt_va uses local_persit memory of this size for the hashtable.
* `GEN_LEX_ALLOCATOR_SIZE`
* `GEN_BUILDER_STR_BUFFER_RESERVE`

The following CodeTypes are used which the user may optionally use strong typing with if they enable: `GEN_ENFORCE_STRONG_CODE_TYPES`

Expand All @@ -187,7 +182,6 @@ The following CodeTypes are used which the user may optionally use strong typing
* CodeConstructor
* CodeDefine
* CodeDestructor
* CodePreprocessCond
* CodeEnum
* CodeExec
* CodeExtern
Expand All @@ -199,6 +193,8 @@ The following CodeTypes are used which the user may optionally use strong typing
* CodeOperator
* CodeOpCast
* CodeParam : Has support for `for-range` iterating across parameters.
* CodePreprocessCond
* CodePragma
* CodeSpecifiers : Has support for `for-range` iterating across specifiers.
* CodeStruct
* CodeTemplate
Expand All @@ -221,7 +217,7 @@ Retrieving a raw version of the ast can be done using the `raw()` function defin
### Upfront Construction

All component ASTs must be previously constructed, and provided on creation of the code AST.
The construction will fail and return Code::Invalid otherwise.
The construction will fail and return CodeInvalid otherwise.

Interface :``

Expand All @@ -231,6 +227,7 @@ Interface :``
* def_comment
* def_class
* def_constructor
* def_define
* def_destructor
* def_enum
* def_execution
Expand All @@ -245,6 +242,7 @@ Interface :``
* def_operator_cast
* def_param
* def_params
* def_preprocess_cond
* def_specifier
* def_specifiers
* def_struct
Expand Down Expand Up @@ -321,19 +319,6 @@ Interface :
* parse_using
* parse_variable
The lexing and parsing takes shortcuts from whats expected in the standard.
* Numeric literals are not check for validity.
* The parse API treats any execution scope definitions with no validation and are turned into untyped Code ASTs.
* *This includes the assignment of variables.*
* Attributes ( `[[]]` (standard), `__declspec` (Microsoft), or `__attribute__` (GNU) )
* Assumed to *come before specifiers* (`const`, `constexpr`, `extern`, `static`, etc) for a function
* Or in the usual spot for class, structs, (*right after the declaration keyword*)
* typedefs have attributes with the type (`parse_type`)
* As a general rule; if its not available from the upfront constructors, its not available in the parsing constructors.
* *Upfront constructors are not necessarily used in the parsing constructors, this is just a good metric to know what can be parsed.*
* Parsing attributes can be extended to support user defined macros by defining `GEN_DEFINE_ATTRIBUTE_TOKENS` (see `gen.hpp` for the formatting)
Usage:
```cpp
Expand All @@ -342,13 +327,6 @@ Code <name> = parse_<function name>( string with code );
Code <name> = def_<function name>( ..., parse_<function name>(
<string with code>
));
Code <name> = make_<function name>( ... )
{
<name>->add( parse_<function name>(
<string with code>
));
}
```

### Untyped constructions
Expand Down Expand Up @@ -408,12 +386,15 @@ The following are provided predefined by the library as they are commonly used:
* `access_public`
* `access_protected`
* `access_private`
* `attrib_api_export`
* `attrib_api_import`
* `module_global_fragment`
* `module_private_fragment`
* `fmt_newline`
* `pragma_once`
* `param_varaidc` (Used for varadic definitions)
* `preprocess_else`
* `preprocess_endif`
* `pragma_once`
* `spec_const`
* `spec_consteval`
* `spec_constexpr`
Expand All @@ -425,19 +406,17 @@ The following are provided predefined by the library as they are commonly used:
* `spec_internal_linkage` (internal macro)
* `spec_local_persist` (local_persist macro)
* `spec_mutable`
* `spec_neverinline`
* `spec_override`
* `spec_ptr`
* `spec_pure`
* `spec_ref`
* `spec_register`
* `spec_rvalue`
* `spec_static_member` (static)
* `spec_thread_local`
* `spec_virtual`
* `spec_volatile`
* `spec_type_signed`
* `spec_type_unsigned`
* `spec_type_short`
* `spec_type_long`
* `t_empty` (Used for varaidc macros)
* `t_auto`
* `t_void`
Expand Down
Empty file removed docs/Upfront.md
Empty file.
Loading

0 comments on commit c7647ab

Please sign in to comment.