Merge pull request #114 from Nu-SCPTheme/FTML-49

Substitute include blocks before preprocessing
Nu-SCPTheme · Jan 19, 2021 · b1e881c · b1e881c
2 parents 9b7923c + d6cf067
commit b1e881c
Show file tree

Hide file tree

Showing 128 changed files with 4,753 additions and 430 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -34,7 +34,9 @@ strum = "0.20"
 strum_macros = "0.20"
 tinyvec = "1"
 unicase = "2"
+void = "1"
 wikidot-normalize = "0.6"
 
 [dev-dependencies]
+maplit = "1"
 sloggers = "1"
diff --git a/README.md b/README.md
@@ -57,15 +57,27 @@ While the expanded form of the initialism is never explicitly stated, it is clea
 name similarity to HTML.
 
 ### Usage
-There are three exported functions, which correspond to each of the main steps in the wikitext process.
+There are a couple main exported functions, which correspond to each of the main steps in the wikitext process.
 
-First is `preprocess`, which will perform Wikidot's various minor text substitutions.
+First is `include`, which substitutes all `[[include]]` blocks for their replaced page content. This returns the substituted wikitext as a new string, as long as the names of all the pages that were used. It requires an object that implement `Includer`, which handles the process of retrieving pages and generating missing page messages.
 
-Second is `tokenize`, which takes the input string and returns a wrapper type. This can be `.into()`-ed into a `Vec<ExtractedToken<'t>>` should you want the token extractions it produced. This is used as the input for `parse`.
+Second is `preprocess`, which will perform Wikidot's various minor text substitutions.
+
+Third is `tokenize`, which takes the input string and returns a wrapper type. This can be `.into()`-ed into a `Vec<ExtractedToken<'t>>` should you want the token extractions it produced. This is used as the input for `parse`.
 
 Then, borrowing a slice of said tokens, `parse` consumes them and produces a `SyntaxTree` representing the full structure of the parsed wikitext.
 
+Finally, with the syntax tree you `render` it with whatever `Render` instance you need at the time. Most likely you want `HtmlRender`.
+
 ```rust
+fn include<'t, I, E>(
+    log: &slog::Logger,
+    input: &'t str,
+    includer: I,
+) -> Result<(String, Vec<PageRef<'t>>), E>
+where
+    I: Includer<'t, Error = E>;
+
 fn preprocess(
     log: &slog::Logger,
     text: &mut String,
@@ -96,8 +108,23 @@ store the results in a `struct`.
 // journalled messages are outputted to.
 let log = slog::Logger::root(/* drain */);
 
+// Get an `Includer`.
+//
+// See trait documentation for what this requires, but
+// essentially it is some abstract handle that gets the
+// contents of a page to be included.
+//
+// Two sample includers you could try are `NullIncluder`
+// and `DebugIncluder`.
+let includer = MyIncluderImpl::new();
+
+// Get our source text
+let mut input = "**some** test <<string?>>";
+
+// Substitute page inclusions
+let (mut text, included_pages) = ftml::include(&log, input, includer);
+
 // Perform preprocess substitions
-let mut text = str!("**some** test <<string?>>");
 ftml::preprocess(&log, &mut text);
 
 // Generate token from input text
@@ -121,13 +148,13 @@ let (tree, warnings) = result.into();
 See [`Serialization.md`](Serialization.md).
 
 ### Server
-If you wish to build the `ftml-server` subcrate, use the following:
+If you wish to build the `ftml-http` subcrate, use the following:
 Note that it was primarily designed for UNIX-like platforms, but with
 some minor changes could be modified to work on Windows.
 
 ```sh
-$ cargo build -p ftml-server --release
-$ cargo run -p ftml-server
+$ cargo build -p ftml-http --release
+$ cargo run -p ftml-http
 ```
 
 This will produce an HTTP server which a REST client can query to perform ftml operations.
@@ -142,12 +169,12 @@ Its usage message (produced by adding `-- --help` to the above `cargo run` invoc
 is reproduced below:
 
 ```
-ftml ftml-server v0.3.1 [8a42fccd]
+ftml ftml-http v0.3.1 [8a42fccd]
 Wikijump Team
 REST server to parse and render Wikidot text.
 
 USAGE:
-    ftml-server [FLAGS] [OPTIONS]
+    ftml-http [FLAGS] [OPTIONS]
 
 FLAGS:
     -h, --help         Prints help information.
@@ -169,6 +196,11 @@ $ curl \
     -X POST \
     -H 'Content-Type: application/json' \
     --compressed \
-    --data '{"text": "<your input here>"}' \
-    http://localhost:3865/parse
+    --data '
+{
+    "text": "<your input here>",
+    "callback-url": "http://localhost:8000/included-pages",
+    "missing-include-template": "No page {{ page }} {% if site %}on site {{ site }} {% endif %}exists!"
+}' \
+        http://localhost:3865/parse
 ```
diff --git a/ServerRoutes.md b/ServerRoutes.md
@@ -1,26 +1,173 @@
 [<< Return to the README](README.md)
 
-## ftml-server Routes
-
-Note that input text are really simple JSON objects in the following form:
-```json
-{
-    "text": "<your input string>"
-}
-```
+## ftml-http Routes
 
 The currently available API routes in the server are:
 
 | Method | Route | Input | Output | Description |
 |--------|-------|-------|--------|-------------|
 | Any | `/ping` | None | `String` | See if you're able to connect to the server. |
 | Any | `/version` | None | `String` | Outputs what version of ftml is being run. |
-| `POST` | `/preprocess` | Text | `String` | Runs the preprocessor on the given input string. |
-| `POST` | `/tokenize` | Text | `Vec<ExtractedToken>` | Runs the tokenizer on the input string and returns the extracted tokens. |
-| `POST` | `/tokenize/only` | Text | `Vec<ExtractedToken>` | Same as above, but the preprocessor is not run first. |
-| `POST` | `/parse` | Text | `ParseOutcome<SyntaxTree>` | Runs the parser on the input string and returns the abstract syntax tree. |
-| `POST` | `/parse/only` | Text | `ParseOutcome<SyntaxTree>` | Same as above, but the preprocessor is not run first. |
-| `POST` | `/render/html` | Text | `ParseOutcome<HtmlOutput>` | Performs the full rendering process, from preprocessing, tokenization, parsing, and then rendering. |
-| `POST` | `/render/html/only` | Text | `ParseOutcome<HtmlOutput>` | Same as above, but the preprocessor is not run first. |
-| `POST` | `/render/debug` | Text | `ParseOutcome<String>` | Performs rendering, as above, but uses `ftml::DebugRender`. |
-| `POST` | `/render/debug/only` | Text | `ParseOutcome<String>` | Same as above, but the preprocessor is not run first. |
+| `POST` | `/include` | `TextInput` | `Response<IncludeOutput>` | Substitutes all include blocks in the input string. |
+| `POST` | `/preprocess` | `TextInput` | `Response<PreprocessOutput>` | Runs the preprocessor on the given input string. |
+| `POST` | `/tokenize` | `TextInput` | `Response<TokenizeOutput>` | Runs the tokenizer on the input string and returns the extracted tokens. |
+| `POST` | `/parse` | `TextInput` | `Response<ParseOutput>` | Runs the parser on the input string and returns the abstract syntax tree. |
+| `POST` | `/render/html` | `TextInput` | `Response<HtmlRenderOutput>` | Performs the full rendering process, from inclusion, preprocessing, tokenization, parsing, and then rendering. |
+| `POST` | `/render/debug` | `TextInput` | `Response<DebugRenderOutput>` | Performs rendering, as above, but uses `ftml::render::DebugRender`. |
+
+Where the structures expected are the following:
+
+**TextInput`** is the object describing a text input, and the specifications necessary to perform include substitution.
+
+* `text` is the input wikitext to be processed.
+* `callback-url` is the URL that ftml-http will POST to with an `IncludeRequest`, to get the pages to be included.
+* `missing-include-template` is the template used to generate the "missing include" string if the `callback-url` does not return a result for a page. This allows jinja2-like syntax, backed by the crate [`tera`](https://crates.io/crates/tera). Three context variables are provided: `site` (nullable), `page`, `path`.
+
+```json
+{
+    "text": "**My** //wikitext//!",
+    "callback-url": "http://localhost:8000/includes",
+    "missing-include-template": "Page '{{ page }}' is missing!"
+}
+```
+
+**`IncludeRequest`** is the object requesting a foreign server return contents for each of these pages. It is just the field `includes` pointing to a list of `IncludeRef`s.
+
+**`IncludeRef`** is the object describing one particular page to be included. It has two fields, `page-ref`, which specifies the page being included, and a map of all the variables to substitute.
+
+Page references are composed of an optional site, then the page name. For instance `component:blah` would be on-site (`null`), and `:scp-wiki:main` would be off-site (site would be `scp-wiki`).
+
+```json
+{
+    "page-ref": {
+        "site": null,
+        "page": "page-name"
+    },
+    "variables": {
+        "each": "variable",
+        "here!": ""
+    }
+}
+```
+
+**`IncludeResponse`** is the object expected from the foreign server returning contents of the fetched pages. It is a list of `FetchedPage` objects.
+
+**`FetchedPage`** is the object describing one retrieved page. The first field, `page-ref`, describes which page it has content for. The second, `content`, has the data to be replaced, or null, if the page was not found.
+
+The number of returned pages should exactly match the order and count of the requested pages. Each index between the request and the response must share the same `PageRef` in the same order.
+
+```json
+{
+    "page-ref": {
+        "site": null,
+        "page": "theme:black-highlighter-theme"
+    },
+    "content": "[[module CSS]]\n...\n[[/module]]"
+}
+```
+
+**`Response`** is a wrapper to describe the state of an API call. It takes one of two forms:
+
+Success:
+```json
+{
+    "result": [ "data", "here" ]
+}
+```
+
+Error:
+```json
+{
+    "error": "Error message here"
+}
+```
+
+This is a generic type, so what is inside depends on what is being wrapped. Errors will always be strings.
+
+**`IncludeOutput`** is the object describing the result of a successful `/include` call.
+
+The `text` fields represents the replaced wikitext. The `pages-included` is a list of `PageRef` instances, describing the pages that were included in the text.
+
+```json
+{
+    "text": "Wikidot text following replacement"
+    "pages-included": [
+        {
+            "site": null,
+            "page": "some-page"
+        }
+    ],
+}
+```
+
+**`PreprocessOutput`** is the object describing the result of a successful `/preprocess` call.
+
+It is functionally the same as `IncludeOutput`, except also describes the preprocess step being applied after inclusion.
+
+```json
+{
+    "text": "My //wikitext// here!"
+    "pages-included": []
+}
+```
+
+**`ParseOutput`** is the object describing the result of a successful `/parse` call.
+
+It extends `PreprocessOutput`, with two added fields.
+
+* `syntax_tree` is the JSON representation of the abstract syntax tree (AST) created by the parser, a recursively nested series of elements which describe its structure.
+* `warnings` is a list of warning objects, describing parsing issues.
+
+```json
+{
+    "text": "My //wikitext// here!"
+    "pages-included": [],
+    "syntax-tree": {
+        "elements": [],
+        "styles": []
+    },
+    "warnings": []
+}
+```
+
+**`HtmlRenderOutput`** is the object describing the result of a successful `/render/html` call.
+
+It extends `ParseOutput`, with three new fields.
+
+* `html` is the generated HTML body, corresponding to the wikitext.
+* `style` is the full collected stylesheet, as specified through CSS in the wikitext.
+* `meta` is the list of HTML meta tags to add to the HTML document's `<head>`.
+
+```json
+{
+    "text": "My //wikitext// here!"
+    "pages-included": [],
+    "syntax-tree": {
+        "elements": [],
+        "styles": []
+    },
+    "warnings": [],
+    "html": "<strong>test</strong>",
+    "style": "a { display: none }",
+    "meta": []
+}
+```
+
+**`DebugRenderOutput`** is the object describing the result of a successful `/render/html` call.
+
+It extends `ParseOutput`, with one new fields.
+
+* `output` is the string output of the `DebugRender` implementation.
+
+```json
+{
+    "text": "My //wikitext// here!"
+    "pages-included": [],
+    "syntax-tree": {
+        "elements": [],
+        "styles": []
+    },
+    "warnings": [],
+    "output": "< Debug! >"
+}
+```
diff --git a/ftml-http/Cargo.toml b/ftml-http/Cargo.toml
@@ -17,12 +17,15 @@ clap = "2"
 ftml = { path = ".." }
 hostname = "0.3"
 lazy_static = "1"
+reqwest = { version = "0.11", features = ["blocking", "json"] }
 serde = { version = "1", features = ["derive"] }
 serde_json = "1"
 slog = "2.7"
 slog-bunyan = "2"
 sloggers = "1"
 str-macro = "0.1"
+tera = "1.6"
+thiserror = "1"
 tokio = { version = "0.2", features = ["macros"] }
 users = "0.11"
 warp = { version = "0.2", features = ["compression"] }

diff --git a/ftml-http/build.rs b/ftml-http/build.rs
@@ -3,9 +3,15 @@ extern crate built;
 use std::env;
 
 fn main() {
+    // Generate build information
     if let Ok(profile) = env::var("PROFILE") {
         println!("cargo:rustc-cfg=build={:?}", &profile);
     }
 
     built::write_built_file().expect("Failed to compile build information!");
+
+    // Set openssl library
+    if env::var("CARGO_CFG_UNIX").is_ok() {
+        println!("cargo:rustc-flags=-L /usr/lib/openssl-1.0");
+    }
 }