You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-2
Original file line number
Diff line number
Diff line change
@@ -20,9 +20,12 @@ In the `URL(s)` field, give a list of page URLs to be imported (e.g. {https://ww
20
20
21
21
### Transformation file
22
22
23
-
A default html to Markdown is applied by you can / need to provide your own. Initially the import transformation file is fetched at http://localhost:3001/tools/importer/import.js (can be changed in the options). Create the file using the following template:
23
+
A default html to Markdown is applied by you can / need to provide your own. Initially the import transformation file is fetched at http://localhost:3001/tools/importer/import.js (can be changed in the options). Create the file using the following templates:
- if you need to create a single md/docx file out from each input page, you can use this template: https://gist.github.com/kptdobe/8a726387ecca80dde2081b17b3e913f7
26
+
- if you need to crate multiple files md/docx out from each input page, you must use this template: https://gist.github.com/kptdobe/7bf50b69194884171b12874fc5c74588
27
+
28
+
Note that in the current state, the 2 templates are doing the exact same thing. But the second one uses the `transform` method and the return array contain more than one element. See guidelines for an example.
Copy file name to clipboardExpand all lines: importer-guidelines.md
+83-3
Original file line number
Diff line number
Diff line change
@@ -16,15 +16,33 @@ Out of the box, the importer should be able to consume any page and output a Mar
16
16
17
17
Such a rule is very straight forward to implement: it is usually a set of DOM operations: create new, move or delete DOM elements.
18
18
19
-
In your `import.js` transformation file, you can implement 2 methods:
19
+
In your `import.js` transformation file, you can implement 2 modes:
20
+
- one input / one output
21
+
- one input / multiple outputs
20
22
21
-
-`transformDOM: ({ document, url, html }) => {}`: implement here your transformation rules and return the DOM element that needs to be transformed to Markdown (default is `document.body` but usually a `main` element is more relevant).
23
+
#### one input / one output
24
+
25
+
You must implement those 2 methods:
26
+
27
+
-`transformDOM: ({ document, url, html, params }) => {}`: implement here your transformation rules and return the DOM element that needs to be transformed to Markdown (default is `document.body` but usually a `main` element is more relevant).
22
28
-`document`: the incoming DOM
23
29
-`url`: the current URL being imported
24
30
-`html`: the original HTML source (when loading the DOM as a document, some things are cleaned up, having the raw original HTML is sometimes useful)
25
-
-`generateDocumentPath: ({ document, url }) => {}`: return a path that describes the document being transformed - allows you to define / filter the page name and the folder structure in which the document should be stored (default is the current url pathname with the trailing slash and the `.html`)
31
+
-`params`: some params given by the importer. Only param so far is the `originalURL` which is the url of the page being imported (url is the one to the proxy)
32
+
-`generateDocumentPath: ({ document, url, html, params }) => {}`: return a path that describes the document being transformed - allows you to define / filter the page name and the folder structure in which the document should be stored (default is the current url pathname with the trailing slash and the `.html`). Params are the same than above.
33
+
34
+
This is simpler version of the implementation. You can achieve the same by implementing the `transform` method as describe below.
35
+
36
+
#### one input / multiple outputsw
37
+
38
+
You must implement this method:
39
+
-`transform: ({ document, url, html, params }) => {}`: implement here your transformation rules and return an array of pairs `{ element, path }` where element is a DOM DOM element that needs to be transformed to Markdown and path is the path to the exported file.
26
40
-`document`: the incoming DOM
27
41
-`url`: the current URL being imported
42
+
-`html`: the original HTML source (when loading the DOM as a document, some things are cleaned up, having the raw original HTML is sometimes useful)
43
+
-`params`: some params given by the importer. Only param so far is the `originalURL` which is the url of the page being imported (url is the one to the proxy)
44
+
45
+
The idea is simple: return a list of elements that will be converted to docx and stored at the path location.
- be careful with the DOM elements you are working with. You always work on the same document thus you may destruct elements for one output which may have an inpact on the other outputs.
322
+
- you may have as many outputs as you want (limit not tested yet).
323
+
244
324
### More samples
245
325
246
326
Sites in the https://github.com/hlxsites/ organisation have all be imported. There are many different implementation cover a lot of use cases.
0 commit comments