Skip to content

Commit fa4610b

Browse files
authored
feat(import): multiple docx output files
1 parent 780d0a3 commit fa4610b

File tree

10 files changed

+2246
-2243
lines changed

10 files changed

+2246
-2243
lines changed

README.md

+5-2
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,12 @@ In the `URL(s)` field, give a list of page URLs to be imported (e.g. {https://ww
2020

2121
### Transformation file
2222

23-
A default html to Markdown is applied by you can / need to provide your own. Initially the import transformation file is fetched at http://localhost:3001/tools/importer/import.js (can be changed in the options). Create the file using the following template:
23+
A default html to Markdown is applied by you can / need to provide your own. Initially the import transformation file is fetched at http://localhost:3001/tools/importer/import.js (can be changed in the options). Create the file using the following templates:
2424

25-
https://gist.github.com/kptdobe/8a726387ecca80dde2081b17b3e913f7
25+
- if you need to create a single md/docx file out from each input page, you can use this template: https://gist.github.com/kptdobe/8a726387ecca80dde2081b17b3e913f7
26+
- if you need to crate multiple files md/docx out from each input page, you must use this template: https://gist.github.com/kptdobe/7bf50b69194884171b12874fc5c74588
27+
28+
Note that in the current state, the 2 templates are doing the exact same thing. But the second one uses the `transform` method and the return array contain more than one element. See guidelines for an example.
2629

2730
### Guidelines
2831

css/import/import.css

+8
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,11 @@
160160
.import #import-markdown-preview td {
161161
padding: 0 6px;
162162
}
163+
164+
.import #import-file-picker-container {
165+
width: 100%;
166+
}
167+
168+
.import #import-file-picker-container sp-picker {
169+
width: 100%;
170+
}

import.html

+1
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ <h3>Page preview</h3>
8080
<sp-tab label="Preview" value="import-preview"></sp-tab>
8181
<sp-tab label="Markdown" value="import-markdown"></sp-tab>
8282
<sp-tab label="HTML" value="import-html"></sp-tab>
83+
<div id="import-file-picker-container"></div>
8384
<sp-tab-panel value="import-preview">
8485
<sp-theme color="light" scale="medium">
8586
<div id="import-markdown-preview"></div>

importer-guidelines.md

+83-3
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,33 @@ Out of the box, the importer should be able to consume any page and output a Mar
1616

1717
Such a rule is very straight forward to implement: it is usually a set of DOM operations: create new, move or delete DOM elements.
1818

19-
In your `import.js` transformation file, you can implement 2 methods:
19+
In your `import.js` transformation file, you can implement 2 modes:
20+
- one input / one output
21+
- one input / multiple outputs
2022

21-
- `transformDOM: ({ document, url, html }) => {}`: implement here your transformation rules and return the DOM element that needs to be transformed to Markdown (default is `document.body` but usually a `main` element is more relevant).
23+
#### one input / one output
24+
25+
You must implement those 2 methods:
26+
27+
- `transformDOM: ({ document, url, html, params }) => {}`: implement here your transformation rules and return the DOM element that needs to be transformed to Markdown (default is `document.body` but usually a `main` element is more relevant).
2228
- `document`: the incoming DOM
2329
- `url`: the current URL being imported
2430
- `html`: the original HTML source (when loading the DOM as a document, some things are cleaned up, having the raw original HTML is sometimes useful)
25-
- `generateDocumentPath: ({ document, url }) => {}`: return a path that describes the document being transformed - allows you to define / filter the page name and the folder structure in which the document should be stored (default is the current url pathname with the trailing slash and the `.html`)
31+
- `params`: some params given by the importer. Only param so far is the `originalURL` which is the url of the page being imported (url is the one to the proxy)
32+
- `generateDocumentPath: ({ document, url, html, params }) => {}`: return a path that describes the document being transformed - allows you to define / filter the page name and the folder structure in which the document should be stored (default is the current url pathname with the trailing slash and the `.html`). Params are the same than above.
33+
34+
This is simpler version of the implementation. You can achieve the same by implementing the `transform` method as describe below.
35+
36+
#### one input / multiple outputsw
37+
38+
You must implement this method:
39+
- `transform: ({ document, url, html, params }) => {}`: implement here your transformation rules and return an array of pairs `{ element, path }` where element is a DOM DOM element that needs to be transformed to Markdown and path is the path to the exported file.
2640
- `document`: the incoming DOM
2741
- `url`: the current URL being imported
42+
- `html`: the original HTML source (when loading the DOM as a document, some things are cleaned up, having the raw original HTML is sometimes useful)
43+
- `params`: some params given by the importer. Only param so far is the `originalURL` which is the url of the page being imported (url is the one to the proxy)
44+
45+
The idea is simple: return a list of elements that will be converted to docx and stored at the path location.
2846

2947
## Rule examples
3048

@@ -241,6 +259,68 @@ Output is then:
241259
# Hello World
242260
![](https://www.sample.com/images/helloworld.png);
243261
```
262+
263+
### Mutiple output
264+
265+
If you need to transform one page into multiple Word documents (fragments, banners, author pages...), you can use the `transform` method.
266+
267+
Input DOM:
268+
269+
```html
270+
<html>
271+
<head></head>
272+
<body>
273+
<main>
274+
<h1>Hello World</h1>
275+
<div class="hero" style="background-image: url(https://www.sample.com/images/helloworld.png);"></div>
276+
</main>
277+
</body>
278+
</html>
279+
```
280+
281+
With the following `import.js`, you will get 2 md / docx documents:
282+
283+
```js
284+
{
285+
transform: ({ document, params }) => {
286+
const main = document.querySelector('main');
287+
// keep a reference to the image
288+
const image = main.querySelector('.hero')
289+
290+
//remove the image from the main, otherwise we'll get it in the 2 documents
291+
WebImporter.DOMUtils.remove(main, [
292+
'.hero',
293+
]);
294+
295+
return [{
296+
element: main,
297+
path: '/main',
298+
}, {
299+
element: image,
300+
path: '/image',
301+
}];
302+
},
303+
}
304+
```
305+
306+
Outputs are:
307+
308+
`/main.md`
309+
310+
```md
311+
# Hello World
312+
```
313+
314+
`/image.md`
315+
316+
```md
317+
![](https://www.sample.com/images/helloworld.png);
318+
```
319+
320+
Note:
321+
- be careful with the DOM elements you are working with. You always work on the same document thus you may destruct elements for one output which may have an inpact on the other outputs.
322+
- you may have as many outputs as you want (limit not tested yet).
323+
244324
### More samples
245325

246326
Sites in the https://github.com/hlxsites/ organisation have all be imported. There are many different implementation cover a lot of use cases.

js/import/import.ui.js

+73-31
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
/* global CodeMirror, showdown, html_beautify, ExcelJS */
1313
import { initOptionFields, attachOptionFieldsListeners } from '../shared/fields.js';
1414
import { getDirectoryHandle, saveFile } from '../shared/filesystem.js';
15+
import { asyncForEach } from '../shared/utils.js';
1516
import PollImporter from '../shared/pollimporter.js';
1617
import alert from '../shared/alert.js';
1718

@@ -38,6 +39,8 @@ const IS_BULK = document.querySelector('.import-bulk') !== null;
3839
const BULK_URLS_HEADING = document.querySelector('#import-result h2');
3940
const BULK_URLS_LIST = document.querySelector('#import-result ul');
4041

42+
const IMPORT_FILE_PICKER_CONTAINER = document.getElementById('import-file-picker-container');
43+
4144
const ui = {};
4245
const config = {};
4346
const importStatus = {
@@ -68,20 +71,48 @@ const setupUI = () => {
6871
ui.markdownPreview.innerHTML = ui.showdownConverter.makeHtml('Run an import to see some markdown.');
6972
};
7073

71-
const updateImporterUI = (out) => {
72-
const { md, html: outputHTML, originalURL } = out;
74+
const loadResult = ({ md, html: outputHTML }) => {
75+
ui.transformedEditor.setValue(html_beautify(outputHTML));
76+
ui.markdownEditor.setValue(md || '');
77+
78+
const mdPreview = ui.showdownConverter.makeHtml(md);
79+
ui.markdownPreview.innerHTML = mdPreview;
80+
81+
// remove existing classes and styles
82+
Array.from(ui.markdownPreview.querySelectorAll('[class], [style]')).forEach((t) => {
83+
t.removeAttribute('class');
84+
t.removeAttribute('style');
85+
});
86+
};
87+
88+
const updateImporterUI = (results, originalURL) => {
7389
if (!IS_BULK) {
74-
ui.transformedEditor.setValue(html_beautify(outputHTML));
75-
ui.markdownEditor.setValue(md || '');
90+
IMPORT_FILE_PICKER_CONTAINER.innerHTML = '';
91+
const picker = document.createElement('sp-picker');
92+
picker.setAttribute('size', 'm');
93+
94+
results.forEach((result, index) => {
95+
const { path } = result;
96+
97+
// add result to picker list
98+
const item = document.createElement('sp-menu-item');
99+
item.innerHTML = path;
100+
if (index === 0) {
101+
item.setAttribute('selected', true);
102+
picker.setAttribute('label', path);
103+
picker.setAttribute('value', path);
104+
}
105+
picker.appendChild(item);
106+
});
76107

77-
const mdPreview = ui.showdownConverter.makeHtml(md);
78-
ui.markdownPreview.innerHTML = mdPreview;
108+
IMPORT_FILE_PICKER_CONTAINER.append(picker);
79109

80-
// remove existing classes and styles
81-
Array.from(ui.markdownPreview.querySelectorAll('[class], [style]')).forEach((t) => {
82-
t.removeAttribute('class');
83-
t.removeAttribute('style');
110+
picker.addEventListener('change', (e) => {
111+
const r = results.filter((i) => i.path === e.target.value)[0];
112+
loadResult(r);
84113
});
114+
115+
loadResult(results[0]);
85116
} else {
86117
const li = document.createElement('li');
87118
const link = document.createElement('sp-link');
@@ -101,6 +132,12 @@ const clearResultPanel = () => {
101132
BULK_URLS_HEADING.innerText = 'Importing...';
102133
};
103134

135+
const clearImportStatus = () => {
136+
importStatus.imported = 0;
137+
importStatus.total = 0;
138+
importStatus.rows = [];
139+
};
140+
104141
const disableProcessButtons = () => {
105142
IMPORT_BUTTON.disabled = true;
106143
};
@@ -127,6 +164,23 @@ const getProxyURLSetup = (url, origin) => {
127164
};
128165
};
129166

167+
const postImportProcess = async (results, originalURL) => {
168+
await asyncForEach(results, async ({ docx, filename, path }) => {
169+
const data = {
170+
status: 'Success',
171+
url: originalURL,
172+
path,
173+
};
174+
175+
const includeDocx = !!docx;
176+
if (includeDocx) {
177+
await saveFile(dirHandle, filename, docx);
178+
data.docx = filename;
179+
}
180+
importStatus.rows.push(data);
181+
});
182+
};
183+
130184
const createImporter = () => {
131185
config.importer = new PollImporter({
132186
origin: config.origin,
@@ -140,25 +194,14 @@ const getContentFrame = () => document.querySelector(`${PARENT_SELECTOR} iframe`
140194
const attachListeners = () => {
141195
attachOptionFieldsListeners(config.fields, PARENT_SELECTOR);
142196

143-
config.importer.addListener(async (out) => {
197+
config.importer.addListener(async ({ results }) => {
144198
const frame = getContentFrame();
145-
out.originalURL = frame.dataset.originalURL;
146-
const includeDocx = !!out.docx;
199+
const { originalURL } = frame.dataset;
147200

148-
updateImporterUI(out, includeDocx);
201+
updateImporterUI(results, originalURL);
202+
postImportProcess(results, originalURL);
149203

150-
const data = {
151-
status: 'Success',
152-
url: out.originalURL,
153-
path: out.path,
154-
};
155-
if (includeDocx) {
156-
const { docx, filename } = out;
157-
await saveFile(dirHandle, filename, docx);
158-
data.docx = filename;
159-
}
160-
importStatus.rows.push(data);
161-
alert.success(`Import of page ${frame.dataset.originalURL} completed.`);
204+
alert.success(`Import of page ${originalURL} completed.`);
162205
});
163206

164207
config.importer.addErrorListener(({ url, error: err }) => {
@@ -168,6 +211,8 @@ const attachListeners = () => {
168211
});
169212

170213
IMPORT_BUTTON.addEventListener('click', (async () => {
214+
clearImportStatus();
215+
171216
if (IS_BULK) {
172217
clearResultPanel();
173218
if (config.fields['import-show-preview']) {
@@ -196,9 +241,6 @@ const attachListeners = () => {
196241
}
197242
}
198243

199-
importStatus.imported = 0;
200-
importStatus.rows = [];
201-
202244
const field = IS_BULK ? 'import-urls' : 'import-url';
203245
const urlsArray = config.fields[field].split('\n').reverse().filter((u) => u.trim() !== '');
204246
importStatus.total = urlsArray.length;
@@ -242,14 +284,14 @@ const attachListeners = () => {
242284
const includeDocx = !!dirHandle;
243285

244286
window.setTimeout(async () => {
245-
const { originalURL } = frame.dataset;
246-
const { replacedURL } = frame.dataset;
287+
const { originalURL, replacedURL } = frame.dataset;
247288
if (frame.contentDocument) {
248289
try {
249290
config.importer.setTransformationInput({
250291
url: replacedURL,
251292
document: frame.contentDocument,
252293
includeDocx,
294+
params: { originalURL },
253295
});
254296
await config.importer.transform();
255297
} catch (e) {

js/shared/pollimporter.js

+30-14
Original file line numberDiff line numberDiff line change
@@ -73,46 +73,62 @@ export default class PollImporter {
7373
}
7474

7575
async transform() {
76+
const {
77+
includeDocx, url, document, params,
78+
} = this.transformation;
79+
7680
try {
77-
let out;
78-
if (this.transformation.includeDocx) {
79-
out = await WebImporter.html2docx(
80-
this.transformation.url,
81-
this.transformation.document,
81+
let results;
82+
if (includeDocx) {
83+
const out = await WebImporter.html2docx(
84+
url,
85+
document,
8286
this.projectTransform,
87+
params,
8388
);
8489

85-
const { path } = out;
86-
out.filename = `${path}.docx`;
90+
results = Array.isArray(out) ? out : [out];
91+
results.forEach((result) => {
92+
const { path } = result;
93+
result.filename = `${path}.docx`;
94+
});
8795
} else {
88-
out = await WebImporter.html2md(
89-
this.transformation.url,
90-
this.transformation.document,
96+
const out = await WebImporter.html2md(
97+
url,
98+
document,
9199
this.projectTransform,
100+
params,
92101
);
102+
results = Array.isArray(out) ? out : [out];
93103
}
94104

95105
this.listeners.forEach((listener) => {
96106
listener({
97-
...out,
98-
url: this.transformation.url,
107+
results,
108+
url,
99109
});
100110
});
101111
} catch (err) {
102112
this.errorListeners.forEach((listener) => {
103113
listener({
104-
url: this.transformation.url,
114+
url,
105115
error: err,
106116
});
107117
});
108118
}
109119
}
110120

111-
setTransformationInput({ url, document, includeDocx = false }) {
121+
setTransformationInput({
122+
url,
123+
document,
124+
includeDocx = false,
125+
params,
126+
}) {
112127
this.transformation = {
113128
url,
114129
document,
115130
includeDocx,
131+
params,
116132
};
117133
}
118134

0 commit comments

Comments
 (0)