Skip to content

Commit 2759b7e

Browse files
authored
Merge pull request #11 from adobe/guideline-typos
chore: fix guideline typos
2 parents 70bae90 + 18b7df5 commit 2759b7e

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

importer-guidelines.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
## General idea
44

5-
The general idea of the importer is pretty straight forward: it takes a page DOM and transforms it into a Markdown file which is then converted to a docx file). For now, let's consider that the Markdown file is a one-to-one equivalent to the docx file thus next references to Markdown or docx are equivalent "to the output of the transformation process".
5+
The general idea of the importer is pretty straight forward: it takes a page DOM and transforms it into a Markdown file which is then converted to a docx file. For now, let's consider that the Markdown file is a one-to-one equivalent to the docx file thus the next references to Markdown or docx are equivalent to "the output of the transformation process".
66

7-
As Markdown is a pretty simple format, the DOM transformation is really basic: a `h1` becomes a `Heading 1`, a paragraph or text in a `span` or `div` becomes a paragraph, an `a` stays a link, an `img` an image... All styling, layout or `div` nesting disappears in the Markdown output. Only special case is `table` which stays a `table` HTML element in the Markdown output and become a table in Word (which is the foundation for Blocks).
7+
As Markdown is a pretty simple format, the DOM transformation is really basic: a `h1` becomes a `Heading 1`, a paragraph or text in a `span` or `div` becomes a paragraph, an `a` stays a link, an `img` an image... All styling, layout or `div` nesting disappears in the Markdown output. The only special case is `table` which stays a `table` HTML element in the Markdown output and become a table in Word (which is the foundation for Blocks).
88

9-
The point is really to only extract the content from the original page. And the importer objectif is to help digesting a large amount of pages from an existing website. If you have only few pages on the website, it is easy and faster to manually copy/paste the content into Word documents. But in the case of large website with pages that are structurally similar (for example a blog site with thousands of blog articles), it would be fastidious to manullay copy/paste all pages.
9+
The point is really to only extract the content from the original page. And the importers primary objective is to help in digesting a large amount of pages from an existing website. If you have only few pages on the website, it is easier and faster to manually copy/paste the content into Word documents. But in the case of a large website with pages that are structurally similar (for example a blog site with thousands of blog articles), it would be fastidious to manually copy/paste all pages.
1010

11-
To summuarise: if a large set of pages look the same, this is when you want to use the importer and write a specific `import.js` transformation file.
11+
To summarize: if a large set of pages look the same, this is when you want to use the importer and write a specific `import.js` transformation file.
1212

1313
### `import.js` transformation file
1414

@@ -33,7 +33,7 @@ You must implement those 2 methods:
3333

3434
This is simpler version of the implementation. You can achieve the same by implementing the `transform` method as describe below.
3535

36-
#### one input / multiple outputsw
36+
#### one input / multiple outputs
3737

3838
You must implement this method:
3939
- `transform: ({ document, url, html, params }) => {}`: implement here your transformation rules and return an array of pairs `{ element, path }` where element is a DOM DOM element that needs to be transformed to Markdown and path is the path to the exported file.
@@ -104,7 +104,7 @@ export default {
104104

105105
Notes on those 2 different implementations:
106106
- you need to return a DOM element, otherwise the `document.body` is used.
107-
- you can either work on the full `body` element or focus on the `main` element. This is really up to you. Sometimes removing everything not necessary can be tidious.
107+
- you can either work on the full `body` element or focus on the `main` element. This is really up to you. Sometimes removing everything not necessary and can be tedious.
108108
- you do not need to transform the `div` into a `p` to get a text paragraph.
109109

110110
### Create a block
@@ -323,7 +323,7 @@ Note:
323323

324324
### More samples
325325

326-
Sites in the https://github.com/hlxsites/ organisation have all be imported. There are many different implementation cover a lot of use cases.
326+
Sites in the https://github.com/hlxsites/ organization have all be imported. There are many different implementations that cover a lot of use cases.
327327

328328
## Helpers
329329

@@ -340,9 +340,9 @@ While more documentation will be written, you can already find how to use them v
340340
## Security and memory
341341

342342
When using this importer tool, everything happens in the browser which means the import process must be able to fetch all the resources and in some cases execute the Javascript from the page being imported.
343-
When running `hlx import`, a proxy is started and all requests to the host are re-written clientside and go through the proxy. This allows to control the security settings and avoid CORS and CSP issues. The target page is then loaded in an iframe and the importer access to the DOM via this iframe.
343+
When running `hlx import`, a proxy is started and all requests to the host are re-written client-side and go through the proxy. This allows the importer to control the security settings and avoid CORS and CSP issues. The target page is then loaded in an iframe and the importer access to the DOM via this iframe.
344344

345-
That's a generic solution that might not work in some cases, some sites being pretty imaginative on how to prevent to be loaded in a iframe (like a Javascript redirect if the `window.location` is not their own host). If you face to such a problem, you can contact the Helix team and we can look at some workarounds and / or integrate more logic in the proxy to handle more of those cases.
345+
This is a generic solution that might not work in some cases, some sites are pretty imaginative in how to prevent being loaded in a iframe (like a Javascript redirect if the `window.location` is not their own host). If you face such a problem, you can contact the Helix team and we can look at some workarounds and potentially integrate more logic in the proxy to handle more of these cases.
346346

347347
One workaround to try could be to run the browser with all security settings off. But this is getting harder and harder to do.
348348

@@ -376,6 +376,6 @@ This simply transforms the image srcs to use the proxy: `https://www.sample.com/
376376
Disabling Javascript in the option is the best solution for speed and memory consumption. You can then import thousands of pages.
377377
With Javascript enabled, things become more complicated for the browser. It depends on the amount of code to load and execute, but in general, you can only import around one hundred pages before the browser crashes (too much memory consumed).
378378

379-
Having Javascript enabled is usually required to capture content which is dynamically loaded which is 100% of the cases with SPA (React, Angular...). In this case, you need to create small set of pages to import, run the import and reload the full browser window to flush the memory and run the next batch.
379+
Having Javascript enabled is usually required to capture content which is dynamically loaded which is 100% of the cases with SPA (React, Angular...). In this case, you need to create a small set of pages to import, run the import and reload the full browser window to flush the memory and run the next batch.
380380

381381
We are also working on a cli version of the importer (see https://github.com/adobe/helix-importer/issues/23) where memory can be handled properly.

0 commit comments

Comments
 (0)