Skip to content

Commit 228d8f9

Browse files
committed
Add len implementation to Doc pseudo list
1 parent 3cf3e6c commit 228d8f9

File tree

4 files changed

+213
-95
lines changed

4 files changed

+213
-95
lines changed

docs/api.html

Lines changed: 78 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@
1313
<main>
1414
<h2>API</h2>
1515
<p></p>
16-
<h3>Usage</h3>
16+
<h3>Principles </h3>
17+
<p></p>
18+
<h4>Pure functions</h4>
1719
<p>Most functions are pure and are exposed both as basic functions and as instance methods of a Doc object: in the function signatures found in the following sections, a <code>doc</code> first argument can read as <code>self</code>. For example both samples are equivalent:</p>
1820
<pre>&gt;&gt;&gt; #Function pattern
1921
&gt;&gt;&gt; from pdfsyntax import readfile, metadata
@@ -26,6 +28,45 @@ <h3>Usage</h3>
2628
&gt;&gt;&gt; doc = pdf.readfile("samples/simple_text_string.pdf")
2729
&gt;&gt;&gt; m = doc.metadata()
2830
</pre>
31+
<p>Every time a function is applied to a Doc object, the function returns a new object built as a shallow copy of the input.</p>
32+
<h4>Incremental updates</h4>
33+
<p>PDFSyntax tracks document incremental updates made possible by appending new or updated objects at the end of an original PDF file (and the matching XREF section). A revision, if greater than 0, indicates that incremental updates have been appended.By default, a newly opened document by PDFSyntax is ready to write modifications in the next revision.The <code>rewind</code> function rolls back to the previous revision. The <code>commit</code> function closes the current revision and open the next one.</p>
34+
<p>For example this file contains 2 revisions (0 and 1) and PDFSyntax has initialized the doc object to open revision 2:</p>
35+
<pre>&gt;&gt;&gt; import pdfsyntax as pdf
36+
&gt;&gt;&gt; doc = pdf.readfile("samples/add_text_annotation.pdf")
37+
&gt;&gt;&gt; doc
38+
&lt;PDF Doc in revision 2 with 0 modified object(s)&gt;
39+
</pre>
40+
<p>The <code>rewind</code> function rolls back to the previous revision. Let's rewind to revision 0:</p>
41+
<pre>&gt;&gt;&gt; doc = pdf.rewind(doc) # to revision 1
42+
&gt;&gt;&gt; doc = pdf.rewind(doc) # to revision 0
43+
&gt;&gt;&gt; doc
44+
&lt;PDF Doc in revision 0 with 7 modified object(s)&gt;
45+
</pre>
46+
<p>After one or several modifications, the <code>commit</code> function closes the current revision and opens the next one:</p>
47+
<pre>&gt;&gt;&gt; doc = pdf.rotate(doc)
48+
&gt;&gt;&gt; doc = pdf.commit(doc)
49+
&gt;&gt;&gt; doc
50+
&lt;PDF Doc in revision 1 with 0 modified object(s)&gt;
51+
</pre>
52+
<p></p>
53+
<h4>Squashing</h4>
54+
<p>By default incremental updates stack up but it is possible to <code>squash</code> a document in order to combine all revisions into a single one. in this example the last document is equivalent to the first one (same appearance), but it is only made of one revision. As this revision is like a document started from scratch, its revision is 0 and all its 7 internal objects look like new ones:</p>
55+
<pre>&gt;&gt;&gt; doc90 = pdf.rotate(doc)
56+
&gt;&gt;&gt; doc90
57+
&lt;PDF Doc in revision 1 with 1 modified object(s)&gt;
58+
&gt;&gt;&gt; docs = pdf.squash(doc90)
59+
&gt;&gt;&gt; docs
60+
&lt;PDF Doc in revision 0 with 7 modified object(s)&gt;
61+
</pre>
62+
<p></p>
63+
<h4>File I/O</h4>
64+
<p>The <code>writefile</code> function dumps the document with all the incremental updates appended at the end of the original data. </p>
65+
<pre>&gt;&gt;&gt; from pdfsyntax import readfile, writefile
66+
&gt;&gt;&gt; doc = readfile("samples/simple_text_string.pdf")
67+
&gt;&gt;&gt; doc90 = pdf.rotate(doc)
68+
&gt;&gt;&gt; writefile(doc90, "rotated_doc.pdf")
69+
</pre>
2970
<p></p>
3071
<h3>File information</h3>
3172
<p><code>structure</code> and <code>metadata</code> are functions showing general information about the document.</p>
@@ -38,7 +79,41 @@ <h3>File information</h3>
3879
{'Title': None, 'Author': None, 'Subject': None, 'Keywords': None, 'Creator': None, 'Producer': None, 'CreationDate': None, 'ModDate': None}
3980
</pre>
4081
<p></p>
41-
<h3>Low-level access to object tree</h3>
82+
<h3>High-level transformation</h3>
83+
<p><code>rotate</code> turns pages relatively to their current position by multiples of 90 degrees clockwise. NB: It takes into account the inherited attributes from the page hierarchy.</p>
84+
<pre>&gt;&gt;&gt; #Default rotation applies 90 degrees to all pages
85+
&gt;&gt;&gt; doc90 = rotate(doc)
86+
87+
&gt;&gt;&gt; #Apply 180 degrees to first two page
88+
&gt;&gt;&gt; doc180 = doc.rotate(180, [0, 1])
89+
</pre>
90+
<p><em>WARNING</em>: To REMOVE something means it still exists but it is hidden.</p>
91+
<p><code>remove_pages</code> cuts a set of pages from the document as incremental update: they are not permanently deleted because it is still possible to revert to the previous revision.</p>
92+
<pre>&gt;&gt;&gt; #Remove first 3 pages of a 6-page doc
93+
&gt;&gt;&gt; second_half_doc = pdf.remove_pages(doc, {0, 1, 2})
94+
</pre>
95+
<p><code>keep_pages</code> does the opposite:</p>
96+
<pre>&gt;&gt;&gt; #Keep last 3 pages of a 6-page doc
97+
&gt;&gt;&gt; second_half_doc = pdf.keep_pages(doc, {3, 4, 5})
98+
</pre>
99+
<p><code>concat</code> merges documents:</p>
100+
<pre>&gt;&gt;&gt; #Concatenate doc2 pages after doc1 pages into a new doc
101+
&gt;&gt;&gt; doc = pdf.concat(doc1, doc2)
102+
</pre>
103+
<p>A Doc object can also be seen as a partial implementation of a list of pages. It is possible to use operators to slice or concatenate:</p>
104+
<pre>&gt;&gt;&gt; #Equivalent to pdf.keep_pages(doc, {3, 4, 5})
105+
&gt;&gt;&gt; last_3_pages = doc[3:]
106+
107+
&gt;&gt;&gt; #Equivalent to pdf.concat(doc1, doc2)
108+
&gt;&gt;&gt; doc = doc1 + doc2
109+
</pre>
110+
<p><code>add_text_annotation</code> inserts a simple text annotation in a page.</p>
111+
<pre>&gt;&gt;&gt; annotated_doc = add_text_annotation(doc, 0, "abcdefg", [100, 100, 100, 100])
112+
</pre>
113+
<p></p>
114+
<h3>Low-level access and modification</h3>
115+
<p></p>
116+
<h4>Objects</h4>
42117
<p><code>trailer</code> and <code>catalog</code> give access to the starting point of the object tree. </p>
43118
<pre>&gt;&gt;&gt; #Access to document trailer
44119
&gt;&gt;&gt; doc.trailer()
@@ -56,7 +131,7 @@ <h3>Low-level access to object tree</h3>
56131
{'/Pages': 3j, '/Outlines': 2j, '/Type': '/Catalog'}
57132
</pre>
58133
<p></p>
59-
<h3>Pages</h3>
134+
<h4>Pages</h4>
60135
<p>Page index is a tree structure where attributes can be inherited from parent nodes. For convenience <code>flat_page_tree</code> returns an ordered list of document pages and specifies inherited attributes that should apply to each page.</p>
61136
<pre>&gt;&gt;&gt; #Each item of the list is a tuple with the page object reference and its inherited attributes
62137
&gt;&gt;&gt; doc = pdf.readfile("samples/simple_text_string.pdf")
@@ -74,42 +149,6 @@ <h3>Pages</h3>
74149
'/Type': '/Page'}]
75150
</pre>
76151
<p></p>
77-
<h3>Incremental updates</h3>
78-
<p>PDFSyntax tracks document incremental updates made possible by appending new or updated objects at the end of an original PDF file (and the matching XREF section). The <code>Revisions</code> entry of the <code>structure</code> function result, if greater than 1, indicates that incremental updates have been appended.By default, a newly opened document by PDFSyntax is ready to write modifications in the next revision.The <code>rewind</code> function rolls back to the previous revision. The <code>commit</code> function closes the current revision and open the next one.</p>
79-
<pre>&gt;&gt;&gt; import pdfsyntax as pdf
80-
&gt;&gt;&gt; doc = pdf.readfile("samples/add_text_annotation.pdf")
81-
&gt;&gt;&gt; doc.structure()
82-
{'Version': '1.4', 'Pages': 1, 'Revisions': 2, 'Encrypted': False, 'Paper of 1st page': '215x279mm or 8.5x11.0in (US Letter)'}
83-
84-
&gt;&gt;&gt; #This file contains 2 revisions and PDFSyntax has initialized the doc object for a future revision 3
85-
86-
&gt;&gt;&gt; doc.get_object(4j)
87-
{'/Annots': 8j, '/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j}, '/Contents': 5j, '/MediaBox': [0, 0, 612, 792], '/Parent': 3j, '/Type': '/Page'}
88-
89-
&gt;&gt;&gt; #In its current state, the page (object 4) contains an annotation
90-
&gt;&gt;&gt; #Let's rewind to revision 1
91-
92-
&gt;&gt;&gt; doc = doc.rewind() # to revision 2
93-
&gt;&gt;&gt; doc = doc.rewind() # to revision 1
94-
95-
&gt;&gt;&gt; doc.get_object(4j)
96-
{'/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j}, '/Contents': 5j, '/MediaBox': [0, 0, 612, 792], '/Parent': 3j, '/Type': '/Page'}
97-
98-
&gt;&gt;&gt; #The annotation was not present in the initial revision of the file
99-
</pre>
100-
<p></p>
101-
<h3>High-level transformation</h3>
102-
<p><code>add_text_annotation</code> inserts a simple text annotation in a page.</p>
103-
<pre>&gt;&gt;&gt; annotated_doc = add_text_annotation(doc, 0, "abcdefg", [100, 100, 100, 100])
104-
</pre>
105-
<p><code>rotate</code> turns pages relatively to their current position by multiples of 90 degrees clockwise. NB: It takes into account the inherited attributes from the page hierarchy.</p>
106-
<pre>&gt;&gt;&gt; #Default rotation applies 90 degrees to all pages
107-
&gt;&gt;&gt; doc90 = rotate(doc)
108-
109-
&gt;&gt;&gt; #Apply 180 degrees to first two page
110-
&gt;&gt;&gt; doc180 = doc.rotate(180, [1, 2])
111-
</pre>
112-
<p></p>
113152
<blockquote><p> TO BE CONTINUED</p>
114153
</blockquote>
115154

docs/api.md

Lines changed: 118 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
## API
22

3-
### Usage
3+
### Principles
4+
5+
#### Pure functions
46

57
Most functions are pure and are exposed both as basic functions and as instance methods of a Doc object: in the function signatures found in the following sections, a `doc` first argument can read as `self`. For example both samples are equivalent:
68

@@ -18,6 +20,66 @@ Most functions are pure and are exposed both as basic functions and as instance
1820
>>> m = doc.metadata()
1921
```
2022

23+
Every time a function is applied to a Doc object, the function returns a new object built as a shallow copy of the input.
24+
25+
#### Incremental updates
26+
27+
PDFSyntax tracks document incremental updates made possible by appending new or updated objects at the end of an original PDF file (and the matching XREF section). A revision, if greater than 0, indicates that incremental updates have been appended.
28+
By default, a newly opened document by PDFSyntax is ready to write modifications in the next revision.
29+
The `rewind` function rolls back to the previous revision. The `commit` function closes the current revision and open the next one.
30+
31+
32+
For example this file contains 2 revisions (0 and 1) and PDFSyntax has initialized the doc object to open revision 2:
33+
34+
```Python
35+
>>> import pdfsyntax as pdf
36+
>>> doc = pdf.readfile("samples/add_text_annotation.pdf")
37+
>>> doc
38+
<PDF Doc in revision 2 with 0 modified object(s)>
39+
```
40+
41+
The `rewind` function rolls back to the previous revision. Let's rewind to revision 0:
42+
43+
```Python
44+
>>> doc = pdf.rewind(doc) # to revision 1
45+
>>> doc = pdf.rewind(doc) # to revision 0
46+
>>> doc
47+
<PDF Doc in revision 0 with 7 modified object(s)>
48+
```
49+
50+
After one or several modifications, the `commit` function closes the current revision and opens the next one:
51+
52+
```Python
53+
>>> doc = pdf.rotate(doc)
54+
>>> doc = pdf.commit(doc)
55+
>>> doc
56+
<PDF Doc in revision 1 with 0 modified object(s)>
57+
```
58+
59+
#### Squashing
60+
61+
By default incremental updates stack up but it is possible to `squash` a document in order to combine all revisions into a single one. in this example the last document is equivalent to the first one (same appearance), but it is only made of one revision. As this revision is like a document started from scratch, its revision is 0 and all its 7 internal objects look like new ones:
62+
63+
```Python
64+
>>> doc90 = pdf.rotate(doc)
65+
>>> doc90
66+
<PDF Doc in revision 1 with 1 modified object(s)>
67+
>>> docs = pdf.squash(doc90)
68+
>>> docs
69+
<PDF Doc in revision 0 with 7 modified object(s)>
70+
```
71+
72+
#### File I/O
73+
74+
The `writefile` function dumps the document with all the incremental updates appended at the end of the original data.
75+
76+
```Python
77+
>>> from pdfsyntax import readfile, writefile
78+
>>> doc = readfile("samples/simple_text_string.pdf")
79+
>>> doc90 = pdf.rotate(doc)
80+
>>> writefile(doc90, "rotated_doc.pdf")
81+
```
82+
2183
### File information
2284

2385
`structure` and `metadata` are functions showing general information about the document.
@@ -32,7 +94,60 @@ Most functions are pure and are exposed both as basic functions and as instance
3294
{'Title': None, 'Author': None, 'Subject': None, 'Keywords': None, 'Creator': None, 'Producer': None, 'CreationDate': None, 'ModDate': None}
3395
```
3496

35-
### Low-level access to object tree
97+
### High-level transformation
98+
99+
`rotate` turns pages relatively to their current position by multiples of 90 degrees clockwise. NB: It takes into account the inherited attributes from the page hierarchy.
100+
101+
```Python
102+
>>> #Default rotation applies 90 degrees to all pages
103+
>>> doc90 = rotate(doc)
104+
105+
>>> #Apply 180 degrees to first two page
106+
>>> doc180 = doc.rotate(180, [0, 1])
107+
```
108+
109+
_WARNING_: To REMOVE something means it still exists but it is hidden.
110+
111+
`remove_pages` cuts a set of pages from the document as incremental update: they are not permanently deleted because it is still possible to revert to the previous revision.
112+
113+
```Python
114+
>>> #Remove first 3 pages of a 6-page doc
115+
>>> second_half_doc = pdf.remove_pages(doc, {0, 1, 2})
116+
```
117+
118+
`keep_pages` does the opposite:
119+
120+
```Python
121+
>>> #Keep last 3 pages of a 6-page doc
122+
>>> second_half_doc = pdf.keep_pages(doc, {3, 4, 5})
123+
```
124+
125+
`concat` merges documents:
126+
127+
```Python
128+
>>> #Concatenate doc2 pages after doc1 pages into a new doc
129+
>>> doc = pdf.concat(doc1, doc2)
130+
```
131+
132+
A Doc object can also be seen as a virtual list of pages. It is possible to use operators to slice or concatenate:
133+
134+
```Python
135+
>>> #Equivalent to pdf.keep_pages(doc, {3, 4, 5})
136+
>>> last_3_pages = doc[3:]
137+
138+
>>> #Equivalent to pdf.concat(doc1, doc2)
139+
>>> doc = doc1 + doc2
140+
```
141+
142+
`add_text_annotation` inserts a simple text annotation in a page.
143+
144+
```Python
145+
>>> annotated_doc = add_text_annotation(doc, 0, "abcdefg", [100, 100, 100, 100])
146+
```
147+
148+
### Low-level access and modification
149+
150+
#### Objects
36151

37152
`trailer` and `catalog` give access to the starting point of the object tree.
38153

@@ -58,7 +173,7 @@ You may think of the `j` as a "jump" to another object :)
58173
{'/Pages': 3j, '/Outlines': 2j, '/Type': '/Catalog'}
59174
```
60175

61-
### Pages
176+
#### Pages
62177

63178
Page index is a tree structure where attributes can be inherited from parent nodes. For convenience `flat_page_tree` returns an ordered list of document pages and specifies inherited attributes that should apply to each page.
64179

@@ -82,52 +197,6 @@ The `page` function goes further by merging inherited attributes with local attr
82197
'/Type': '/Page'}]
83198
```
84199

85-
### Incremental updates
86-
87-
PDFSyntax tracks document incremental updates made possible by appending new or updated objects at the end of an original PDF file (and the matching XREF section). The `Revisions` entry of the `structure` function result, if greater than 1, indicates that incremental updates have been appended.
88-
By default, a newly opened document by PDFSyntax is ready to write modifications in the next revision.
89-
The `rewind` function rolls back to the previous revision. The `commit` function closes the current revision and open the next one.
90-
91-
```Python
92-
>>> import pdfsyntax as pdf
93-
>>> doc = pdf.readfile("samples/add_text_annotation.pdf")
94-
>>> doc.structure()
95-
{'Version': '1.4', 'Pages': 1, 'Revisions': 2, 'Encrypted': False, 'Paper of 1st page': '215x279mm or 8.5x11.0in (US Letter)'}
96-
97-
>>> #This file contains 2 revisions and PDFSyntax has initialized the doc object for a future revision 3
98-
99-
>>> doc.get_object(4j)
100-
{'/Annots': 8j, '/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j}, '/Contents': 5j, '/MediaBox': [0, 0, 612, 792], '/Parent': 3j, '/Type': '/Page'}
101-
102-
>>> #In its current state, the page (object 4) contains an annotation
103-
>>> #Let's rewind to revision 1
104-
105-
>>> doc = doc.rewind() # to revision 2
106-
>>> doc = doc.rewind() # to revision 1
107-
108-
>>> doc.get_object(4j)
109-
{'/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j}, '/Contents': 5j, '/MediaBox': [0, 0, 612, 792], '/Parent': 3j, '/Type': '/Page'}
110-
111-
>>> #The annotation was not present in the initial revision of the file
112-
```
113-
114-
### High-level transformation
115-
116-
`add_text_annotation` inserts a simple text annotation in a page.
117-
118-
```Python
119-
>>> annotated_doc = add_text_annotation(doc, 0, "abcdefg", [100, 100, 100, 100])
120-
```
121-
122-
`rotate` turns pages relatively to their current position by multiples of 90 degrees clockwise. NB: It takes into account the inherited attributes from the page hierarchy.
123-
124-
```Python
125-
>>> #Default rotation applies 90 degrees to all pages
126-
>>> doc90 = rotate(doc)
127-
128-
>>> #Apply 180 degrees to first two page
129-
>>> doc180 = doc.rotate(180, [1, 2])
130-
```
131200

132201
> TO BE CONTINUED
133202

pdfsyntax/api.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -470,7 +470,8 @@ def cross_map_index(sections, index):
470470
Doc.structure = structure
471471
Doc.get_object = get_object
472472
Doc.obj = obj
473-
Doc.rewind = rewind
473+
#Doc.rewind = rewind
474+
#Doc.commit = commit
474475
Doc.rotate = rotate
475476
Doc.page_layouts = page_layouts
476477
Doc.flat_page_tree = flat_page_tree

0 commit comments

Comments
 (0)