|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "No, PDFs really do suck!" |
| 4 | +date: 2009-06-17 |
| 5 | +blogger-link: https://chem-bla-ics.blogspot.com/2009/06/no-pdfs-really-do-suck.html |
| 6 | +doi: 10.59350/dv8xh-5dk63 |
| 7 | +tags: publishing |
| 8 | +--- |
| 9 | + |
| 10 | +A typical blog by Peter MR made (again), [The ICE-man: Scholary HTML not PDF](http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2102), |
| 11 | +the point of why PDF is to data what a hamburger is to a cow, in reply to a blog by Peter SF, [Scholarly HTML](http://ptsefton.com/2009/06/11/trip-report-visit-to-microsoft.htm#id3). |
| 12 | + |
| 13 | +This lead to a [discussion on FriendFeed](http://friendfeed.com/petermr/767254d7/ice-man-scholary-html-not-pdf). |
| 14 | +A couple of misconceptions: |
| 15 | + |
| 16 | +**"But how are we going to cite without paaaaaaaaaaaage nuuuuuuuuuuumbers?"**<br /> |
| 17 | +We don't. Many online-only journals can do without; there is DOI. And if that is not enough, the legal business has means of |
| 18 | +identifying paragraphs, etc, which should provide us with all the methods we could possibly need in science. |
| 19 | + |
| 20 | +**Typesetting of PDFs, in most journals, is superior than HTML, which is why I prefer to read a PDF version if it is available. It is nicer to the eyes.**<br /> |
| 21 | +Ummm... this is supposed to be Science, not a California Glossy. It seems that |
| 22 | +[pretty looks is causing major body count](http://shirleywho.wordpress.com/2009/05/11/an-open-letter-to-oprah/) in |
| 23 | +the States. Otherwise, HTML+CSS can likely beat any pretty looks of PDF, or at least match it. |
| 24 | + |
| 25 | +**As I seem to be the only physicist/mathematician who comments on these sort of things, I feel like a broken record, |
| 26 | +but math support in browsers currently sucks extremely badly and this is a primary reason why we will continue to use |
| 27 | +PDF for quite some time.**<br /> |
| 28 | +HTML+[MathML](http://www.w3.org/Math/) is well established, and default FireFox browsers have no problem showing mathematical |
| 29 | +equations. For years, the [Blue Obelisk](http://en.wikipedia.org/wiki/Blue_Obelisk) [QSAR descriptor ontology](http://qsar.sourceforge.net/dicts/qsar-descriptors/index.xhtml) |
| 30 | +has been using such a set up for years. If you use TeX to author your equations, you can |
| 31 | +[convert it to HTML](http://silas.psfc.mit.edu/mathmltalk/) too. |
| 32 | + |
| 33 | +**We can mine the data from the PDF text.** Theoretically, yes. Practically, it is money down the drain. PDF is particularly |
| 34 | +nasty here, as it breaks words at the end of a line, and even can make words consist of unlinked series of characters |
| 35 | +positioned at (x,y). PDF, however, can contains a lot of metadata, but that is merely a hack, and unneeded workaround. |
| 36 | +Worse, hardly used regarding chemistry. PDF can contain PNG images which can contain CML; the tools are there, but not |
| 37 | +used, and there are more efficient technologies anyway. |
| 38 | + |
| 39 | +I, for one, agree with Peter on PDF: it really suck as scientific communication medium. |
0 commit comments