Skip to content

Commit 5e0e99d

Browse files
committed
ooops
1 parent 3cee1e1 commit 5e0e99d

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

Lacy/2025-09-02_berlin.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
<!DOCTYPE HTML><html xmlns="http://www.w3.org/1999/xhtml" xmlns:t="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xml:lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><title>"The most perfect edition of plays ever published": the Digital Lacy project</title><meta name="generator" content="Generated by TEISLIDY stylesheet"><script src="https://www.w3.org/Talks/Tools/Slidy/slidy.js" type="text/javascript"></script><link rel="stylesheet" type="text/css" media="screen, projection" href="https://www.w3.org/Talks/Tools/Slidy/show.css"><link href="../css/egXMLhandling.css" rel="stylesheet" type="text/css"><link href="../css/ODD.css" rel="stylesheet" type="text/css"/><link href="../css/tei.css" rel="stylesheet" type="text/css"></head><body class="simple" id="TOP"><div class="slide cover"><img src="media/theBestest.png" width="70%" style="float:center" alt="[Put logo here]" class="cover"><br clear="all"><h1>"The most perfect edition of plays ever published": the Digital Lacy project</h1><p>Lou Burnard (Independent Scholar)</p></div><div class="slide"><div class="frame"><div class="col"><figure class="figure"><img src="media/THLacy.jpg" alt="Thomas Hailes Lacy (1809-1873)" class="graphic" style=" width:50%; height:55%;"><h2>Thomas Hailes Lacy (1809-1873)</h2></figure></div><div class="col2"><ul><li class="item">Lacy was the leading theatrical publisher of "Acting Editions" -- practical working documents printed at 6d a copy for individual titles, or 5s for a bound volume of 15 titles. </li><li class="item">Between 1848 and 1873, his <span class="titlem">Lacy's Acting Edition of Plays</span>, grew to contain 100 volumes of 15 titles each: it was sold across the globe, and made him a reasonable fortune.</li><li class="item">The LAE is a unique sample, apparently covering the full range of Victorian Theatrical presentations</li><li class="item">The population it samples approximates to the titles listed in vols 4 and 5 of Allardyce Nicoll's magisterial <span class="titlem">History of English Drama</span> -- c. 24,000 distinct titles performed between 1800 and 1900.</li></ul></div></div></div><div class="slide"><h2>Research question: how representative is the LAE ?</h2><p>A corpus is a sample, hopefully representative of a known population. Initial comparisons between the LAE and Allardyce Nicoll's <span class="titlem">Handlist</span>s suggest distributions of size, age, and mode are comparable.</p><figure class="figure"><img src="media/agebyvol.png" alt="First performance dates by volume" class="graphic"><h2>First performance dates by volume</h2></figure><p class="box">"It is hard to avoid the conclusion that Lacy astutely leavened the mix for each volume, using mainly contemporary titles to complement the old favourites." (cf. <a class="link_ref" href="https://doi.org/10.58079/140cy">How old are these plays?</a>)</p></div><div class="slide"><h2>Digital Lacy project</h2><ul><li class="item">Builds on and expands data from Richard Pearson's <span class="titlem">Victorian Plays Project</span> (VPP), AHRB funded 2005-2007</li><li class="item"><ul><li class="item">The VPP produced a catalogue of the LAE, along with c. 15,000 page images from a copy held at Birmingham Library</li><li class="item">These were OCRd, proofed, and made available in a PDF format as visually faithful as possible to the original</li><li class="item">By 2014, the project had processed 340 titles which were distributed from a site at the University of Galway</li><li class="item">Following Pearson's death in 2018, the project was frozen; by June 2022 the website had disappeared...</li></ul></li><li class="item">In 2022, with the aid of researchers who worked on the project, I recovered most of the resources it had created and transferred them to a github repository, where I continue to work on them</li><li class="item">Digital Lacy now combines : <ul><li class="item">a detailed and expanding set of metadata relating to the LAE and its authors, enhanced with links to available digital versions</li><li class="item">a slowly increasing number of TEI-XML transcripts</li></ul></li></ul><p>Proto-website at <a class="link_ref" href="http://lb42.github.io/Lacy/">http://lb42.github.io/Lacy</a></p></div><div class="slide"><div class="frame"><div class="col"><h2>Current workflow</h2><figure class="figure"><img src="media/workflow-1.png" alt="" class="graphic" style=" width:80%;"></figure></div><div class="col2"><ul><li class="item">Goal is consistent minimal encoding of a known source edition</li><li class="item"><ul><li class="item">VPP-PDF to Docx (OCR by Abby, thanks Huma-num)</li><li class="item">DocX to TEI-All (XSLT by TEI)</li><li class="item">TEI-All to Lacy XML (homegrown XSLT scripts)</li></ul></li><li class="item">Minimal markup, largely ignoring visual salience</li><li class="item">TEI schema defined by ODD very close to dracor-schema </li></ul><p>Impossible without manual intervention: this is the main bottleneck in current workflow.</p></div></div></div><div class="slide"><h2>DraCor vs Lacy: how close ?</h2><p>DraCor and Lacy have a few ideological differences...</p><ul><li class="item">In DraCor metadata, the digital version is primary, any source version being nested within it; in Lacy, that hierarchy is reversed. </li><li class="item">Some DraCor metadata (notably performances and identifiers) is relegated to a <span class="gi">&lt;standOff&gt;</span>; in Lacy it is imbricated in the TEI Header</li><li class="item">DraCor uses explicit scene divisions to define stage-presence, as the basis for its network analysis; a quarter of Lacy titles don't have scene divisions.</li><li class="item">Lacy uses many of the available TEI tags for the front matter of a play ; DraCor largely ignores this (for example, Lacy makes explicit that role is gendered, which is not supported by DraCor ODD)</li><li class="item">DraCor makes no attempt to support metadata such as editorial correction, variant readings, modifications for performance, etc; Lacy should, but doesn't. </li></ul><p class="box">However - the DraCor team is very responsive and helpful !</p></div><div class="slide"><h2>Just a few tagging headaches </h2><p>These texts are full of phenomena which break or strain the simple OHCO model... </p><ul><li class="item">speaker may be implicit</li><li class="item">speaker may be multiple person</li><li class="item">musical numbers (<span class="gi">&lt;spGrp&gt;</span>) don't tesselate and may self-nest</li><li class="item">available metadata may be missing, uncertain, inconsistent or just wrong </li></ul></div><div class="slide"><h2>Implied speaker</h2><figure class="figure"><img src="media/impliedSpkr.png" alt="" class="graphic"><img src="media/impliedSpkrX.png" alt="" class="graphic"></figure></div><div class="slide"><h2>Speeches assigned to multiple speakers</h2><figure class="figure"><img src="media/egMult.png" alt="" class="graphic" style=" width:70%;"><img src="media/egMultx.png" alt="" class="graphic" style=" width:60%;"></figure><p>(<span class="gi">&lt;stage&gt;</span> not currently permitted within <span class="gi">&lt;speaker&gt;</span>)</p></div><div class="slide"><div class="frame"><div class="col"><h2>Nesting of simultaneous speech or song</h2><figure class="figure"><img src="media/parallelExample.png" alt="" class="graphic" style=" width:65%;"></figure></div><div class="col2"><figure class="figure"><img src="media/tyrolienne.png" alt="" class="graphic" style=" width:45%;"></figure><p>The whole dance (the Tyrolienne) is contained by a <span class="gi">&lt;spGrp&gt;</span> element which contains two nested <span class="gi">&lt;spGrp&gt;</span> elements, each containing two <span class="gi">&lt;sp&gt;</span> elements to be performed in parallel. (See also <a class="link_ref" href="https://github.com/TEIC/TEI/issues/2695">TEI Issue 2695</a>)</p></div></div></div><div class="slide"><h2>Tentative suggestions and conclusions</h2><ul><li class="item">TEI conformance is crucial to the interoperability of DraCor corpora. The DraCor profile/ODD should specify which parts of the TEI model are mandatory, desirable, permissible, unsupported ... </li><li class="item">The documentation provided by the DraCor ODD is good, but could be improved: more examples and more discussion of edge cases would be useful ; as would simple tutorial guides showing how to use DraCor-conformant corpora with a variety of tools (not only python, plz)</li><li class="item">Provide a forum for corpus creators to compare methods and tools, and to discuss possible solutions to common encoding problems</li><li class="item">Encourage corpus creators to facilitate gap-filling in DraCor coverage, e.g. ECCO</li></ul></div></body></html>
1+
<!DOCTYPE HTML><html xmlns="http://www.w3.org/1999/xhtml" xmlns:t="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xml:lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><title>"The most perfect edition of plays ever published": the Digital Lacy project</title><meta name="generator" content="Generated by TEISLIDY stylesheet"><script src="https://www.w3.org/Talks/Tools/Slidy/slidy.js" type="text/javascript"></script><link rel="stylesheet" type="text/css" media="screen, projection" href="https://www.w3.org/Talks/Tools/Slidy/show.css"><link href="../css/egXMLhandling.css" rel="stylesheet" type="text/css"><link href="../css/ODD.css" rel="stylesheet" type="text/css"><link href="../css/tei.css" rel="stylesheet" type="text/css"></head><body class="simple" id="TOP"><div class="slide cover"><img src="media/theBestest.png" width="70%" style="float:center" alt="[Put logo here]" class="cover"><br clear="all"><h1>"The most perfect edition of plays ever published": the Digital Lacy project</h1><p>Lou Burnard (Independent Scholar)</p></div><div class="slide"><div class="frame"><div class="col"><figure class="figure"><img src="media/THLacy.jpg" alt="Thomas Hailes Lacy (1809-1873)" class="graphic" style=" width:50%; height:55%;"><h2>Thomas Hailes Lacy (1809-1873)</h2></figure></div><div class="col2"><ul><li class="item">Lacy was the leading theatrical publisher of "Acting Editions" -- practical working documents printed at 6d a copy for individual titles, or 5s for a bound volume of 15 titles. </li><li class="item">Between 1848 and 1873, his <span class="titlem">Lacy's Acting Edition of Plays</span>, grew to contain 100 volumes of 15 titles each: it was sold across the globe, and made him a reasonable fortune.</li><li class="item">The LAE is a unique sample, apparently covering the full range of Victorian Theatrical presentations</li><li class="item">The population it samples approximates to the titles listed in vols 4 and 5 of Allardyce Nicoll's magisterial <span class="titlem">History of English Drama</span> -- c. 24,000 distinct titles performed between 1800 and 1900.</li></ul></div></div></div><div class="slide"><h2>Research question: how representative is the LAE ?</h2><p>A corpus is a sample, hopefully representative of a known population. Initial comparisons between the LAE and Allardyce Nicoll's <span class="titlem">Handlist</span>s suggest distributions of size, age, and mode are comparable.</p><figure class="figure"><img src="media/agebyvol.png" alt="First performance dates by volume" class="graphic"><h2>First performance dates by volume</h2></figure><p class="box">"It is hard to avoid the conclusion that Lacy astutely leavened the mix for each volume, using mainly contemporary titles to complement the old favourites." (cf. <a class="link_ref" href="https://doi.org/10.58079/140cy">How old are these plays?</a>)</p></div><div class="slide"><h2>Digital Lacy project</h2><ul><li class="item">Builds on and expands data from Richard Pearson's <span class="titlem">Victorian Plays Project</span> (VPP), AHRB funded 2005-2007</li><li class="item"><ul><li class="item">The VPP produced a catalogue of the LAE, along with c. 15,000 page images from a copy held at Birmingham Library</li><li class="item">These were OCRd, proofed, and made available in a PDF format as visually faithful as possible to the original</li><li class="item">By 2014, the project had processed 340 titles which were distributed from a site at the University of Galway</li><li class="item">Following Pearson's death in 2018, the project was frozen; by June 2022 the website had disappeared...</li></ul></li><li class="item">In 2022, with the aid of researchers who worked on the project, I recovered most of the resources it had created and transferred them to a github repository, where I continue to work on them</li><li class="item">Digital Lacy now combines : <ul><li class="item">a detailed and expanding set of metadata relating to the LAE and its authors, enhanced with links to available digital versions</li><li class="item">a slowly increasing number of TEI-XML transcripts</li></ul></li></ul><p>Proto-website at <a class="link_ref" href="http://lb42.github.io/Lacy/">http://lb42.github.io/Lacy</a></p></div><div class="slide"><div class="frame"><div class="col"><h2>Current workflow</h2><figure class="figure"><img src="media/workflow-1.png" alt="" class="graphic" style=" width:80%;"></figure></div><div class="col2"><ul><li class="item">Goal is consistent minimal encoding of a known source edition</li><li class="item">VPP texts: <ul><li class="item">VPP-PDF to Docx (OCR by Abby, thanks Huma-num)</li><li class="item">DocX to TEI-All (XSLT by TEI)</li><li class="item">TEI-All to Lacy XML (homegrown XSLT scripts)</li></ul></li><li class="item">Minimal markup, largely ignoring visual salience</li><li class="item">TEI schema defined by ODD very close to dracor-schema </li></ul><p>Impossible without manual intervention: this is the main bottleneck in current workflow.</p></div></div></div><div class="slide"><h2>DraCor vs Lacy: how close ?</h2><p>DraCor and Lacy have a few ideological differences...</p><ul><li class="item">In DraCor metadata, the digital version is primary, any source version being nested within it; in Lacy, that hierarchy is reversed. </li><li class="item">Some DraCor metadata (notably performances and identifiers) is relegated to a <span class="gi">&lt;standOff&gt;</span>; in Lacy it is imbricated in the TEI Header</li><li class="item">DraCor uses explicit scene divisions to define stage-presence, as the basis for its network analysis; a quarter of Lacy titles don't have scene divisions.</li><li class="item">Lacy uses many of the available TEI tags for the front matter of a play ; DraCor largely ignores these. </li><li class="item">DraCor makes no attempt to support metadata such as editorial correction, variant readings, modifications for performance, etc; Lacy should, but doesn't. </li></ul><p class="box">However - the DraCor team is very responsive and helpful !</p></div><div class="slide"><h2>Tagging headaches are another persistent challenge </h2><p>These texts are full of phenomena which break or strain the simple OHCO model... </p><ul><li class="item">speaker may be implicit or multiple</li><li class="item">musical numbers (<span class="gi">&lt;spGrp&gt;</span>) don't tesselate and may self-nest</li><li class="item">metadata may be missing, uncertain, inconsistent or just wrong </li></ul><p>For example ....</p></div><div class="slide"><h2>Implied speaker</h2><figure class="figure"><img src="media/impliedSpkr.png" alt="" class="graphic"><img src="media/impliedSpkrX.png" alt="" class="graphic"></figure></div><div class="slide"><h2>Speeches assigned to multiple speakers</h2><figure class="figure"><img src="media/egMult.png" alt="" class="graphic" style=" width:70%;"><img src="media/egMultx.png" alt="" class="graphic" style=" width:60%;"></figure><p>(<span class="gi">&lt;stage&gt;</span> not currently permitted within <span class="gi">&lt;speaker&gt;</span>)</p></div><div class="slide"><div class="frame"><div class="col"><h2>Nesting of simultaneous speech or song</h2><figure class="figure"><img src="media/parallelExample.png" alt="" class="graphic" style=" width:65%;"></figure></div><div class="col2"><figure class="figure"><img src="media/tyrolienne.png" alt="" class="graphic" style=" width:45%;"></figure><p>The whole dance (the Tyrolienne) is contained by a <span class="gi">&lt;spGrp&gt;</span> element which contains two nested <span class="gi">&lt;spGrp&gt;</span> elements, each containing two <span class="gi">&lt;sp&gt;</span> elements to be performed in parallel. (See also <a class="link_ref" href="https://github.com/TEIC/TEI/issues/2695">TEI Issue 2695</a>)</p></div></div></div><div class="slide"><h2>Tentative suggestions and conclusions</h2><ul><li class="item">TEI conformance is crucial to the interoperability of DraCor corpora. The DraCor profile/ODD should specify which parts of the TEI model are mandatory, desirable, permissible, unsupported ... </li><li class="item">The documentation provided by the DraCor ODD is good, but could be improved: more examples and more discussion of edge cases would be useful ; as would simple tutorial guides showing how to use DraCor-conformant corpora with a variety of tools (not only python, plz)</li><li class="item">Provide a forum for corpus creators to compare methods and tools, and to discuss possible solutions to common encoding problems</li><li class="item">Encourage corpus creators to facilitate gap-filling in DraCor coverage, e.g. ECCO</li></ul></div></body></html>

0 commit comments

Comments
 (0)