Skip to content

testing maps #37

@djbpitt

Description

@djbpitt

The following test succeeds when run from the command line using XSpec pulled from the master branch of the GitHub repo (Latest commit 4c50caa; SAXON HE 9.9.1.4), but fails in <oXygen/> 21.1, build 2019101513 with version XSpec 1.2.1 of the add-on (both Framework and Helper view). The error message, from Ant, reads "FOTY0013: Maps cannot be atomized". The test is:

<x:scenario label="Scenario for testing function tokenize_input with two one-word inputs">
    <x:call function="djb:tokenize_input">
        <x:param name="s1" select="'kittens'"/>
        <x:param name="s2" select="'sitting'"/>
    </x:call>
    <x:expect label="Two one-word inputs are okay"
        select='            
        map{
        "left":("s","i","t","t","i","n","g"),
        "top":("k","i","t","t","e","n","s"),
        "type":"characters"
        }'
    />
</x:scenario>

and the function is:

<xsl:function name="djb:tokenize_input" as="map(xs:string, item()+)">
    <xsl:param name="top" as="xs:string"/>
    <xsl:param name="left" as="xs:string"/>
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <!-- validate input                                        -->
    <!-- no null strings                                       -->
    <!-- both strings must be either single  or multiple-word  -->
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <!-- normalize whitespace first                            -->
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <xsl:variable name="top_n" as="xs:string" select="normalize-space($top)"/>
    <xsl:variable name="left_n" as="xs:string" select="normalize-space($left)"/>

    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <!-- check for mismatch parameter types                    -->
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <xsl:if test="(string-length($top_n), string-length($left_n)) = 0">
        <xsl:message select="'Null strings are not permitted'" terminate="yes"/>
    </xsl:if>
    <xsl:if
        test="
        not(
        (matches($top_n, '\s') and matches($left_n, '\s'))
        or
        not(matches($top_n, '\s')) and not(matches($left_n, '\s'))
        )">
        <xsl:message
            select="'Either both strings must be single words or both strings must be multiple words'"
            terminate="yes"/>
    </xsl:if>


    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <!-- split the inputs                                      -->
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <xsl:variable name="top_out" as="xs:string+"
        select="
        if (matches($top_n, '\s')) then
        tokenize($top_n, '\s+')
        else
        for $c in string-to-codepoints($top_n)
        return
        codepoints-to-string($c)"/>
    <xsl:variable name="left_out" as="xs:string+"
        select="
        if (matches($left_n, '\s')) then
        tokenize($left_n, '\s+')
        else
        for $c in string-to-codepoints($left_n)
        return
        codepoints-to-string($c)"/>

    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <!-- are we returning characters or words?                 -->
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <xsl:variable name="input_type" as="xs:string+"
        select="
        if (matches($top_n, '\s')) then
        'words'
        else
        'characters'"/>

    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <!-- return tokenized sequences and type in map            -->
    <!-- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -->
    <xsl:sequence
        select="
        map {
        'top': $top_out,
        'left': $left_out,
        'type': $input_type
        }"
    />
</xsl:function>

Synopsis: The point of the function is to accept either two one-word inputs or two multi-word inputs (both as single strings; the multi-word ones are identified by internal whitespace). With one-word inputs, it should split the string into single-character strings. With multi-word inputs, it should word-tokenize the inputs on white space. It returns a map with three items: the sequences for each of the two inputs and a report on whether it split into characters or words. It should trap any input that isn't two single words or two multi-word strings, either as a type error on the parameter or by checking performed inside the function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions