Skip to content

Improve regular expressions per Bruce Rennie bug report.#536

Merged
Jafaral merged 2 commits intouniconproject:masterfrom
cjeffery:master
Nov 16, 2025
Merged

Improve regular expressions per Bruce Rennie bug report.#536
Jafaral merged 2 commits intouniconproject:masterfrom
cjeffery:master

Conversation

@cjeffery
Copy link
Contributor

@cjeffery cjeffery commented Oct 1, 2025

I have tweaked unigram.y (commit includes unigram.icn and unigram.u for bootstrapping purposes) and tree.icn to handle some regular expressions that were previously failing as reported by Bruce Rennie.

@Jafaral
Copy link
Member

Jafaral commented Oct 1, 2025

@brucerennie do you have any comments to add to this PR?

@brucerennie
Copy link
Collaborator

I will test it against my local copy of the public release of Unicon.

@Jafaral
Copy link
Member

Jafaral commented Oct 6, 2025

@brucerennie, let me know if you have any feedback, or if we should move ahead and merge it.

@brucerennie
Copy link
Collaborator

Jafar,

I just have to find the test programs I was using at the time and test against these changes. Should be done in the next two days.

@brucerennie
Copy link
Collaborator

brucerennie commented Oct 7, 2025

Jafar,

The changes made above allow the code as follows to compile:

procedure main()
  local identifier1, integ1, integerd1, reallit1, radint1, numdie1

  identifier1 := <[a-zA-Z0-9_]>

end

However, when attempting to execute the larger program from which this was extracted as below:

procedure main()
  local identifier1, integ1, integerd1, reallit1, radint1, numdie1

  numdie1 := <[1-9][0-9]*[dD][1-9][0-9]*>
  integ1 := <[0-9][0-9]*>
  identifier1 := <[a-zA-Z_][a-zA-Z0-9_]*>
  integerd1 := <[0-9][0-9]*([kKmMgGtTpP]|)>
  reallit1 := <([0-9][0-9]*|)("."|)([0-9][0-9]*|)(([eE]([-+]|)([0-2][0-9][0-9]|"30"[0-8]|[0-9][0-9]|[0-9]))|)>

end

This now compiles and the code generated is 

procedure main();
  local identifier1, integ1, integerd1, reallit1, radint1, numdie1;

  numdie1 := pattern_concat(Any('123456789'),  pattern_concat((Arbno('0123456789')),  pattern_concat(Any('dD'),  pattern_concat(Any('123456789'), (Arbno('0123456789'))))));
  integ1 := pattern_concat(Any('0123456789'), (Arbno('0123456789')));
  identifier1 := pattern_concat(Any('ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz'), (Arbno('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_')));
  integerd1 := pattern_concat(Any('0123456789'),  pattern_concat((Arbno('0123456789')),  pattern_alternate(Any('kKmMgGtTpP'),emptyregex)));
  reallit1 := pattern_concat( pattern_alternate( pattern_concat(Any('0123456789'), (Arbno('0123456789'))),emptyregex),  pattern_concat( pattern_alternate(".",emptyregex),  pattern_concat( pattern_alternate( pattern_concat(Any('0123456789'), (Arbno('0123456789'))),emptyregex),  pattern_alternate( pattern_concat(Any('eE'),  pattern_concat( pattern_alternate(Any('-+'),emptyregex),  pattern_alternate( pattern_concat(Any('012'),  pattern_concat(Any('0123456789'), Any('0123456789'))), pattern_alternate( pattern_concat("30", Any('012345678')), pattern_alternate( pattern_concat(Any('0123456789'), Any('0123456789')),Any('0123456789')))))),emptyregex))));

end

This fails to execute with a runtime error:

test061

Run-time error 127
File test061.icn; Line 7
pattern expected
offending value: &null
Traceback:
   main()
   pattern_alternate(pattern,&null) from line 7 in test061.icn

The problem appears to be the unitialised variable emptyregex. If this is changed to "" (the empty string), there is no runtime error found.

@cjeffery
Copy link
Contributor Author

cjeffery commented Nov 10, 2025

I confirm Bruce's report. I don't recall whether emptyregex was a pure figment of my imagination, but it functions as one here and its semantics is simple enough to infer. Probably I believed the pattern type already had an epsilon of some kind and we could look it up and use it instead of the empty string here. But the empty string would seem to be OK. At least, I confirm that it removes the runtime error as per Bruce's comment.

I don't know enough about git to edit an existing pull request. I don't know whether to accept this pull request and then put in another, very small pull request to change emptyregex to "\"\"" in unigram.y (with corresponding changes to unigram.icn and unigram.y), or to reject this pull request and submit a new one. At the moment my corrected regex nonterminal in unigram.y adds a small comment and looks like:

/* The empty string "\"\"", denoting "" in the generated code, serves as an empty regex */
regex: neregex { $$ := regexp($1) }
| { $$ := "\"\"" }
;

@Jafaral
Copy link
Member

Jafaral commented Nov 11, 2025

@cjeffery you opened the PR from the master branch from your fork. It is fine for this PR, but it is good to create a different branch for every feature/fix you add in the future.

To add changes to this PR. Make those changes at your end and commit to the same branch you opened this PR from (master in this case). After that push the update of the branch to your fork. That will trigger an update to this PR.

@Jafaral Jafaral merged commit 8533518 into uniconproject:master Nov 16, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants