Description
The new regex_capture()
and regex_captures()
functions are great, but often you have to call regex_capture()
multiple times to extract different capture group from the same capture.
For example:
select
regex_capture("('ID(?P<id>\d+)Y(?P<year>\d+)ABC')", "ID123Y2023ABC", "id") as id,
regex_capture("('ID(?P<id>\d+)Y(?P<year>\d+)ABC')", "ID123Y2023ABC", "year") as year;
The regex_captures
make this a bit better, but it's still awkward and you repeat yourself a lot.
Possible solution: A "template" virtual table
A new virtual table that can take in a regex pattern as input, and infer column names based on the pattern's capture group names.
For example:
create virtual table parse_id using regex_captures_template(
'ID(?P<id>\d+)Y(?P<year>\d+)ABC'
);
Would create a parse_id
table function with id
and year
as columns. Then it can be used like so:
select id, year from parse_id('ID123Y2023ABC');
select
items.rowid,
id,
year
from items join parse_id(items.code);
This is the first "template" virtual table that I have seen, besides maybe the define function (but isn't arbitrary code execution).
Also not sure what to do with capture groups without names. Maybe c0
, c1
as column names for those? What about if a capture group has c0
as a name? Probably not too terrible tbh...