internal package
Foswiki::Query::HoistREs
internal package
Foswiki::Query::HoistREs
Static functions to extract regular expressions from queries. The REs can
be used in caching stores that use the Foswiki standard inline meta-data
representation to pre-filter topic lists for more efficient query matching.
See
Store/QueryAlgorithms/BruteForce.pm
for an example of usage.
Note that this hoisting is very crude. At this point of time the
functions don't attempt to do anything complicated, like re-ordering
the query. They simply hoist up expressions on either side of an AND,
where the expressions apply to a single domain.
The ideal would be to rewrite the query for AND/OR evaluation i.e. an
expression of the form (A and B) or (C and D). However this is
complicated by the fact that there are three search domains (the web
name, the topic name, and the topic text) that may be freely
intermixed in the query, but cannot be mixed in the generated search
expressions. The problem becomes one of rewriting the query to
separate these three sets. For example, a query such as:
name='Topic' OR Field='maes' OR web='Trash'
requires three searches. We have to filter on name='Topic', and
separately filter on Field='maes' and then union the sets.
This gets complicated when the sets are intermixed; for example,
(name='Topic' OR Field='maes') AND (web='Trash' OR Maes="field")
Because the Field= terms on each side of the AND could potentially
match any topic, we can't usefully hoist the name= or web= sub-terms.
We can, however, hoist the Field subqueries. Now, what happens when we
have an expression like this?
(name='Topic' OR Field='maes') AND (web='Trash')
Obviously we can pre-filter on the web='Trash' term, but we can't
filter on name="Topic" because it is part of an OR.
If you think I'm making this too complicated, please feel free to
implement your own superior heuristics!
StaticMethod
hoist($query) → \%regex_lists
Main entry point for the hoister.
Returns a hash where the keys are the aspects to be tested
(web|name|text) and the AND terms represented as lists of regexes,
each of which is one OR term.
There are also keys named "(web|name|text)_source" where the list
contains what the user entered for that term.