Commit Graph

5 Commits

Author SHA1 Message Date
Abduqodiri Qurbonzoda fb31a29c39 [K/N] Fix stack overflow in regex when a quantifier is matched many times
Motivation:

Users often expect simple patterns, like `[a]+` or `[^a]+`, to work fast
and without any problems, even with long strings.
Char class from the first pattern matches only 'a' and gets wrapped into
LeafQuantifierSet, and works fine with long strings indeed.
Char class from the second pattern, however, matches any character
except 'a', including supplementary code points. So, the number of chars
it consumes is not known beforehand. There is no optimization for such
char classes, and if they are matched multiple times, the stack memory
gets exhausted.

Modification:

Introduce FixedLengthQuantifierSet node.
The node represents quantifier over constructs that consume a fixed
amount of characters for a given string and index. Such constructs don't
need backtracking to find a different match. Thus, it is possible for
the node to avoid recursion when matching multiple times.

Result:

Fixes KT-46211, KT-35508 and probably KT-39789. Reproducer for the
latter issue is no longer available, but error stacktrace resembles
those of the other issues.
2022-12-19 16:40:51 +00:00
Abduqodiri Qurbonzoda 1cb5cab28f [K/N and WASM] Fix ESCAPE and COMMENTS modes 2022-04-05 15:21:31 +00:00
Ilya Matveev f7468cf9bc [K/N][stdlib] Support \V, \v, \H, \h, \R in regex engine
Issue #KT-50742 Fixed
2022-02-21 13:20:23 +07:00
Igor Yakovlev d30a4fa4d5 [WASM/Native] Split AllCodePointsTest into two separate tests 2022-02-03 21:25:59 +01:00
Igor Yakovlev d55e16a030 [WASM] Regex std implementation 2021-12-07 21:33:31 +03:00