Skip to content

Strict Text wrapper unnecessarily decodes to Char for text-2.* #280

@sol

Description

@sol

Looking at the code:

type AlexInput = (Char, -- previous char
[Byte], -- pending bytes on current char
Data.Text.Text) -- current input string

This can be replaced by:

type AlexInput = (Int,            -- current offset
                  Data.Text.Text) -- input string

alex/data/AlexWrappers.hs

Lines 98 to 105 in e65958c

alexGetByte :: AlexInput -> Maybe (Byte,AlexInput)
alexGetByte (c,(b:bs),s) = Just (b,(c,bs,s))
alexGetByte (_,[],s) = case Data.Text.uncons s of
Just (c, cs) ->
case utf8Encode' c of
(b, bs) -> Just (b, (c, bs, cs))
Nothing ->
Nothing

This can be replaced by:

alexGetByte :: AlexInput -> Maybe (Byte,AlexInput)
alexGetByte (cur, input) = case input of
  Text arr off len
    | cur < len = Just (unsafeIndex arr (off + curr) (cur + 1, input))
    | otherwise = Nothing

The only thing "complicated" is alexInputPrevChar, basically you go back from cur one byte at a time until you have seen two character boundaries and then do a character decode at that position.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions