Quick one-line lexer in Perl for cherry picking: https://gist.github.com/nikosvaggalis/e5ede10ffa9c4384245364c722f944b7 just add the ordinal value of your Unicode characters into the $dispatch table together with their replacement. For a more automated solution, try https://metacpan.org/pod/Text::Unidecode