Wednesday, March 12, 2014

Digits in Regular Expression

There are two ways for matching any digit via regular expression: \d and [0-9].
[0-9] matches any arabic numeral, i.e. 0,1,2,3,4,5,6,7,8,9;
\d matches any unicode number.

In addition to arabic numeral unicode contains more than 300 numbers from different cultures. For example, indian numbers  (0),  (1),  (2), etc.

With simple C# script all possible unicode numbers can be found (up to 65536 characters)
0: 0,٠,۰,߀,०,০,੦,૦,୦,௦,౦,೦,൦,๐,໐,༠,၀,႐,០,᠐,᥆,᧐,᭐,᮰,᱀,᱐,꘠,꣐,꤀,꩐,0
1: 1,١,۱,߁,१,১,੧,૧,୧,௧,౧,೧,൧,๑,໑,༡,၁,႑,១,᠑,᥇,᧑,᭑,᮱,᱁,᱑,꘡,꣑,꤁,꩑,1
2: 2,٢,۲,߂,२,২,੨,૨,୨,௨,౨,೨,൨,๒,໒,༢,၂,႒,២,᠒,᥈,᧒,᭒,᮲,᱂,᱒,꘢,꣒,꤂,꩒,2
3: 3,٣,۳,߃,३,৩,੩,૩,୩,௩,౩,೩,൩,๓,໓,༣,၃,႓,៣,᠓,᥉,᧓,᭓,᮳,᱃,᱓,꘣,꣓,꤃,꩓,3
4: 4,٤,۴,߄,४,৪,੪,૪,୪,௪,౪,೪,൪,๔,໔,༤,၄,႔,៤,᠔,᥊,᧔,᭔,᮴,᱄,᱔,꘤,꣔,꤄,꩔,4
5: 5,٥,۵,߅,५,৫,੫,૫,୫,௫,౫,೫,൫,๕,໕,༥,၅,႕,៥,᠕,᥋,᧕,᭕,᮵,᱅,᱕,꘥,꣕,꤅,꩕,5
6: 6,٦,۶,߆,६,৬,੬,૬,୬,௬,౬,೬,൬,๖,໖,༦,၆,႖,៦,᠖,᥌,᧖,᭖,᮶,᱆,᱖,꘦,꣖,꤆,꩖,6
7: 7,٧,۷,߇,७,৭,੭,૭,୭,௭,౭,೭,൭,๗,໗,༧,၇,႗,៧,᠗,᥍,᧗,᭗,᮷,᱇,᱗,꘧,꣗,꤇,꩗,7
8: 8,٨,۸,߈,८,৮,੮,૮,୮,௮,౮,೮,൮,๘,໘,༨,၈,႘,៨,᠘,᥎,᧘,᭘,᮸,᱈,᱘,꘨,꣘,꤈,꩘,8
9: 9,٩,۹,߉,९,৯,੯,૯,୯,௯,౯,೯,൯,๙,໙,༩,၉,႙,៩,᠙,᥏,᧙,᭙,᮹,᱉,᱙,꘩,꣙,꤉,꩙,9 

Code

In online regex tool you can find the proof for this unicode test.
By the way java script does not support unicode in regular expressions by default, so there \d is the same as [0-9].
And here is code in C# that collects all numbers:

Does your e-mail checking regular expression have protection from unicode special numbers?
Or they will appear in a company database? :)

var stringBuilder = new StringBuilder();
 
 var digitRegex = new Regex(@"\d");
 var charDigitGroups = Enumerable.Range(Char.MinValue, Char.MaxValue)
                                 .Select(Convert.ToChar)
                                 .Where(ch => digitRegex.IsMatch(ch.ToString()))
                                 .GroupBy(ch => Char.GetNumericValue(ch));
 
foreach (var charGroup in charDigitGroups)
{
      string joinedValues = String.Join(",", charGroup);
      string rowResult = String.Concat(charGroup.Key.ToString(), ": ", joinedValues);
      stringBuilder.AppendLine(rowResult); 
} 

Idea is based on Turkey Test.

No comments:

Post a Comment