Picture Source: Chapter illustration created by Madalina Tantareanu for the book Eloquent JavaScript available under a Creative Commons attribution-noncommercial license

Compelling use case of Regex Programming

Ravindra Kondekar

--

Constructing regex (Regular Expressions) has charm of its own! Regex provides tremendous utility in text processing. Several programming languages, text editors and applications support regex. Regex programming could save you from writing a lengthy source code that parses text directly. As you know lesser the code lesser the bugs, so this is a good idea. Regex helps not only to validate input strings but also to capture substrings and do substitutions.

You always need some good debuggers to test and verify your regex patterns. Various free and freemium online regex debugging tools are available and I will talk about a couple of them.

The problem:

As a breather from typical regex examples of Email IDs and IPv4 addresses, let me take an example of a map naming scheme. National surveying & mapping agencies publish map indexing schemes, which are then used by users for identifying and ordering digital or printed maps. These map names are also used in map based software applications. Survey of India has published such a map naming scheme, for its Open Series Maps (OSM).

Figure 1 and Figure 2 show two sample string in OSM naming scheme:

Figure 1: OSM Coding Scheme F1-F2-F3-F4-F5
Figure 2: OSM Coding Scheme F1-F2-F3-F6

Let’s understand the coding scheme by dividing the string into six fields as shown in the figures above. Field F1 indicates a zone. Fields F2 to F6 are recursive subdivisions alternatively named with letters and numbers, with F5 indicating the smallest subdivision, as follows:

Refer Figure 1 above,

  • F1: Width of 3 characters and is formed of rows named alphabetically from B to J and columns numbered from 42 to 47. Since some zones fall outside the land and into the seas, those are invalid zones — skipping the details about which ones.
  • F2: Width of 1 character and valid values are letters A to X.
  • F3: Width of 2 characters and valid values are numbers 1 to 16; single digit numbers are prefixed with a 0.
  • F4: Width of 1 character and valid values are letters A to Y.
  • F5: Width of 2 characters and valid values are numbers 1 to 25; single digit numbers are prefixed with a 0.

Refer Figure 1 and Figure 2,

  • F6: Width of 2 characters, named after four directions, NW, NE, SE, SW, as shown in Figure 2. It is an alternative field for F4-F5.

Thus a valid map name is formed with following combinations:

  • Just F1
  • F1, F2 sequence
  • F1, F2, F3 sequence
  • F1, F2, F3, F4 sequence
  • F1, F2, F3, F4, F5 sequence
  • F1, F2, F3, F6 sequence

It also helps to get these fields extracted in an associative array for further processing.

Note: Since the focus here is on regex programming, the details of what each of those field values mean is not covered for simplicity.

Solution:

I am demonstrating the regex pattern using Python programming language. Here is a Python code fragment that validates and parses strings for OSM naming scheme, using regex:

comp = re.match(r”^(?:(?P<mill>J43|I4[34]|[GH]4[2–7]|F4[2–6]|E4[3–5]|[CD]4[2–4]|[BCD]46)(?:(?P<quartm>[A-X])(?:(?P<fiftyk>0[1–9]|1[0–6])(?:(?P<twentyfivek>[NS][EW])|(?:(?P<tenk>[A-Y])(?P<twok>[0–1][1–9]|2[0–5])?))?)?)?)$”, name)if(not comp):
raise ValueError(“invalid Map name”);
try:
zone = comp.group(‘mill’)
alpha1 = comp.group(‘quartm’)
numstr1 = comp.group(‘fiftyk’)
alpha2 = comp.group(‘twentyfivek’)
alpha3 = comp.group(‘tenk’)
numstr2 = comp.group(‘twok’)

The variables related to missing fields will automatically have value of None.

Debugging tools

Let’s now talk about the tools that should help to construct, test and debug such a regex. There are several online regex applications that are available and new ones coming up. You should consider a regex application that provides a railroad diagram for your regex pattern and Debuggex is one such application. See below for how the railroad diagram for our use case appears in Debuggex.

Regex101 is another promising tool. It is smart, user-friendly, fast, feature-rich and mobile friendly web application. Its well laid out screen provides everything you would look for, in foldable panes. It provides option to create permalink to share your work with others. Here is permalink for our problem.

I have employed this implementation in my application, that shows these map extents on Google Maps.

Regular expression suits best for solving the problem like the one mentioned in this article and we should be always looking out for such problems that regex makes easy. You should also find the regex debuggers very useful. Good luck for your regex programming tasks.

--

--