Two New RE (regex) Functions

I have recently been spending countless hours tinkering with regular expressions (for a project that I’ll blog within the next few days, I hope). ColdFusion has great regular expression support, but the REFind() and REFindNoCase() functions annoy me no end. I like that I can opt for a position and length to be returned instead of just a position, but why does the array just contain the first instance? Almost any regex code I write needs to include looping code to call REFind() repeatedly shifting the position each time. That’s just plain silly, especially when the returned values are arrays (heck, why even bother using arrays if all that is ever returned is a single element?).
So, I wrote my own REFindAll() function. It does exactly what REFind() does (taking the same first two parameters) and returns he same array that REFind() does, but that this version returns all instances (one array element per instance). Enjoy.

And for the sake of completeness, here’s REFindNoCaseAll(), the case-insensitive version of the UDF.

4 responses to “Two New RE (regex) Functions”

  1. Matthew Walker Avatar
    Matthew Walker

    > why even bother using arrays if all that is ever returned is a single element?)
    The array is used to return subexpressions (i.e. parenthesised bits). For example: with refind("<([[:alpha:]]+)( [^>]*)?>", myHTML, 1, true) then the first array info would be the position and length of the whole string. The second would be the pos and len of the tag name, and the third would be the pos and len of the attributes.
    Nice looking UDF — I guess with a two dimensional array it could do both…

  2. Ben Forta Avatar
    Ben Forta

    Valid points, I overstated my annoyance. Having said that, supporting both would be ideal, and until then I guess I’ll be using my UDF (or some hybrid as Matthew suggested).

  3. Matthew Walker Avatar
    Matthew Walker

    > perfectly reasonable
    Maybe so but I think for many who first encounter this function it’s rather counter-intuitive.
    So what happens if you change
    <CFSET ArrayAppend(results.len, subex.len[1])>
    <CFSET ArrayAppend(results.pos, subex.pos[1])>
    <CFSET ArrayAppend(results.len, subex.len)>
    <CFSET ArrayAppend(results.pos, subex.pos)>
    (possibly you could add a returnsubexpressions boolean argument to control this)…

  4. seancorfield Avatar

    I like the way REFind() works and use it with multiple pattern subexpressions. Matthew’s example shows why the current behavior is perfectly reasonable.

Leave a Reply