Find

  • Find the location of a specific text string within a variable; or

  • Find all text matching a regular expression within a variable

What is a regular expression?

Often, when you search for data in a text, you are looking for all the text that matches a certain pattern, rather than for specific text itself. A regular expression (often abbreviated regex) is a sequence of characters that represents the pattern you are searching for.

  • To learn more about regular expressions, see this article on the Microsoft Developer Network.

  • For an online regex tester and reference, check out this website: regular expressions 101.

To find specific text

When you search for the location of a specific text, the wizard will return a number indicating the character position of the first instance of the text (based on the search direction you select).

  • Character position is counted from the top left corner of a variable and includes spaces

  • If the text you are searching for contains multiple characters, the location of the first character in the string will be returned

  • If the text you are searching for is not found, the value 0 will be returned

  1. Enter the name of the variable in which you would like to search

  2. Select Find Text

  3. Enter the:

    • text for which you would like to search (free text and/or values copied from different variables);

    • character position at which you would like to begin searching; and

    • search direction: search Start to End or End to Start

  4. Indicate whether you wish to allow close matching of the specified text; and

    the level of accuracy required for the close match to be accepted

  5. Indicate whether letter case should be ignored when identifying matching text

    • If unchecked, only text in the same case as the text entered is considered a match

    • When close match is allowed, letter case is always ignored

  6. Enter the name of the variable into which to place the search result

What is a close match?

Close match (sometimes referred to as fuzzy match) allows a certain level of flexibility in the matching of visually similar characters – such as the number 1 and a lowercase L – which can be very useful when working with scanned documents.

How close does the match need to be?

By using the Required accuracy slider, you determine how visually similar the characters need to be in order for the match to be accepted.

For example, with a high required accuracy setting:

  • The wizard would likely accept the word c1ose as matching the word close because the number 1 is highly visually similar to a lowercase L.

  • The wizard would NOT likely accept the word ad ress as matching the word address because a blank space is not highly similar to a lowercase D.

In order for the word ad ress to be accepted as matching the word address, you would need to specify a lower required accuracy setting.

Example

Finding the location of specific text

quote = I think, therefore I am.

Result: find result = 3

To find text matching a regular expression

When you search for text matching a regular expression, the wizard will return all instances of text matching the pattern, separated by a delimiter you specify.

  1. Enter the name of the variable in which you would like to search

  2. Select Find text matching a regular expression

  3. Choose the search type to use:

    • Full match is the standard search type: Each text matching the regex pattern is returned, separated by a delimiter. To learn more, see the full match example below.

    • Capture groups is a more complex search type: The regex pattern defines a group, which can be further subdivided into individual elements

      • Each group matching the regex pattern is returned, separated by the group delimiter

      • The individual elements within each group are separated by the match delimiter

  4. To learn more, see the Capture groups example below.

  5. Enter the regular expression pattern for which you would like to find matching text

    • The regular expression pattern can include:

      • free text and/or variables

      • line breaks

  6. Enter the delimiter to separate each matching text found

    If you are using the Capture groups search type, you must specify both the group delimiter and the match delimiter

  7. Try it out: (Optional) To ensure that your regular expression will provide the expected results, try some test data. Simply click the TEST link from within the Find command.

  8. Indicate whether letter case should be ignored when identifying matching text

    • If unchecked, only text with the same case as that entered will be considered a match

  9. Enter the name of the variable into which you'd like to place the search result

Example

FULL MATCH search type

Let's say you have copied a large block of text from a web page, and you want to extract all telephone numbers from this text. (Note: The regular expression used in this example will find all phone numbers in the following formats: 444-555-1234; 444.555.1234; 4445551234.)

regex text =

abcdefghijklmnopqrstuvwxyz

ABCDEFGHIJKLMNOPQRSTUVWXYZ

0123456789 _+-.,!@#$%^&*();\/|<>"'

12345 -98.7 3.141 .6180 9,000 +42

555.123.4567 800-555-2468

foo@demo.net bar.ba@test.co.uk

www.demo.com http://foo.co.uk/

http://regexr.com/foo.html?q=bar

https://mediatemple.net

Result: regex result = 0123456789;555.123.4567;800-555-2468

Example

CAPTURE GROUPS search type

Let's say you have copied the HTML code of a table from a web page. Each row represents a different item available for sale. Within each row is a column for the item name and a column for quantity on hand.

You want to extract the data from each row (excluding the table header) as a complete row. You also want to parse the data from each row so that the item name is separated from the quantity.

regex text =

<table>

<tr><th>Item</th><th>Quantity</th></tr>

<tr><td>laptop</td><td>15</td></tr>

<tr><td>keyboard</td><td>12</td></tr>

<tr><td>mouse</td><td>5</td></tr>

<tr><td>LCD</td><td>8</td></tr>

</table>

Result:

regex result = laptop^^15%keyboard^^12%mouse^^5%LCD^^8