Search Syntax

Regular expression searches provide a way to do simple or complex searches for strings that match a pattern or set of patterns (branches) separated by vertical bars "|". While a pattern can be built to look for a word or phrase, a simple pattern that consists of a word does not look for only that word but for any place the string of letters that make that word are found. A search for "right" will return verses that contain the word "right", but also "righteous", "right eousness", "unrighteous", "upright" and even "bright". A search for "hall not" is not a search for "hall" AND "not" but for the string "hall not" with a space between the second "l" and the "n". The search for "hall not" will find occurrences of "shall not".

The power of Regular Expressions is in the patterns (or templates) used to define a search. A pattern consists of ordinary characters and some special characters that are used and interpreted by a set of rules. Special characters include .\[^*$?+. Ordinary (or simple) characters are any characters that are not special. The backslash, "\", is used to convert special characters to ordinary and ordinary characters to special.

Example: the pattern "i. love\." will find sentences that end with "his love" or "in love" or " is love" followed by a period. The first period in "i. love \." is a special character that means allow any character in this position. The backslash in "i. love\." means that the period following it is not to be considered a special character, but is an ordinary period.

Rules for Regular Expression Search Requests

  • . The period matches any character.

  • * The asterisk matches 0 or more characters of the preceding: set, character or indicated character.

  • + The plus sign matches 1 or more characters of the preceding: set, character or indicated character.

  • ? The question mark matches 0 or 1 character of the preceding: set, character or indicated character.

  • [ ] Square brackets match any one of the characters specified inside [ ].

  • ^ A caret as the first character inside [ ] means NOT.

  • ^ A caret beginning a pattern anchors the beginning of a line.

  • $ A dollar at the end of a pattern anchors the end of a line.

  • | A vertical bar means logical OR.

  • ( ) Parentheses enclose expressions for grouping. Not supported!

  • \ A backslash can be used prior to any special character to match that character.

  • \ A backslash can be used prior to an ordinary character to make it a special character.

The Period

The Period "." will match any single character even a space or other non-alphabet character. s.t matches sit, set, sot, etc., which could be located in sitting, compasseth and sottish b..t matches boot, boat and beat foot.tool matches footstool and foot tool

The Asterisk

The asterisk "*" matches zero or more characters of the preceding: set, character or indicated character. Using a period asterisk combination ".*" after a commonly found pattern can cause the search to take a very long time, making the program seem to freeze. be*n matches beeen, been, ben, and bn which could locate Reuben and Shebna.

The Plus Sign

The Plus Sign "+" matches one or more characters of the preceding: set, character or indicated character. Using a period and plus sign combination ".+" after a commonly found pattern can cause the search to take a very long time, making the program seem to freeze. be+n matches beeen, been and ben, but not bn.

The Question Mark

The Question Mark "?"matches zero or one character of the preceding: set, character or indicated character. be?n matches ben and bn but not been. trees? matches trees or tree.

The Square Brackets

The Square Brackets "[]" enclose a set of characters that can match. The period, asterisk, plus sign and question mark are not special inside the brackets. A minus sign can be used to indicate a range. If you want a caret "^" to be part of the range do not place it first after the left bracket or it will be a special character. To include a "]" in the set make it the first (or second after a special "^") character in the set. To include a minus sign in the set make it the first (or second after a special "^") or last character in the set. s[eia]t matches set, sit, and sat, but not sot. s[eia]+t matches as above but also, seat, seet, siet, etc. [a-d] matches a, b, c, or d. [A-Z] matches any uppercase letter. [.;:?!] matches ., ;, :, ?, or ! but not a comma. [ ]^-] matches ] or ^ or -

The Caret first in Square Brackets

If the Caret is the first character after the left bracket ("[^") it means NOT. s[^io]t matches set, sat, etc., but not sit and sot.

The Caret as Start of Line Anchor

If the Caret is the first character in a pattern ("^xxx") it anchors the pattern to the start of a line. Any match must be at the beginning of a line. Because of unfiltered formatting characters in some texts, this feature does not always work, but may if a few periods are placed after the caret to account for the formatting characters. ^In the beginning matches lines that start with "In the beginning". (May need to use: ^.....In the beginning)

The Dollar Sign as End of Line Anchor

If the Dollar Sign is the last character ("xxx$") in a pattern it anchors the pattern to the end of a line. Any match must be at the end of a line. Because of unfiltered formatting characters in some texts, this feature does not always work, but may if a few periods are placed before the dollar sign to account for the formatting characters. Amen\.$ matches lines that end with "Amen." (May need to use Amen\....$, Amen\..........$, or even Amen\....................$)

The Vertical Bar

The Vertical Bar "|" between patterns means OR. John|Peter matches John or Peter. John .*Peter|Peter .*John matches John ... Peter or Peter ... John. (.* slows a search) pain|suffering|sorrow matches pain, or suffering, or sorrow.

The Parentheses

The use of Parentheses "( )" is not supported!

The Backslash Prior to a Special Character

The Backslash prior to a special character ("\*") indicates that the character is not being used in its special meaning, but is just to match itself. amen\. matches amen. but not ament and will not locate firmament.

The Backslash Prior to an Ordinary Character

The Backslash prior to an ordinary character ("\o") indicates that the character is not being used to match itself, but has special meaning.

  • \b if use outside [ ] means word boundary. If used inside [ ] means backspace. \brighteous\b matches righteous but not unrighteous or righteousness

  • \B means non-word boundary. \Brighteous\B matches unrighteousness and unrighteously but not righteous, unrighteous or righteousness.

  • \d means digit; same as [0-9].

  • \D means non-digit, same as [^0-9].

  • \s means space.

  • \S means not a space.

  • \w means alphanumeric; same as [a-zA-Z0-9_].

  • \W means not alphanumeric; same as [^a-zA-Z0-9_].