An Introduction to Wildcard Searches with Power Search
This guide gives a brief introduction to using wildcard matching within Power Search.
What are Wildcard Searches?
Wildcard searches are a simple way of easily matching patterns of text. The pattern can be very straight-forward, or very complex. We've included wildcard support in Inspyder Finder to make it easier for you to find complicated items, or groups of items, on your site.
Basic Wildcards
The basic wildcard character ('*') lets us match zero or more characters in the search pattern. For example; 'a*.html' matches:
- "a.html"
- "ab.html"
- "abc.html"
- etc.
To match a single character, we use the question-mark character '?'. For example; 'a?.html' will match the following:
- aa.html
- ab.html
- ac.html
- etc.
It will NOT match words like:
- a.html
- abc.html
Ranges
To match a character within a range (for example, a numeric character, 0 to 9) we can use ranges. A range is encapsulated within square brackets, [], and contains the set of characters to search for. For example, if we wanted to match 'inventoryX.html' where the 'X' is any numeric digit, we could writeinventory[0123456789].html. This will match the following:
- inventory1.html
- inventory2.html
- inventory3.html
- etc.
It will NOT match words like:
- inventory12.html
As a shorthand, the following is also valid: inventory[0-9].html
Multiple ranges within a single set of brackets is acceptable (for example, the letters A to F and the numbers 0 to 9). For example; inventory[A-F0-9].html will match
- inventory1.html
- inventoryA.html
- inventoryB.html
- inventoryF.html
It will NOT match words like:
- inventoryG.html
It's important to note that a range only matches a single character. To find inventoryXX.html, we'd have to do the following: inventory[0-9][0-9].html. If the search is case sensitive, then [A-Z] and [a-z] are different ranges.
Alternates
We can specify matches that take the form "one or the other" by using alternates. Alternates are enclosed in curly brackets, {}, and each alternate is separated by a comma. For example, if we wanted to find all the words that start with 'ins' and end with 'pyder' or 'ite' we would use the following syntax: Ins{pyder,ite}. This will match the following:
- Inspyder
- Insite
But it will NOT match:
- Inspect
Literals
To match a special character 'as is' (such as '*' or '?'), then it must be preceded with a backslash ('\') in the query text. For example; 'question\?' will match the following:
- question?
But it will NOT match:
- question!
The same holds true for the '{', '}', '[' and ']' characters.
Extracting Data
In addition to Wildcard matches, Power Search provides a powerful mechanism for extracting data from websites. By including "#DataName#" in your search pattern (where "DataName" is the name of the column you wish to extract data into), any text found in place of "#DataName#" will be captured and saved.
For example, the title text of a webpage is stored between HTML title tags as follows:
<title>Inspyder Software Inc.</title>
We could create a Wildcard Match like:
<title>#WebpageTitle#</title>
Remember to check "Include HTML") When we run this search on the website, each page's title will be extracted and stored in a column called "WebpageTitle".
It's possible to include multiple fields in the Wildcard Search. For example:
<IMG*src="#Source#"*alt="#AltText#"*>
This search will extract the "src" and "alt" attributes from all images (where the alt attribute is present). We could enhance this pattern to make the "alt" attribute optional:
<IMG*src="#Source#"*{alt="#AltText#",}*>
By adding the Alternate syntax ("{alt="#AltText#",}") the match includes "alt="#AltText#" OR nothing" (since nothing follows the comma).
Minimum Requirements
- Windows XP SP2 or Higher
- 32-bit and 64-bit supported
- 1GB RAM
- 1GB of available hard disk space