Question: I want to extract data from a website. What is the best way to create a Wildcard Match to extract the text I want?
Answer: In our experience, the easiest way to construct the Wildcard Matches is to build the match starting with the page’s HTML code. Follow these steps:
- Load the page you want to extract data from in your web browser.
- If you’re using FireFox, select “View | Page Source” from the main menu bar. If you’re using Internet Explorer, select “View | Source”.
- Locate the text you wish to extract in the HTML source.
- Select the text you wish to extract and the surrounding HTML tags. For example, if we wanted to capture all the title text information from this page, we’d select:
<TITLE>Inspyder - I want to extract data from a website, what is the best way to create a Wildcard Match?</TITLE>
- Now replace the text with the field name you wish to capture the data to. If we wanted to call the page title text “TitleText”, our query would look like:
<TITLE>#TitleText#</TITLE>
- Finally, make sure “Include HTML” is checked in the Query options (otherwise the HTML will be ignored and the query will never match anything).