Extract from HTML Table/List
Extract all or selected data from a web-based table or list and place it into a new or existing variable.
Throughout this topic, the term "table" is used as a shortcut to refer to both tables and lists. However, as the name of this command suggests, it designed to extract data from both, and the instructions in this topic refer equally to both.
It's easiest to open your web browser and navigate to the desired page prior to using this command. Once your browser is open to the page you want to work with, return to Studio and select the Extract from HTML Table/Listcommand to get started.
Support and Prerequisite: See HTML Commands Support and Prerequisites
Selecting the data to extract
This command employs a "specialized" version of the HTML object selector that is designed to recognize patterns.
Most web pages are comprised of a series of HTML building blocks (or objects). Because a table presents data in a structured format, the HTML objects within a table generally appear in a regular, repeating pattern. By using the Extract from HTML Table/List selector, you actually "teach" the robot to recognize this pattern so that it can extract from the table exactly the data you are looking for.
Throughout this command, the following terms are used –
-
Item: refers to each "row" in a table or list
-
Element: refers to each piece of data presented about each item (in table terminology, this would be called a "column")
Selecting items
When you first open the command, you will be prompted to select items. By selecting just 2 equivalent items (at the same level of the HTML hierarchy), you will teach the robot the pattern for identifying items.
-
Click Select item to invoke the selector.
-
The selector will appear with your web browser open behind it.
-
As you roll over the open page in the web browser, the various HTML objects on the page will be highlighted.
-
Click on the object that represents an item in the table you want to work with.
-
You will be prompted to select a second (equivalent) item.
-
After you select the second item, an item representing the selected pattern will appear in the Select 2 items field.
-
The following additional options will be enabled, allowing you to work the your item selection:
-
When you are satisfied with your selection, click Next to move on to selecting elements.
If the selector window interferes with viewing or selecting the object you need, simply drag the dialog window to a more convenient location on the screen.
|
Briefly highlights the pattern of selected items in the browser |
|
Opens a dialog box allowing you to directly edit selectors used to identify the relevant object |
Selecting elements
-
Now that you've shown the robot what items to select, the selector will prompt you to select the elements you want to extract from each item.
-
Click, one by one, on each element that you want to extract from within the item.
As you click on each element, it will be added to the selector window:
-
As you add elements, you can work with them within the selector window:
-
using the icon at the far left of each element to drag them into the desired order; or
-
clicking the button at the far right of each element to move it up one position
-
Click on the button at any time to see a sample of the data that will be extracted based on your selections:
-
When you are satisfied with your selections, click Finish to return to the main Extract from HTML Table/List dialog.
|
Expands selected element to the next higher object (the "parent object") in the HTML hierarchy, if one exists |
|
Narrows selected element to the next lower object (the "child object") in the HTML hierarchy, if it one exists |
|
Briefly highlights the selected element in the browser |
|
Deletes the selected element from the list |
|
Opens a dialog box allowing you to directly edit selectors used to identify the element |
Recognize your elements
Highly recommended: Click within the Column name field of each element and change the default name (i.e., Element #
) to a name that will be easy to recognize within the extracted data.
Keep your elements in order
By default, elements will appear in the list in the order you selected them. You can change the order in which they appear either by:
Finalize command options
After item/element selections have been made, the Extract from HTML Table/List dialog will appear as follows:
-
Preview a thumbnail image of a typical selected item.
-
Review the list of elements that will be extracted from each item.
-
Click to restart the selection process (if necessary).
-
Enter the name of the variable into which you would like to place the extracted data; and
the delimiters to use to separate each column and row in the returned data.
-
Instruct the wizard how to handle any errors encountered. Read more about error handling.
This command has a built-in timeout monitoring of 5 seconds. See Timeout Monitoring Support for Advanced Commands.
Editing selectors (optional)
After you have selected items and/or elements, the gives you the option to directly edit two types of selectors used by the robot to identify the relevant objects:
-
Inner text selectors (for item selections only)
When you are finished editing the selectors, ensure that the correct object is identified by clicking the button. The selected object will be briefly highlighted in your browser.
Object selectors
The Object selectors section allows advanced users to directly edit the HTML object selectors identified during the select items/elements process. By default, the robot will run through these selectors in the order they appear until a single matching object is detected.
You can use this dialog to:
-
Select/desect selectors: tick/untick the checkbox of any selector to choose whether it should or should not be used in identifying the relevant object
-
Change the order in which the selectors will be processed: reorder the selectors by dragging them into the desired order using the icon at the far left of each selector
-
Modify the selectors altogether: click in a selector's field to directly edit/overwrite its text (can be free text and/or values copied from variables)
Inner text selectors(for item selections only, not applicable to element selections)
The robot can also identify an HTML object by matching the text inside it. By default, Nintex's visual algorithm will determine whether matching inner text will improve accuracy. The Inner text section allows you to override the default setting of Decide for me by choosing:
-
Don't match using text; or
-
Match using this text
When this option is selected, you will be prompted to specify the text and the operator to be used for matching.