Xpath
💡 language for locating XML nodes
🕷️ XPath stands for XML Path Language. 📝 It is a query language for selecting nodes from an XML document. 👩💻 It allows you to navigate elements and attributes in an XML document.
💻 How do you use XPath? 🤓 You use XPath to locate or select elements in an XML document. 🔢 XPath uses path expressions to identify nodes. ✅ XPath expressions are evaluated to a node set.
🔎 How does XPath work? 📂 XPath selects nodes in an XML document. 📜 It does this using path expressions that resemble the path to a file on a file system: /parent/child
XPath Expressions:
Tag-name[@attribute='value'] Example:
<input type="text" placeholder="Username">
Expression://input[@placeholder='Username']
Explanation: Selects an input element with the placeholder attribute set to "Username".Tag-name[@attribute='value'][index] Example:
<button class="submit">Submit</button> <button class="submit">Confirm</button>
Expression://button[contains(@class,'submit')][1]
Explanation: Selects the first button element that contains the class name "submit".Parent-Child Relationship Example:
<header><div><button>Submit</button><button>Cancel</button></div></header>
Expression:header/div/button[1]/following-sibling::button[1]
Explanation: Selects the first following sibling button element of the first button element inside a div element, which is a child of a header element.Parent Axis Example:
<header><div><button>Submit</button></div></header>
Expression:header/div/button[1]/parent::div
Explanation: Select the parent div element of the first button element inside a div element, which is a child of a header element.
Relative vs Absolute XPath
The main difference between relative and absolute XPath lies in how they locate elements based on their position in the HTML document.
Let's explore these differences with Java examples:
Relative XPath Example:
In this example, we use a relative XPath expression to locate an input element with the ID attribute equal to "username". The XPath begins with "//" to search for the desired element anywhere in the HTML document relative to the current context. Relative XPath expressions are flexible and adaptable to changes in the HTML structure, making them a preferred choice in many scenarios.
Absolute XPath Example:
Here, we use an absolute XPath expression to locate the same input element. The XPath starts from the root node (/html
) and specifies the complete path to the element, including its position within the HTML structure. Absolute XPath expressions provide the full path from the root node to the target element, making them less flexible and more prone to breaking if there are changes in the HTML structure.
Key Differences:
Flexibility: Relative XPath expressions are more flexible as they search for elements relative to a specific context, making them adaptable to changes in the HTML structure. Absolute XPath expressions, on the other hand, rely on the complete path from the root node and are less flexible.
Maintainability: Relative XPath expressions are generally more maintainable because they are less affected by changes in the HTML structure. Absolute XPath expressions can become brittle and require updating if there are any modifications to the HTML structure.
Reusability: Relative XPath expressions can be reused for different elements as they start from a specific element and navigate from there. Absolute XPath expressions are specific to the element's exact position in the HTML structure and may not be easily reusable.
Preferred Approach: Relative XPath is often the preferred approach in most scenarios due to its flexibility and adaptability to changes in the HTML structure. Absolute XPath is typically used when there is a fixed and known HTML structure.
Here's an example to illustrate the syntax differences between relative and absolute XPaths:
Suppose we have the following HTML code:
Syntax
//div[@class='container']//li/a
/html/body/div[1]/ul/li/a
Flexibility
Can use various XPath axes to navigate the document tree in different ways, such as // to select any descendant element, . to select the current node, .. to select the parent node, and @ to select an attribute. For example, //div[@class='container']/h1 selects the h1 element that is a child of the div element with class='container'.
Relies on the absolute position of elements in the HTML structure. For example, /html/body/div[1]/ul/li[2]/a selects the second a element that is a child of the second li element that is a child of the ul element that is a child of the first div element that is a child of the body element.
Readability
Uses short, descriptive expressions to identify elements. For example, //div[@class='container']//li/a selects all a elements that are descendants of li elements that are descendants of the div element with class='container'.
Requires long, complex expressions that specify the full path to an element. For example, /html/body/div[1]/ul/li[2]/a specifies the full path to the second a element in the list.
Maintainability
Adapts to changes in the document tree by using relative paths that are independent of the absolute location of elements. For example, if we add a new div element around the ul element, the relative XPath //div[@class='container']//li/a would still work.
May require updates when changes occur. For example, if we add a new div element around the ul element, the absolute XPath /html/body/div[1]/ul/li[2]/a would need to be updated to /html/body/div[1]/div[1]/ul/li[2]/a.
Performance
More efficient at locating elements and requires less processing time. For example, //div[@class='container']//li/a selects all a elements that are descendants of li elements that are descendants of the div element with class='container', without having to traverse the entire HTML document.
Requires more processing time to traverse the entire HTML document and locate elements. For example, /html/body/div[1]/ul/li[2]/a requires the script to traverse the entire HTML document to locate the second a element in the list.
What are the differences between single slash (/) and double slash (//) in XPath?
Single slash (/) - Selects from root node:
Specifies an absolute XPath that starts from the root/node
Selects a specific node relative to the root node
Example:
This selects the div that is a direct child of the body which is a direct child of the html root node.
Double slash (//) - Selects from anywhere in the DOM:
Specifies a relative XPath that selects from any node in the HTML document
Does not start from the root but can match anywhere in the DOM tree
Example:
This selects any div element anywhere in the HTML document, not just direct children.
Here is an explanation of the rules of XPath including relative and absolute XPath, with bold header, emojis and code examples:
📏 Rules of XPath 📏
XPath is used to navigate and locate XML/HTML elements.
🔀 Absolute XPath Rules
Absolute XPath provides the full path from the root element.
Starts with single slash / for root element
Use double slash // to select descendant elements
Specify full path from parent to target node
Can be prone to issues if HTML changes
📐 Relative XPath Rules
Relative XPath starts from current node without specifying full path.
Starts with double slash //
Uses current node as reference not root
More resilient to changes
Can use dot . to refer to current node
✅ Best Practices
Prefer relative XPath over absolute
Use unique ID attribute if available
Index matching nodes like [2] if multiple matches
Avoid complex long XPath expressions
So following XPath rules and best practices helps create resilient and maintainable locators for test automation.
Last updated