Overview
XPath Injection is an attack technique targeting applications that use user-supplied input to construct XPath (XML Path Language) queries for searching or accessing XML documents. If the input is not properly sanitized, attackers can inject special characters (like', " , =, |, ) to alter the XPath expression. This can allow them to bypass access controls, extract sensitive information from the XML document, or cause denial of service by crafting complex queries. 📜💉
Business Impact
XPath Injection can lead to:- Information Disclosure: Attackers can retrieve parts of the XML document they shouldn’t have access to, potentially exposing sensitive configuration data, user details, or business information.
- Authentication/Authorization Bypass: If XPath queries are used to check credentials or permissions stored in XML, injection can bypass these checks.
- Data Structure Discovery: Attackers can infer the structure of the underlying XML document.
Reference Details
CWE ID: CWE-643
OWASP Top 10 (2021): A03:2021 - Injection
Severity: High (depending on the data stored in XML)
Framework-Specific Analysis and Remediation
This vulnerability is not tied to a specific web framework but rather to the XML parsing and XPath query libraries used. The core issue is string concatenation to build queries with untrusted input. Defenses include:- Strict Input Validation: Validate user input against an expected format (e.g., allow only alphanumeric characters if searching for a username).
- Escaping/Quoting: Carefully escape quotes within user input before embedding it in string literals within the XPath query. Single quotes (
') are often replaced with',"'", and double quotes (") with,"", within the appropriate string literal. However, this is complex and error-prone. - Parameterized Queries (If Supported): Some libraries might support parameterized XPath queries or variable bindings, which is the most robust solution. This separates the query structure from the user data.
- Python
- Java
- .NET(C#)
- PHP
- Node.js
- Ruby
Framework Context
Using libraries likelxml or Python’s built-in xml.etree.ElementTree and constructing XPath query strings manually.Vulnerable Scenario 1: User Search
Searching an XML user database based on a username from a GET request.Vulnerable Scenario 2: Product Lookup
Looking up a product by ID, where the ID might contain quotes.Mitigation and Best Practices
Avoid constructing XPath queries via string formatting if possible. If you must, strictly validate the input format first. For embedding in string literals, carefully escape quotes. The safest approach forlxml is often to use parameterized XPath queries if the structure allows, or find elements by tag and filter in Python code.Secure Code Example
Testing Strategy
Identify all inputs used in XPath queries. Submit values containing single quotes ('), double quotes ("), pipe (|), equals (=), spaces, and XPath expressions like ' or '1'='1 or '] | /* | /foo[bar='. Observe if the query logic changes, unexpected data is returned, or errors occur.
