serp.fast
← Glossary

XPath

XPath is a query language for navigating XML and HTML documents. Where CSS selectors are tuned for matching by class, ID, and structure, XPath supports navigating in any direction (parent, sibling, ancestor), matching by text content, applying boolean predicates, and computing values inline. An XPath expression like `//div[contains(@class, 'product')]/h2[text()='In Stock']/..` selects every product div whose H2 contains the text "In Stock" — the kind of query that requires JavaScript or multiple steps with CSS selectors. For scraping, XPath is the heavier tool. Use it when you need text-content matching, axis navigation, or complex predicates. Reach for CSS selectors first because they are easier to read and maintain; reach for XPath when the task demands its expressiveness. Most modern scraping libraries (lxml, Scrapy, Playwright, Selenium) support both. For AI builders, XPath shows up most in legacy scraping codebases and in tools that target sites with messy or non-semantic HTML. It also remains useful for scraping XML payloads — RSS feeds, sitemaps, government open-data formats — where CSS selectors do not apply. Learning the basic axis navigation (parent::, ancestor::, following-sibling::) and predicate syntax repays itself the first time you need to reach upward in the DOM tree.

Related tools