How to Filter and Traverse DOM Tree with JavaScript
Did you know there’s a JavaScript API whose sole mission is to filter out and iterate through the nodes we want from a DOM tree? In fact, not one but there are two such APIs: NodeIterator
and TreeWalker
. They’re quite similar to one another, with some useful differences. Both can return a list of nodes that are present under a given root node while complying with any predefined and/or custom filter rules applied to them.
The predefined filters available in the APIs can help us target different kinds of nodes such as text nodes or element nodes, and custom filters (added by us) can further filter the bunch, for instance by looking for nodes with specific contents. The returned list of nodes are iterable, i.e. they can be looped through, and we can work with all the individual nodes in the list.
Read Also: Understanding Document Object Model (DOM) in Details
How to use the NodeIterator
API
A NodeIterator
object can be created using the createNodeIterator()
method of the document
interface. This method takes three arguments. The first one is required; it”s the root node that holds all the nodes we want to filter out.
The second and third arguments are optional. They are the predefined and custom filters, respectively. The predefined filters are available for use as constants of the NodeFilter
object.
For example, if the NodeFilter.SHOW_TEXT
constant is added as the second parameter it will return an iterator for a list of all the text nodes under the root node. NodeFilter.SHOW_ELEMENT
will return only the element nodes. See a full list of all the available constants.
The third argument (the custom filter) is a function that implements the filter.
Here is an example code snippet:
<!doctype html> <html lang='en'> <head> <meta charset='UTF-8'> <title>Document</title> </head> <body> <header><h1>title</h1></header> <div id='wrapper'> this is the page wrapper <p>Hello</p> <p>How are you?</p> </div> <span>txt</span> <a href='#'>some link</a> <footer>copyrights</footer> </body> </html>
Assuming we want to extract the contents of all the text nodes that are inside the #wrapper
div, this is how we go about it using NodeIterator
:
var div = document.querySelector('#wrapper'); var nodeIterator = document.createNodeIterator( div, NodeFilter.SHOW_TEXT ); while(nodeIterator.nextNode()) { console.log(nodeIterator.referenceNode.nodeValue.trim()); } /* console output [Log] this is the page wrapper [Log] Hello [Log] [Log] How are you? [Log] */
The nextNode()
method of the NodeIterator
API returns the next node in the list of iterable text nodes. When we use it in a while
loop to access each node in the list, we log the trimmed contents of every text node into the console. The referenceNode
property of NodeIterator
returns the node the iterator is currently attached to.
As you can see in the output, there are some text nodes with just empty spaces for their contents. We can avoid showing these empty contents using a custom filter:
var div = document.querySelector('#wrapper'); var nodeIterator = document.createNodeIterator( div, NodeFilter.SHOW_TEXT, function(node) { return (node.nodeValue.trim() !== "") ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT; } ); while(nodeIterator.nextNode()) { console.log(nodeIterator.referenceNode.nodeValue.trim()); } /* console output [Log] this is the page wrapper [Log] Hello [Log] How are you? */
The custom filter function returns the constant NodeFilter.FILTER_ACCEPT
if the text node is not empty, which leads to the inclusion of that node in the list of nodes the iterator will be iterating over. Contrary, the NodeFilter.FILTER_REJECT
constant is returned in order to exclude the empty text nodes from the iterable list of nodes.
How to use the TreeWalker
API
As I mentioned before, the NodeIterator
and TreeWalker
APIs are similar to each other.
TreeWalker
can be created using the createTreeWalker()
method of the document
interface. This method, just like createNodeFilter()
, takes three arguments: the root node, a predefined filter, and a custom filter.
If we use the TreeWalker
API instead of NodeIterator
the previous code snippet looks like the following:
var div = document.querySelector('#wrapper'); var treeWalker = document.createTreeWalker( div, NodeFilter.SHOW_TEXT, function(node) { return (node.nodeValue.trim() !== "") ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT; } ); while(treeWalker.nextNode()) { console.log(treeWalker.currentNode.nodeValue.trim()); } /* output [Log] this is the page wrapper [Log] Hello [Log] How are you? */
Instead of referenceNode
, the currentNode
property of the TreeWalker
API is used to access the node to which the iterator is currently attached. In addition to the nextNode()
method, Treewalker
has other useful methods. The previousNode()
method (also present in NodeIterator
) returns the previous node of the node the iterator is currently anchored to.
Similar functionality is performed by the parentNode()
, firstChild()
, lastChild()
, previousSibling()
, and nextSibling()
methods. These methods are only available in the TreeWalker
API.
Here’s a code example that outputs the last child of the node the iterator is anchored to:
var div = document.querySelector('#wrapper'); var treeWalker = document.createTreeWalker( div, NodeFilter.SHOW_ELEMENT ); console.log(treeWalker.lastChild()); /* output [Log] <p>How are you?</p> */
Which API to choose
Choose the NodeIterator
API, when you need just a simple iterator to filter and loop through the selected nodes. And, pick the TreeWalker
API, when you need to access the filtered nodes’ family, such as their immediate siblings.
Read Also: 15 JavaScript Methods For DOM Manipulation for Web Developers