Add more SEO in user created HTML with PHP DOM methods

PHP can edit DOM, so we can use this feature for make some SEO tricks for user inputted HTML, for example, blog post. If we will find images – let’s add title and alt for them. The same action we can do for links: add nofollow and target _blank for external links. And let’s do that!

PHP DOM : Quick intoduction on simple examples

Work with PHP DOM, is similar to work with others DOM-working adapters in PHP: you must load DOM into some variable, before start editing. If you are familiar with PHP SimpleXML – PHP DOM will not something new for you, it’s only another commands.

In this post I will edit this markup:


<p>As you can see, this site uses Highlight.js too and you can check how will looks our result. To get lib code, you can visit <a rel="nofollow" href="https://github.com/SashaDesigN/highlight.js">GitHub repo</a>. Or you can download customized version of lib on it <a href="https://highlightjs.org/download/">website</a>.</p>

<img data-after="images/1444298151.png">

<p>The code above do the injection of Highlight.js and init of it class, which do code more beautiful. Let's looking more deeper: this code highlights lib has more than 30 different styles, which you can find in <code>highlight/styles</code> a folder.</p>

And I will store this HTML in $_POST['text'] variable like I will send it from the form. Now we can load it into PHP:


$dom = new DomDocument();
$html = $_POST['text'];
$dom-<loadHTML($html);

We create the new instance of DomDocument object and load our markup. Here is one painful thing: work with non-latin symbols and loadHTML. To fix this, you can use mb_convert_encoding to convert text before using it in loadHTML.


$dom = new DomDocument();
$html = mb_convert_encoding($_POST['text'], 'HTML-ENTITIES', "UTF-8");
@$dom->loadHTML($html);

After loading of DOM we can do anything with our object. As I say before, in this post we will make our content more SEO-friendly. To to this, let’s start from external links.

PHP DOM: Add nofollow and _blank to links

Idea is a pretty simple: get all links, make a loop on all links and check if a current link has a target or nofollow attributes: if not – add them; after loop save changes in DOM.


# select all links
$links = $dom->getElementsByTagName('a');
if($links instanceof DOMNodeList){
  foreach($links as $a){
    # check if this is external link
    if(strpos($a->attributes['href']->value,'http://')!==false){
      $a->setAttribute('rel','nofollow');
      $a->setAttribute('target','_blank');
    }
  }
}
#save changes
$text = $dom->saveHTML($dom);

In real, $links it’s not a selection of DOM elements, it’s a link to some list of elements, like & in PHP. Because of this the next line of code it’s the test instanceof DOMNodeList, which check is $links variable a part of DOM or we select nothing.

Inside of the loop, we must check if the current link is external and if it’s true – set rel and target attributes.

PHP DOM: add alt and title to images

Now we can do the same action for all images inside our DOM:


$images = $dom->getElementsByTagName('img');
if($images instanceof DOMNodeList){
  foreach($images as $i){
    if( !$i->hasAttribute('alt') || !strlen($i->attributes['alt']->value)){
      $i->setAttribute('alt', $ex['title']);
    }
    if( !$i->hasAttribute('title') || !strlen($i->attributes['title']->value)){
      $i->setAttribute('title', $ex['title']);
    }
  }
}

After saveHTML inside $text you can see advanced HTML tags, which you didn’t send, for example <html> and <body>. To get them out I will like to use this PHP code, which cut our markup using PHP string methods:


$a = $dom->saveHTML($dom);
$a = substr($a,strpos($a, '<body>')+6,strlen($a));
$a = substr($a,0,strpos($a, '</body>'));

Now out the content are more SEO friendly and we can save it into the database. Don’t forget use mysql_real_escape_string before paste this HTML into you table.

PHP DOM is simple to use, but get a real power to play with our content as we need, and this post it’s a simple working example of it.