DJ Mike's Tutorials: PHP

Working With Files

< ^ >

Javascript Ad Stripper

Once you have the source code of a webpage stored as a variable, you can do more than just view the unaltered source code. You can use Regular Expressions to remove or replace parts you don't want or extract parts that you do want. One of the problems that WebTV users face, is slow loading, bloated ads on web pages. Advertizing people are like monkeys. They are fasinated by flashing, moving objects and they assume that you to would just love to see a webpage cluttered up with strobing animations running around your screen. For a computer user, they are just irritations but for a WebTV user, they make a webpage take forever to load and sometimes, crash our browser. Enter PHP to the rescue. The ads that are a problem to WebTV users are powered by Javascript. Remove the Javascript, remove the ads.

To make an ad stripper, start with the source code viewer on the previous page and make a few modifications. First, remove the lines that replace the <'s and >'s and echos out the textarea to display the code. It will then output exactly what is put in and show as HTML.

The next step is to remove any case insensitive pattern that starts with <script, ends in /script> and has any kind of chararacter any number of times inbetween.

$source = preg_replace("%<script[.\s\S\n\r]*?/script>%is", "", "$source");

[.\s\S\n\r] will match any character, any space, any new line and any return. You need to include new lines and returns to make sure you get multi-line Javascript. The i modfier after the % delimiter makes the search case insensitive so it also matches <SCRIPT. The s modifier is another way of making it match new lines. The same technique is used to remove some event handlers

After copying the source code of the problem page, all the relative links are broken so you loose some images and some links do not work. To fix that problem, I use regular expressions to find the </body> tag and replace it with a </body> that has a <base href="$url"> before it.. This has an effect on the result page. If you don't have an action attribute for the form or if it is a relative URL, the form data will be submitted to the page that you are removing Javascript from instead of your script. To prevent that, the form needs a full URL for it's action.

Example Source Code Viewer Source Code
Example | Ad Stripper | Source Code

Example | Source Code Viewer | Source Code
Example | Ad Stripper | Source Code
Hide

Using the techniques in ths page, you can make your own version of Simplify. They method I used to remove Javascript can also be used to remove CSS, tables and divs. The method I used to add a base href can also be used to insert your own style sheet.

< ^ >


Created by DJ Mike from Santa Barbara

DJ Mike


Dance Away Santa Barbara's Home Page
<a href="http://www.statcounter.com/" target="_blank"> <img src="http://c5.statcounter.com/counter.php?sc_project=1321035&java=0&security=da2193dc" alt="counter free hit invisible" border="0" /></a>