Mirror A Website (Or Part Of Website) In Split Second!


Occasionally it is necessary for developers to mirror a website. It is super easy to do this in no time.

1. Getting A Copy Of Original Site:

You can use the powerful tool wget for this. If you are familiar with *nix, you might have used it to fetch a single page or files to your system. But wget is capable of doing lot more than that. To download an entire site, you can use
wget --recursive --page-requisites --convert-links www.avilpage.com
# if you are lazy to type use
# wget -rpk www.avilpage.com
This downloads all components(sounds, images etc)  of www.avilpage.com and make links in downloaded  HTML/CSS to point to local files.

If you want to mirror only a part of website say 2014 November archives only, then you can pass no-parent argument.
wget --recursive --page-requisites --convert-links --no-parent www.avilpage.com/2014/11/
#wget -rpknp www.avilpage.com/2014/11
This downloads all the urls that follow 2014/11.

In addition to that there are a hell lot of options you can pass like number of retries, time to wait, lot of HTTP options, FTP options and so on.

2. Serving The Mirror Website:

If you are familiar with Python, you might have heard of http.server. With a single command it starts running a basic web server. If you run the following command, 
python -m http.server 8000  #Python 3
or
python -m SimpleHTTPServer 8000 #Python 2
it starts serving files on port 8000. If you are on LAN, all others can access this mirrored website from their browser by specifying the IP & PORT. 

However this is just a basic server and not suitable for deployment. For that you can use Apache or Nginx to serve website.

As wget fetches pages, start server. While wget is downloading the pages, you can browse through the downloaded pages. 


I am Chillar Anand. I daydream a lot and write about the things that interest me here. You can read more about this blog here.

See all articles

Feeds

Edit this page