How excited are you? After countless hours of hacking away at your new website you are ready for launch. You even installed Google Analytics and Mint to track all visitors and clicks. A week goes by and the only people who have gone to your site are your parents, your little sister Jenny, and your crazy Uncle Larry. They are proud of you, but of those 200 hits you have registered, 196 of them are you. So what’s the problem? Well, that is easy my friend, the big boys don’t know you exist yet! You need a sitemap. Lucky for you GadElKareem created a little script written in PHP called Sitemap Creator. So what does it do? Upon logging into the sitemap admin section and clicking “Crawl yourdomain” it will crawl your entire site and subsequently alert Google, Yahoo!, MSN and MoreOver to hop on over and download your sitemap. One of the best features of this is you do not have to manually do this every time you make a change on your site. Simply set up a Cron Job as described in README file and off you go. One last thing, be nice to your little sister, and make sure YOU take the blame for breaking your mothers favorite lamp in the living room.
**Tech Note**
I was having problems with the Adding Reference to Robot.txt command writing or detecting the robots.txt file. I would constantly get thrown the following error, “Robots.txt does not exist or is not writable, please chmod 666” Knowing that the file did exist and the permissions were correct I needed to dig a little deeper. I tested the script on two Plesk VPS servers on different web hosts and on my personal Ubuntu box (Hardy Heron). It seems that the Unix command utime which the PHP function touch() uses is not installed by default in Plesk (v. 8.4.0) yet exists in Ubuntu (LAMP install). I have found a work around which resolves the problem as far as I can see. You need to change some code in the ../sitemap/.function.inc.php, line 662 to make it work, YMMV. If you are afraid to get your hands dirty on the code, you can download the hacked version here (Remove the .txt extension and upload to your /sitemap directory).
Original Code
if(!@touch($robots))
Modified Code
if(!@is_writable($robots))









Hi,
I’m trying your script and clicking on “crawl” after some minutes I get this response:
“Crawler Timed out after 200 seconds while crawling http://www.edaje.com, Crawled 2481 links
Took 200.31 Seconds, using 41.5MB of memory”
then, I have to click on “resume crawling” to make the prog keep crawling the site , crawling a little bit more and stopping with the same result as above, and so on.
Am I doin something wrong? Did I miss anything? Is there a way the script does its job all the way till the end without my intervention?
Thanks a lot in advance