From sklav at sklav.com Tue Jul 15 05:31:11 2003 From: sklav at sklav.com (Nick Sklavenitis) Date: Tue Jul 15 00:31:13 2003 Subject: [SearchEngine] Issues indexing website Message-ID: <1058243468.1485.4.camel@localhost.localdomain> Hi List Im having issues indexing a website with the steps documented on the website. im currently in the data directory issing the following command. Search index.xml -i http://www.sklav.com -d www.sklav.com below is the error message i receive. Indexing http://www.sklav.com/...canceled. Any reason why this is failing. Ps: I have the exact same setup on an internal server and it works flawlessly? the method i use in that case is the following. Search index.xml -i http://192.168.1.2 -d 192.168.1.2 This works perfectly? The only issue i see is that one system is redhat 8.0 fully updated and the other is redhat 9.0 fully updated. Systems it fails on is 8.0 systems it works on is 9.0. Thanks in advance. -- Nick Sklavenitis Sklav Networks From douglaswth at earthlink.net Wed Jul 16 21:13:42 2003 From: douglaswth at earthlink.net (Douglas William Thrift) Date: Wed Jul 16 23:13:58 2003 Subject: [SearchEngine] Issues indexing website References: <1058243468.1485.4.camel@localhost.localdomain> Message-ID: <014101c34c11$6e019ba0$0100a8c0@mshome.net> Hello Nick, This is occurs because the Search Engine obeys the Robots META Tag as specified at http://www.robotstxt.org/wc/exclusion.html#meta . You will need to remove the following line from the HTML of the page to get it indexed: If you need to block other spiders I would suggest you use the Robots Exclusion Protocol so you can specify specific user agents to allow. Here is a sample of what a robots.txt file that would allow my Search Engine and block everything else (note: the real file should not have indentation): # robots.txt # block everything User-agent: * Disallow: / # allow Doug's Search Engine User-agent: Douglas Thrift's Search Engine Disallow: _______________________________________________________________________ Douglas William Thrift ----- Original Message ----- From: "Nick Sklavenitis" To: "Search Engine Mailing List" Sent: Monday, July 14, 2003 9:31 PM Subject: [SearchEngine] Issues indexing website > Hi List > > Im having issues indexing a website with the steps documented on the > website. im currently in the data directory issing the following > command. > > Search index.xml -i http://www.sklav.com -d www.sklav.com > > below is the error message i receive. > > Indexing http://www.sklav.com/...canceled. > > Any reason why this is failing. > > Ps: I have the exact same setup on an internal server and it works > flawlessly? the method i use in that case is the following. > > Search index.xml -i http://192.168.1.2 -d 192.168.1.2 > > This works perfectly? The only issue i see is that one system is redhat > 8.0 fully updated and the other is redhat 9.0 fully updated. Systems it > fails on is 8.0 systems it works on is 9.0. > > Thanks in advance. > -- > Nick Sklavenitis > Sklav Networks > > > _______________________________________________ > Searchengine mailing list > Searchengine@computers.douglasthrift.net > http://computers.douglasthrift.net/mailman/listinfo/searchengine_computers.douglasthrift.net >