Go to the FreeLists Home Page Home Signup Help Login
 



[arachne] || [Date Prev] [10-2004 Date Index] [Date Next] || [Thread Prev] [10-2004 Thread Index] [Thread Next]

[arachne] Websites - longish

  • From: Mel Evans <arachne4dos@xxxxxxxxxxxxx>
  • To: <arachne@xxxxxxxxxxxxx>, <arachne4dos@xxxxxxxxxxx>
  • Date: Mon, 4 Oct 2004 11:12:26 +0100
Arachne at FreeLists---The Arachne Fan Club!


Hi Guys and Gals,

For those interested, the BCC Scotland (British Caravanners Club)=
 
website is running "beta" 2005 version at

http://www.bccscotland.org.uk

and you are welcome to visit and comment. Comments and link=
 requests 
to 

mel@xxxxxxxxxx

please, so I can add them in when the website goes fully live.

NOW! for those who have websites of any ilk or description, some=
 good 
news/bad news depending on where you are with your website and=
 how 
you've built it.

It appears (I have no direct confirmation of this, personal 
observation and rumour on a couple of webmaster sites I visit)=
 that 
ALL of the major search engines, yahoo, google, alta-vista=
 whatever 
either have changed or are in the process of changing the=
 parameters 
they use on their webspiders so that ONLY domain level pages are=
 
spidered. This is supposedly due to the vast amount of pages now=
 on 
the web, the many squillions that are out there and maybe=
 abandoned, 
written and forgotten.

They will now only look at or spider those pages with absolute=
 paths 
and or domain levels such as

http://www.xetronella.com
http://www.clicreports.co.uk

What this means is that those of us who have websites hosted at 
"freebie" isp's such as tiscali, freeeserve or wherever could=
 find 
ourselves un-spidered or not looked at, in favour of those with=
 full 
domains, and additionally, those with domains that are "parked"=
 on 
free providers will find that ONLY their index page will be=
 looked 
at.

The workaround is to use absolute paths to all URL's at all=
 levels. 
When you submit your website to any of the search engines, you=
 should 
use the full path

http://www.domain.co.uk

but for all internal links in that page, and internal links on=
 all 
other pages on your website, to the rest of your site, you use=
 AGAIN 
the full absolute path of the "actual" location, such as

http://myweb.tiscali.co.uk/arachne4dos/about.htm(l)

and NOT what most HTML editors put in or advise

./arachne4dos/about.htm(l)

which is also called the relative path, i.e. the  two dots=
 represent 
the primary bit, the

http://myweb.tiscali.co.uk

It looks like this is also being applied to domains that you host=
 
yourself (those of you clever enough to run your own server!) and=
 the 
workaround is the same, use the absolute path to force the 
robot/spider to follow the link and drill down into the lower=
 levels 
of your website. Then you can use robot "meta" tags to include or=
 
exclude spiders and robots from individual pages.

I'm going to expand this note and archive it onto my basic HTML 
pages, this will be at

http://www.xetronella.com/xcom/

(see, using absolute paths) as soon as I can get there with the 
information.

To those on the list without an interest in this, apologies for=
 using 
bandwidth, but I just don't know how many of you have websites, 
possibly un-connected with the list content, and this may affect=
 you 
in some way.

Regards

Mel



Arachne at FreeLists
-- Arachne, The Web Browser/Suite for DOS and Linux --





[ Home | Signup | Help | Login | Archives | Lists ]

All trademarks and copyrights within the FreeLists archives are owned by their respective owners.
Everything else ©2007 Avenir Technologies, LLC.