Showing posts with label search engine. Show all posts
Showing posts with label search engine. Show all posts

Tuesday, February 19, 2008

Sitemap

What are Sitemaps?
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft
A Sitemap does not affect the actual ranking of your pages. However, if it helps get more of your site crawled (by notifying us of URLs we didn't previously didn't know about, and/or by helping us prioritize the URLs on your site), that can lead to increased presence and visibility of your site in our index


Sitemap File Format
Most Popular sitemap format is XML and it is supported by most of all type of Search Engine even google too pushing people to use XML format Sitemap
The XML Sitemap must:


  1. Begin with an opening tag and end with a closing tag.
  2. Specify the namespace (protocol standard) within the tag.
  3. Include a entry for each URL, as a parent XML tag.
  4. Include a child entry for each parent tag.

All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine's documentation for details.


Other Sitemap formats
The Sitemap protocol enables you to provide details about your pages to search engines, and we encourage its use since you can provide additional information about site pages beyond just the URLs. However, in addition to the XML protocol, we support RSS feeds and text files, which provide more limited information.
Syndication feed
You can provide an RSS (Real Simple Syndication) 2.0 or Atom 0.3 or 1.0 feed. Generally, you would use this format only if your site already has a syndication feed. Note that this method may not let search engines know about all the URLs in your site, since the feed may only provide information on recent URLs, although search engines can still use that information to find out about other pages on your site during their normal crawling processes by following links inside pages in the feed. Make sure that the feed is located in the highest-level directory you want search engines to crawl. Search engines extract the information from the feed as follows:

  1. <'link'>field - indicates the URL
  2. Mdified date field (the field for RSS feeds and the date for Atom feeds) - indicates when each URL was last modified. Use of the modified date field is optional.

Text file
You can provide a simple text file that contains one URL per line. The text file must follow these guidelines:

  1. The text file must have one URL per line. The URLs cannot contain embedded new lines.
  2. You must fully specify URLs, including the http.
  3. Each text file can contain a maximum of 50,000 URLs. If you site includes more than 50,000 URLs, you can separate the list into multiple text files and add each one separately.
  4. The text file must use UTF-8 encoding. You can specify this when you save the file (for instance, in Notepad, this is listed in the Encoding menu of the Save As dialog box).
  5. The text file should contain no information other than the list of URLs.
  6. The text file should contain no header or footer information.
  7. You can name the text file anything you wish.

You should upload the text file to the highest-level directory you want search engines to crawl and make sure that you don't list URLs in the text file that are located in a higher-level directory.

Sitemap Location
The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.
If you have the permission to change http://example.org/path/sitemap.xml, it is assumed that you also have permission to provide information for URLs with the prefix http://example.org/path/. Examples of URLs considered valid in http://example.com/catalog/sitemap.xml include:

http://example.com/catalog/show?item=23
http://example.com/catalog/show?item=233&user=3453
URLs not considered valid in http://example.com/catalog/sitemap.xml include:
http://example.com/image/show?item=23
http://example.com/image/show?item=233&user=3453
https://example.com/catalog/page1.php
Note that this means that all URLs listed in the Sitemap must use the same protocol (http, in this example) and reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it can't include URLs from http://subdomain.example.com.
URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.xml. In certain cases, you may need to produce different Sitemaps for different paths (e.g., if security permissions in your organization compartmentalize write access to different directories).
If you submit a Sitemap using a path with a port number, you must include that port number as part of the path in each URL listed in the Sitemap file. For instance, if your Sitemap is located at http://www.example.com:100/sitemap.xml, then each URL listed in the Sitemap must begin with http://www.example.com:100.


Validating Sitemap
The following XML schemas define the elements and attributes that can appear in your Sitemap file. You can download this schema from the links below:
For Sitemaps: http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
For Sitemap index files: http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd
There are a number of tools available to help you validate the structure of your Sitemap based on this schema. You can find a list of XML-related tools at each of the following locations:

http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html
Some other good sites for validating your sitemap are
http://www.xml-sitemaps.com/

Submitting Sitemap
Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location. You can do this by:
· submitting it to them via the search engine's submission interface
· specifying the location in your site's robots.txt file
· sending an HTTP request
The search engines can then retrieve your Sitemap and make the URLs available to their crawlers.
Submitting your Sitemap via the search engine's submission interface
To submit your Sitemap directly to a search engine, which will enable you to receive status information and any processing errors, refer to each search engine's documentation.
Specifying the Sitemap location in your robots.txt file
You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line:
Sitemap:
The should be the complete URL to the Sitemap, such as: http://www.example.com/sitemap.xml
This directive is independent of the user-agent line, so it doesn't matter where you place it in your file. If you have a Sitemap index file, you can include the location of just that file. You don't need to list each individual Sitemap listed in the index file.
Submitting your Sitemap via an HTTP request
To submit your Sitemap using an HTTP request (replace with the URL provided by the search engine), iIssue your request to the following URL:
/ping?sitemap=sitemap_url
For example, if your Sitemap is located at http://www.example.com/sitemap.gz, your URL will become:
/ping?sitemap=http://www.example.com/sitemap.gz
URL encode everything after the /ping?sitemap=:
/ping?sitemap=http://www.yoursite.com/sitemap.gz
You can issue the HTTP request using wget, curl, or another mechanism of your choosing. A successful request will return an HTTP 200 response code; if you receive a different response, you should resubmit your request. The HTTP 200 response code only indicates that the search engine has received your Sitemap, not that the Sitemap itself or the URLs contained in it were valid. An easy way to do this is to set up an automated job to generate and submit Sitemaps on a regular basis.
Note: If you are providing a Sitemap index file, you only need to issue one HTTP request that includes the location of the Sitemap index file; you do not need to issue individual requests for each Sitemap listed in the index.

Excluding Content
The Sitemaps protocol enables you to let search engines know what content you would like indexed. To tell search engines the content you don't want indexed, use a robots.txt file or robots meta tag

Read other Articles

Sitemap
Antivirus reviews 2008 (part 2)
Introduction to SEO Tutorials
How to optimize your site : SEO process
Basic link terminology
Taking the search engine point of view: why you wa...
What is search engine optimization (aka SEO)?
The best keyword research tools available
Antivirus reviews 2008 (part 1)
Strange Google Crawler
Creating Statspack job

Monday, February 18, 2008

Introduction to SEO Tutorials

Before learning the techniques and theory behind search engine optimization, it is important to understand the basics behind search engine optimization. Below are some tutorials which help explain those basics as well as a brief explanation of the search engines upon which SEO is built. Also included are some technical papers about the search engine algorithms which the more advanced student of SEO may find useful.

What is search engine optimization?

Taking the search engine point of viewHow do the search engines determine relevance?

Basic link terminology


Technical papers relating to Google’s Algorithm:
PageRank
Topic Specific PageRank
LocalRank
Hilltop
Latent Semantic Indexing
TrustRank

What is search engine optimization (aka SEO)?

This post includes an article. To read the article, scroll down below to the article).
Read the article
Imagine that you’re a cell phone vendor and that a potential customer is searching on Google for the phrase “free cell phone”. That customer will most likely reach a page like this:


There are three parts of this page that we need to note. The first is the search term (in this case “Free Cell Phones”), the second are the paid ads and the third are the natural search results. See image below:

For these tutorials we are only interested in the search term and the natural search results. We can forget about the paid ads as they have nothing to do with search engine optimization. You don’t optimize your site for the paid ads, you bid for them. They are part of Google Adwords program and will be discussed in our Paid Search (aka pay-per-click) tutorials.

Search engine optimization only deals with what are called “natural” or “organic” search results. Nobody pays to show up in the natural results. The results displayed in this section are there only because Google “decided” that they are the best pages on the Web to display for the search term “free cell phones”. If you entered a different search term you would see different results.

Now, with that said, let’s return to our imaginary cell phone site. As a cell phone vendor, your goal would be to have your web site show up in the top 10 - 20 search results for all searches related to cell phones (most people don’t scroll down beyond the top 10 - 20 search results). By showing up in the organic search results for the major search engines (Google, Yahoo, MSN, and Ask.com) you can bring in large numbers of potential customers to your website…..for free!

On the other hand, imagine the business you’re missing out on by ranking poorly in the search engines. This is why so many people spend a great deal of time, money, and effort trying to get their sites to rank well in the search engines.

You’re probably wondering, what do I have to do to rank well for the terms and phrases related to your site? Simple, you have to optimize your website for the search engines. That means you have to figure out how the search engines determine their results for any given search and then build your website accordingly. That’s what we mean by “optimization”: building and marketing your website to rank well within the major search engines.

This brings us to our next article: How do search engines rank and determine the order of their search results?

next Post/Tutorial:Taking the search engine point of view