Creating and Using SWISH Site-Indices

SWISH is a search tool that allows clients to search your site. To use SWISH, you must:
  1. Create a SWISH index file, and possibly a description cache file, of your site (or of a portion of your site)
  2. Create an HTML document containing a FORM that will search this file
  3. Make this document available to the world

For most sites, you can accomplish steps 1 and 2 with one of the following forms -- the quick mode if you are willing to accept some default parameters, or the custom mode if you want to set these parameters.

Or, you can select from a list of currently available indices.


Quick Mode

Use this Quick Mode to create a search index, and the necessary tools for searching your site, using default options. All you need to do is specify the "web directory" to search (and, optionally, whether to create a "description cache")!

Create a "default options" SWISH index, and a front end, for:

You can also create a description cache file for files included in this index (highly recommended if you want short descriptions to be returned with each match):
 No files
 HTML documents
 All text documents


Custom Mode

Use this to set a number of parameters; including file exclusion rules, and the name of the output file.
If you leave a field empty, a default value will be used

Parameter Description
Directory & file names
Web directories to index:

Root directory of the web-tree:
Create an index of all files in, and under the web directory (or web directories).
The web directories should NOT be fully qualified names.
They should be relative to the root directory of the web tree.
Enter web directories using a space delimited list.
SWISH index name (fully qualified):
(a relative file name is assumed to be relative to your SWISH directory)

Search-form document to create:
(a relative file name is assumed to be relative to your web directory)

The name of the SWISH index file to create. If not specified, a random name, written to your "SWISH" directory, will be created. The SWISH directory is specified with the INDEX_DIR parameter in GOSWISH.CMD

The search-form document is an HTML document that will contain a link to the search mode of GOSWISH.

Indexing Rules
A replace with rule: This should contain two quoted strings. It is mainly used to convert filenames into URLS. For example:
"\www\samples" "http://www.mysite.net/samples"
By default, a replace rule that generates URLS back to the file will be used.
We HIGHLY recommend use of this default!
Files to index:
Files with these extensions will be indexed. Both the file name, and the contents of the file will be indexed.
Do notindex contents of these files: Files with these names will not have their contents indexed (just the filenames will be indexed). This list must be a subset of the files to index list.
Do not index rules.
PathName:
Directory:
Filename:
Title:
These file rules are used to limit what directories and files are searched. The first word should be contains, followed by a space delimited list.
  • PathName: If the pathname (to the file, or to the directory) contains any of these strings, do not index.
  • Directory: If one of these files is in the directory, do not index any file in the directory
  • Filename: If the filename contains one of these strings, do not index the file.
  • Title:If the title contains one of these strings, do not index.
More Options
Two limits (percent #_files) to use to identify common words:
After indexing, swish can automatically tell which words are the most common and omit them from the index according to these parameters. For example:
IgnoreLimit 75 250 -- ignore all words that occur in over 75% of the files and that also occur in over 250 different files.
Common words (to be ignored):
Ignore these "commonly occuring" words. If you leave this blank (or enter SwishDefault), a default set (of about thousand words) will be used
The name of this index:
the administrator:
the description:
a pointer:
These are strictly optional items used to identify the index. Leave them blank and some basic client, server, and selector information will be used.
Description Cache
Create a description cache file for:
 No files
 HTML documents
 All text documents
For every match found during a search of a SWISH index, the URL of the matching document, it's TITLE, and a relevancy score are displayed. You can also display a description, that is generated from the contents of the document.
List of HTML document extensions:
Files ending with these extensions are treated as HTML documents (descriptions will use meta elements, headers, etc.)
Name of directory specific description file:
Filename.ext only: do not include a path or drive.
To allow you to specify your own descriptions (say, for image files), you can create a (set of) directory specific description files.

Do you want to monitor SWISH while it runs: Yes || No


Expert Mode

You can always write your own SWISH configuration file (see GOSWISH.DOC for more information)!

Some further definitions

Web Directories and Root Directory
The web directories are the relative directories to index. They are relative to (subdirectories of) the root directory of the web tree.

If you do not specify the root directory of the web tree, a default value (set in the GoSWISH.CMD program file) is used.

Some examples might help (the following assume that the default root directory of the web tree is D:\WWW).

  1. If:
    web directory = /samples
    root directory is not specified (it's left blank)

    then files in D:\WWW\SAMPLES will be indexed (this includes files in subdirectories of D:\WWW\SAMPLES)
  2. If:
    web directory = /samples /customs
    root directory is not specified (it's left blank)

    then files in/under D:\WWW\SAMPLES and in/under D:\WWW\CUSTOM will be indexed.
  3. If:
    web directory = cars/pickups
    root directory = E:\ALTW3

    then files in E:\ALTW3\CARS\PICKUPS will be indexed

    Note:

  4. If:
    web directory = e:\mydocs\set1
    root directory is not specified (it's left blank)

    then an error will occur -- since E:\MYDOCS\SET1 is not a relative directory. Notes:
    • If you specify a root directory, you should also specify the search form document
    • On some sites (such as those using the SRE-http server), the web directory can point to a virtual directory
    • Leading and trailing / (or \) characters in a web directory entry can be omitted (they will be added and converted as need be).

    Search form document
    GoSwish will generate a search form document -- an HTML document that can be used as a front-end for the SWISH search engine. In other words, this document can be used as-is (though you might want to customize it) by clients interested in searching the index you are about to create!

    Typically, you should enter a file name (with no path information). Or, you can leave this field blank, in which case a randomly derived name (eg; SEARCH2.HTM) will be used. In either case, the search form document will be created in the web directory

    Alternatively, you can specify a fully qualified file name. If you enter a fully qualified name, you should follow the file name with a URL that points to this fully qualified name. For example:

        D:\WWW\GIANTS\FOOBAR.HTM   /GIANTS/FOOBAR.HTM 
    Generating descriptions
    When displaying "hits", GoSwish can also display a short file description. To do this, a set of descriptions must first be created (which are then stored in a seperate description cache file).

    GoSwish has two means of generating these descriptions: either by examining the contents of text files, or by explicitily defining a description in a directory specific description file. file.

    • By examining contents of a file
      GoSwish has two modes for generating descriptions
    • Generate descriptions for all HTML documents: descriptions will be created for all HTML documents (as identified by the List of HTML document extensions). This description is drawn from a META NAME="DESCRIPTION" element in the HTML documents <HEAD> section, and from <H1> and <H2> elements.
    • Generate descriptions for all text documents (including HTML documents): the first several lines of the text file are used as a description.
    • From a directory specific description file
      Before attempting to create a description from the contents of a file, the appropriate directory description file (typically named DESCRIBE.TXT will first be checked.
      Notes:
    • The syntax of these files is:
       filename.ext  a description
       filenam2.ext  another description
       filenam3.ext  another description, this one 
        | on 2 lines (continuation of filenam3.ext)
      Note the use of | as a continuation character.

    • directory specific means use the description file in the directory that contains the document.
back to entry form