Smart Cache loader uses loader.cnf
configuration file by default.
Alternate config file can be given on command-line by using
#filename.ext syntax. You can use more than one configuration
file.
Statements in configuration file can be:
Global definitions are applied to all servers in configuration file or specified at command line. They can appear in configuration file at any place.
Global definitions are:
specifies how many concurrent network/local connections will be made. Default value is 4.
http_proxy <proxy address> <port>
Loader will use this proxy server for servicing requests.
How many times can be unsuccessful requests repeated before giving up?
Priority for serving requests that failed at least once.
What to do with downloaded data? Local store can be.
This settings apply to all servers or locations processed by Loader unless one of nodefault server options is set. Default masks are processed after any server-specific masks.
Sets default server priority (floating point number) for correct program working it should be >0
Sets default depth when scanning web sites.
Every server or URL location can have specific setup. New location starts by keyword Location, Section 3.3.1.
Keyword Locations defines a start of new per-location section in config file. Syntax is Location |Common URL part|. It is recommended to end location URL with / for avoiding possible problems.
Location http://slashdot.org/
If documents at Location, Section 3.3.1 can be also accessed by alternative URL, add this URL as alias. Server abc.com can be also accessed by www.abc.com.. You can have multiple aliases.
Location http://www.abc.com/ Alias http://abc.com/
URL on which fetching starts. Can be outside Location and you can have more than one. If undefined, defaults to Location.
Name of location. Used for Program command line options, Chapter 2
Server priority.
How many link levels will be scanned. If set to 0, only current page and images will be loaded. If set to -1, only starturl is loaded.
Server options. Comma-separated list of options, you can invert options by prepending !. List of options follows:
Sets autofetching for this site to on. Auto-fetching means that if config file with active sites is processed, loader starts loading these sites even with empty command-line.
Sets autofetching off for this site.
Use default masks and action.
Do not use default action and masks.
If none of masks matches, load URL.
If none of masks matches, ignore URL.
Do not process default url masks after masks specific for this location.
Do not use defaultaction setting, use systemdefault instead.
Load any url within location, ignore all mask statements.
Actions is whitespace separated list of name=value. These values are merged with defaultactions, Section 3.2.4 unless nodefaultactions, Section 3.3.7.8 is specified. Actions are used as defaults for mask, Section 3.3.10.
Similar to actions, Section 3.3.8, but it is merged with actions first, then with defaultactions.
Main and most poverfull SC Loader config command. It check URL with respect to specified input conditions and performs specified action (load, reject URL ...). Mask command is list of space-separated pairs name=value[, value ...]. List of names follows:
Sets URL queue priority. If you want URL to be loaded, using priority > 0 is adviced.
Matches URL by regexp. URL is strip, Section 3.3.10.7 stripped before comparsion. URL can have special value 'any' or '*' which matches any given URL. If you have multiple URLs specified and separated by ',' there are ORed.
Matches filename extensions. Same as url, Section 3.3.10.2 unless no stripping is done. Using ext is faster than url.
Name of HTML tag in which link is found. SRC is mostly IMG or A. It is in upper-case and can contain regex. Magic words any or * match any SRC tag. You can also append ! before tag which is negation.
mask src=!A act=reject
Rejects all non anchor links.
Changes fetch depth if URL is matched. You can not inrease nesting depth level with this command.
Check if size of URL is at least xxx bytes. You can also use magic words 'known' 'unknown' 'any'.
Strips URL before comparsion. Can have value 'none','server','location' or 'auto'.
Location of destination URL targeting. Known values are: any,world,known,server,location,directory,subdir or auto. You can have more than 1 target, separated by ','. Special targets are site (everything on this server), me (everything on this location) or auto (guess target from URL mask used).
What to do with URL matched? Possible values are reject (ignore), load, noparse (load but do not parse HTML), fastclose (close after sending request), close (close on reply from server), nosave (do not save it to disk), direct (do not use proxy for this request).
Which parts of URL processing are logged. Parts are queue, load, parse, store, ioerr, fatalerr. Special names are none, server (use server default from actions command), all (log everything), url (log only url).
Update strategy for URL can be ONE of load (load it), norefresh (do not load it if already loaded), reload (force re-loading from cache), update (check time difference in hours. Example: upd=update,24) forceupdate ( force proxy server to update with xx hours), noreparse (Do not reparse already loaded HTML documents.).
Smart Cache Loader Manual
0.25hsn@cybermail.net