[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ next ]

Smart Cache Loader Manual
Chapter 3 - Program configuration file


Smart Cache loader uses loader.cnf configuration file by default. Alternate config file can be given on command-line by using #filename.ext syntax. You can use more than one configuration file.

Statements in configuration file can be:


3.1 Global definitions

Global definitions are applied to all servers in configuration file or specified at command line. They can appear in configuration file at any place.

Global definitions are:


3.1.1 threads

specifies how many concurrent network/local connections will be made. Default value is 4.


3.1.2 http_proxy

http_proxy <proxy address> <port>

Loader will use this proxy server for servicing requests.


3.1.3 user_agent

You can define custom user-agent string.


3.1.4 retry

How many times can be unsuccessful requests repeated before giving up?


3.1.5 retrypriority

Priority for serving requests that failed at least once.


3.1.6 localstore

What to do with downloaded data? Local store can be.


3.2 Default server settings

This settings apply to all servers or locations processed by Loader unless one of nodefault server options is set. Default masks are processed after any server-specific masks.


3.2.1 defaultserverpriority

Sets default server priority (floating point number) for correct program working it should be >0


3.2.2 defaultscandepth

Sets default depth when scanning web sites.


3.2.3 defaultserveroptions

See options, Section 3.3.8.


3.2.4 defaultactions

See actions, Section 3.3.9.


3.2.5 defaultmask

See mask, Section 3.3.11


3.3 Per server configuration

Every server or URL location can have specific setup. New location starts by keyword Location, Section 3.3.1.


3.3.1 Location

Keyword Locations defines a start of new per-location section in config file. Syntax is Location |Common URL part|. It is recommended to end location URL with / for avoiding possible problems.

     Location http://slashdot.org/

3.3.2 Alias

If documents at Location, Section 3.3.1 can be also accessed by alternative URL, add this URL as alias. Server abc.com can be also accessed by www.abc.com.. You can have multiple aliases.

     Location http://www.abc.com/
     Alias http://abc.com/

3.3.3 starturl

URL on which fetching starts. Can be outside Location and you can have more than one. If undefined, defaults to Location.


3.3.4 referer

Default referer: header for this location. This option has effect only for starting URLs, because real referers are used for other URLs.


3.3.5 name

Name of location. Used for Program command line options, Chapter 2


3.3.6 priority

Server priority.


3.3.7 scandepth

How many link levels will be scanned. If set to 0, only current page and images will be loaded. If set to -1, only starturl is loaded.


3.3.8 options

Server options. Comma-separated list of options, you can invert options by prepending !. List of options follows:


3.3.8.1 active

Sets autofetching for this site to on. Auto-fetching means that if config file with active sites is processed, loader starts loading these sites even with empty command-line.


3.3.8.2 passive

Sets autofetching off for this site.


3.3.8.3 default

Use default masks and action.


3.3.8.4 nodefault

Do not use default action and masks.


3.3.8.5 accept

If none of masks matches, load URL.


3.3.8.6 reject

If none of masks matches, ignore URL.


3.3.8.7 nodefaulturlmasks

Do not process default url masks after masks specific for this location.


3.3.8.8 nodefaultactions

Do not use defaultaction setting, use systemdefault instead.


3.3.8.9 anyurl

Load any url within location, ignore all mask statements.


3.3.9 actions

Actions is whitespace separated list of name=value. These values are merged with defaultactions, Section 3.2.4 unless nodefaultactions, Section 3.3.8.8 is specified. Actions are used as defaults for mask, Section 3.3.11.


3.3.10 addactions

Similar to actions, Section 3.3.9, but it is merged with actions first, then with defaultactions.


3.3.11 mask

Main and most poverfull SC Loader config command. It check URL with respect to specified input conditions and performs specified action (load, reject URL ...). Mask command is list of space-separated pairs name=value[, value ...]. List of names follows:


3.3.11.1 q

Sets URL queue priority. If you want URL to be loaded, using priority > 0 is adviced.


3.3.11.2 url

Matches URL by regexp. URL is strip, Section 3.3.11.7 stripped before comparsion. URL can have special value 'any' or '*' which matches any given URL. If you have multiple URLs specified and separated by ',' there are ORed.


3.3.11.3 ext

Matches filename extensions. Same as url, Section 3.3.11.2 unless no stripping is done. Using ext is faster than url.


3.3.11.4 src

Name of HTML tag in which link is found. SRC is mostly IMG or A. It is in upper-case and can contain regex. Magic words any or * match any SRC tag. You can also append ! before tag which is negation.

     mask src=!A act=reject

Rejects all non anchor links.


3.3.11.5 depth

Changes fetch depth if URL is matched. You can not inrease nesting depth level with this command.


3.3.11.6 size

Check if size of URL is at least xxx bytes. You can also use magic words 'known' 'unknown' 'any'.


3.3.11.7 strip

Strips URL before comparsion. Can have value 'none','server','location' or 'auto'.


3.3.11.8 target

Location of destination URL targeting. Known values are: any,world,known,server,location,directory,subdir or auto. You can have more than 1 target, separated by ','. Special targets are site (everything on this server), me (everything on this location) or auto (guess target from URL mask used).


3.3.11.9 act

What to do with URL matched? Possible values are reject (ignore), load, noparse (load but do not parse HTML), fastclose (close after sending request), close (close on reply from server), nosave (do not save it to disk), direct (do not use proxy for this request).


3.3.11.10 log

Which parts of URL processing are logged. Parts are queue, load, parse, store, ioerr, fatalerr. Special names are none, server (use server default from actions command), all (log everything), url (log only url).


3.3.11.11 upd

Update strategy for URL can be ONE of load (load it), norefresh (do not load it if already loaded), reload (force re-loading from cache), update (check time difference in hours. Example: upd=update,24) forceupdate ( force proxy server to update with xx hours), noreparse (Do not reparse already loaded HTML documents.).


[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ next ]

Smart Cache Loader Manual

0.26
Radim Kolar hsn/at/netmag.cz