Mirror was written by Lee
for use by archive maintainers but can be used by anyone wanting to transfer
a lot of files via FTP. Although originally only available on Un*x
with version 2.9 mirror will also run on Wind*ws 95 and Wind*ws
The latest version of mirror can always be found at either:
mirror [flags] -gsite:pathname
mirror [flags] [package-files]The first method is used to retrieve a remote file or directory into the current directory. If you are mirroring a directory it is best to end the pathname in a slash ('/') as this makes the remote recursive listing smaller or use the -r flag to suppress recursion (see -g below). The mirror.defaults file is not used.
In the second method given above, a minimal number of arguments are required and mirror is controlled by keyword=value lines read from the package files. If a file named mirror.defaults is found in either the directory containing the mirror executable or in the PERLLIB path, then it is loaded before any of the package-files. mirror.defaults normally just contains the package of keyword settings called defaults that is used to provide common defaults for all package-files. If no mirror.defaults file is found the default settings built into mirror are used.
Each package-files is read in turn, looking for named packages. If the package is not named defaults, then mirror will perform the following steps.
If mirror is already connected to a site, other than the target site, it will disconnect from the site. It then changes to the given local directory, creating it if necessary, and scans it to get the details of the local files that are already there. Mirror then attempts to connect to the remote site's FTP daemon. It will then login using the given remote_user and remote_password. The remote directory is then scanned. Mirror does this by changing to the remote directory (remote_dir) and running the FTP LIST command, passing the flags_recursive or flags_nonrecursive options depending on the value of recursive. Alternatively a file containing the directory listing may be retrieved (see ls_lR_file and local_ls_lR_file) . Each remote pathname will have any required mappings performed on it to create a local pathname. Then any checks specified by the exclude_patt, max_days, get_newer and get_size_change keywords are applied to names of files or symlinks. max_days, get_newer and get_size_change are not applied to directories. This creates a list of all required remote files and the local pathnames to store them in.
Local versions of all required directories are then created. Then all required files are fetched from the remote site into their local pathnames. This is done by retrieving the file into a temporary file in the target directory. The transfer is normally done in binary mode (see vms_xfer_text). If required the temporary file may be compressed, gzip'ed or split. The file's time-stamps are reset to match those of the remote file. Finally the temporary file is renamed to have the correct name.
Once all files have been transferred any required symbolic links are created (where support by your Operating System) and any unnecessary pathnames in the mirror are deleted.
Unless an internal failure is detected, any error will cause the current package to be skipped and the next one tried.
Mirror can handle symbolic links but not hard links. It does not duplicate owner or group information as usually this is meaningless over a network (but see user and group). If you require any of these options and you are on Un*x use rdist(1) instead.
Mirror was written to mirror remote Un*x archives, but has grown (like topsy).
The only flags you should use often are -n
and, if you like to see what mirror is up to,-d.
|-d||Enable debugging. If this argument is given more than once (e.g. -d -d) the debugging level will increase. Currently the maximum useful level is four.|
|-n||Do nothing except compare local and remote directories, no file transfers are done. Sets debug level to two, so that you are shown a trace of what would be done.|
|-g site:path||Get all files matching path, which is a regexp, on the given site. If path matches .*/.+ (e.g. /fred or /fred/bloggs) then it is the name of the directory and everything after the last / is the pattern of filenames to get. If path ends with / then it is the name of a directory and all its contents are retrieved. One note of caution. If you use host:/fred, a full directory listing of / on the remote host will be done. If all you wanted was the contents of the directory /fred then specify host:/fred/|
|-p package||When using multiple package files only mirror the given package. This option may be given multiple times in which case all the given packages will be mirrored. Without this option, all packages will be mirrored. Package is a regexp matched against the package name following the -p.|
|-R package||Similar to -p but skips all packages until it reaches the given package. Useful for restarting failed mirror runs from where they left off.|
|-F||Use temporary dbm files for the information about files. This is useful if you mirror a very large directory. See the variable use_files.|
|-r||Equivalent to -k recursive=false|
|-v||Print the version details of mirror and exit.|
|-T||Do not do any file transfers just force the time-stamps of any local files to be reset to be the same as the remote files. Normally only used when initialising a mirror that already contains files retrieved another way (e.g. from CDROM).|
|-Ufilename||Record all files transfered by mirror into the given filename. Remember that mirror changes into local_dir to do its work, so it should be a full pathname. If no filename is given, it defaults to upload_log.day.month.year.|
|-k key=value||Override any default key/value. See below|
|-m||Equivalent to -k mode_copy=true|
|-t||Equivalent to -k text_mode=true|
|-f||Equivalent to -k force=true|
|-s site||Equivalent to -k site=site|
|-u user||Equivalent to -k remote_user=user You are then prompted for a password, with echo turned off. The password is used as the remote_password.|
|-L||Just generate a pretty printed version of the input and exit.|
Package files are parsed as a series of statements. Blank lines and lines beginning with a hash are ignored. Each statement is of the form
A statement can be continued over multiple lines by ending all lines except the last, with the character ampersand ('&'). The line following the ampersand, is appended to the current line with all leading whitespace removed.
Although there are a lot of keywords that can be set, the built-in defaults will handle most cases. Normally only package, site, remote_dir and local_dir need to be set.
# Sample mirror.defaults package=defaults # The LOCAL hostname - if not the same as `hostname` returns # (I advertise the name sunsite.org.uk but the machine is # really swallow.doc.ic.ac.uk.) hostname=sunsite.org.uk # Keep all local_dirs relative to here local_dir=/public/ email@example.com
|package||none||A name for the package to be mirrored. Should be different from all other package names you use.|
|site||none||Hostname or IP address of the remote site to mirror from.|
|remote_dir||none||Remote directory to mirror. See also recurse_hard.|
|remote_user||anonymous||Username to use at remote site.|
|remote_password||localuser@localhostname||Password to use at remote site. Note: localuser is will be your name and localhostname will be the name of the local machine (if it can be found, see hostname)|
|remote_account||none||Account name/password to use at remote site, after logging in anonymously (for systems that require it).|
|remote_group||none||If present set the remote 'site group'.|
|remote_gpass||none||If present set the remote 'site gpass'.|
|timeout||40||Timeout FTP requests after this many seconds.|
|failed_gets_excl||none||Regexp of error messages to skip reporting, when the FTP GET command fails. (E.g. permission denied.)|
|ftp_port||21||Port number of remote FTP daemon.|
|proxy||false||Set to true to use proxy FTP service.|
|proxy_ftp_port||4514||Port number of proxy-service FTP daemon. This value should be changed depending on which proxy library you are using.|
|proxy_gateway||internet-gateway||Name of proxy-service, may also be supplied by the environment variable INTERNET_HOST.|
|using_socks||false||Set to true if you are using a SOCKS version of Perl.|
|passive_ftp||false||Set to true if you want to use the PASV extension of the FTP protocol. Especially useful with firewalls, other proxy FTP servers, and the variable using_socks.|
|retry_call||true||If initial connect fails, retry ONCE after ONE minute. This is to handle sites which reverse lookup the incoming host but sometimes timeout on the first attempt.|
|disconnect||false||Disconnect from remote site at end of package. Normally only disconnects if the next package specifies a different site. (Some sites will not let you change to certain directories except when first connecting in.)|
|remote_idle||none||If set try and set the remote idle timer to this.|
|get_patt||.||Regexp of remote pathnames to retrieve.|
|exclude_patt||none||Regexp of remote pathnames to ignore.|
|local_ignore||none||Regexp of local pathnames to ignore. Useful to skip restricted local directories.|
|get_newer||true||Get the remote file if it is more recent that the local file.|
|get_size_change||true||Get the file if the size is different from local. If the file is to be compressed after being fetched get_size_change is automatically set to false.|
|make_bad_symlinks||false||If true, symlinks will be made to invalid (non-existent) pathnames. (In older versions of mirror this defaulted to true.)|
|follow_local_symlinks||none||Regexp of pathnames of local symbolic links. Rather than treating them as symlinks the target files or directories they reference are used instead. This makes local symlinks invisible to mirror.|
|get_missing||true||Really get files. When set to false, only deletions and symlinking will be done. Used to delete expired files older than max_days without retrieving older files.|
|get_file||true||Get files. If set to false mirror will try to put files.|
|text_mode||false||If true, all files are transferred in TEXT mode. Un*x prefers binary so that is the default.|
|strip_cr||false||Strip carriage returns from any file as it is retrieved.|
|vms_keep_versions||true||When mirroring VMS files, keep the version numbers. If false, the versions are stripped off and the only the base filenames are kept.|
|vms_xfer_text||(readme|info|listing|\.c)$||Pattern of VMS files to transfer in TEXT mode (case insensitive).|
|name_mappings||none||Remote to local pathname mappings (a Perl substitute command, e.g. s:old:new:).|
|external_mapping||none||Specifies a file that should contain a Perl module called extmap containing at least a function called map. This function is used as the name_mappings function.|
|update_local||false||Set get_patt to be all the files and directories already present in local_dir.|
|max_days||0||If >0, ignore files older than this many days. Any ignored files will not be transferred or deleted.|
|max_size||0||If >0, do not transfer any files any larger than this many bytes.|
|chmod||true||By default try and set the file attributes (e.g. time-stamps) of the copied file. If false do not set attributes.|
|Local File Attributes|
|user||none||User name or uid to give to local pathnames.|
|group||none||Group name or gid to give to local pathnames.|
|mode_copy||false||Flag indicating if we need to copy the file/dir modes. If this is false then file_mode and dir_mode will be used instead.|
|file_mode||0444||Mode to give files created locally if mode_copy is false.|
|dir_mode||0755||Mode to give directories created locally if mode_copy is false.|
|force||false||If true, all files will be transferred regardless of the results from size or time-stamp comparisons.|
|umask||07000||Do not create setuid files by default (see the chmod(1) on Un*x).|
|use_timelocal||true||Time-stamp files to local time zone. If false, the time zone is set to GMT (older versions of mirror had a bug setting all files to GMT).|
|force_times||yes||Force local times to match remote times.|
|do_deletes||false||Delete destination files if not in source tree.|
|delete_patt||.||Regexp of local pathnames to check for deletions. Names that are not matched are not checked. The match by delete_excl is done to all files selected by this pattern.|
|delete_get_patt||false||Set delete_patt to be get_patt.|
|delete_excl||none||Regexp of local pathnames that mirror will not delete.|
|max_delete_files||10%||If this is set to just a number and there are more than this many files to delete, do not delete just warn. If this is set to number% and the percentage of files that would be deleted is greater than the number, do not delete just warn.|
|max_delete_dirs||10%||As max_delete_files except applies to directories.|
|save_deletes||false||Instead of deleting local files move them into save_dir .|
|save_dir||Old||Where local files no longer on remote site are moved to. Either begins with / or is relative to local_dir. Only used when save_deletes is true.|
|store_remote_listing||none||Local pathname where remote listings are kept. Useful if you have a slow network or want to perform several operations on the same package without retrieving the index every time.|
|compress_patt||none||Regexp of files to compress before storing locally. See get_size_change.|
|compress_excl||\.(z|gz)$||Regexp of files not to compress (case insensitive).|
|compress_prog||compress||Program to compress files. If set to the word compress or gzip, the full pathname for the program and correct compress_suffix will automatically be set. When using gzip, level -9 is used. Note that compress_suffix can be reset to a non-standard value by setting it after compress_prog.|
|compress_suffix||none||Character(s) the compress program appends to files. If compress_prog is compress, this defaults to .Z. If compress_prog is gzip, this defaults to .gz.|
|compress_conv_patt||(\.Z|\.taz)$||If compress_prog is gzip, files matching this pattern are uncompressed and gzip'ed before storing locally. Compression conversion is only meant to do compress to gzip conversion.|
|Perl expression to convert suffix from compress to gzip style. Change .Z to .gz and .taz to .tgz.|
|compress_size_floor||0||Do not compress files smaller than this size, in bytes.|
|split_max||0||If >0 and the size of the file is greater than this many bytes, the file is split up to be stored locally (filename must also match split_patt). The name of the file being split up is used as the directory name and each part is stored in a file called part1, part2... in that directory.|
|split_patt||none||Regexp of remote pathnames to split up before storing locally.|
|split_chunk||102400||Size, in bytes, of chunks to split files into.|
|remote_fs||unix||File store type. Currently can be one of unix, dls, netware, vms, dosftp, macos, lsparse and infomac. See the Filestores section for more details.|
|ls_lR_file||none||Remote file containing ls-lR (result of running ls -lR on that machine), otherwise run remote ls command.|
|local_ls_lR_file||none||Local file containing ls-lR, otherwise use remote ls_lR_file. This is useful when first mirroring a large package.|
|recursive||true||Mirror both the contents of local_dir and sub directories of local_dir.|
|recurse_hard||false||Generate remote ls by doing CWD and ls for each sub directory. In this case remote_dir must be absolute (begin with a /) not relative. Use the CWD command in FTP to find the path for the start of the remote archive area. (Not available if remote_fs is VMS.)|
|flags_recursive||-lRat||Flags to send to remote ls to do a recursive listing.|
|flags_nonrecursive||-lat||Flags to send to remote ls to do a non-recursive listing.|
|ls_fix_mappings||none||Edit pathnames in remote directory listings (a Perl substitute command, e.g. s:/usr/spool/pub:/:).|
|update_log||none||Filename, relative to local_dir, where mirror will write a report of all it does to maintain a package.|
|mail_to||none||Mail a log of the work done to this comma separated list of addresses (currently only supported on Un*x).|
|mail_prog||none||Program called to send to the mail_to list. May be passed the argument mail_subject. Defaults to mailx, Mail, or mail. (Not supported under Wind*ws)|
|mail_subject||-s "mirror update"||This can contain $keyword. These will be replaced by the current value for that keyword (e.g.: -s "mirror update: $package")|
|hostname||none||Mirror automatically skips packages whose site variable matches this host. Defaults to the local hostname. This is normally only ever set in the defaults package. Useful if you are sharing mirror package files with others.|
|comment||none||Used in reports.|
|use_files||false||Put the associative arrays that mirror uses into temporary files (currently only support on Un*x). The files are created in /var/tmp with names: local_map and remote_map. The suffixes will depend on which DBM library was set as default when Perl was installed on your machine.|
|interactive||false||A non-batch transfer. Implied by -g flag.|
|skip||none||If set causes this package to be skipped. The value is reported as the reason for skipping.|
|algorithm||0||Sets the basic algorithm that mirror uses.
Algorithm=0 mirrors an entire site at a time. This is very friendly on the remote site as it uses few of its resources. However it can chew up a lot of memory on the local machine.
Algorithm=1 mirrors a site directory-by-directory. Should ONLY be used for true mirrors (i.e.: no differences between the this mirror copy and the original). This uses up a lot less local resources. However it is very unfriendly to the remote site as it requires remote site to run an ls command for each directory mirrored. Mirror will only "see" the one directory it is mirroring so it will not know that files outside this directory exists so symlinks outside this directory are considered bad, see make_bad_symlinks. Deletions are done on a directory by directory basis so be extra careful about the settings of max_delete_files and max_delete_dirs. get_patt is applied to just the filename in this directory not the full path, as are other name checks. You will almost certainly need to set remote_dir to be an absolute pathname (beginning with /).
|local_dir_check||false||If true and the local_dir does not exit skip this package. By default the local_dir will be created if it does not already exist.|
total 65 -rw-r--r-- 1 nobody nobody 2245 Jan 28 20:06 README -rw-r--r-- 1 nobody nobody 45881 Jan 29 19:13 mirror.htmlThis is the default and you should not normally have to reset any other related variables.
00index.txt 189916 0readme 5793 1_x/ = OS/2 1.x-specific filesThis is an ls variant used on some Un*x archives. It provides descriptions of known items in the listing. Set flags_recursive to -dtR.
- [R----F--] jrd 1646 May 07 21:43 index d [R----F--] jrd 512 Sep 09 10:52 netwire d [R----F--] jrd 512 Sep 02 01:31 pktdrvr d [RWCE-F--] jrd 512 Sep 04 10:55 incomingor
-[R----F--] 1 jrd 1646 May 07 21:43 index d[R----F--] 1 jrd 512 Sep 09 10:52 netwire d[R----F--] 1 jrd 512 Sep 02 01:31 pktdrvrThis is used by Novell archives. Set recurse_hard to true and set flags_nonrecursive to be nothing. See also remote_dir.
00-index.txt 6,471 13:54 7/20/93 alabama.txt 1,246 23:29 5/08/97 alaska.txt 873 23:29 5/08/92 alberta.txt 2,162 23:29 5/08/97dosftp is for an FTP daemon on D*S boxes. Set recurse_hard to true and set flags_nonrecursive to nothing. See also remote_dir.
-------r-- 0 127 127 Aug 27 13:53 !Gopher Links drwxrwxr-x folder 32 Sep 9 16:30 FAQ drwxrwx-wx folder 0 Sep 9 09:59 incomingmacos is for one of Macintosh FTP daemon variants. Although the output is similar to Un*x the Un*x remote_fs type cannot cope with it because there are three file sizes for each file. Set recurse_hard to true, flags_nonrecursive to nothing, get_size_change to false and compress_patt to nothing (this last setting is due to the unusual file names upsetting the shell used to run compress). See also remote_dir.
USERS:[ANONYMOUS.PUBLIC] 1-README.FIRST;13 9 14-JUN-1993 13:09 [ANONYMOUS] (RWE,RWE,RE,RE) PALTER.DIR;1 1 18-JAN-1993 11:56 [ANONYMOUS] (RWE,RWE,RE,RE) PRESS-RELEASES.DIR;1 1 11-AUG-1992 20:05 [ANONYMOUS] (RWE,RWE,,)alternatively:
[VMSSERV.FILES]ALARM.DIR;1 1/3 5-MAR-1993 18:09 [VMSSERV.FILES]ALARM.TXT;1 1/3 4-FEB-1993 12:20Set flags_recursive to '[...]' and get_size_change to false. recurse_hard is not available with VMS. See also the vms_keep_versions and vms_xfer_text variables.
-r 1974 Jul 21 00:06 00readme.txt lr 3 Sep 8 08:34 AntiVirus -> virThis is a special case just meant to handle the sumex-aim.stanford.edu info-mac directory listing stored on that archive in help/all-files. recurse_hard should be set to true.
03-04-94 08:45PM <DIR> . 03-04-94 08:45PM <DIR> .. 03-04-94 09:58AM 9718 Conduit 03-04-94 09:59AM 8745 Everecurse_hard should be set to true and flags_nonrecursive to nothing.
# This is the default mirror settings used by my site: # sunsite.org.uk (22.214.171.124) package=defaults # The LOCAL hostname - if not the same as `hostname` # (I advertise the name sunsite.org.uk but the machine is # really swallow.sunsite.org.uk) hostname=sunsite.org.uk # Keep all local_dirs relative to here local_dir=/public/Mirrors firstname.lastname@example.org mail_to= # Don't mirror file modes. Set all dirs/files to these dir_mode=0755 file_mode=0444 # By default, files are owned by root.zero user=0 group=0 # # Keep a log file in each updated directory # update_log=.mirror update_log= # Don't overwrite my mirror log with the remote one. # Don't retrieve any of their mirror temporary files. # Don't touch anything whose name begins with a space! # nor any FSP or gopher files... exclude_patt=(^|/)(\.mirror$|\.in\..*\.$|MIRROR.LOG|#.*#|\.FSP|\.cache|\.zipped|lost+found/|) # Try to compress everything compress_patt=. compress_prog=compress # Don't compress information files, files that don't benefit from # being compressed, files that tell ftpd, gopher, wais... to do things, # the sources for compression programs... # (Note this is the only regexp that is case insensitive.) compress_excl+|^\.notar$|-z|\.gz$|\.taz$|\.tar.Z|\.arc$|\.zip$|\.lzh$|\.zoo$|\.exe$|\.lha$|\.zom$|\.gif$|\.jpeg$|\.jpg$|\.mpeg$|\.au$|read.*me|index|\.message|info|faq|gzip|compress # Don't delete own mirror log or any .notar files (incl in subdirs) delete_excl=(^|/)\.(mirror|notar)$ # Ignore any local readme files local_ignore=README.doc.ic # Automatically delete local copies of files that the # remote site has zapped do_deletes=trueHere are some sample package descriptions:
package=gnu comment=Powerful and free Un*x utilities site=prep.ai.mit.edu remote_dir=/pub/gnu # Local_dir+ causes gnu to be appended to the default local_dir # so making /public/gnu local_dir+gnu exclude_patt+|^ListArchives/|^lost+found/|^scheme-7.0/|^\.history # I tend to only keep the latest couple of versions of things # this stops mirror from retrieving the older versions I've removed max_days=30 do_deletes=false package=X11R6 comment=X Windows (windowing graphics system for Un*x) site=ftp.x.org remote_dir=/pub/R6 local_dir+ftp.x.org/pub/R6 # This is a local symlink to the free-for-all contrib area # and is mirrored elsewhere local_ignore=^contrib$ # Don't compress a thing. It is already compressed # but doesn't look it. compress_patt= # THIS IS JUST A TEST package=test vms site site=vmsbox.somewhere.ac.uk local_dir=/tmp/copy4 remote_dir=vmsserv/files remote_fs=vms # Must do these settings for VMS flags_recursive=[...] get_size_change=false # and on, and on ...
LIMITED NAMELENwhich is about 75% of the way through mirror.pl, for a note on how to reduce temporary filename length. I only know of one site using this.
A regular expression, or regexp, is a way of using matching patterns in text strings. For example the regexp:
^swould match any string that begins with an s. The ^ is a special character that means beginning of string. There are a number of specials possible in a regexp, everything that is not special is taken as a literal character, such as the s in the example above. To turn off a special character put a backslash, \, in front of it. This only effects the special character immediately following it.
A word of warning: although very similar to Un*x shell (and D*S COMMAND) wildcards there are differences. For example any Un*x and D*S would treat *.ZIP as any filename ending in .ZIP, *.ZIP as a regular expression is an error! The * is special that must follow something (see below).
|^||beginning of string|
|$||end of string|
|[r]||a range or characters either as a list abcef or a hyphen separated range a-f|
|[^r]||anything not in the given list or range|
|(p1|p2|p3...)||patterns p1 or p2 or p3 ... (the patterns may be specials)|
|*||zero or more of the preceding item (which may be a special)|
|+||one or more of the preceding item (which may be a special)|
|\d||any digit (same as [0-9])|
|\D||any non-digit (same as [^0-9])|
|\s||any whitespace character|
|\S||any non-whitespace character|
|abc||matches abc, also xxxabcyyy but not xabbcy|
|^abc$||matches only abc|
|a.*z||matches a any string z. e.g. asdkjfhaksdjfhz|
|index.html||matches index.html AND indexXhtml index/html (. matches any character)|
|index\.html||matches index.html (the backslash stops . matching any character)|
|[rR][eE][aA][dD][mM][eE]||matches readme, Readme, README ...|
|\.(gz|Z)$||matches strings ending in .gz or .Z|
If you are adding to an existing archive that was not created by mirror (perhaps you copied the files from a CDROM) then it is usually best to force the time-stamps of the existing local files so