科学家 · 科研服务:0571-87885727
  医   生 · 精准医疗:0571-87885730
  大   众 · 健康检测:0571-87181268
当前所在位置 >lftp下载公共数据库

lftp下载公共数据库

由于高性能计算(HPC)和云基础设施的硬件部署,生物信息行业的服务器一般都是linux的,而生物信息行业所依赖的databases非常的繁多且极其庞大,所以数据库下载成了生物信息基建之一,笔者今天主要介绍一款linux下的下载工具lftp。

 

linux下有很多种下载工具,包括系统自带的wget等,但是操作起来不太方便,尤其是如果需要大规模并行下载的时候还需要配置额外的脚本,而ltfp操作非常简便。

 

操作环境CentOS(可执行yum install lftp安装该软件),以下载jgi为例。

—————————————————————-

# 链接ftp

$ lftp ftp.jgi-psf.org/pub

cd ok, cwd=/pub

# 执行 ls 命令

lftp ftp.jgi-psf.org:/pub> ls

drwxrwxr-x 154 1996     124           169 Apr 09 2013 JGI_data

-rw-r–r–   1 0       0           1184 Mar 17 2011 README

drwxr-xr-x   2 60006   124           13 Aug 31 2009 annotation

drwxrwxr-x   4 24799   124             5 Nov 26 2013 compgen

drwxrwx—   22 1996     65534       4096 Jun 16 20:30 incoming

drwxr-xr-x   2 0       0           4096 Mar 17 2011 nolan

drwxrwxr-x   15 22744   124           15 Feb 27 2013 rnd

drwxr-xr-x   3 0       0           4096 Mar 17 2011 rnd2

# 查看 mirror 函数使用方法

lftp ftp.jgi-psf.org:/pub> help mirror

Usage: mirror [OPTS] [remote [local]]

 

Mirror specified remote directory to local directory

 

-c, –continue         continue a mirror job if possible

-e, –delete           delete files not present at remote site

–delete-first     delete old files before transferring new ones

-s, –allow-suid       set suid/sgid bits according to remote site

–allow-chown     try to set owner and group on files

–ignore-time     ignore time when deciding whether to download

-n, –only-newer       download only newer files (-c won’t work)

-r, –no-recursion     don’t go to subdirectories

-p, –no-perms         don’t set file permissions

–no-umask         don’t apply umask to file modes

-R, –reverse         reverse mirror (put files)

-L, –dereference     download symbolic links as files

-N, –newer-than=SPEC download only files newer than specified time

-P, –parallel[=N]     download N files in parallel

-i RX, –include RX   include matching files

-x RX, –exclude RX   exclude matching files

RX is extended regular expression

-v, –verbose[=N]     verbose operation

–log=FILE         write lftp commands being executed to FILE

–script=FILE     write lftp commands to FILE, but don’t execute them

–just-print, –dry-run   same as –script=-

 

When using -R, the first directory is local and the second is remote.

If the second directory is omitted, basename of first directory is used.

If both directories are omitted, current local and remote directories are used.

# 续传且并行下载 JGI_data,这里parallel设为10

lftp ftp.jgi-psf.org:/pub> mirror -c -P 10 JGI_data &

[0] mirror -c -P 10 JGI_data &

cd `JGI_data’ [Waiting for response...]

# 退出

lftp ftp.jgi-psf.org:/pub> bye

—————————————————————-

操作类似linux命令,这里会默认将JGI_data里的所有数据都下载下来,而且&可以后台自动下载,bye退出会默认进入nohup模式,关机任务也不会掉。据笔者经验,能极大利用宽带资源,有时候迅雷搞不定的,这个都能搞定。

 

相同的秘籍,不同的身手,祝你也成功!

 

 

(工作日:8:30-17:30)