Moving data#

Local computer <-> supercomputer#

Web Interface#

Graphical data transfer tools on local computer#

  • For example: FileZilla, WinSCP and CyberDuck

  • For medium amounts of data, < 1 Tb.

  • Easy drag-and-drop for moving, but installation required.

  • WinSCP is slower than others.

  • CSC Docs: Graphical data transfer tools

"FileZilla"

Command line tools on local computer#

  • For any amount of data, practically required if data size > 1 Tb.

  • Requires knowing the commands.

scp#

  • The most usual Linux tool for moving files

  • scp works even in Windows Powershell

  • CSC Docs: scp

# One file:
scp /path/to/a_file cscusername@puhti.csc.fi:/scratch/project_200xxxx/data_dir

# One folder:
scp -r /path/to/directory cscusername@puhti.csc.fi:/scratch/project_200xxxx/directory 

rsync#

  • Best for big data transfers: does not copy what is already there, can resume a copy process which has disconnected.

  • Can warn against accidental over-writes.

  • Available on Linux, Mac and Windows Subsystem for Linux (WSL).

  • Windows Powershell does not have rsync, MobaXterm has rsync, but it removes write permissions of copied files

  • CSC Docs: rsync

# One file:
rsync --info=progress2 -a /path/to/a_file cscusername@puhti.csc.fi:/scratch/project_200xxxx/data_dir

# One folder:
rsync --info=progress2 -a /path/to/directory cscusername@puhti.csc.fi:/scratch/project_200xxxx/directory
  • --info=progress2 shows time left and percentage

Firewall limitations

Some organizations, for example research institutes with IT-services from Valtori, have stricter rules and need to use a proxy for connecting to CSC servers. In this case, ask your IT service or other Puhti users in your organization for additional guidelines.

External data services -> supercomputer#

  • When downloading data from external services, try to download directly to CSC servers, not via your local computer

  • Check what APIs/tools the service supports:

    • Standard APIs: OGC APIs, STAC

    • Custom service APIs

    • ftp, rsync

    • wget/curl if there is a URL for the data

wget#

# One file, Depth contours from SYKE open spatial data service:
wget http://wwwd3.ymparisto.fi/d3/gis_data/spesific/syvyyskayra.zip 

# One folder, Forest mask from Forest center:
wget -r -nc ftp://ftp.aineistot.metsaan.fi/Metsamaski/Maakunta/ --cut-dirs=2

# Via API, 10m DEM from Geoportti GeoCubes
# API url generation service: https://vm0160.kaj.pouta.csc.fi/geocubes/apiaccess/
# Give file name with -O
wget https://vm0160.kaj.pouta.csc.fi/geocubes/clip/10/km10/kuntajako:235/2018 -O kauniainen_dem10m.tif