Psst, remember the cheatsheet!

Disk areas#

Similar to your own computer, also supercomputers have different directories readily available. Their use and names are a bit different though:

Puhti disk areas#

Name

Access

Path

Cleaning

Capacity

Number of files

Use

home

Personal

/users/cscusername

No

10 GiB

100 000 files

personal settings and files

projappl

Project

/projappl/project_200xxxx

No

50 GiB

100 000 files

installation files

scratch

Project

/scratch/project_200xxxx

180 days

1 TiB

1 000 000 files

main working area

Temporary fast disks#

  • CSC Docs: Login node local tmp $TMPDIR for compiling, cleaned frequently.

  • CSC Docs: NVMe - $LOCAL_SCRATCH in batch jobs,

    • NVMe is accessible only during your job allocation, inc. interactive job

    • You must copy data in and out during your batch job

    • If your job reads or writes a lot of small files, using this can give 10x performance boost

Avoid unneccesary reading and writing

Avoid unnecessary reads and writes of data to improve I/O performance

  • Read and write in big chunks and avoid reading/writing lots of small files

LUMI disk areas#

Disk status#

  • Display usage and quota of all your disk areas: csc-workspaces

module load lue
lue --display-level=2 /scratch/project_200xxxx/

path, total size, in dir size, % of total, % of dir
---------------------------------------------------
/scratch/project_200xxxx/dirA        8.4GB  356KB 100.0 100.0
    results                            3.7GB  458MB 44.15 44.15
        simu1                              2.8GB  522MB 32.84 74.38 NOSIZE:1
        simu2                              521MB  521MB 6.02  13.64 NOSIZE:1
    installation                       1.4GB  48KB  16.2  16.2 
        gcc10                              351MB  351MB 4.05  25.02
        clang15                            351MB  351MB 4.05  25.02
        intel                              350MB  350MB 4.04  24.94

Some best practice tips#

  • Take backups of important files. Data on Puhti disks is not backed up.

    • Allas is best CSC option for back-up.

    • Github or similar for code.

  • Supercomputer disks do not work well with too many small files (see the file limits above)

    • Plan your analysis in a way that too many files are not needed.

    • Keep the small files in one zip-file, unzip it only on local fast disks during the analysis.

    • Don’t create a lot of files in one folder

  • CSC Docs: Best practice performance tips for using Lustre

  • Databases:

    • Only file databases (SQLite, GeoPackage) can be kept in supercomputer disks.

    • For PostgreSQL (but not PostGIS) use CSC CSC Docs: Database-as-service.

    • For any other database set up virtual machine in cPouta.