Best practice tips for data#
Take backups of important files. Data on Puhti disks is not backed up.
Allas is the best option for backups at CSC.
GitHub or similar for code.
Supercomputer disks do not work well with too many small files
Plan your analysis in a way that too many files are not needed.
Keep the small files in one zip-file, unzip it only on local fast disks during the analysis.
Don’t create a lot of files in one folder
Keep data that is needed longer also in Allas.
Databases:
Only file databases (SQLite, GeoPackage) can be kept on supercomputer disks.
For PostgreSQL and PostGIS use CSC Pukki Database-as-a-service.
For any other database set up virtual machine in cPouta.
Disk status#
Display usage and quota of all your disk areas:
csc-workspaces
Display the amount of data and number of files within a given folder:
LUE
module load lue
lue --display-level=2 /scratch/project_200xxxx/
path, total size, in dir size, % of total, % of dir
---------------------------------------------------
/scratch/project_200xxxx/dirA 8.4GB 356KB 100.0 100.0
results 3.7GB 458MB 44.15 44.15
simu1 2.8GB 522MB 32.84 74.38 NOSIZE:1
simu2 521MB 521MB 6.02 13.64 NOSIZE:1
installation 1.4GB 48KB 16.2 16.2
gcc10 351MB 351MB 4.05 25.02
clang15 351MB 351MB 4.05 25.02
intel 350MB 350MB 4.04 24.94