Skip to content

Data Transfer

Overview

Transferring data between the HPC cluster and your local computer is an essential operation in high-performance computing workflows. This guide presents the most common methods for securely and efficiently transferring files.

Prerequisites

Before transferring data, make sure you:

Transfer Methods

SCP (Secure Copy)

scp is the simplest tool for transferring files via SSH. It works similarly to the cp command but allows copying between different machines.

Download file from cluster to your computer

scp <username>@<cluster>:/path/on/cluster/file.txt ~/Downloads/

Example:

scp john@ogun-login.senaicimatec.com.br:/scratch/projetos/climate_analysis/results.csv ~/Downloads/

Download entire directory (recursive)

scp -r <username>@<cluster>:/path/on/cluster/folder/ ~/Projects/

Example:

scp -r john@ogun-login.senaicimatec.com.br:/scratch/projetos/fluid_simulation/ ~/Projects/

Upload file from your computer to the cluster

scp ~/Documents/script.py <username>@<cluster>:/scratch/projetos/<my_project>/

Example:

scp ~/Documents/analysis.py john@ogun-login.senaicimatec.com.br:/scratch/projetos/climate_analysis/

Upload directory to the cluster

scp -r ~/Projects/data/ <username>@<cluster>:/scratch/projetos/<my_project>/

Custom SSH port

If the cluster uses a different SSH port than the default (22), use the -P option:

scp -P 2222 file.txt <username>@<cluster>:/scratch/projetos/<my_project>/

rsync is more efficient than scp for large transfers or when you need to synchronize directories. It transfers only the differences between files, saving time and bandwidth.

Download data from cluster

rsync -avz --progress <username>@<cluster>:/scratch/projetos/<my_project>/data/ ~/Projects/data/

Options explained:

  • -a: archive mode (preserves permissions, timestamps, etc.)
  • -v: verbose (shows details)
  • --progress: shows transfer progress

Upload data to the cluster

rsync -av --progress ~/Data/input/ <username>@<cluster>:/scratch/projetos/<my_project>/input/

Synchronization with deletions

If you've deleted files locally and want them deleted on the destination as well:

rsync -av --progress --delete ~/Projects/ <username>@<cluster>:/scratch/projetos/<my_project>/

Caution with --delete

The --delete option removes files on the destination that don't exist in the source. Use with care!

Transfer with pattern exclusions

rsync -av --progress --exclude='*.tmp' --exclude='__pycache__' ~/project/ <username>@<cluster>:/scratch/projetos/<my_project>/

SFTP (Interactive Transfer)

sftp provides an interactive interface similar to FTP, but secure via SSH.

Connect to cluster via SFTP

sftp <username>@<cluster>

Basic SFTP commands

After connecting, you can use the following commands:

# List files on the cluster
ls

# List files on your local computer
lls

# Change directory on the cluster
cd /scratch/projetos/<my_project>/

# Change local directory
lcd ~/Downloads/

# Download file
get file.txt

# Download directory (recursive)
get -r folder/

# Upload file
put my_file.txt

# Upload directory
put -r my_folder/

# Exit
exit

Example session:

$ sftp john@ogun-login.senaicimatec.com.br
sftp> cd /scratch/projetos/climate_analysis/
sftp> ls
results_2024/  data/  scripts/
sftp> get -r results_2024/
Fetching /scratch/projetos/climate_analysis/results_2024/ to results_2024
sftp> exit

Graphical Clients

For users who prefer graphical interfaces, several options are available:

Windows

  • MobaXterm: Integrated interface with file browser (recommended)
  • WinSCP: Dedicated SFTP/SCP client
  • FileZilla: Supports SFTP

macOS

Linux

  • FileZilla: Available on most distributions
  • Native file managers (Nautilus, Dolphin) with SFTP support via sftp://

Best Practices

Data Organization

  1. Use appropriate directories:
  2. /scratch/projetos/<project>/ for project job input/output data
  3. /home/$USER/ only for small personal scripts and configurations
  4. See File Management for details on organization

  5. Organize by project:

    /scratch/projetos/
    ├── climate_analysis/
       ├── data/
       ├── scripts/
       ├── results/
       └── logs/
    └── fluid_simulation/
        ├── input/
        ├── output/
        └── logs/
    

Transfer Optimization

  1. Compress large files before transferring:

    # On the cluster
    tar -czf results.tar.gz /scratch/projetos/<my_project>/results/
    
    # Transfer the compressed file
    scp <username>@<cluster>:/scratch/projetos/<my_project>/results.tar.gz ~/
    
    # On your computer
    tar -xzf results.tar.gz
    

  2. Use rsync to resume interrupted transfers:

    rsync -avz --progress --partial <username>@<cluster>:/scratch/projetos/<my_project>/large_data/ ~/
    
    The --partial option keeps partially transferred files to resume later.

  3. Transfer during off-peak hours when possible.

Integrity Verification

After important transfers, verify data integrity:

# Generate checksum on the cluster
md5sum file.dat > file.md5

# Transfer both
scp <username>@<cluster>:/scratch/projetos/<my_project>/file.* ~/

# Verify locally
md5sum -c file.md5

Limitations and Quotas

Respect storage quotas

Each directory has storage limits. Contact hpc@fieb.org.br for information about quotas.

  • Bandwidth: Very large transfers can affect other users. Be considerate.
  • Connection timeout: SSH connections can expire. Use tools like screen or tmux for long sessions.

Checking Available Space

Before transferring data, check available space:

# Check disk usage in a project directory
du -sh /scratch/projetos/<my_project>/

# Check quota (if available)
quota -s

# Available space on filesystem
df -h /scratch

Common Problems

Permission denied

If you receive "Permission denied":

  1. Verify that your SSH key is correctly configured
  2. Confirm that you have write permission in the destination directory
  3. Check if you haven't exceeded your storage quota

Slow transfer

If transfers are slow:

  1. Try compressing data before transferring
  2. Use rsync with compression (-z)
  3. Check your internet connection
  4. Avoid peak hours if possible

Interrupted connection

For long transfers that may be interrupted:

# Use rsync with --partial to resume
rsync -avz --progress --partial <source> <destination>

# Or use screen/tmux on the server
screen
rsync -avz --progress <source> <destination>
# Press Ctrl+A, then D to detach
# Reconnect later with: screen -r

Support

If you encounter problems transferring data, see our support page or contact hpc@fieb.org.br.