File and Directory Management¶

Overview¶

Efficient file and directory management is essential for working on the HPC cluster. This guide presents the essential commands and practices for creating, organizing, and managing your data on the cluster.

Basic Linux concepts

If you're new to the Linux environment, we recommend familiarizing yourself with basic terminal commands before proceeding.

Cluster Directory Structure¶

Before creating files, understand where to store them:

Directory	Purpose	Size	Best use
`/home/$USER`	Personal files	20 GB	Personal scripts, configurations
`/scratch/projetos/<project>`	Project work	10 TB (per group)	Job input/output, project data
`/tmp`	Local temporary	1.7 TB	Temporary files during execution

Storage quotas

For information about storage limits and data retention, contact hpc@fieb.org.br.

Basic Commands¶

Navigate between directories¶

# Check which directory you're in
pwd

# List contents of current directory
ls

# List with details (permissions, size, date)
ls -lh

# List including hidden files
ls -la

# Go to a directory
cd /scratch/projetos/<my_project>

# Go back to previous directory
cd -

# Go to your home directory
cd ~
# or simply
cd

Create directories¶

# Create a directory
mkdir my_project

# Create multiple levels of directories at once
mkdir -p /scratch/projetos/<my_project>/project_2024/data/results

# Create multiple directories at the same time
mkdir experiment1 experiment2 experiment3

Example of organized structure:

# Create complete structure for a project
mkdir -p /scratch/projetos/<my_project>/analysis_project/{data,scripts,results,logs}

# Result:
# /scratch/projetos/<my_project>/analysis_project/
# ├── data/
# ├── scripts/
# ├── results/
# └── logs/

Copy files and directories¶

# Copy file
cp original_file.py copy_file.py

# Copy file to another directory
cp script.py /scratch/projetos/<my_project>/scripts/

# Copy entire directory (recursive)
cp -r original_folder/ copy_folder/

# Copy preserving metadata (timestamps, permissions)
cp -a important_data/ backup_data/

# Copy multiple files to a directory
cp file1.txt file2.txt file3.txt /scratch/projetos/<my_project>/data/

Move and rename¶

# Rename file
mv old_name.py new_name.py

# Move file to another directory
mv result.csv /scratch/projetos/<my_project>/results/

# Move directory
mv old_folder/ /scratch/projetos/<my_project>/new_location/

# Move multiple files
mv *.txt /scratch/projetos/<my_project>/texts/

Remove files and directories¶

Caution: Removal is permanent

There is no recycle bin on the cluster. Removed files cannot be recovered.

# Remove file
rm file.txt

# Remove with confirmation (recommended)
rm -i important_file.dat

# Remove multiple files
rm file1.txt file2.txt

# Remove all .tmp files from current directory
rm *.tmp

# Remove empty directory
rmdir empty_folder/

# Remove directory with contents (recursive)
rm -r folder_with_files/

# Force remove (use with GREAT care!)
rm -rf folder/  # NOT recommended without prior verification

Safety tip:

# Always check before removing
ls *.tmp        # See what will be removed
rm -i *.tmp     # Remove with confirmation for each file

Space Management¶

Check disk usage¶

# Check size of a directory
du -sh /scratch/projetos/<my_project>/project/

# List size of all subdirectories
du -h --max-depth=1 /scratch/projetos/<my_project>/

# Find the 10 largest directories
du -h /scratch/projetos/<my_project>/ | sort -rh | head -10

# Check available space on filesystem
df -h /scratch

# Check your quota (if configured)
quota -s

Example output:

$ du -h --max-depth=1 /scratch/projetos/fluid_simulation/
45G     /scratch/projetos/fluid_simulation/data
12G     /scratch/projetos/fluid_simulation/results
2.3G    /scratch/projetos/fluid_simulation/scripts
59G     /scratch/projetos/fluid_simulation/

Find and clean large files¶

# Find files larger than 1GB in your directory
find /scratch/projetos/<my_project>/ -type f -size +1G -exec ls -lh {} \;

# Find files modified more than 30 days ago
find /scratch/projetos/<my_project>/ -type f -mtime +30

# Find and list large .log files
find /scratch/projetos/<my_project>/ -name "*.log" -size +100M -exec ls -lh {} \;

# Find old temporary files to remove
find /scratch/projetos/<my_project>/ -name "*.tmp" -mtime +7 -ls

Job Data Organization¶

Recommended project structure¶

# Create organized structure
mkdir -p /scratch/projetos/<my_project>/project_2024/{input,output,logs,scripts}

# Resulting structure:
/scratch/projetos/<my_project>/project_2024/
├── input/           # Input data
├── output/          # Job results
├── logs/            # Log files and SLURM output
└── scripts/         # Submission and analysis scripts

Manage SLURM job outputs¶

In your SLURM script, specify where output files should go:

#!/bin/bash
#SBATCH --job-name=my_analysis
#SBATCH --output=/scratch/projetos/<my_project>/logs/job_%j.out
#SBATCH --error=/scratch/projetos/<my_project>/logs/job_%j.err

# Create output directory if it doesn't exist
mkdir -p /scratch/projetos/<my_project>/output/

# Run your program
python my_script.py --output /scratch/projetos/<my_project>/output/results.csv

Organize results by date¶

# Create directory with timestamp
DATE=$(date +%Y-%m-%d_%H-%M-%S)
mkdir -p /scratch/projetos/<my_project>/results/$DATE

# Or in SLURM script:
#!/bin/bash
#SBATCH --job-name=analysis
#SBATCH --output=/scratch/projetos/<my_project>/logs/analysis_%j.out

OUTPUT_DIR="/scratch/projetos/<my_project>/results/$(date +%Y-%m-%d_%H-%M-%S)"
mkdir -p $OUTPUT_DIR

python analysis.py --output $OUTPUT_DIR/

File Permissions¶

Understanding permissions¶

# View file permissions
ls -l

# Example output:
# -rw-r--r-- 1 john users 1024 Jan 15 10:30 file.txt
# drwxr-xr-x 2 john users 4096 Jan 15 10:31 folder/
#
# First column explains permissions:
# d = directory, - = regular file
# rwx = read, write, execute for: owner, group, others

Modify permissions¶

# Make script executable
chmod +x my_script.sh

# Give write permission to group
chmod g+w file.txt

# Remove read permission for others
chmod o-r private_data.csv

# Set specific permissions (numeric mode)
chmod 755 script.sh    # rwxr-xr-x
chmod 644 data.txt     # rw-r--r--
chmod 700 private/     # rwx------ (owner only)

File Compression¶

Compress data to save space¶

# Compress individual file
gzip large_file.txt
# Results in: large_file.txt.gz

# Decompress
gunzip large_file.txt.gz

# Compress keeping the original
gzip -k file.txt

# Compress entire directory (tar + gzip)
tar -czf project.tar.gz project/

# Decompress tar.gz file
tar -xzf project.tar.gz

# View contents without decompressing
tar -tzf project.tar.gz

# Compress with bzip2 (more compression, slower)
tar -cjf data.tar.bz2 data/

Practical example:

# Before transferring large results
cd /scratch/projetos/<my_project>/
tar -czf results_2024.tar.gz results_2024/
# Now transfer results_2024.tar.gz (much smaller)

Finding Files¶

Find files by name¶

# Search for file by exact name
find /scratch/projetos/<my_project>/ -name "result.csv"

# Search with pattern (case-insensitive)
find /scratch/projetos/<my_project>/ -iname "*.txt"

# Search only directories
find /scratch/projetos/<my_project>/ -type d -name "logs"

# Search only files
find /scratch/projetos/<my_project>/ -type f -name "*.py"

Search by content¶

# Search for text in files
grep "error" log.txt

# Search recursively in all files
grep -r "ERROR" /scratch/projetos/<my_project>/logs/

# Search ignoring case
grep -i "warning" *.log

# Search and show line number
grep -n "success" result.txt

# Search in specific files
grep "temperature" *.csv

Productivity Tips¶

Using wildcards¶

# * matches any sequence of characters
ls *.py              # All Python files
rm temp_*            # Remove files starting with temp_

# ? matches a single character
ls data_?.csv        # data_1.csv, data_2.csv, etc.

# [] matches one character from the set
ls file_[abc].txt    # file_a.txt, file_b.txt, file_c.txt
ls data_[0-9].csv    # data_0.csv through data_9.csv

Useful aliases¶

Add to your ~/.bashrc:

# Aliases for common commands
alias ll='ls -lh'
alias la='ls -lah'
alias ..='cd ..'
alias project='cd /scratch/projetos/<my_project>'
alias du1='du -h --max-depth=1'
alias clean_tmp='find /scratch/projetos/<my_project>/ -name "*.tmp" -mtime +7 -delete'

After adding, reload:

source ~/.bashrc

Working with multiple files¶

# Rename multiple files (add prefix)
for file in *.txt; do
    mv "$file" "processed_$file"
done

# Copy directory structure (without files)
find /scratch/projetos/<my_project>/project/ -type d -exec mkdir -p /scratch/projetos/<my_project>/backup/{} \;

# Process all .csv files
for csv in *.csv; do
    echo "Processing $csv..."
    python analysis.py "$csv"
done

Symbolic Links¶

Symbolic links are shortcuts to files/directories in other locations:

# Create symbolic link
ln -s /scratch/projetos/<my_project>/project/data ~/project_data

# Now you can access:
ls ~/project_data
# Instead of:
ls /scratch/projetos/<my_project>/project/data

# Check where a link points
ls -l ~/project_data

# Remove link (doesn't remove the original)
rm ~/project_data

Practical use:

# Link to frequently accessed data
ln -s /scratch/projetos/<my_project>/main_data ~/data

# Link to current results directory
ln -s /scratch/projetos/<my_project>/current_experiment ~/experiment

Best Practices¶

Organize from the start: Create a logical directory structure before beginning
Name clearly: Use descriptive names (temperature_analysis_2024.py is better than script.py)
Use dates: Include timestamps in result directory names
Clean regularly: Remove unnecessary temporary and intermediate files
Document: Keep a README in each project explaining the structure
Check quotas: Monitor your disk usage regularly
Backup important data: Transfer important data to your local computer

Complete Workflow Example¶

# 1. Create project structure
mkdir -p /scratch/projetos/<my_project>/climate_analysis/{data,scripts,results,logs}

# 2. Go to the directory
cd /scratch/projetos/<my_project>/climate_analysis

# 3. Copy input data
cp ~/raw_data/*.csv data/

# 4. Create submission script
cat > scripts/submit.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=climate
#SBATCH --output=/scratch/projetos/<my_project>/climate_analysis/logs/job_%j.out
#SBATCH --error=/scratch/projetos/<my_project>/climate_analysis/logs/job_%j.err
#SBATCH --time=02:00:00
#SBATCH --cpus-per-task=4

module load python/3.10
python /scratch/projetos/<my_project>/climate_analysis/scripts/analysis.py
EOF

# 5. Make executable
chmod +x scripts/submit.sh

# 6. Submit job
cd /scratch/projetos/<my_project>/climate_analysis
sbatch scripts/submit.sh

# 7. Check results after execution
ls -lh results/

# 8. Compress and transfer
tar -czf climate_results.tar.gz results/
# Then transfer using scp (see transfer guide)

Common Problems¶

Quota exceeded¶

If you can't create files:

# Check usage
du -sh /scratch/projetos/<my_project>/
quota -s

# Find and remove large files
find /scratch/projetos/<my_project>/ -type f -size +1G -ls

Permission denied¶

# Check permissions
ls -l file

# Fix if it's your file
chmod u+w file

Directory not empty¶

# If rmdir doesn't work:
ls -la folder/  # See what's inside
rm -r folder/   # Remove with contents

Support¶

For more information or problems, see: