File and Directory Management¶
Overview¶
Efficient file and directory management is essential for working on the HPC cluster. This guide presents the essential commands and practices for creating, organizing, and managing your data on the cluster.
Basic Linux concepts
If you're new to the Linux environment, we recommend familiarizing yourself with basic terminal commands before proceeding.
Cluster Directory Structure¶
Before creating files, understand where to store them:
| Directory | Purpose | Size | Best use |
|---|---|---|---|
/home/$USER | Personal files | 20 GB | Personal scripts, configurations |
/scratch/projetos/<project> | Project work | 10 TB (per group) | Job input/output, project data |
/tmp | Local temporary | 1.7 TB | Temporary files during execution |
Storage quotas
For information about storage limits and data retention, contact hpc@fieb.org.br.
Basic Commands¶
Navigate between directories¶
# Check which directory you're in
pwd
# List contents of current directory
ls
# List with details (permissions, size, date)
ls -lh
# List including hidden files
ls -la
# Go to a directory
cd /scratch/projetos/<my_project>
# Go back to previous directory
cd -
# Go to your home directory
cd ~
# or simply
cd
Create directories¶
# Create a directory
mkdir my_project
# Create multiple levels of directories at once
mkdir -p /scratch/projetos/<my_project>/project_2024/data/results
# Create multiple directories at the same time
mkdir experiment1 experiment2 experiment3
Example of organized structure:
# Create complete structure for a project
mkdir -p /scratch/projetos/<my_project>/analysis_project/{data,scripts,results,logs}
# Result:
# /scratch/projetos/<my_project>/analysis_project/
# ├── data/
# ├── scripts/
# ├── results/
# └── logs/
Copy files and directories¶
# Copy file
cp original_file.py copy_file.py
# Copy file to another directory
cp script.py /scratch/projetos/<my_project>/scripts/
# Copy entire directory (recursive)
cp -r original_folder/ copy_folder/
# Copy preserving metadata (timestamps, permissions)
cp -a important_data/ backup_data/
# Copy multiple files to a directory
cp file1.txt file2.txt file3.txt /scratch/projetos/<my_project>/data/
Move and rename¶
# Rename file
mv old_name.py new_name.py
# Move file to another directory
mv result.csv /scratch/projetos/<my_project>/results/
# Move directory
mv old_folder/ /scratch/projetos/<my_project>/new_location/
# Move multiple files
mv *.txt /scratch/projetos/<my_project>/texts/
Remove files and directories¶
Caution: Removal is permanent
There is no recycle bin on the cluster. Removed files cannot be recovered.
# Remove file
rm file.txt
# Remove with confirmation (recommended)
rm -i important_file.dat
# Remove multiple files
rm file1.txt file2.txt
# Remove all .tmp files from current directory
rm *.tmp
# Remove empty directory
rmdir empty_folder/
# Remove directory with contents (recursive)
rm -r folder_with_files/
# Force remove (use with GREAT care!)
rm -rf folder/ # NOT recommended without prior verification
Safety tip:
# Always check before removing
ls *.tmp # See what will be removed
rm -i *.tmp # Remove with confirmation for each file
Space Management¶
Check disk usage¶
# Check size of a directory
du -sh /scratch/projetos/<my_project>/project/
# List size of all subdirectories
du -h --max-depth=1 /scratch/projetos/<my_project>/
# Find the 10 largest directories
du -h /scratch/projetos/<my_project>/ | sort -rh | head -10
# Check available space on filesystem
df -h /scratch
# Check your quota (if configured)
quota -s
Example output:
$ du -h --max-depth=1 /scratch/projetos/fluid_simulation/
45G /scratch/projetos/fluid_simulation/data
12G /scratch/projetos/fluid_simulation/results
2.3G /scratch/projetos/fluid_simulation/scripts
59G /scratch/projetos/fluid_simulation/
Find and clean large files¶
# Find files larger than 1GB in your directory
find /scratch/projetos/<my_project>/ -type f -size +1G -exec ls -lh {} \;
# Find files modified more than 30 days ago
find /scratch/projetos/<my_project>/ -type f -mtime +30
# Find and list large .log files
find /scratch/projetos/<my_project>/ -name "*.log" -size +100M -exec ls -lh {} \;
# Find old temporary files to remove
find /scratch/projetos/<my_project>/ -name "*.tmp" -mtime +7 -ls
Job Data Organization¶
Recommended project structure¶
# Create organized structure
mkdir -p /scratch/projetos/<my_project>/project_2024/{input,output,logs,scripts}
# Resulting structure:
/scratch/projetos/<my_project>/project_2024/
├── input/ # Input data
├── output/ # Job results
├── logs/ # Log files and SLURM output
└── scripts/ # Submission and analysis scripts
Manage SLURM job outputs¶
In your SLURM script, specify where output files should go:
#!/bin/bash
#SBATCH --job-name=my_analysis
#SBATCH --output=/scratch/projetos/<my_project>/logs/job_%j.out
#SBATCH --error=/scratch/projetos/<my_project>/logs/job_%j.err
# Create output directory if it doesn't exist
mkdir -p /scratch/projetos/<my_project>/output/
# Run your program
python my_script.py --output /scratch/projetos/<my_project>/output/results.csv
Organize results by date¶
# Create directory with timestamp
DATE=$(date +%Y-%m-%d_%H-%M-%S)
mkdir -p /scratch/projetos/<my_project>/results/$DATE
# Or in SLURM script:
#!/bin/bash
#SBATCH --job-name=analysis
#SBATCH --output=/scratch/projetos/<my_project>/logs/analysis_%j.out
OUTPUT_DIR="/scratch/projetos/<my_project>/results/$(date +%Y-%m-%d_%H-%M-%S)"
mkdir -p $OUTPUT_DIR
python analysis.py --output $OUTPUT_DIR/
File Permissions¶
Understanding permissions¶
# View file permissions
ls -l
# Example output:
# -rw-r--r-- 1 john users 1024 Jan 15 10:30 file.txt
# drwxr-xr-x 2 john users 4096 Jan 15 10:31 folder/
#
# First column explains permissions:
# d = directory, - = regular file
# rwx = read, write, execute for: owner, group, others
Modify permissions¶
# Make script executable
chmod +x my_script.sh
# Give write permission to group
chmod g+w file.txt
# Remove read permission for others
chmod o-r private_data.csv
# Set specific permissions (numeric mode)
chmod 755 script.sh # rwxr-xr-x
chmod 644 data.txt # rw-r--r--
chmod 700 private/ # rwx------ (owner only)
File Compression¶
Compress data to save space¶
# Compress individual file
gzip large_file.txt
# Results in: large_file.txt.gz
# Decompress
gunzip large_file.txt.gz
# Compress keeping the original
gzip -k file.txt
# Compress entire directory (tar + gzip)
tar -czf project.tar.gz project/
# Decompress tar.gz file
tar -xzf project.tar.gz
# View contents without decompressing
tar -tzf project.tar.gz
# Compress with bzip2 (more compression, slower)
tar -cjf data.tar.bz2 data/
Practical example:
# Before transferring large results
cd /scratch/projetos/<my_project>/
tar -czf results_2024.tar.gz results_2024/
# Now transfer results_2024.tar.gz (much smaller)
Finding Files¶
Find files by name¶
# Search for file by exact name
find /scratch/projetos/<my_project>/ -name "result.csv"
# Search with pattern (case-insensitive)
find /scratch/projetos/<my_project>/ -iname "*.txt"
# Search only directories
find /scratch/projetos/<my_project>/ -type d -name "logs"
# Search only files
find /scratch/projetos/<my_project>/ -type f -name "*.py"
Search by content¶
# Search for text in files
grep "error" log.txt
# Search recursively in all files
grep -r "ERROR" /scratch/projetos/<my_project>/logs/
# Search ignoring case
grep -i "warning" *.log
# Search and show line number
grep -n "success" result.txt
# Search in specific files
grep "temperature" *.csv
Productivity Tips¶
Using wildcards¶
# * matches any sequence of characters
ls *.py # All Python files
rm temp_* # Remove files starting with temp_
# ? matches a single character
ls data_?.csv # data_1.csv, data_2.csv, etc.
# [] matches one character from the set
ls file_[abc].txt # file_a.txt, file_b.txt, file_c.txt
ls data_[0-9].csv # data_0.csv through data_9.csv
Useful aliases¶
Add to your ~/.bashrc:
# Aliases for common commands
alias ll='ls -lh'
alias la='ls -lah'
alias ..='cd ..'
alias project='cd /scratch/projetos/<my_project>'
alias du1='du -h --max-depth=1'
alias clean_tmp='find /scratch/projetos/<my_project>/ -name "*.tmp" -mtime +7 -delete'
After adding, reload:
Working with multiple files¶
# Rename multiple files (add prefix)
for file in *.txt; do
mv "$file" "processed_$file"
done
# Copy directory structure (without files)
find /scratch/projetos/<my_project>/project/ -type d -exec mkdir -p /scratch/projetos/<my_project>/backup/{} \;
# Process all .csv files
for csv in *.csv; do
echo "Processing $csv..."
python analysis.py "$csv"
done
Symbolic Links¶
Symbolic links are shortcuts to files/directories in other locations:
# Create symbolic link
ln -s /scratch/projetos/<my_project>/project/data ~/project_data
# Now you can access:
ls ~/project_data
# Instead of:
ls /scratch/projetos/<my_project>/project/data
# Check where a link points
ls -l ~/project_data
# Remove link (doesn't remove the original)
rm ~/project_data
Practical use:
# Link to frequently accessed data
ln -s /scratch/projetos/<my_project>/main_data ~/data
# Link to current results directory
ln -s /scratch/projetos/<my_project>/current_experiment ~/experiment
Best Practices¶
- Organize from the start: Create a logical directory structure before beginning
- Name clearly: Use descriptive names (
temperature_analysis_2024.pyis better thanscript.py) - Use dates: Include timestamps in result directory names
- Clean regularly: Remove unnecessary temporary and intermediate files
- Document: Keep a README in each project explaining the structure
- Check quotas: Monitor your disk usage regularly
- Backup important data: Transfer important data to your local computer
Complete Workflow Example¶
# 1. Create project structure
mkdir -p /scratch/projetos/<my_project>/climate_analysis/{data,scripts,results,logs}
# 2. Go to the directory
cd /scratch/projetos/<my_project>/climate_analysis
# 3. Copy input data
cp ~/raw_data/*.csv data/
# 4. Create submission script
cat > scripts/submit.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=climate
#SBATCH --output=/scratch/projetos/<my_project>/climate_analysis/logs/job_%j.out
#SBATCH --error=/scratch/projetos/<my_project>/climate_analysis/logs/job_%j.err
#SBATCH --time=02:00:00
#SBATCH --cpus-per-task=4
module load python/3.10
python /scratch/projetos/<my_project>/climate_analysis/scripts/analysis.py
EOF
# 5. Make executable
chmod +x scripts/submit.sh
# 6. Submit job
cd /scratch/projetos/<my_project>/climate_analysis
sbatch scripts/submit.sh
# 7. Check results after execution
ls -lh results/
# 8. Compress and transfer
tar -czf climate_results.tar.gz results/
# Then transfer using scp (see transfer guide)
Common Problems¶
Quota exceeded¶
If you can't create files:
# Check usage
du -sh /scratch/projetos/<my_project>/
quota -s
# Find and remove large files
find /scratch/projetos/<my_project>/ -type f -size +1G -ls
Permission denied¶
Directory not empty¶
Support¶
For more information or problems, see: