File Storage Guide

Data Storage Locations

SRILS Server provides several storage locations for different types of data and use cases:

Home Directory

  • Path: /home/username/
  • Purpose: Personal files and private work
  • Access: Only accessible by you
  • Usage: Store personal scripts, notebooks, and small datasets
  • Backup: Regularly backed up
  • Quota: Contact admin for quota information

Shared Reference and Tools

  • Path: /share/
  • Purpose: Shared reference data and pre-built tools
  • Access: Read-only access for all users
  • Usage: Access reference genomes, databases, and common tools
  • Subdirectories:
    • /share/reference/ - Reference data including genomes, databases, and pre-built indices
      • genome/ - Reference genomes and pre-built index files
      • contamination/ - Contamination reference databases
      • genes/ - Gene annotation databases
      • igv/ - IGV reference files
    • /share/tools/ - Pre-built tools and software packages

Accessing Files from Applications

From Jupyter Notebook

import os

# Check current directory
print(os.getcwd())

# List files in current directory
print(os.listdir('.'))

# Navigate to different directories
os.chdir('/home/username/data/')
print(os.listdir('.'))

# Load data from different locations
import pandas as pd

# From home directory
df1 = pd.read_csv('/home/username/data/dataset.csv')

# Access reference data and tools
reference_genome = '/share/reference/genome/hg38.fa'
igv_tool = '/share/tools/IGV_Linux_2.16.0/igv.sh'

# List available reference databases
print(os.listdir('/share/reference/'))

From RStudio

# Check current directory
getwd()

# List files
list.files()

# Set working directory to home data folder
setwd("/home/username/data/")

# Load data from different locations
# From home directory
data1 <- read.csv("/home/username/data/dataset.csv")

# Access reference data
reference_path <- "/share/reference/genome/"
tools_path <- "/share/tools/"

# List available reference files
list.files("/share/reference/genome/")

From Terminal/SSH

# Navigate between directories
cd /home/username/
cd /share/reference/  # Access reference genomes and databases
cd /share/tools/      # Access pre-built tools

# List files and directories
ls -la
ls -lh  # Human readable file sizes

# Copy files between locations
cp /home/username/script.py /home/username/scripts/

# Move files
mv /tmp/processed_data.csv /home/username/data/

# Create symbolic links to shared resources
ln -s /share/reference/genome/hg38.fa ~/data/reference_genome.fa

# Access tools from /share/tools/
/share/tools/IGV_Linux_2.16.0/igv.sh

Storage Quotas and Limits

Quota Information

  • Home Directory: Limited quota per user
  • Shared Reference: Read-only access, managed by administrators

Check Usage

# Check disk usage in your home directory
du -sh ~/

# Check usage of specific directories
du -sh ~/data/

# Check overall disk space
df -h

Managing Large Files

  1. Compress data: Use .gz, .zip, or .bz2 formats
  2. Use efficient formats: HDF5, Parquet instead of CSV for large datasets
  3. Archive old data: Move completed work to archive directories
  4. External storage: For very large datasets, consult with administrators

Data Security and Privacy

Sensitive Data Guidelines

  1. Personal Data: Store in home directory only
  2. Confidential Research: Use appropriate access controls in home directory
  3. Reference Data: Available in /share/ for read-only access
  4. Temporary Processing: Use temporary directories with caution

Access Controls

  • Respect file permissions and access controls
  • Don’t share access credentials
  • Report unauthorized access attempts
  • Follow institutional data policies

Data Backup

  • Automated Backups: Home directories are backed up regularly
  • Version Control: Use Git for tracking changes
  • External Backup: For critical data, maintain additional backups
  • Recovery: Contact administrators for data recovery needs

Getting Help

File System Issues

  • Quota exceeded: Contact admin to increase quota or clean up files
  • Permission denied: Check file permissions or contact admin
  • File corruption: Restore from backup or contact admin
  • Performance issues: Large file operations may be slow during peak hours

Support Contacts

  • Technical Issues: Contact SRILS Server administration team
  • Data Management: Consult with data management team
  • Backup/Recovery: Contact system administrators
  • Storage Requests: Submit requests for additional storage space

Related Guides:

Need help? Contact the SRILS Server administration team for storage and file management support.


Copyright © 2025 SRILS Server Documentation. Distributed under the MIT License.