Skip to main content

Malware Scanning Implementation Guide

This guide provides technical details about how the malware scanning system was implemented across 7 phases, key architectural decisions, and integration points.

Implementation Overview​

The malware scanning system was built in 7 phases:

  1. Phase 1: ClamAV Client Library - Low-level TCP client for ClamAV daemon
  2. Phase 2: Database Schema - Tables for scans, quarantine, and audit logs
  3. Phase 3: Core Scanning Logic - Document scanner orchestration layer
  4. Phase 4: ClamAV Adapter - Production-ready adapter pattern integration
  5. Phase 5: Console Management UI - Staff interface for quarantine management
  6. Phase 6: Monitoring & Compliance - Metrics dashboard and reporting
  7. Phase 7: Testing & Production Rollout - E2E tests and deployment

Architecture​

System Components​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Platform App (Vercel) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Upload │───▢│ Document │───▢│ Quarantine β”‚ β”‚
β”‚ β”‚ Handler β”‚ β”‚ Scanner β”‚ β”‚ Manager β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ClamAV Daemon β”‚
β”‚ (Railway) β”‚
β”‚ - Port: 3310 β”‚
β”‚ - INSTREAM Proto β”‚
β”‚ - Virus DB (CVD) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Supabase (Database + Storage) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ document_ β”‚ β”‚ quarantine_ β”‚ β”‚ Quarantine β”‚ β”‚
β”‚ β”‚ scans β”‚ β”‚ events β”‚ β”‚ Bucket β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Console App (Vercel) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Quarantine β”‚ β”‚ Metrics β”‚ β”‚ Compliance β”‚ β”‚
β”‚ β”‚ Documents β”‚ β”‚ Dashboard β”‚ β”‚ Reports β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow​

Document Upload & Scan Flow​

1. User uploads document
↓
2. Platform API creates document record (upload_status: PENDING)
↓
3. File uploaded to Supabase Storage
↓
4. Upload finalized β†’ emits DocumentFinalized event
↓
5. DocumentScanner receives event
↓
6. Creates scan record (status: PENDING)
↓
7. Downloads file from storage
↓
8. Sends file to ClamAV via INSTREAM protocol
↓
9. ClamAV scans file against virus database
↓
10. Updates scan record (status: CLEAN/INFECTED/FAILED)
↓
11. If INFECTED β†’ Quarantine file
↓
12. If CLEAN β†’ Mark document as available

Quarantine Flow​

1. Scanner detects infection
↓
2. Calls QuarantineManager.quarantine()
↓
3. Copies file to quarantine bucket (YYYY-MM/DD/docId-timestamp.ext)
↓
4. Updates document record:
- is_quarantined = true
- quarantined_at = NOW()
- quarantine_reason = "Malware detected: [virus names]"
↓
5. Creates quarantine event audit log:
- event_type = 'quarantined'
- actor_user_id = NULL (automated)
- metadata = { virus_names, scan_id }
↓
6. Document inaccessible to all users

Phase-by-Phase Implementation​

Phase 1: ClamAV Client Library​

File: apps/platform/lib/malware-scanning/clam-client.ts

Key Features:

  • Low-level TCP socket communication with ClamAV daemon
  • INSTREAM protocol implementation for buffer scanning
  • Support for PING, VERSION, SCAN commands
  • Custom error handling with ClamAVError class
  • Configurable timeouts and connection management

Design Decisions:

  • Why TCP sockets? ClamAV's native protocol is TCP-based INSTREAM for efficient streaming
  • Why buffer scanning? Allows scanning in-memory data without filesystem access
  • Why custom error class? Distinguishes ClamAV errors from network/system errors

Code Example:

const client = new ClamAVClient({
host: 'localhost',
port: 3310,
timeout: 5000
});

const result = await client.scanBuffer(fileBuffer);
// result: { isInfected: boolean, viruses: string[], durationMs: number }

Phase 2: Database Schema​

Files: supabase/migrations/20251017_add_quarantine_fields.sql

Tables Created:

  1. document_scans - Scan results and audit trail

    - id (uuid, PK)
    - document_id (uuid, FK to documents)
    - vault_id (uuid, FK to vaults)
    - status (enum: PENDING, CLEAN, INFECTED, FAILED)
    - scanned_at (timestamptz)
    - engine (text) - "clamav"
    - scan_duration_ms (int)
    - viruses (text[]) - Array of detected virus names
    - error_message (text)
    - meta (jsonb) - ClamAV version, raw response
  2. quarantine_events - Audit log for quarantine actions

    - id (uuid, PK)
    - document_id (uuid)
    - vault_id (uuid)
    - event_type (enum: quarantined, released, deleted)
    - reason (text)
    - actor_user_id (uuid, nullable) - NULL = automated
    - metadata (jsonb)
    - created_at (timestamptz)
  3. documents table - Added quarantine fields

    - is_quarantined (boolean, default false)
    - quarantined_at (timestamptz, nullable)
    - quarantine_reason (text, nullable)

Design Decisions:

  • Separate scan table: Allows multiple scans per document, retains history
  • Event sourcing: Quarantine events provide complete audit trail
  • Nullable actor_user_id: Distinguishes automated vs. manual quarantine actions
  • JSONB metadata: Flexible structure for future extensions

Phase 3: Core Scanning Logic​

File: apps/platform/lib/malware-scanning/scanner.ts

Key Components:

  1. DocumentScanner Class:

    class DocumentScanner {
    async scanDocument(documentId, fileBuffer, vaultId): ScanResult
    async scanDocumentWithRetry(...): ScanResult
    async isAvailable(): boolean
    }
  2. Scan Workflow:

    • Create scan record (status: PENDING)
    • Perform ClamAV scan
    • Update scan record with results
    • Auto-quarantine if infected (configurable)
    • Handle errors with retry logic
  3. Retry Mechanism:

    • Exponential backoff (2^attempt * 1000ms)
    • Configurable max retries (default: 3)
    • Skips retry if infection detected (success case)

Design Decisions:

  • Why separate scanner class? Encapsulates scan lifecycle management
  • Why auto-quarantine option? Allows testing without side effects
  • Why retry logic? Handles transient network/ClamAV issues gracefully

Phase 4: ClamAV Adapter​

Files:

  • apps/platform/lib/scan/clamd.ts - Adapter implementation
  • apps/platform/lib/scan/adapter.ts - Integration point

Integration Pattern:

// adapter.ts
export function getScanner(): ScannerAdapter {
if (process.env.CLAMAV_HOST) {
return new ClamAVScannerAdapter();
}
return new MockScannerAdapter();
}

ClamAVScannerAdapter:

  • Downloads file from Supabase Storage
  • Scans with DocumentScanner
  • Returns normalized result for platform
  • Handles errors with fallback to MOCK

Design Decisions:

  • Why adapter pattern? Allows swapping scanning implementations
  • Why fallback to mock? Graceful degradation if ClamAV unavailable
  • Why download file? Scanner needs buffer, not storage key

Phase 5: Console Management UI​

Files:

  • apps/console/app/(app)/admin/quarantine/page.tsx - Documents list
  • apps/console/lib/data/quarantine.ts - Data access layer
  • apps/console/app/api/admin/quarantine/route.ts - API endpoint

Features:

  • Documents List: Table/card views of quarantined documents
  • Filters: By vault, date range, virus signature
  • Badges: Visual indicators for INFECTED/FAILED status
  • Actions: View details, release (future), delete (future)
  • Authorization: Requires security_admin role

Design Decisions:

  • Server-side rendering: Better performance, no client-side data loading
  • Suspense boundaries: Progressive loading for better UX
  • RLS policies: Database-level security even if UI bypassed

Phase 6: Monitoring & Compliance​

Files:

  • apps/console/app/(app)/admin/quarantine/metrics/page.tsx - Dashboard
  • apps/console/lib/data/quarantine-metrics.ts - Metrics calculation
  • apps/console/app/api/admin/quarantine/compliance/route.ts - Reports API

Metrics Dashboard:

type QuarantineMetrics = {
totalQuarantined: number;
totalInfected: number;
totalFailed: number;
quarantinedLast24h: number;
quarantinedLast7d: number;
quarantinedLast30d: number;
topViruses: Array<{ virus: string; count: number }>;
scansByStatus: Array<{ status: string; count: number }>;
infectionsByVault: Array<{ vaultId: string; count: number }>;
};

Compliance Reports:

type ComplianceReport = {
generatedAt: string;
reportPeriod: { start: string; end: string };
summary: {
totalScans: number;
cleanScans: number;
infectedScans: number;
failedScans: number;
successRate: number;
};
topThreats: Array<{
signature: string;
count: number;
firstSeen: string;
lastSeen: string;
}>;
vaultBreakdown: Array<{
vaultId: string;
totalScans: number;
infections: number;
infectionRate: number;
}>;
};

Design Decisions:

  • In-memory aggregation: Flexible calculations without complex SQL
  • Generic report format: Supports SOC 2, ISO 27001, GDPR
  • Rate limiting: Lower limits for expensive compliance reports (20 req/min)

Phase 7: Testing & Production Rollout​

Files:

  • apps/platform/lib/malware-scanning/__tests__/workflow.e2e.test.ts - E2E tests
  • apps/platform/lib/malware-scanning/__tests__/*.test.ts - Unit/integration tests

Test Coverage:

  • 124 total tests (121 passing, 3 intentionally skipped)
  • E2E workflow tests: Upload β†’ Scan β†’ Quarantine
  • Performance benchmarks: < 2s for small files, < 5s for 1MB files
  • Error handling: Corrupted files, empty files, connection failures
  • Concurrent scanning: 5 parallel scans complete within 10s

Production Deployment:

  • Railway: ClamAV daemon (4GB RAM, 1 vCPU)
  • Vercel: Platform + Console apps with environment variables
  • Supabase: Database + Storage (quarantine bucket)
  • Rollback: < 5 minutes via environment variable toggle

Code Organization​

apps/platform/lib/malware-scanning/
β”œβ”€β”€ clam-client.ts # ClamAV TCP client (Phase 1)
β”œβ”€β”€ scanner.ts # Document scanner orchestration (Phase 3)
β”œβ”€β”€ quarantine.ts # Quarantine manager (Phase 3)
β”œβ”€β”€ handler.ts # Event handler for finalized docs (Phase 4)
β”œβ”€β”€ types.ts # TypeScript type definitions
β”œβ”€β”€ __tests__/ # Test suite (Phase 7)
β”‚ β”œβ”€β”€ clam-client.test.ts
β”‚ β”œβ”€β”€ scanner.integration.test.ts
β”‚ β”œβ”€β”€ quarantine.integration.test.ts
β”‚ β”œβ”€β”€ handler.integration.test.ts
β”‚ └── workflow.e2e.test.ts
└── README.md # Developer quick start

apps/platform/lib/scan/
β”œβ”€β”€ adapter.ts # Scanner adapter interface
β”œβ”€β”€ clamd.ts # ClamAV adapter implementation (Phase 4)
└── __tests__/
└── clamd.integration.test.ts

apps/console/lib/data/
β”œβ”€β”€ quarantine.ts # Quarantine data access (Phase 5)
└── quarantine-metrics.ts # Metrics data access (Phase 6)

apps/console/app/(app)/admin/quarantine/
β”œβ”€β”€ page.tsx # Documents list page (Phase 5)
└── metrics/
└── page.tsx # Metrics dashboard (Phase 6)

apps/console/app/api/admin/quarantine/
β”œβ”€β”€ route.ts # Documents API endpoint (Phase 5)
β”œβ”€β”€ metrics/
β”‚ └── route.ts # Metrics API endpoint (Phase 6)
└── compliance/
└── route.ts # Compliance reports API (Phase 6)

Key Design Patterns​

1. Adapter Pattern​

Allows swapping scanning implementations without changing platform code:

// Platform uses generic interface
const scanner = getScanner(); // Returns ClamAVScannerAdapter or MockScannerAdapter
const result = await scanner.scan(document);

// Each adapter implements same interface
interface ScannerAdapter {
scan(document: Document): Promise<ScanResult>;
}

2. Event-Driven Architecture​

Document finalization triggers async scanning:

// Upload handler emits event
await emitDocumentFinalized({ documentId, vaultId });

// Scanner listens for event
handleDocumentFinalized(async (event) => {
await scanner.scanDocument(event.documentId, ...);
});

3. Database-First Security​

Row-Level Security policies enforce access control:

-- Only security_admin can view quarantined documents
CREATE POLICY "quarantine_admin_only" ON documents
FOR SELECT USING (
NOT is_quarantined OR
auth.jwt() ->> 'user_role' = 'security_admin'
);

4. Graceful Degradation​

System remains functional even if ClamAV is unavailable:

try {
const result = await clamav.scanBuffer(buffer);
} catch (error) {
// Log error, mark as FAILED, allow manual review
return { status: 'FAILED', errorMessage: error.message };
}

Integration Points​

1. Document Upload Flow​

File: apps/platform/lib/documents/finalize.ts (or similar)

// After document upload completes
await finalizeUpload(documentId, storageKey);

// Emit event for async scanning
if (process.env.SCAN_EMIT_ON_FINALIZE === 'true') {
await emitDocumentFinalized({ documentId, vaultId });
}

2. Storage Integration​

Access Pattern:

// Download from Supabase Storage
const { data, error } = await supabase.storage
.from('documents')
.download(storageKey);

// Scan buffer
const fileBuffer = Buffer.from(await data.arrayBuffer());
const result = await scanner.scanDocument(documentId, fileBuffer, vaultId);

3. Authorization Integration​

Middleware:

// Check security_admin role
const { email } = await getIdentityFromRequestHeaders(request.headers);
const roles = await getUserRolesByEmail(email, supabase);
const hasSecurityAdmin = roles.some(role => role.toLowerCase() === 'security_admin');

if (!hasSecurityAdmin) {
return authorizationError('Security admin role required');
}

Testing Strategy​

Unit Tests​

Test individual components in isolation:

describe('ClamAVClient', () => {
it('should detect EICAR test virus', async () => {
const client = new ClamAVClient({ host: 'localhost' });
const result = await client.scanBuffer(eicarBuffer);
expect(result.isInfected).toBe(true);
expect(result.viruses[0]).toContain('Eicar');
});
});

Integration Tests​

Test component interactions:

describe('DocumentScanner', () => {
it('should create scan record and update on completion', async () => {
const scanner = new DocumentScanner({ clamav, supabase });
const result = await scanner.scanDocument(docId, buffer, vaultId);

const scanRecord = await supabase
.from('document_scans')
.select()
.eq('id', result.scanId)
.single();

expect(scanRecord.data.status).toBe('CLEAN');
});
});

E2E Tests​

Test complete workflows:

describe('Complete Workflow', () => {
it('should upload β†’ scan β†’ quarantine infected file', async () => {
const eicarBuffer = Buffer.from(EICAR_SIGNATURE);

// Step 1: Scan document
const result = await scanner.scanDocument(docId, eicarBuffer, vaultId);
expect(result.isInfected).toBe(true);
expect(result.wasQuarantined).toBe(true);

// Step 2: Verify quarantine
const document = await getDocument(docId);
expect(document.is_quarantined).toBe(true);

// Step 3: Verify audit log
const events = await getQuarantineEvents(docId);
expect(events[0].event_type).toBe('quarantined');
});
});

Performance Considerations​

Scan Performance​

  • Target latency: < 2s average, < 5s P95
  • Throughput: ~50-100 files/second
  • Bottleneck: ClamAV CPU processing, not network

Optimization strategies:

  • File streaming (chunked INSTREAM protocol)
  • Connection pooling (reuse TCP connections)
  • Async/non-blocking I/O
  • Resource limits (max file size: 100MB)

Database Performance​

Efficient queries:

// Use count with head: true
const { count } = await supabase
.from('document_scans')
.select('*', { count: 'exact', head: true })
.eq('status', 'INFECTED');

// Limit TOP-N queries
const topViruses = await supabase
.from('document_scans')
.select('viruses')
.not('viruses', 'is', null)
.limit(10);

Indexes (recommended):

CREATE INDEX idx_document_scans_status ON document_scans(status);
CREATE INDEX idx_document_scans_scanned_at ON document_scans(scanned_at);
CREATE INDEX idx_documents_is_quarantined ON documents(is_quarantined);

Memory Management​

ClamAV requirements:

  • Base memory: ~800MB (virus database)
  • Per-scan overhead: ~50-100MB
  • Recommended: 4GB RAM for production

File handling:

// Stream large files instead of loading into memory
const stream = createReadStream(filePath);
const result = await client.scanStream(stream);

Security Considerations​

Threat Model​

Protected against:

  • Known malware (ClamAV signature database: 8M+ signatures)
  • Zero-day threats (heuristic analysis)
  • File-borne attacks (macros, exploits)

Not protected against:

  • Encrypted malware (cannot scan encrypted files)
  • Polymorphic malware (may evade signatures)
  • Social engineering (outside scan scope)

Signature Updates​

Automatic updates:

  • Frequency: Every 2-4 hours
  • Update size: ~50-100MB incremental
  • Downtime: None (updates in background)

Manual update:

railway run -- freshclam

Isolation​

Quarantined files:

  • Separate storage bucket (quarantine/)
  • No public access (RLS policies)
  • Restricted to security_admin role
  • Audit trail for all access

Troubleshooting​

Common Issues​

Issue: ClamAV connection timeout

// Solution: Increase timeout
const client = new ClamAVClient({
host: 'localhost',
timeout: 10000, // 10 seconds
});

Issue: Large files timing out

// Solution: Increase download timeout
const CLAMAV_DOWNLOAD_TIMEOUT = 60000; // 60 seconds

Issue: False positives

// Solution: Review quarantined file, release if safe
await releaseQuarantinedDocument(documentId, {
reason: 'False positive - verified safe',
actor_user_id: adminUserId,
});

Future Enhancements​

Planned Features​

  1. Quarantine Release UI: Allow security admins to restore false positives
  2. Custom Signature Rules: Upload custom ClamAV signatures
  3. Multi-Engine Scanning: Integrate additional AV engines (VirusTotal API)
  4. ML-Based Detection: Train model on file metadata patterns
  5. Real-Time Alerts: Notify security team on infections
  6. Automated Reporting: Schedule daily/weekly compliance reports
  7. File Analysis: Deep inspection of quarantined files (sandboxing)

Scalability​

Current capacity:

  • Single ClamAV instance: ~50-100 files/second
  • Suitable for: < 100K uploads/day

Scaling options:

  • Horizontal: Multiple ClamAV instances behind load balancer
  • Vertical: Larger Railway instance (8GB RAM)
  • Queue-based: SQS/RabbitMQ for async processing at scale

Support & Resources​


Last Updated: October 2025