Malware Scanning Implementation Guide
This guide provides technical details about how the malware scanning system was implemented across 7 phases, key architectural decisions, and integration points.
Implementation Overviewβ
The malware scanning system was built in 7 phases:
- Phase 1: ClamAV Client Library - Low-level TCP client for ClamAV daemon
- Phase 2: Database Schema - Tables for scans, quarantine, and audit logs
- Phase 3: Core Scanning Logic - Document scanner orchestration layer
- Phase 4: ClamAV Adapter - Production-ready adapter pattern integration
- Phase 5: Console Management UI - Staff interface for quarantine management
- Phase 6: Monitoring & Compliance - Metrics dashboard and reporting
- Phase 7: Testing & Production Rollout - E2E tests and deployment
Architectureβ
System Componentsβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Platform App (Vercel) β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Upload βββββΆβ Document βββββΆβ Quarantine β β
β β Handler β β Scanner β β Manager β β
β ββββββββββββββββ ββββββββ¬ββββββββ ββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β ClamAV Daemon β
β (Railway) β
β - Port: 3310 β
β - INSTREAM Proto β
β - Virus DB (CVD) β
βββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supabase (Database + Storage) β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β document_ β β quarantine_ β β Quarantine β β
β β scans β β events β β Bucket β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Console App (Vercel) β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Quarantine β β Metrics β β Compliance β β
β β Documents β β Dashboard β β Reports β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flowβ
Document Upload & Scan Flowβ
1. User uploads document
β
2. Platform API creates document record (upload_status: PENDING)
β
3. File uploaded to Supabase Storage
β
4. Upload finalized β emits DocumentFinalized event
β
5. DocumentScanner receives event
β
6. Creates scan record (status: PENDING)
β
7. Downloads file from storage
β
8. Sends file to ClamAV via INSTREAM protocol
β
9. ClamAV scans file against virus database
β
10. Updates scan record (status: CLEAN/INFECTED/FAILED)
β
11. If INFECTED β Quarantine file
β
12. If CLEAN β Mark document as available
Quarantine Flowβ
1. Scanner detects infection
β
2. Calls QuarantineManager.quarantine()
β
3. Copies file to quarantine bucket (YYYY-MM/DD/docId-timestamp.ext)
β
4. Updates document record:
- is_quarantined = true
- quarantined_at = NOW()
- quarantine_reason = "Malware detected: [virus names]"
β
5. Creates quarantine event audit log:
- event_type = 'quarantined'
- actor_user_id = NULL (automated)
- metadata = { virus_names, scan_id }
β
6. Document inaccessible to all users
Phase-by-Phase Implementationβ
Phase 1: ClamAV Client Libraryβ
File: apps/platform/lib/malware-scanning/clam-client.ts
Key Features:
- Low-level TCP socket communication with ClamAV daemon
- INSTREAM protocol implementation for buffer scanning
- Support for PING, VERSION, SCAN commands
- Custom error handling with
ClamAVErrorclass - Configurable timeouts and connection management
Design Decisions:
- Why TCP sockets? ClamAV's native protocol is TCP-based INSTREAM for efficient streaming
- Why buffer scanning? Allows scanning in-memory data without filesystem access
- Why custom error class? Distinguishes ClamAV errors from network/system errors
Code Example:
const client = new ClamAVClient({
host: 'localhost',
port: 3310,
timeout: 5000
});
const result = await client.scanBuffer(fileBuffer);
// result: { isInfected: boolean, viruses: string[], durationMs: number }
Phase 2: Database Schemaβ
Files: supabase/migrations/20251017_add_quarantine_fields.sql
Tables Created:
-
document_scans- Scan results and audit trail- id (uuid, PK)
- document_id (uuid, FK to documents)
- vault_id (uuid, FK to vaults)
- status (enum: PENDING, CLEAN, INFECTED, FAILED)
- scanned_at (timestamptz)
- engine (text) - "clamav"
- scan_duration_ms (int)
- viruses (text[]) - Array of detected virus names
- error_message (text)
- meta (jsonb) - ClamAV version, raw response -
quarantine_events- Audit log for quarantine actions- id (uuid, PK)
- document_id (uuid)
- vault_id (uuid)
- event_type (enum: quarantined, released, deleted)
- reason (text)
- actor_user_id (uuid, nullable) - NULL = automated
- metadata (jsonb)
- created_at (timestamptz) -
documentstable - Added quarantine fields- is_quarantined (boolean, default false)
- quarantined_at (timestamptz, nullable)
- quarantine_reason (text, nullable)
Design Decisions:
- Separate scan table: Allows multiple scans per document, retains history
- Event sourcing: Quarantine events provide complete audit trail
- Nullable actor_user_id: Distinguishes automated vs. manual quarantine actions
- JSONB metadata: Flexible structure for future extensions
Phase 3: Core Scanning Logicβ
File: apps/platform/lib/malware-scanning/scanner.ts
Key Components:
-
DocumentScanner Class:
class DocumentScanner {
async scanDocument(documentId, fileBuffer, vaultId): ScanResult
async scanDocumentWithRetry(...): ScanResult
async isAvailable(): boolean
} -
Scan Workflow:
- Create scan record (status: PENDING)
- Perform ClamAV scan
- Update scan record with results
- Auto-quarantine if infected (configurable)
- Handle errors with retry logic
-
Retry Mechanism:
- Exponential backoff (2^attempt * 1000ms)
- Configurable max retries (default: 3)
- Skips retry if infection detected (success case)
Design Decisions:
- Why separate scanner class? Encapsulates scan lifecycle management
- Why auto-quarantine option? Allows testing without side effects
- Why retry logic? Handles transient network/ClamAV issues gracefully
Phase 4: ClamAV Adapterβ
Files:
apps/platform/lib/scan/clamd.ts- Adapter implementationapps/platform/lib/scan/adapter.ts- Integration point
Integration Pattern:
// adapter.ts
export function getScanner(): ScannerAdapter {
if (process.env.CLAMAV_HOST) {
return new ClamAVScannerAdapter();
}
return new MockScannerAdapter();
}
ClamAVScannerAdapter:
- Downloads file from Supabase Storage
- Scans with DocumentScanner
- Returns normalized result for platform
- Handles errors with fallback to MOCK
Design Decisions:
- Why adapter pattern? Allows swapping scanning implementations
- Why fallback to mock? Graceful degradation if ClamAV unavailable
- Why download file? Scanner needs buffer, not storage key
Phase 5: Console Management UIβ
Files:
apps/console/app/(app)/admin/quarantine/page.tsx- Documents listapps/console/lib/data/quarantine.ts- Data access layerapps/console/app/api/admin/quarantine/route.ts- API endpoint
Features:
- Documents List: Table/card views of quarantined documents
- Filters: By vault, date range, virus signature
- Badges: Visual indicators for INFECTED/FAILED status
- Actions: View details, release (future), delete (future)
- Authorization: Requires
security_adminrole
Design Decisions:
- Server-side rendering: Better performance, no client-side data loading
- Suspense boundaries: Progressive loading for better UX
- RLS policies: Database-level security even if UI bypassed
Phase 6: Monitoring & Complianceβ
Files:
apps/console/app/(app)/admin/quarantine/metrics/page.tsx- Dashboardapps/console/lib/data/quarantine-metrics.ts- Metrics calculationapps/console/app/api/admin/quarantine/compliance/route.ts- Reports API
Metrics Dashboard:
type QuarantineMetrics = {
totalQuarantined: number;
totalInfected: number;
totalFailed: number;
quarantinedLast24h: number;
quarantinedLast7d: number;
quarantinedLast30d: number;
topViruses: Array<{ virus: string; count: number }>;
scansByStatus: Array<{ status: string; count: number }>;
infectionsByVault: Array<{ vaultId: string; count: number }>;
};
Compliance Reports:
type ComplianceReport = {
generatedAt: string;
reportPeriod: { start: string; end: string };
summary: {
totalScans: number;
cleanScans: number;
infectedScans: number;
failedScans: number;
successRate: number;
};
topThreats: Array<{
signature: string;
count: number;
firstSeen: string;
lastSeen: string;
}>;
vaultBreakdown: Array<{
vaultId: string;
totalScans: number;
infections: number;
infectionRate: number;
}>;
};
Design Decisions:
- In-memory aggregation: Flexible calculations without complex SQL
- Generic report format: Supports SOC 2, ISO 27001, GDPR
- Rate limiting: Lower limits for expensive compliance reports (20 req/min)
Phase 7: Testing & Production Rolloutβ
Files:
apps/platform/lib/malware-scanning/__tests__/workflow.e2e.test.ts- E2E testsapps/platform/lib/malware-scanning/__tests__/*.test.ts- Unit/integration tests
Test Coverage:
- 124 total tests (121 passing, 3 intentionally skipped)
- E2E workflow tests: Upload β Scan β Quarantine
- Performance benchmarks: < 2s for small files, < 5s for 1MB files
- Error handling: Corrupted files, empty files, connection failures
- Concurrent scanning: 5 parallel scans complete within 10s
Production Deployment:
- Railway: ClamAV daemon (4GB RAM, 1 vCPU)
- Vercel: Platform + Console apps with environment variables
- Supabase: Database + Storage (quarantine bucket)
- Rollback: < 5 minutes via environment variable toggle
Code Organizationβ
apps/platform/lib/malware-scanning/
βββ clam-client.ts # ClamAV TCP client (Phase 1)
βββ scanner.ts # Document scanner orchestration (Phase 3)
βββ quarantine.ts # Quarantine manager (Phase 3)
βββ handler.ts # Event handler for finalized docs (Phase 4)
βββ types.ts # TypeScript type definitions
βββ __tests__/ # Test suite (Phase 7)
β βββ clam-client.test.ts
β βββ scanner.integration.test.ts
β βββ quarantine.integration.test.ts
β βββ handler.integration.test.ts
β βββ workflow.e2e.test.ts
βββ README.md # Developer quick start
apps/platform/lib/scan/
βββ adapter.ts # Scanner adapter interface
βββ clamd.ts # ClamAV adapter implementation (Phase 4)
βββ __tests__/
βββ clamd.integration.test.ts
apps/console/lib/data/
βββ quarantine.ts # Quarantine data access (Phase 5)
βββ quarantine-metrics.ts # Metrics data access (Phase 6)
apps/console/app/(app)/admin/quarantine/
βββ page.tsx # Documents list page (Phase 5)
βββ metrics/
βββ page.tsx # Metrics dashboard (Phase 6)
apps/console/app/api/admin/quarantine/
βββ route.ts # Documents API endpoint (Phase 5)
βββ metrics/
β βββ route.ts # Metrics API endpoint (Phase 6)
βββ compliance/
βββ route.ts # Compliance reports API (Phase 6)
Key Design Patternsβ
1. Adapter Patternβ
Allows swapping scanning implementations without changing platform code:
// Platform uses generic interface
const scanner = getScanner(); // Returns ClamAVScannerAdapter or MockScannerAdapter
const result = await scanner.scan(document);
// Each adapter implements same interface
interface ScannerAdapter {
scan(document: Document): Promise<ScanResult>;
}
2. Event-Driven Architectureβ
Document finalization triggers async scanning:
// Upload handler emits event
await emitDocumentFinalized({ documentId, vaultId });
// Scanner listens for event
handleDocumentFinalized(async (event) => {
await scanner.scanDocument(event.documentId, ...);
});
3. Database-First Securityβ
Row-Level Security policies enforce access control:
-- Only security_admin can view quarantined documents
CREATE POLICY "quarantine_admin_only" ON documents
FOR SELECT USING (
NOT is_quarantined OR
auth.jwt() ->> 'user_role' = 'security_admin'
);
4. Graceful Degradationβ
System remains functional even if ClamAV is unavailable:
try {
const result = await clamav.scanBuffer(buffer);
} catch (error) {
// Log error, mark as FAILED, allow manual review
return { status: 'FAILED', errorMessage: error.message };
}
Integration Pointsβ
1. Document Upload Flowβ
File: apps/platform/lib/documents/finalize.ts (or similar)
// After document upload completes
await finalizeUpload(documentId, storageKey);
// Emit event for async scanning
if (process.env.SCAN_EMIT_ON_FINALIZE === 'true') {
await emitDocumentFinalized({ documentId, vaultId });
}
2. Storage Integrationβ
Access Pattern:
// Download from Supabase Storage
const { data, error } = await supabase.storage
.from('documents')
.download(storageKey);
// Scan buffer
const fileBuffer = Buffer.from(await data.arrayBuffer());
const result = await scanner.scanDocument(documentId, fileBuffer, vaultId);
3. Authorization Integrationβ
Middleware:
// Check security_admin role
const { email } = await getIdentityFromRequestHeaders(request.headers);
const roles = await getUserRolesByEmail(email, supabase);
const hasSecurityAdmin = roles.some(role => role.toLowerCase() === 'security_admin');
if (!hasSecurityAdmin) {
return authorizationError('Security admin role required');
}
Testing Strategyβ
Unit Testsβ
Test individual components in isolation:
describe('ClamAVClient', () => {
it('should detect EICAR test virus', async () => {
const client = new ClamAVClient({ host: 'localhost' });
const result = await client.scanBuffer(eicarBuffer);
expect(result.isInfected).toBe(true);
expect(result.viruses[0]).toContain('Eicar');
});
});
Integration Testsβ
Test component interactions:
describe('DocumentScanner', () => {
it('should create scan record and update on completion', async () => {
const scanner = new DocumentScanner({ clamav, supabase });
const result = await scanner.scanDocument(docId, buffer, vaultId);
const scanRecord = await supabase
.from('document_scans')
.select()
.eq('id', result.scanId)
.single();
expect(scanRecord.data.status).toBe('CLEAN');
});
});
E2E Testsβ
Test complete workflows:
describe('Complete Workflow', () => {
it('should upload β scan β quarantine infected file', async () => {
const eicarBuffer = Buffer.from(EICAR_SIGNATURE);
// Step 1: Scan document
const result = await scanner.scanDocument(docId, eicarBuffer, vaultId);
expect(result.isInfected).toBe(true);
expect(result.wasQuarantined).toBe(true);
// Step 2: Verify quarantine
const document = await getDocument(docId);
expect(document.is_quarantined).toBe(true);
// Step 3: Verify audit log
const events = await getQuarantineEvents(docId);
expect(events[0].event_type).toBe('quarantined');
});
});
Performance Considerationsβ
Scan Performanceβ
- Target latency: < 2s average, < 5s P95
- Throughput: ~50-100 files/second
- Bottleneck: ClamAV CPU processing, not network
Optimization strategies:
- File streaming (chunked INSTREAM protocol)
- Connection pooling (reuse TCP connections)
- Async/non-blocking I/O
- Resource limits (max file size: 100MB)
Database Performanceβ
Efficient queries:
// Use count with head: true
const { count } = await supabase
.from('document_scans')
.select('*', { count: 'exact', head: true })
.eq('status', 'INFECTED');
// Limit TOP-N queries
const topViruses = await supabase
.from('document_scans')
.select('viruses')
.not('viruses', 'is', null)
.limit(10);
Indexes (recommended):
CREATE INDEX idx_document_scans_status ON document_scans(status);
CREATE INDEX idx_document_scans_scanned_at ON document_scans(scanned_at);
CREATE INDEX idx_documents_is_quarantined ON documents(is_quarantined);
Memory Managementβ
ClamAV requirements:
- Base memory: ~800MB (virus database)
- Per-scan overhead: ~50-100MB
- Recommended: 4GB RAM for production
File handling:
// Stream large files instead of loading into memory
const stream = createReadStream(filePath);
const result = await client.scanStream(stream);
Security Considerationsβ
Threat Modelβ
Protected against:
- Known malware (ClamAV signature database: 8M+ signatures)
- Zero-day threats (heuristic analysis)
- File-borne attacks (macros, exploits)
Not protected against:
- Encrypted malware (cannot scan encrypted files)
- Polymorphic malware (may evade signatures)
- Social engineering (outside scan scope)
Signature Updatesβ
Automatic updates:
- Frequency: Every 2-4 hours
- Update size: ~50-100MB incremental
- Downtime: None (updates in background)
Manual update:
railway run -- freshclam
Isolationβ
Quarantined files:
- Separate storage bucket (
quarantine/) - No public access (RLS policies)
- Restricted to security_admin role
- Audit trail for all access
Troubleshootingβ
Common Issuesβ
Issue: ClamAV connection timeout
// Solution: Increase timeout
const client = new ClamAVClient({
host: 'localhost',
timeout: 10000, // 10 seconds
});
Issue: Large files timing out
// Solution: Increase download timeout
const CLAMAV_DOWNLOAD_TIMEOUT = 60000; // 60 seconds
Issue: False positives
// Solution: Review quarantined file, release if safe
await releaseQuarantinedDocument(documentId, {
reason: 'False positive - verified safe',
actor_user_id: adminUserId,
});
Future Enhancementsβ
Planned Featuresβ
- Quarantine Release UI: Allow security admins to restore false positives
- Custom Signature Rules: Upload custom ClamAV signatures
- Multi-Engine Scanning: Integrate additional AV engines (VirusTotal API)
- ML-Based Detection: Train model on file metadata patterns
- Real-Time Alerts: Notify security team on infections
- Automated Reporting: Schedule daily/weekly compliance reports
- File Analysis: Deep inspection of quarantined files (sandboxing)
Scalabilityβ
Current capacity:
- Single ClamAV instance: ~50-100 files/second
- Suitable for: < 100K uploads/day
Scaling options:
- Horizontal: Multiple ClamAV instances behind load balancer
- Vertical: Larger Railway instance (8GB RAM)
- Queue-based: SQS/RabbitMQ for async processing at scale
Related Documentationβ
- Malware Scanning Overview - High-level system overview
- Deployment Guide - Production deployment instructions
- Security Best Practices - General security guidance
Support & Resourcesβ
- Source Code:
apps/platform/lib/malware-scanning/ - Tests:
apps/platform/lib/malware-scanning/__tests__/ - ClamAV Docs: https://docs.clamav.net/
- Security Team: security@torvussecurity.com
Last Updated: October 2025