Programming • Jan 15, 2026 • Cliff
LogSieve: A Technical Deep Dive into Building a Powerful, Offline Log Analysis Tool
LogSieve: A Technical Deep Dive into Building a Powerful, Offline Log Analysis Tool
Introduction
LogSieve started as a personal tool. I was tired of bouncing between grep, awk, less, and various command-line utilities whenever I needed to analyze log files. I wanted something visual, instant, and searchable, but I didn't want to set up a server, manage dependencies, or worry about where my sensitive log data was going.
So I built a single-file web app that runs entirely in the browser. No backend. No database. No external API calls. Everything happens on your machine, in real-time.
Then I kept adding features.
What started as a simple text search became a full-featured log analysis platform with multi-line event detection, structured field extraction, saved filters, and statistical summaries. The challenge was doing all of this without a backend while keeping performance snappy. This post walks through how LogSieve works under the hood—the architecture decisions, the technical patterns, and the clever optimizations that make it possible.
Part 1: The Architecture
Why No Backend?
Before diving into the code, let's talk about why the no-backend approach matters:
- Privacy: Your logs never leave your machine. No upload, no transmission, no third-party access.
- Offline operation: Works perfectly without an internet connection.
- Instant performance: No network latency. Filtering happens immediately as you type.
- Zero setup: Open a browser, drag a file, start analyzing. No Docker, no installation, no configuration.
- Completely free: No servers to host, no API calls to charge for.
The tradeoff is that the browser becomes your computational engine, which introduces challenges around performance and responsiveness.
The Three-Tier JavaScript Architecture
LogSieve is built on three layers of JavaScript, each with a specific job:
┌─────────────────────────────────────────┐
│ index.html (UI Container) │
└──────────────┬──────────────────────────┘
│
┌─────────┴──────────┐
│ │
┌────▼────────┐ ┌────────▼───────┐
│ logsieve.js │ │ logsieve- │
│ (Main │ │ worker.js │
│ Thread) │ │ (Web Worker) │
└────┬────────┘ └────────┬───────┘
│ │
└─────────┬──────────┘
│
┌─────▼────────┐
│ shared.js │
│ (Utilities) │
└──────────────┘
logsieve.js handles the UI, event listeners, and orchestration. It doesn't block the browser thread when processing large datasets.
logsieve-worker.js runs in a separate thread and handles heavy lifting: parsing files, applying filters, sorting, and computing statistics.
shared.js contains utilities shared by both (timestamp parsing, field detection, extraction algorithms, query parsing).
This separation is critical. When you're processing a 500MB log file, you don't want the main thread frozen. Users can still interact with the UI while the worker grinds through data.
How the Worker Communication Works
The main thread sends messages to the worker like this:
function sendToWorker(type, data, waitForResponse = false) {
const id = Date.now() + Math.random();
self.postMessage({
type,
data,
id
});
if (waitForResponse) {
return new Promise((resolve) => {
pendingRequests.set(id, resolve);
});
}
}
The worker receives the message, does work, and sends back results:
self.onmessage = function (e) {
const { type, data, id } = e.data;
try {
let result;
if (type === 'parse_log') {
// Parse the file in chunks
resetParserState();
for (const chunk of data.chunks) {
parseLogChunk(chunk, false, (progress, msg) => {
self.postMessage({
type: 'progress',
data: { progress, message: msg }
});
});
}
parseLogChunk('', true); // finalize
result = { rows, fieldNames: Array.from(fieldNames) };
}
self.postMessage({
type: 'response',
id,
data: result
});
} catch (error) {
self.postMessage({
type: 'error',
id,
error: error.message
});
}
};
The main thread listens for responses and resolves the pending promise:
function handleWorkerMessage(e) {
const { type, id, data, error } = e.data;
if (type === 'response' && id) {
const resolve = pendingRequests.get(id);
if (resolve) {
resolve(error ? { error } : data);
pendingRequests.delete(id);
}
}
if (type === 'progress') {
updateProgressBar(data.progress, data.message);
}
}
This async request-response pattern keeps the UI responsive even during heavy operations.
Part 2: Parsing & Multi-Line Event Detection
The Challenge
Log files aren't always one-line-per-event. Python tracebacks, Java stack traces, and exception messages span multiple lines:
2025-11-13T10:30:10.789Z ERROR Failed to process request
Traceback (most recent call last):
File "/app/main.py", line 45, in process_request
result = compute(item)
File "/app/compute.py", line 12, in compute
return item['price'] * tax_rate
AttributeError: 'NoneType' object has no attribute 'price'
If you treat each line as a separate event, you lose context. The solution: detect continuation lines and merge them.
Chunked Parsing
For large files, LogSieve doesn't parse all at once. It reads in chunks to avoid memory spikes and show progress:
async function handleFile(file) {
const CHUNK_SIZE = 1024 * 1024; // 1MB chunks
let offset = 0;
while (offset < file.size) {
const chunk = file.slice(offset, offset + CHUNK_SIZE);
const text = await readFileAsText(chunk);
// Tell worker to parse this chunk
const isLast = offset + CHUNK_SIZE >= file.size;
const result = await sendToWorker('parse_log', {
chunks: [text],
isLast
}, true);
if (result.error) {
showError(result.error);
return;
}
offset += CHUNK_SIZE;
}
}
In the worker, the parser maintains state across chunks:
let parserState = {
buffer: '', // Remainder from previous chunk
currentEntry: null, // Current multi-line entry being built
id: 1, // Next row ID
totalBytes: 0
};
function parseLogChunk(chunk, isLast, progressCallback) {
parserState.buffer += chunk;
parserState.totalBytes += chunk.length;
let lastNewlineIndex = parserState.buffer.lastIndexOf('\n');
if (!isLast && lastNewlineIndex === -1) {
// No complete line yet, wait for more data
return;
}
let textToProcess;
if (isLast) {
textToProcess = parserState.buffer;
parserState.buffer = '';
} else {
// Process up to the last complete line
textToProcess = parserState.buffer.substring(0, lastNewlineIndex);
parserState.buffer = parserState.buffer.substring(lastNewlineIndex + 1);
}
const lines = textToProcess.split(/\r?\n/);
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
if (!line.trim()) continue;
// Is this a continuation of the previous entry?
if (parserState.currentEntry && isContinuationLine(line)) {
// Append to the current entry
parserState.currentEntry.raw += '\n' + line;
parserState.currentEntry.message += '\n' + line;
continue;
}
// Save the previous entry
if (parserState.currentEntry) {
rows.push(parserState.currentEntry);
}
// Start a new entry
const ts = tryTs(line);
const level = guessLevel(line);
const msg = stripPrefix(line) || line;
parserState.currentEntry = {
id: parserState.id++,
ts,
level,
message: msg,
raw: line,
fields: {},
_lc: (line + " " + msg).toLowerCase()
};
}
if (isLast && parserState.currentEntry) {
rows.push(parserState.currentEntry);
parserState.currentEntry = null;
}
}
Detecting Continuation Lines
The parser uses heuristics to detect continuation lines:
function isContinuationLine(line) {
const trimmed = line.trimLeft();
// Must start with whitespace
if (trimmed === line) return false;
// Python traceback patterns
if (trimmed.startsWith('File "') || trimmed.startsWith('Traceback')) return true;
if (/^\s*(at\s|Error:|Exception:)/.test(line)) return true;
// General rule: heavily indented lines (2+ spaces or tab)
if (line.startsWith(' ') || line.startsWith('\t')) return true;
return false;
}
The result: multi-line exceptions are stored as single entries, preserving their full context:
{
id: 42,
ts: "2025-11-13T10:30:10.789Z",
level: "ERROR",
message: "Failed to process request\nTraceback (most recent call last):\n ...",
raw: "2025-11-13T10:30:10.789Z ERROR Failed to process request\nTraceback...",
fields: {},
_lc: "2025-11-13t10:30:10.789z error failed to process request..."
}
Part 3: Filtering & Search
The Search Index
Every log entry has a _lc field—a lowercase copy of all searchable text:
row._lc = (row.raw + " " + row.message + " " + Object.values(row.fields).flat().join(" ")).toLowerCase();
Text search is dead simple:
if (searchQuery) {
const terms = searchQuery.split(/\s+/);
view = view.filter(row =>
terms.every(term => row._lc.includes(term))
);
}
This tokenized AND-search is fast because it's just substring matching. Searching across 100,000 log entries for "error database" takes milliseconds.
The Query Builder: Structured Filtering
But you often need more than text search. You need "ERROR level AND timestamp after 2026-01-01." That's where the Query Builder comes in.
The builder lets users create rules without touching a syntax:
function createEmptyRule() {
return {
id: generateUUID(),
field: 'level',
operator: 'equals',
value: '',
logic: 'AND',
enabled: true
};
}
Each rule has a field, operator, value, and logic connector. The UI renders these as dropdowns:
<div class="builder-rule">
<select class="rule-field">
<option value="level">Log Level</option>
<option value="ts">Timestamp</option>
<option value="message">Message</option>
<option value="field:userId">User ID</option>
</select>
<select class="rule-operator">
<option value="equals">equals</option>
<option value="contains">contains</option>
<option value="after">after</option>
<option value="before">before</option>
</select>
<input type="text" class="rule-value" placeholder="value" />
<select class="rule-logic">
<option value="AND">AND</option>
<option value="OR">OR</option>
</select>
</div>
When the user clicks "Apply", the rules are converted to a query object:
function rulesToQuery(rules) {
if (rules.length === 0) return null;
const groups = [];
let currentGroup = [];
for (const rule of rules) {
if (!rule.enabled) continue;
currentGroup.push({
field: rule.field,
operator: rule.operator,
value: rule.value
});
// Next rule uses OR? Start a new AND group
if (rule.logic === 'OR') {
groups.push(currentGroup);
currentGroup = [];
}
}
if (currentGroup.length > 0) {
groups.push(currentGroup);
}
return { type: 'builder', groups };
}
Then this config is applied to the data:
function applyFilters(sortConfig) {
let view = rows;
// Apply builder rules
if (appliedFilterConfig && appliedFilterConfig.rules.length > 0) {
view = applyFilterConfig(view, appliedFilterConfig);
}
// Sort
const sort = sortConfig.field;
const order = sortConfig.order;
view = view.sort((a, b) => {
let valA = sort.startsWith('field:')
? a.fields[sort.slice(6)]
: a[sort];
let valB = sort.startsWith('field:')
? b.fields[sort.slice(6)]
: b[sort];
return order === 'asc'
? (valA < valB ? -1 : 1)
: (valA > valB ? -1 : 1);
});
return view;
}
Advanced Query Syntax
For power users, LogSieve also supports a compact text syntax:
level:ERROR AND (status:500 OR status:503) AND timestamp:>2025-01-13
This is parsed into a tree and applied the same way. The parser handles:
- Field search:
level:ERROR - Boolean operators:
AND,OR,NOT - Grouping:
(level:ERROR OR level:WARN) AND app:backend - Wildcards:
user:admin* - Regex:
message:/error \d+/ - Existence checks:
has:field(field is not empty) ormissing:field(field is empty) - Comparisons:
latency>100,code!=200
Part 4: Field Extraction & Dynamic Columns
Named-Group Regex
One of LogSieve's most powerful features is automatic field extraction using named regex groups. If you have Apache access logs like:
192.168.1.1 - user [13/Nov/2025:10:30:10 +0000] "GET /api/users HTTP/1.1" 200 1234
You can create an extractor with this pattern:
^(?<ip>\S+) - (?<user>\S+) \[(?<ts>[^\]]+)\] "(?<method>\w+) (?<path>\S+) HTTP/[\d.]+" (?<status>\d+) (?<bytes>\d+)
The extraction logic (in shared.js) uses JavaScript's matchAll:
function runSingleExtractor(lines, extractor) {
const pattern = new RegExp(extractor.pattern, 'g');
for (const row of lines) {
const matches = [...row.raw.matchAll(pattern)];
if (matches.length === 0) continue;
for (const match of matches) {
if (!match.groups) continue;
for (const [groupName, groupValue] of Object.entries(match.groups)) {
if (groupName === 'ts') {
// Override detected timestamp if extractor provides one
row.ts = parseTimestampToISO(groupValue);
} else if (groupName === 'level') {
row.level = groupValue.toUpperCase();
} else if (groupName === 'message') {
row.message = groupValue;
} else {
// Store as extracted field
if (!row.fields[groupName]) {
row.fields[groupName] = [];
}
row.fields[groupName].push(groupValue);
}
}
}
}
// Flatten single-item arrays to scalars
for (const row of lines) {
for (const [key, value] of Object.entries(row.fields)) {
if (Array.isArray(value) && value.length === 1) {
row.fields[key] = value[0];
}
}
}
}
After extraction, the parsed fields appear as columns in the results table:
| ip | user | ts | method | path | status | bytes |
|---|---|---|---|---|---|---|
| 192.168.1.1 | user | 2025-11-13T10:30:10Z | GET | /api/users | 200 | 1234 |
Column Visibility & Reordering
As extractors add fields, the UI gets cluttered. That's why LogSieve lets you show/hide and reorder columns:
let visibleColumns = new Set(); // Which columns user wants to see
let columnOrder = []; // Order to display them
function isColumnVisible(colKey) {
// If no visibility preferences set, show all columns
if (visibleColumns.size === 0) return true;
return visibleColumns.has(colKey);
}
function setColumnVisibility(colKey, visible) {
if (visible) {
visibleColumns.add(colKey);
} else {
visibleColumns.delete(colKey);
}
saveVisibleColumnsToPrefs();
}
function renderColumnsPanel() {
const cols = ['id', 'ts', 'level', 'message', 'raw', ...Array.from(fieldNames)];
let html = '<div class="columns-list">';
for (const col of columnOrder) {
const visible = isColumnVisible(col);
html += `
<div class="column-row" draggable="true" data-col="${col}">
<span class="drag-handle">≡</span>
<input type="checkbox" ${visible ? 'checked' : ''}
onchange="setColumnVisibility('${col}', this.checked)">
<span>${col}</span>
</div>
`;
}
html += '</div>';
document.getElementById('columnsPanel').innerHTML = html;
// Add drag handlers
enableColumnDragAndDrop();
}
Preferences are saved to localStorage so your column settings persist across sessions:
const Storage = {
KEYS: {
VISIBLE_COLUMNS: 'logsieve-visible-columns',
COLUMN_ORDER: 'logsieve-column-order'
},
saveVisibleColumns(cols) {
localStorage.setItem(this.KEYS.VISIBLE_COLUMNS, JSON.stringify(Array.from(cols)));
},
getVisibleColumns() {
const stored = localStorage.getItem(this.KEYS.VISIBLE_COLUMNS);
return stored ? new Set(JSON.parse(stored)) : new Set();
},
saveColumnOrder(order) {
localStorage.setItem(this.KEYS.COLUMN_ORDER, JSON.stringify(order));
},
getColumnOrder() {
return JSON.parse(localStorage.getItem(this.KEYS.COLUMN_ORDER)) || [];
}
};
Part 5: Summary Statistics
The Problem
When you have 100,000 log entries with 20 extracted fields, how do you understand what you're looking at? That's where Summary Statistics comes in.
The statistics panel shows: - Detected field types (text, number, date, array) - Value distributions (top 10 most common values) - Basic stats (min, max, avg for numeric fields; cardinality for text)
Computing Statistics in the Background
Statistics computation is expensive, but you can't block the UI while crunching numbers on 100k rows. So it runs in the worker:
function computeSummaryStats(view, fieldRegistry) {
const allFields = ['id', 'ts', 'level', 'message', 'raw', ...Array.from(fieldNames)];
const result = {};
for (const field of allFields) {
const fieldStats = computeFieldStats(view, field, fieldRegistry.get(field));
result[field] = fieldStats;
}
return result;
}
function computeFieldStats(view, fieldName, fieldMeta) {
const values = [];
// Extract all values for this field
for (const row of view) {
let val;
if (['id', 'ts', 'level', 'message', 'raw'].includes(fieldName)) {
val = row[fieldName];
} else {
val = row.fields[fieldName];
}
if (val !== undefined && val !== null && val !== '') {
values.push(val);
}
}
const withValue = values.length;
const withoutValue = view.length - withValue;
// Count unique values
const uniqueMap = new Map();
for (const val of values) {
let key = val;
if (Array.isArray(val)) {
key = JSON.stringify(val);
}
uniqueMap.set(key, (uniqueMap.get(key) || 0) + 1);
}
const unique = uniqueMap.size;
const type = fieldMeta?.type || 'unknown';
const stats = {
type,
withValue,
withoutValue,
unique
};
// Compute numeric stats
if (type === 'numeric') {
const nums = values.map(v => parseFloat(v)).filter(v => !isNaN(v));
if (nums.length > 0) {
stats.min = Math.min(...nums);
stats.max = Math.max(...nums);
stats.avg = nums.reduce((a, b) => a + b) / nums.length;
}
}
// Top 10 most common values
if (type !== 'array' && type !== 'object') {
stats.topValues = [...uniqueMap.entries()]
.sort((a, b) => b[1] - a[1])
.slice(0, 10)
.map(([val, count]) => ({ value: val, count }));
}
return stats;
}
The main thread requests these stats when the user expands the summary panel:
$("#summary-details").addEventListener('toggle', () => {
if (document.getElementById('summary-details').open) {
sendToWorker('compute_summary_stats', {
fieldRegistry: Array.from(fieldRegistry.entries())
}, true).then(stats => {
renderStatsFromWorker(stats);
});
}
});
Part 6: Real-World Use Cases
Use Case 1: Debugging a Python Application
You have a production Python app that crashed. The logs are 50MB of mixed INFO, WARNING, and ERROR entries. You need to find the error, understand the traceback, and see what was happening before.
1. Drag the log file into LogSieve
2. Filter: level:ERROR
3. LogSieve automatically groups the 7-line Python traceback as a single entry
4. Click the entry to see the full stack trace
5. Use the Query Builder to narrow to errors between 10:00-10:30 AM
6. Use the Advanced Query to search: error AND (database OR connection)
7. Extract user_id and request_id with regex
8. Pivot to see which users/requests had errors most frequently
Use Case 2: Analyzing Apache Access Logs
You have 500,000 Apache access log lines and you want to understand traffic patterns:
1. Import sample-extractors.json to get the Apache extractor
2. Run the extractor to pull out: ip, user, timestamp, method, path, status, bytes
3. Use the Columns panel to hide raw and keep only extracted fields
4. Sort by status code to find 5xx errors
5. Summary Statistics shows:
- status distribution (mostly 200, some 301, 15 404s, 2 500s)
- top 10 paths by frequency
- average bytes served
- 87 unique IP addresses
6. Click a 500 error, copy the path, use Advanced Query: status:500 AND path:exact_path
Use Case 3: Ad-hoc CSV Data Exploration
You downloaded a CSV export of database records and want to spot-check it:
1. Drag the CSV into LogSieve
2. LogSieve auto-parses columns
3. Summary Statistics shows:
- 10,234 rows
- user_id: 50 unique values, min 1, max 502
- created_at: dates range from 2025-01-01 to 2025-01-13
- status: text values = ['active', 'inactive', 'pending']
4. Filter: status:active AND created_at:>2025-01-10
5. Find the 47 rows you need, export as JSON for further processing
Part 7: Storage & State Management
LocalStorage for Persistence
LogSieve saves everything locally so your work isn't lost when you close the browser:
const Storage = {
KEYS: {
EXTRACTORS: 'logsieve-extractors',
FILTERS: 'logsieve-filters',
ACTIVE_EXTRACTORS: 'logsieve-active-extractors',
VISIBLE_COLUMNS: 'logsieve-visible-columns',
COLUMN_ORDER: 'logsieve-column-order',
THEME: 'logsieve-theme'
},
saveExtractors(extractors) {
localStorage.setItem(
this.KEYS.EXTRACTORS,
JSON.stringify(extractors)
);
},
getExtractors() {
const stored = localStorage.getItem(this.KEYS.EXTRACTORS);
return stored ? JSON.parse(stored) : [];
},
saveSavedFilters(filters) {
localStorage.setItem(
this.KEYS.FILTERS,
JSON.stringify(filters)
);
}
// ... etc
};
This means you can save 10 custom extractors for your log format, come back tomorrow, and they're still there.
Export & Import
You can also export your extractor library and filters as JSON, share them with teammates:
function exportLibrary() {
const library = {
extractors: Storage.getExtractors(),
filters: Storage.getSavedFilters(),
exportedAt: new Date().toISOString()
};
const json = JSON.stringify(library, null, 2);
const blob = new Blob([json], { type: 'application/json' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `logsieve-library-${Date.now()}.json`;
a.click();
URL.revokeObjectURL(url);
}
async function importLibrary(file) {
const text = await readFileAsText(file);
const library = JSON.parse(text);
Storage.saveExtractors([
...Storage.getExtractors(),
...library.extractors
]);
Storage.saveSavedFilters([
...Storage.getSavedFilters(),
...library.filters
]);
location.reload();
}
Part 8: Performance Optimizations
Pagination
With 100,000+ log entries, rendering them all at once is a non-starter. LogSieve uses pagination:
const per = 50; // items per page
const page = 1;
function paginate(list) {
const start = (page - 1) * per;
return list.slice(start, start + per);
}
function renderPage(pageData) {
let html = '<table><thead>...';
for (const row of pageData) {
html += '<tr>';
for (const col of visibleColumns) {
html += `<td>${escapeHtml(getFieldValue(row, col))}</td>`;
}
html += '</tr>';
}
html += '</table>';
// Pagination controls
html += `
<div class="pagination">
<button onclick="previousPage()">← Previous</button>
<span>Page ${page} of ${Math.ceil(totalRows / per)}</span>
<button onclick="nextPage()">Next →</button>
</div>
`;
document.getElementById('results').innerHTML = html;
}
Lazy Evaluation
The view isn't computed until needed. If you have a 1MB log file and type a search query, LogSieve only filters rows that haven't been scrolled past yet.
Worker Thread Isolation
All blocking operations (parsing, filtering, sorting) run in the worker. The main thread stays responsive for UI interactions.
Part 9: The Tech Stack (Or Lack Thereof)
Here's what's not in LogSieve:
- No React, Vue, or Angular
- No npm packages or dependencies
- No build step
- No TypeScript compilation
- No webpack or bundler
- No API client library
Here's what is:
- Vanilla JavaScript (ES2020 features: destructuring, arrow functions, template literals, Promise, async/await)
- Web Workers API (for threading)
- FileReader API (for file uploads)
- localStorage API (for persistence)
- Regex (for parsing and extraction)
- CSS Variables & Grid (for responsive design)
This is intentional. Zero dependencies means: - Fewer security vulnerabilities - No dependency hell or version conflicts - Smaller codebase (easier to audit, maintain, fork) - Faster load times - Works on older browsers (with graceful degradation)
Part 10: Lessons Learned
Building LogSieve taught me several things:
1. Client-side computing is viable, with caveats
Web Workers let you do serious processing without freezing the UI. The trick is breaking work into chunks and giving feedback via progress messages.
2. The browser is a surprisingly capable OS
FileReader, localStorage, Web Workers, IndexedDB, ServiceWorkers—browsers have gotten powerful. You can do a lot without a backend.
3. Heuristics matter
Multi-line event detection isn't perfect, but it's good enough 95% of the time.
4. State management is hard, even in the browser
Keeping the UI in sync with the worker thread's data, handling undo/redo, dealing with stale filters after extractors run...these problems aren't unique to backend development.
5. UX is the bottleneck, not performance
A 100,000-line filter takes 200ms. The user doesn't notice. But if the UI doesn't respond for 500ms, they think it's broken. Responsiveness beats speed.
Conclusion
LogSieve exists because I got frustrated with command-line tools. It grew because the problems I solved—multi-line events, field extraction, dynamic filtering—are real problems that others face too.
The architecture is a careful balance: use Web Workers to keep the browser responsive, break large operations into chunks with progress feedback, and leverage browser APIs (FileReader, localStorage, matchAll) to do things that would normally require a server.
If you're analyzing logs, give it a try. If you're building tools or evaluating engineering work, I hope this deep dive showed you what's possible with vanilla JavaScript and a clear architectural vision.
Get Started
Try it out: LogSieve Online
Want to contribute? Fork the repository on GitHub or browse the issue tracker to find areas you can help with.
Looking for similar engineering services? I build tools like this. Things that solve real problems, run efficiently, and just work. Reach out if you need help with your project.
Last updated: January 2026
We build software the same way we write about it: Robust. Tested. Correct.
At McIndi Solutions, we specialize in mission-critical modernization and high-security platforms for healthcare and finance. Whether you need a fractional CTO to guide your architecture or a senior engineering team to unblock a complex automation challenge, we are available for advisory and hands-on engagements.
Email us at sales@mcindi.com to discuss your project.