GoSQLX Architecture Documentation
Version: v1.10.0 Last Updated: 2026-03-13
Table of Contents
- System Overview
- Package Structure
- Component Architecture
- Token Type System (ARCH-002)
- Data Flow
- Memory Management
- Concurrency Model
- Design Patterns
- Performance Architecture
- LSP Architecture
- Linter Architecture
- Security Scanner Architecture
- MCP Architecture
System Overview
GoSQLX is a production-ready, high-performance SQL parsing library with comprehensive dialect support, security scanning, linting capabilities, and full Language Server Protocol (LSP) integration for IDE tooling.
Core Design Principles
- Zero-Copy Operations: Minimize memory allocations by working directly with byte slices
- Object Pooling: Reuse expensive objects through sync.Pool (60-80% memory reduction)
- Immutable Tokens: Tokens are immutable once created - safe for concurrent access
- Stateless Parsing: Parser maintains no global state - enables horizontal scaling
- Unicode-First: Full UTF-8 support throughout (8 languages validated)
- Multi-Dialect: Support for PostgreSQL, MySQL, SQL Server, Oracle, SQLite
- Type-Safe Dispatch: O(1) integer-based token type comparisons (14x faster)
- Lock-Free Metrics: Atomic counters for production observability
High-Level Architecture (v1.6.0)
┌────────────────────────────────────────────────────────────────────────────────┐
│ Application Layer & Tools │
│ ┌─────────────┬──────────────┬──────────────┬─────────────┬───────────────┐ │
│ │ CLI Tool │ LSP Server │ Linter │ Security │ MCP Server │ │
│ │ (validate, │ (JSON-RPC │ (10 rules: │ Scanner │ (7 tools, │ │
│ │ format, │ handler, │ L001-L010, │ (injection │ HTTP, │ │
│ │ analyze, │ rate limit, │ whitespace,│ detection)│ bearer auth,│ │
│ │ parse) │ doc mgmt) │ style) │ │ streaming) │ │
│ └─────────────┴──────────────┴──────────────┴─────────────┴───────────────┘ │
└────────────────────────────────────────────────────────────────────────────────┘
▼
┌───────────────────────────────────────────────────────────────────┐
│ GoSQLX API (pkg/gosqlx) │
│ High-level API providing SQL parsing, validation, formatting │
└───────────────────────────────────────────────────────────────────┘
▼
┌──────────────┬────────────────────────┬──────────────────────────┐
│ Tokenizer │ Parser │ AST │
│ (zero-copy, │ (recursive descent, │ (14 statement types, │
│ 120+ token │ 14 statement types, │ pooled nodes, │
│ types, │ PostgreSQL features, │ visitor pattern) │
│ dialect │ window functions) │ │
│ support) │ │ │
├──────────────┼────────────────────────┼──────────────────────────┤
│ Object Pool │ Token Stream │ Node Factory │
│ (tokenizer, │ (position tracking, │ (statement/expression │
│ parser, │ conversion layer) │ pooling) │
│ AST pools) │ │ │
└──────────────┴────────────────────────┴──────────────────────────┘
▼
┌───────────────────────────────────────────────────────────────────┐
│ Core Models, Error Handling, Metrics & Keywords │
│ ┌──────────┬──────────┬──────────┬──────────┬─────────────────┐ │
│ │ Models │ Errors │ Metrics │ Keywords │ Configuration │ │
│ │ (Token, │ (codes, │ (atomic │ (5 SQL │ (YAML-based, │ │
│ │ Span, │ pos. │ counters│ dialects│ linter, format │ │
│ │ Loc.) │ track) │ lock- │ 120+ │ security opts) │ │
│ │ 100% │ 91.9% │ free) │ kwds) │ 81.8% coverage │ │
│ │ coverage│ coverage│ 73.9% │ 100% │ │ │
│ └──────────┴──────────┴──────────┴──────────┴─────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
Package Structure
The codebase is organized into focused packages with clear responsibilities and high test coverage:
Core Packages (Foundation Layer)
-
pkg/models (100% coverage): Core data structures
- Token, TokenType with 120+ types and helper methods (IsKeyword, IsOperator, etc.)
- Span, Location for position tracking
- Whitespace types (space, newline, tab, comments)
- O(1) token type categorization
-
pkg/errors (91.9% coverage): Structured error handling
- Error codes (PARSE-001 through PARSE-010)
- Position tracking with line/column information
- Context preservation for debugging
- Integration with LSP diagnostics
-
pkg/metrics (73.9% coverage): Performance monitoring
- Atomic counters (lock-free, race-free)
- Pool hit rate tracking
- Query/token/byte counters
- Production observability integration
-
pkg/config (81.8% coverage): Configuration management
- YAML-based configuration (.gosqlx.yml)
- Format options (indent, keyword case, line length)
- Linter rule configuration
- Security scanning options
- Multi-dialect settings
SQL Processing (Core Engine)
-
pkg/sql/tokenizer (75.3% coverage): Zero-copy SQL lexer
- 120+ token types with range-based categorization
- Multi-dialect keyword recognition (PostgreSQL, MySQL, SQL Server, Oracle, SQLite)
- Unicode-aware identifier processing
- Position tracking (line, column, byte offset)
- Object pooling for performance
-
pkg/sql/parser (76.1% coverage): Recursive descent parser
- Modular architecture (9 files: parser.go, select.go, dml.go, cte.go, expressions.go, window.go, grouping.go, alter.go, ddl.go)
- 14 SQL statement types
- PostgreSQL-specific features (LATERAL, DISTINCT ON, FILTER, JSON operators)
- Window functions (OVER, PARTITION BY, frame clauses)
- CTEs and set operations (UNION, EXCEPT, INTERSECT)
- GROUPING SETS, ROLLUP, CUBE support
- Max recursion depth protection (100 levels)
-
pkg/sql/ast (80.3% coverage): Abstract Syntax Tree
- 14 statement types, 20+ expression types
- Visitor pattern for tree traversal
- Object pooling for nodes
- Immutable design for concurrent access
-
pkg/sql/token (68.8% coverage): Token type definitions
- Internal token representation for parser
- Token conversion layer (models.Token → token.Token)
-
pkg/sql/keywords (100% coverage): SQL keyword categorization
- 5 SQL dialect support
- 120+ keywords organized by category
- O(1) keyword lookup via maps
- Reserved/non-reserved classification
-
pkg/sql/security (90.2% coverage): SQL injection detection
- 8 pattern types (tautologies, comment bypasses, UNION-based, stacked queries, time-based, out-of-band, dangerous functions, boolean-based)
- 4 severity levels (CRITICAL, HIGH, MEDIUM, LOW)
- Pre-compiled regex patterns (sync.Once initialization)
- AST-based analysis
- Integration with CLI analyze command
-
pkg/sql/monitor (98.6% coverage): Query monitoring
- Query pattern tracking
- Duration metrics
- Error rate calculation
- Production performance analysis
Tools & Integration (Application Layer)
-
pkg/gosqlx (65.6% coverage): Main API surface
- High-level API for parsing, validation, formatting
- Convenience wrappers around core components
- Error handling and recovery
-
pkg/lsp (70.2% coverage): Language Server Protocol implementation
- Architecture: Server → Handler → Documents
- server.go: JSON-RPC 2.0 message handling over stdio
- handler.go: Request/notification dispatcher (8 methods: initialize, hover, completion, formatting, documentSymbol, signatureHelp, codeAction, shutdown)
- documents.go: Document manager with incremental sync
- protocol.go: LSP type definitions (requests, responses, diagnostics)
- Features: Rate limiting (100 req/sec), content limits (10MB messages, 5MB documents), UTF-8 safe position handling
- Integration: Used by VSCode extension and other LSP clients
-
pkg/mcp: MCP server — Config, BearerAuthMiddleware, 7 tool handlers, Server (thin adapter over pkg/gosqlx, pkg/linter, pkg/sql/security)
-
pkg/linter (96.7% coverage): SQL linting and style checking
- Architecture: Linter → Rules → Context
- linter.go: Linting engine with file/directory support
- rule.go: Rule interface (Check, Fix, CanAutoFix methods)
- context.go: Linting context with tokens, AST, and SQL content
- 10 Built-in Rules (L001-L010):
- Whitespace (5 rules): L001 (trailing), L002 (mixed indentation), L003 (long lines), L004 (consecutive blank lines), L005 (redundant whitespace)
- Keywords (1 rule): L006 (keyword case consistency)
- Style (4 rules): L007 (comma placement), L008 (aliasing consistency), L009 (column alignment), L010 (indentation depth)
- Auto-fix Support: 7/10 rules support automatic fixes
- Integration: Used by CLI lint command and LSP code actions
-
pkg/compatibility: Compatibility layer for API evolution
Command-Line Interface
- cmd/gosqlx: Production-ready CLI tool
- Commands: validate, format, analyze, parse, lsp, lint, config
- Multi-file and directory support
- Exit codes for CI/CD integration
- Progress indicators and colored output
- Configuration file support (.gosqlx.yml)
Supported SQL Statements (14 types)
DML (6): SELECT, INSERT, UPDATE, DELETE, MERGE, TRUNCATE
DDL (8): CREATE TABLE, CREATE VIEW, CREATE MATERIALIZED VIEW, CREATE INDEX, ALTER TABLE, ALTER (generic), DROP, REFRESH MATERIALIZED VIEW
Query Composition (2): WITH (CTEs), Set Operations (UNION/EXCEPT/INTERSECT)
Token Type System (ARCH-002)
Design Decision: v1.6.0 introduces a comprehensive token type system with 120+ distinct token types for O(1) categorization and 14x performance improvement over string comparisons.
Token Type Architecture
// TokenType represents the type of a SQL token
type TokenType int
// Range-based categorization for O(1) type checking
const (
// Token ranges (non-overlapping for fast dispatch)
TokenRangeBasicStart TokenType = 10 // Basic tokens
TokenRangeBasicEnd TokenType = 30
TokenRangeStringStart TokenType = 30 // String literals
TokenRangeStringEnd TokenType = 50
TokenRangeOperatorStart TokenType = 50 // Operators
TokenRangeOperatorEnd TokenType = 150
TokenRangeKeywordStart TokenType = 200 // SQL keywords
TokenRangeKeywordEnd TokenType = 500
TokenRangeDataTypeStart TokenType = 430 // Data types
TokenRangeDataTypeEnd TokenType = 450
)
Token Categories (120+ types)
-
Special Tokens (2)
- EOF, Unknown
-
Basic Tokens (6)
- Word, Number, Char, Whitespace, Identifier, Placeholder
-
String Literals (10)
- SingleQuotedString, DoubleQuotedString, TripleSingleQuoted, TripleDoubleQuoted
- DollarQuotedString, ByteString, NationalString, EscapedString, UnicodeString, HexString
-
Operators (75)
- Arithmetic: Plus, Minus, Mul, Div, Mod, DuckIntDiv
- Comparison: Eq, DoubleEq, Neq, Lt, Gt, LtEq, GtEq, Spaceship
- Logical: And, Or, Not
- Bitwise: Ampersand, Pipe, Caret, ShiftLeft, ShiftRight
- JSON/JSONB (PostgreSQL): Arrow (->), LongArrow (->>), HashArrow (#>), HashLongArrow (#>>), AtArrow (@>), ArrowAt (<@), QuestionPipe (?|), QuestionAnd (?&)
- String: StringConcat (||)
- Punctuation: Comma, Period, Colon, DoubleColon, Semicolon, LParen, RParen, LBracket, RBracket, LBrace, RBrace
-
SQL Keywords (100+)
- DML: SELECT, INSERT, UPDATE, DELETE, FROM, WHERE, JOIN, GROUP BY, ORDER BY
- DDL: CREATE, ALTER, DROP, TABLE, INDEX, VIEW, COLUMN, DATABASE
- CTE/Set Ops: WITH, RECURSIVE, UNION, EXCEPT, INTERSECT, ALL
- Window Functions: OVER, PARTITION, ROWS, RANGE, UNBOUNDED, PRECEDING, FOLLOWING, CURRENT, FILTER
- Joins: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, LATERAL, USING
- Constraints: PRIMARY, KEY, FOREIGN, REFERENCES, UNIQUE, CHECK, DEFAULT
- Aggregates: COUNT, SUM, AVG, MIN, MAX
- Data Types: INT, VARCHAR, TEXT, TIMESTAMP, BOOLEAN, JSON, JSONB
Helper Methods (O(1) categorization)
// Fast token type classification (14x faster than string comparisons)
func (t TokenType) IsKeyword() bool {
return t >= TokenRangeKeywordStart && t < TokenRangeKeywordEnd
}
func (t TokenType) IsOperator() bool {
return t >= TokenRangeOperatorStart && t < TokenRangeOperatorEnd
}
func (t TokenType) IsLiteral() bool {
return (t >= TokenRangeStringStart && t < TokenRangeStringEnd) ||
t == TokenTypeNumber
}
func (t TokenType) IsDMLKeyword() bool {
return t == TokenTypeSelect || t == TokenTypeInsert ||
t == TokenTypeUpdate || t == TokenTypeDelete
}
func (t TokenType) IsDDLKeyword() bool {
return t == TokenTypeCreate || t == TokenTypeAlter || t == TokenTypeDrop
}
func (t TokenType) IsJoinKeyword() bool {
return t >= TokenTypeJoin && t <= TokenTypeUsing
}
func (t TokenType) IsWindowKeyword() bool {
return t >= TokenTypeOver && t <= TokenTypeExclude
}
func (t TokenType) IsAggregateFunction() bool {
return t >= TokenTypeCount && t <= TokenTypeMax
}
func (t TokenType) IsDataType() bool {
return t >= TokenRangeDataTypeStart && t < TokenRangeDataTypeEnd
}
func (t TokenType) IsConstraint() bool {
return t >= TokenTypePrimary && t <= TokenTypeNullable
}
func (t TokenType) IsSetOperation() bool {
return t == TokenTypeUnion || t == TokenTypeExcept || t == TokenTypeIntersect
}
Performance Benefits
- 14x Faster Type Checking: Integer comparisons vs string matching
- O(1) Categorization: Range checks for all categories
- Jump Table Optimization: Compiler optimizes switch statements on integers
- Cache Friendly: Integer comparisons have better cache locality
- Type Safety: Compile-time type checking prevents errors
PostgreSQL Extension Tokens (v1.6.0)
// JSON/JSONB operators
TokenTypeArrow // -> (field access, returns JSON)
TokenTypeLongArrow // ->> (field access, returns text)
TokenTypeHashArrow // #> (path access, returns JSON)
TokenTypeHashLongArrow // #>> (path access, returns text)
TokenTypeAtArrow // @> (contains)
TokenTypeArrowAt // <@ (contained by)
TokenTypeHashMinus // #- (delete at path)
TokenTypeQuestionPipe // ?| (key exists any)
TokenTypeQuestionAnd // ?& (key exists all)
// Keywords
TokenTypeLateral // LATERAL (correlated subquery in FROM)
TokenTypeFilter // FILTER (conditional aggregation)
TokenTypeDistinct // DISTINCT (with ON support)
Component Architecture
Tokenizer Component
The tokenizer is responsible for lexical analysis, converting raw SQL bytes into tokens.
┌──────────────────────────────────────┐
│ Tokenizer │
├──────────────────────────────────────┤
│ ┌──────────────────────────────┐ │
│ │ Input Buffer ([]byte) │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Scanner State Machine │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Token Recognition │ │
│ │ - Keywords │ │
│ │ - Identifiers │ │
│ │ - Literals │ │
│ │ - Operators │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Position Tracking │ │
│ │ - Line/Column │ │
│ │ - Byte Offset │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Token Stream Output │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────┘
Key Components:
-
Scanner State Machine
- Handles character-by-character processing
- Manages state transitions
- Optimized fast paths for common tokens
-
Token Recognition Engine
- Map-based keyword lookup (O(1))
- Unicode-aware identifier processing
- Multi-dialect operator support
-
Position Tracker
- Maintains current position (line, column, byte offset)
- Handles newline detection
- Provides error location information
Parser Component
The parser builds Abstract Syntax Trees from token streams.
┌──────────────────────────────────────┐
│ Parser │
├──────────────────────────────────────┤
│ ┌──────────────────────────────┐ │
│ │ Token Stream Input │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Recursive Descent Parser │ │
│ │ - Statement Parser │ │
│ │ - Expression Parser │ │
│ │ - Clause Parser │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ AST Node Construction │ │
│ │ - Node Factory │ │
│ │ - Type Validation │ │
│ └──────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ AST Output │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────┘
Parser Modular Architecture:
The parser is organized into focused modules for maintainability:
pkg/sql/parser/
├── parser.go # Core parser, entry points, and utilities
├── select.go # SELECT statement parsing
├── dml.go # INSERT, UPDATE, DELETE, MERGE parsing
├── cte.go # Common Table Expressions (WITH clause)
├── expressions.go # Expression parsing (BETWEEN, IN, LIKE, etc.)
├── window.go # Window functions (OVER, PARTITION BY)
├── grouping.go # GROUPING SETS, ROLLUP, CUBE
├── alter.go # ALTER TABLE statements
├── ddl.go # DDL statements (CREATE, DROP, REFRESH, TRUNCATE)
└── token_conversion.go # Internal token conversion (unexported)
Statement Parsing Methods:
The parser supports 14 SQL statement types via these entry points:
parseStatement() # Fast ModelType (int) dispatch with O(1) switching
├── parseWithStatement() # WITH (CTEs)
├── parseSelectWithSetOperations() # SELECT + UNION/EXCEPT/INTERSECT
├── parseInsertStatement() # INSERT (with RETURNING)
├── parseUpdateStatement() # UPDATE (with RETURNING)
├── parseDeleteStatement() # DELETE (with RETURNING)
├── parseMergeStatement() # MERGE
├── parseCreateStatement() # CREATE (TABLE, VIEW, MATERIALIZED VIEW, INDEX)
├── parseAlterTableStmt() # ALTER TABLE
├── parseDropStatement() # DROP
├── parseRefreshStatement() # REFRESH MATERIALIZED VIEW
└── parseTruncateStatement() # TRUNCATE
PostgreSQL-Specific Features (v1.6.0):
parseSelectExpression()
├── parseDistinctOnClause() # DISTINCT ON (col1, col2, ...)
└── parseLateralTableReference() # LATERAL (subquery)
parseFunctionCall()
└── parseAggregateOrderBy() # ORDER BY inside aggregates
├── STRING_AGG(expr, delim ORDER BY col)
├── ARRAY_AGG(expr ORDER BY col)
└── JSON_AGG(expr ORDER BY col)
parseExpression()
├── parseJSONOperator() # ->, ->>, #>, #>>, @>, <@, #-, ?, ?|, ?&
└── parseFilterClause() # FILTER (WHERE condition)
parseReturningClause() # RETURNING * | col1, col2 | expr AS alias
AST Component
The Abstract Syntax Tree provides structured representation of SQL statements.
Core Interfaces:
Node: Base interface for all AST nodes (TokenLiteral, Children methods)Statement: Extends Node for SQL statementsExpression: Extends Node for SQL expressions
Statement Types (14 total):
// DML Statements
SelectStatement // SELECT with JOINs, window functions, CTEs
InsertStatement // INSERT with multiple value sets
UpdateStatement // UPDATE with WHERE clause
DeleteStatement // DELETE with WHERE clause
MergeStatement // MERGE with MATCHED/NOT MATCHED clauses
TruncateStatement // TRUNCATE TABLE
// DDL Statements
CreateTableStatement // CREATE TABLE with columns, constraints
CreateViewStatement // CREATE VIEW
CreateMaterializedViewStatement // CREATE MATERIALIZED VIEW
CreateIndexStatement // CREATE INDEX
AlterTableStatement // ALTER TABLE (add/drop columns, constraints)
AlterStatement // Generic ALTER (roles, policies, etc.)
DropStatement // DROP (tables, views, indexes)
RefreshMaterializedViewStatement // REFRESH MATERIALIZED VIEW
// Query Composition
WithClause // WITH (CTEs) - can contain any statement
SetOperation // UNION/EXCEPT/INTERSECT
Expression Types:
// Basic Expressions
Identifier // Table/column names
Literal // String, number, boolean literals
BinaryExpression // a + b, a = b, etc.
UnaryExpression // NOT, -x, etc.
// Complex Expressions
FunctionCall // func(args) with optional OVER clause
WindowSpec // Window function specification
BetweenExpression // x BETWEEN a AND b
InExpression // x IN (values) or x IN (subquery)
LikeExpression // x LIKE pattern
IsNullExpression // x IS NULL / IS NOT NULL
CaseExpression // CASE WHEN ... THEN ... END
// Grouping
GroupingSet // Individual grouping set
RollupGrouping // ROLLUP(columns)
CubeGrouping // CUBE(columns)
Data Flow
End-to-End Processing Pipeline
SQL Text ([]byte)
│
▼
┌─────────────────────┐
│ Tokenizer │ ← tokenizer.GetTokenizer()
├─────────────────────┤
│ Lexical Analysis │
│ - Zero-copy ops │
│ - Position tracking │
└──────────┬──────────┘
│ []models.TokenWithSpan (with Location)
▼
┌─────────────────────┐
│ Token Converter │
├─────────────────────┤
│ Models → AST tokens │
│ Position mapping │
└──────────┬──────────┘
│ []token.Token + positions
▼
┌─────────────────────┐
│ Parser │ ← parser.GetParser()
├─────────────────────┤
│ Recursive Descent │
│ AST Construction │
└──────────┬──────────┘
│ *ast.AST
▼
┌─────────────────────┐
│ Application Use │
│ - Validation │
│ - Transformation │
│ - Code Generation │
└──────────┬──────────┘
│
▼
Return to Pools
(PutTokenizer, PutParser, ReleaseAST)
Token Flow Example
// 1. Input SQL
sql := "SELECT * FROM users"
// 2. Tokenization (pkg/sql/tokenizer)
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte(sql))
// Returns: []models.TokenWithSpan with position info
// [{Token: SELECT, Start: {Line:1, Col:1}, End: {Line:1, Col:6}}, ...]
// 3. Token conversion (pkg/sql/parser/token_conversion.go)
p := parser.NewParser()
defer p.Release()
// Returns: []token.Token for parser consumption
// 4. Parsing (pkg/sql/parser)
p := parser.GetParser()
defer parser.PutParser(p)
ast, err := p.ParseFromModelTokens(tokens)
// Returns: *ast.AST containing statements
// 5. Access parsed structure
selectStmt := ast.Statements[0].(*ast.SelectStatement)
// selectStmt.Columns contains parsed column expressions
// selectStmt.From contains table references
Memory Management
Object Pool Architecture
GoSQLX uses sync.Pool extensively for performance:
Pooled Components:
-
Tokenizer Pool (pkg/sql/tokenizer)
- Pre-allocated token buffers
- Reusable scanners
- Access:
tokenizer.GetTokenizer()/tokenizer.PutTokenizer()
-
Parser Pool (pkg/sql/parser)
- Parser instances with state
- Access:
parser.GetParser()/parser.PutParser()
-
AST Pool (pkg/sql/ast)
- AST container objects
- Access:
ast.NewAST()/ast.ReleaseAST()
Critical Usage Pattern:
// CORRECT - Always use defer for cleanup
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz) // MANDATORY
p := parser.GetParser()
defer parser.PutParser(p) // MANDATORY
astObj := ast.NewAST()
defer ast.ReleaseAST(astObj) // MANDATORY
// Use the objects...
Performance Benefits:
- 60-80% memory reduction vs non-pooled
- 95%+ pool hit rate in production
- Zero race conditions (validated via race detector)
Memory Optimization Strategies
- Zero-Copy Tokenization
// Tokens reference original input bytes
type Token struct {
Type TokenType
Value string // Points to original input
}
// No copying during tokenization
func (t *Tokenizer) readIdentifier() Token {
start := t.pos
// Scan without copying
for isIdentifierChar(t.peek()) {
t.advance()
}
// Reference original bytes
return Token{
Type: TokenTypeIdentifier,
Value: string(t.input[start:t.pos]), // Single allocation
}
}
- Buffer Reuse
type Tokenizer struct {
// Reusable buffers
buffer []byte
tokens []TokenWithSpan
}
func (t *Tokenizer) Reset() {
// Clear without deallocating
t.buffer = t.buffer[:0]
t.tokens = t.tokens[:0]
t.pos = 0
}
- Capacity Pre-allocation
func ProcessLargeQuery(sql string) {
estimatedTokens := len(sql) / 5 // Heuristic
tokens := make([]Token, 0, estimatedTokens)
// Avoids reallocation during append
}
Concurrency Model
Thread Safety
- Pool Operations: Thread-safe via sync.Pool
- Tokenizer/Parser Instances: Not thread-safe - use one per goroutine
- Tokens & AST: Immutable after creation - safe for concurrent reads
- Metrics: Atomic operations - safe for concurrent updates
Concurrent Usage Pattern
// Process multiple queries concurrently
func ProcessConcurrent(queries []string) []Result {
results := make([]Result, len(queries))
var wg sync.WaitGroup
for i, sql := range queries {
wg.Add(1)
go func(idx int, query string) {
defer wg.Done()
// Each goroutine gets its own instances from pool
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
p := parser.GetParser()
defer parser.PutParser(p)
tokens, _ := tkz.Tokenize([]byte(query))
p := parser.NewParser()
defer p.Release()
ast, _ := p.ParseFromModelTokens(tokens)
results[idx] = Result{AST: ast}
}(i, sql)
}
wg.Wait()
return results
}
Key Points:
- Lock-free design: Each goroutine uses its own pooled instances
- Zero race conditions: Validated with 20,000+ concurrent operations
- High throughput: 1.38M+ ops/sec sustained
Design Patterns
The codebase employs several design patterns for maintainability:
1. Object Pool Pattern
- Purpose: Reduce allocation overhead
- Implementation: sync.Pool for Tokenizer, Parser, AST
- Benefit: 60-80% memory reduction
2. Visitor Pattern
- Purpose: AST traversal and transformation
- Location:
pkg/sql/ast/visitor.go - Use Cases: Query analysis, optimization, code generation
3. Recursive Descent
- Purpose: Predictive parsing with lookahead
- Implementation: Parser methods call each other recursively
- Safety: Max recursion depth limit (100) to prevent stack overflow
4. Multi-Dialect Strategy
- Purpose: Support multiple SQL dialects
- Location:
pkg/sql/keywords/ - Dialects: PostgreSQL, MySQL, SQL Server, Oracle, SQLite
Performance Architecture
Optimization Techniques
-
Fast Path Token Recognition
- Single-character tokens: O(1) switch-case
- Keywords: O(1) map lookup
- Common patterns optimized first
-
Zero-Copy Operations
- Tokenizer works on byte slices directly
- Token values reference original input
- No intermediate string allocations
-
ModelType Fast Dispatch
- Parser uses integer token types for O(1) switching
- Jump table compilation for statement routing
- Avoids string comparisons in hot path
-
Pre-allocation Strategies
- Estimate buffer sizes based on input length
- Reuse slices with
slice[:0]pattern - Pool warming for common object sizes
Benchmarking
GoSQLX includes comprehensive benchmarks (6 benchmark files):
- Component-level: Tokenizer, Parser, AST operations
- Integration: Full pipeline end-to-end
- Memory profiling: Allocation tracking with
-benchmem - Concurrency: Race detection with
-raceflag
Run with: go test -bench=. -benchmem ./pkg/...
Performance Monitoring
GoSQLX includes built-in metrics collection for production observability.
Metrics Package (pkg/metrics)
The metrics package provides atomic counters for lock-free performance tracking:
// Available metrics
type Metrics struct {
QueriesParsed int64 // Total queries processed
TokensGenerated int64 // Total tokens created
BytesProcessed int64 // Total SQL bytes scanned
Errors int64 // Parsing errors encountered
PoolHits int64 // Pool reuse count
PoolMisses int64 // Pool allocations
}
// Usage
snapshot := metrics.GetSnapshot()
fmt.Printf("Queries: %d, Pool hit rate: %.2f%%\n",
snapshot.QueriesParsed,
float64(snapshot.PoolHits) / float64(snapshot.PoolHits + snapshot.PoolMisses) * 100)
Security Scanning (pkg/sql/security)
Built-in SQL injection detection with severity classification:
// Scan for security issues
issues := security.ScanQuery(sqlBytes)
for _, issue := range issues {
fmt.Printf("Security: %s - %s (severity: %s)\n",
issue.Type, issue.Description, issue.Severity)
}
Query Monitoring (pkg/sql/monitor)
Track query patterns and performance characteristics:
// Monitor query execution
monitor := monitor.New()
monitor.RecordQuery(sql, duration, err)
stats := monitor.GetStats()
fmt.Printf("Avg duration: %v, Error rate: %.2f%%\n",
stats.AvgDuration, stats.ErrorRate*100)
LSP Architecture
Language Server Protocol (LSP) Implementation (v1.6.0) provides real-time IDE integration for SQL editing.
LSP Component Design
┌─────────────────────────────────────────────────────────────┐
│ LSP Server (pkg/lsp) │
├─────────────────────────────────────────────────────────────┤
│ ┌────────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ Server │─────▶│ Handler │─────▶│ Documents │ │
│ │ (server.go)│ │(handler.go)│ │(documents.go)│ │
│ └────────────┘ └────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ JSON-RPC 2.0 Request/Notify Document │
│ over stdio Dispatcher Manager │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Protocol Types (protocol.go) │ │
│ │ - Request/Response │ │
│ │ - Diagnostics (errors with positions) │ │
│ │ - Completion (keywords, functions, snippets) │ │
│ │ - Hover (documentation) │ │
│ │ - Formatting (indent, keyword case) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Core SQL Components (Tokenizer/Parser) │
└─────────────────────────────────────────────────────────────┘
LSP Server Component (server.go)
Responsibilities:
- JSON-RPC 2.0 message handling over stdio
- Content-Length header parsing
- Rate limiting (100 requests/second)
- Content size validation (10MB max per message, 5MB max per document)
- Concurrent request handling with write mutex
- Graceful shutdown on exit/shutdown requests
Key Features:
type Server struct {
reader *bufio.Reader // stdio reader
writer io.Writer // stdio writer with mutex
writeMu sync.Mutex // Thread-safe write
documents *DocumentManager // Open document tracking
handler *Handler // Request dispatcher
logger *log.Logger // Optional debug logging
// Rate limiting (atomic counters)
requestCount int64
lastReset time.Time
rateMu sync.Mutex
}
Message Flow:
- Read Content-Length header from stdin
- Read JSON-RPC message body
- Unmarshal and validate JSON-RPC structure
- Check rate limits (100 req/sec window)
- Dispatch to handler based on method
- Send response with Content-Length header
- Log operations (if logger enabled)
LSP Handler Component (handler.go)
Responsibilities:
- Request/notification routing to appropriate handlers
- SQL parsing and validation
- Keyword documentation lookup
- Code completion generation
- SQL formatting
- Diagnostic generation from parser errors
Supported LSP Methods (8 requests + 4 notifications):
Requests (expect response):
initialize- Server capabilities negotiationshutdown- Graceful shutdown preparationtextDocument/hover- Keyword/function documentation (70+ keywords)textDocument/completion- Autocomplete (100+ keywords, 23 snippets)textDocument/formatting- SQL formatting with indent/case optionstextDocument/documentSymbol- Statement outline for navigationtextDocument/signatureHelp- Function signature hints (15+ functions)textDocument/codeAction- Quick fixes (add semicolon, uppercase keywords)
Notifications (no response):
initialized- Client initialization completeexit- Server shutdowntextDocument/didOpen- Document opened in editortextDocument/didChange- Document content changed (incremental sync)textDocument/didClose- Document closedtextDocument/didSave- Document saved
Completion Features:
- Keywords: 100+ SQL keywords (SELECT, FROM, WHERE, JOIN, etc.)
- Functions: Aggregate (COUNT, SUM), window (ROW_NUMBER, RANK), string (CONCAT, SUBSTRING)
- Snippets: 23 templates (SELECT statement, JOIN, CTE, window function, etc.)
- Caching: 575x faster with LRU cache (100 entry limit)
Hover Documentation:
- 70+ SQL keywords with descriptions
- Function signatures and usage examples
- PostgreSQL-specific features (LATERAL, DISTINCT ON, FILTER)
Document Manager Component (documents.go)
Responsibilities:
- Track open SQL documents by URI
- Incremental document synchronization
- UTF-8 safe position handling
- Line splitting and caching for fast lookups
Data Structures:
type Document struct {
URI string // file:// URI
LanguageID string // "sql" language identifier
Version int // Document version (increments on change)
Content string // Full SQL content
Lines []string // Cached line splits for position lookup
}
type DocumentManager struct {
mu sync.RWMutex // Thread-safe access
documents map[string]*Document // URI → Document mapping
}
Incremental Sync:
- Supports range-based edits (efficient for large files)
- Full document sync fallback
- Position → offset conversion for UTF-8 safety
- Line boundary handling (preserves newlines)
Thread Safety:
- RWMutex for concurrent reads
- Copy-on-Get prevents race conditions
- Safe for multi-threaded LSP server
LSP Performance Characteristics
- Rate Limiting: 100 requests/second (DoS protection)
- Content Limits: 10MB messages, 5MB documents (memory protection)
- Caching: 575x faster keyword suggestions with LRU cache
- Incremental Sync: Efficient updates for large SQL files
- UTF-8 Safe: Rune-based position handling (international support)
LSP Integration Example
// Start LSP server via CLI
$ gosqlx lsp
$ gosqlx lsp --log /tmp/lsp.log // With debug logging
// VSCode integration (.vscode/settings.json)
{
"gosqlx.lsp.enabled": true,
"gosqlx.format.indent": 2,
"gosqlx.format.uppercaseKeywords": true
}
Linter Architecture
SQL Linting Engine (v1.6.0) provides style checking and automatic fixing with 10 built-in rules.
Linter Component Design
┌─────────────────────────────────────────────────────────────┐
│ Linter (pkg/linter) │
├─────────────────────────────────────────────────────────────┤
│ ┌────────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ Linter │─────▶│ Rules │─────▶│ Context │ │
│ │(linter.go) │ │ (rule.go) │ │(context.go) │ │
│ └────────────┘ └────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ File/Dir/String Rule Interface SQL + Tokens │
│ Linting Engine (Check/Fix) + AST Context │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 10 Built-in Rules │ │
│ │ Whitespace (5): L001-L005 (trailing, mixed, long, │ │
│ │ blank lines, redundant) │ │
│ │ Keywords (1): L006 (case consistency) │ │
│ │ Style (4): L007-L010 (comma, aliasing, alignment, │ │
│ │ indentation depth) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Tokenizer → Parser → AST (best-effort parsing) │
└─────────────────────────────────────────────────────────────┘
Linter Engine (linter.go)
Responsibilities:
- File, directory, and string linting
- Rule orchestration and execution
- Violation aggregation and formatting
- Multi-file batch processing
API:
type Linter struct {
rules []Rule // Configured linting rules
}
// Create linter with specific rules
linter := linter.New(
whitespace.NewTrailingWhitespace(),
keywords.NewKeywordCase(true), // uppercase
style.NewCommaPlacement("trailing"),
)
// Lint operations
fileResult := linter.LintFile("query.sql")
dirResult := linter.LintDirectory("./sql", "*.sql")
stringResult := linter.LintString(sqlContent, "inline.sql")
Output:
type Result struct {
Files []FileResult // Per-file results
TotalFiles int // File count
TotalViolations int // Total violations across all files
}
type FileResult struct {
Filename string // File path
Violations []Violation // All violations found
Error error // Fatal error (if any)
}
Rule Interface (rule.go)
Design:
type Rule interface {
ID() string // L001, L002, etc.
Name() string // Human-readable name
Description() string // What the rule checks
Severity() Severity // error/warning/info
Check(ctx *Context) ([]Violation, error) // Find violations
CanAutoFix() bool // Supports auto-fix?
Fix(content string, v []Violation) (string, error) // Apply fixes
}
type Violation struct {
Rule string // Rule ID (e.g., "L001")
RuleName string // Human-readable rule name
Severity Severity // error/warning/info
Message string // Violation description
Location models.Location // Line/column position
Line string // Actual line content
Suggestion string // How to fix
CanAutoFix bool // Auto-fix available?
}
BaseRule Helper:
type BaseRule struct {
id, name, description string
severity Severity
canAutoFix bool
}
// Embed BaseRule to avoid boilerplate
type TrailingWhitespace struct {
BaseRule
}
Context (context.go)
Linting Context:
type Context struct {
SQL string // Raw SQL content
Filename string // Source filename
Lines []string // Line-by-line split
Tokens []models.TokenWithSpan // Tokenizer output (optional)
AST *ast.AST // Parsed AST (optional)
ParseErr error // Parser error (if any)
}
// Context builders
ctx := NewContext(sql, filename)
ctx.WithTokens(tokens) // Add token stream
ctx.WithAST(astObj, parseErr) // Add AST (best-effort)
Best-Effort Parsing:
- Tokenization always attempted
- Parsing attempted (failures don’t stop linting)
- Token-only rules work without AST
- AST-aware rules skip on parse failure
Built-in Rules (10 total)
Whitespace Rules (5)
L001: Trailing Whitespace (auto-fix)
- Detects spaces/tabs at end of lines
- Severity: warning
- Fix: Remove trailing whitespace
L002: Mixed Indentation (auto-fix)
- Detects mixed tabs and spaces
- Severity: error
- Fix: Convert tabs to spaces (or vice versa)
L003: Long Lines (info)
- Detects lines exceeding max length (default: 120)
- Severity: info
- No auto-fix (requires manual reflow)
L004: Consecutive Blank Lines (auto-fix)
- Detects 3+ consecutive blank lines
- Severity: warning
- Fix: Collapse to 2 blank lines max
L005: Redundant Whitespace (auto-fix)
- Detects multiple consecutive spaces
- Severity: warning
- Fix: Collapse to single space
Keyword Rules (1)
L006: Keyword Case Consistency (auto-fix)
- Detects inconsistent keyword casing
- Severity: warning
- Options: uppercase, lowercase, or consistent
- Fix: Normalize all keywords to chosen case
Style Rules (4)
L007: Comma Placement (auto-fix)
- Detects inconsistent comma placement
- Severity: warning
- Options: trailing (after item), leading (before item)
- Fix: Move commas to preferred position
L008: Aliasing Consistency (info)
- Detects inconsistent AS usage in aliases
- Severity: info
- Options: always-as, never-as, or consistent
- No auto-fix (semantic changes required)
L009: Column Alignment (info)
- Detects misaligned columns in SELECT list
- Severity: info
- No auto-fix (requires complex reformatting)
L010: Indentation Depth (info)
- Detects excessive nesting depth
- Severity: info
- Default max: 4 levels
- No auto-fix (requires query refactoring)
Linter Usage Example
# CLI usage
$ gosqlx lint query.sql
$ gosqlx lint --fix query.sql # Auto-fix violations
$ gosqlx lint --rules L001,L006 *.sql # Specific rules only
$ gosqlx lint --config .gosqlx.yml ./ # With config file
# Configuration (.gosqlx.yml)
linter:
rules:
L001: { enabled: true }
L006: { enabled: true, uppercase: true }
L007: { enabled: true, style: "trailing" }
Linter Performance
- Best-Effort Parsing: Token-only rules work without AST
- Object Pooling: Reuses tokenizer/parser instances
- Parallel File Processing: Multi-file linting uses goroutines
- Incremental Fixes: Auto-fix applies changes in single pass
Security Scanner Architecture
SQL Injection Detection (v1.6.0) provides comprehensive pattern-based and AST-based security analysis.
Security Scanner Design
┌─────────────────────────────────────────────────────────────┐
│ Security Scanner (pkg/sql/security) │
├─────────────────────────────────────────────────────────────┤
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Scanner (scanner.go) │ │
│ ├────────────────────────────────────────────────────────┤ │
│ │ Pattern Detection (8 types): │ │
│ │ 1. Tautologies (1=1, 'a'='a') │ │
│ │ 2. Comment Bypasses (--, /**/, #) │ │
│ │ 3. UNION-based (UNION SELECT, information_schema) │ │
│ │ 4. Stacked Queries (; DROP, ; DELETE) │ │
│ │ 5. Time-based (SLEEP, WAITFOR, pg_sleep) │ │
│ │ 6. Out-of-band (xp_cmdshell, LOAD_FILE) │ │
│ │ 7. Dangerous Functions (EXEC, sp_executesql) │ │
│ │ 8. Boolean-based (conditional logic exploitation) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Detection Methods (dual-layer): │ │
│ │ - Regex Pattern Matching (pre-compiled, sync.Once) │ │
│ │ - AST Analysis (structure-aware detection) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Severity Classification: │ │
│ │ CRITICAL: Definite injection (OR 1=1 --) │ │
│ │ HIGH: Likely injection (suspicious patterns) │ │
│ │ MEDIUM: Potentially unsafe (needs review) │ │
│ │ LOW: Informational findings │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ AST (for structure analysis) │
└─────────────────────────────────────────────────────────────┘
Scanner Component (scanner.go)
Responsibilities:
- Pattern-based injection detection (8 pattern types)
- AST-based structure analysis
- Severity classification (4 levels)
- Pre-compiled regex patterns (performance optimization)
API:
type Scanner struct {
minSeverity Severity // Filter findings by severity
}
// Create scanner
scanner := security.NewScanner()
scanner.SetMinSeverity(security.SeverityHigh) // Filter to HIGH+
// Scan AST
results := scanner.Scan(astObj)
for _, finding := range results.Findings {
fmt.Printf("[%s] %s at line %d: %s\n",
finding.Severity, finding.Pattern, finding.Line, finding.Description)
}
Output:
type ScanResult struct {
Findings []Finding // All detected issues
CriticalCount int // Count by severity
HighCount int
MediumCount int
LowCount int
ScannedAt time.Time // Scan timestamp
}
type Finding struct {
Pattern PatternType // Injection pattern type
Severity Severity // CRITICAL/HIGH/MEDIUM/LOW
Description string // Human-readable description
Line int // Source line number
Column int // Source column number
Evidence string // Matched SQL fragment
Suggestion string // How to fix
}
Pattern Detection (8 types)
1. Tautologies (CRITICAL)
- Always-true conditions:
1=1,'a'='a',1<2 - Detection: AST-based literal comparison
- Example:
SELECT * FROM users WHERE 1=1 OR username='admin'
2. Comment Bypasses (HIGH/MEDIUM)
- SQL comments to bypass validation:
--,/* */,# - Detection: Regex pattern matching
- Patterns: trailing comments, comment after quote, MySQL conditional comments
- Example:
SELECT * FROM users WHERE username='admin'--' AND password='x'
3. UNION-based Extraction (HIGH)
- UNION SELECT for data exfiltration
- Detection: Regex + AST analysis
- Patterns:
UNION SELECT,information_schemaaccess - Example:
SELECT * FROM users WHERE id=1 UNION SELECT password FROM admin_users
4. Stacked Queries (CRITICAL)
- Multiple statements (destructive operations)
- Detection: Regex for semicolon + dangerous keywords
- Patterns:
; DROP,; DELETE,; UPDATE,; EXEC - Example:
SELECT * FROM users WHERE id=1; DROP TABLE users--
5. Time-based Blind (HIGH)
- Timing attacks for blind injection
- Detection: Regex for sleep functions
- Functions:
SLEEP(),WAITFOR DELAY,pg_sleep(),BENCHMARK(),DBMS_LOCK.SLEEP() - Example:
SELECT * FROM users WHERE id=1 AND SLEEP(5)
6. Out-of-band (CRITICAL)
- OS command execution, file access
- Detection: Regex for dangerous functions
- Functions:
xp_cmdshell,LOAD_FILE(),INTO OUTFILE,UTL_HTTP,DBMS_LDAP - Example:
SELECT * FROM users WHERE id=1 AND xp_cmdshell('whoami')
7. Dangerous Functions (HIGH)
- Dynamic SQL execution risks
- Detection: Regex for exec functions
- Functions:
EXEC(),EXECUTE IMMEDIATE,sp_executesql,PREPARE ... FROM - Example:
EXEC('DROP TABLE ' + @tableName)
8. Boolean-based (MEDIUM)
- Conditional logic exploitation
- Detection: AST-based conditional analysis
- Complex boolean expressions with suspicious patterns
Pre-compiled Patterns (Performance)
Optimization Strategy:
// Package-level pattern compilation (sync.Once)
var (
compiledPatterns map[PatternType][]*regexp.Regexp
compiledPatternsOnce sync.Once
)
func initCompiledPatterns() {
compiledPatternsOnce.Do(func() {
compiledPatterns = make(map[PatternType][]*regexp.Regexp)
// Time-based patterns
compiledPatterns[PatternTimeBased] = []*regexp.Regexp{
regexp.MustCompile(`(?i)\bSLEEP\s*\(`),
regexp.MustCompile(`(?i)\bWAITFOR\s+DELAY\b`),
// ... more patterns
}
// ... other pattern types
})
}
Benefits:
- Patterns compiled once at package initialization
- Thread-safe via sync.Once
- Zero allocation per scan
- Regex engine optimizations applied once
Dual-Layer Detection
1. Regex Layer (Fast Path)
- Pre-compiled patterns
- Quick elimination of safe queries
- Low false positive rate
2. AST Layer (Deep Analysis)
- Structure-aware detection
- Context-sensitive analysis
- Accurate tautology detection
- Boolean expression analysis
Security Scanner Usage
# CLI usage
$ gosqlx analyze query.sql
$ gosqlx analyze --security-only query.sql
$ gosqlx analyze --min-severity HIGH query.sql
# Programmatic usage
scanner := security.NewScanner()
scanner.SetMinSeverity(security.SeverityHigh)
results := scanner.Scan(ast)
for _, finding := range results.Findings {
log.Printf("[%s] %s: %s", finding.Severity, finding.Pattern, finding.Description)
}
Integration with CLI
The security scanner integrates with the analyze command:
$ gosqlx analyze suspicious.sql
Security Findings:
[CRITICAL] Tautology at line 3: Always-true condition '1=1'
[HIGH] Comment Bypass at line 5: Trailing comment may indicate injection
Suggestions:
- Use parameterized queries instead of string concatenation
- Validate and sanitize all user inputs
- Implement prepared statements
Scalability Characteristics
The architecture supports high-throughput production workloads:
- Stateless Design: Enables horizontal scaling across multiple instances
- Lock-Free Operations: Each goroutine uses its own pooled instances
- Concurrent Safety: Zero race conditions (validated with race detector)
- Memory Efficiency: Object pooling reduces GC pressure (60-80% reduction)
- Performance: 1.38M+ operations/sec sustained, 1.5M peak throughput
- LSP Rate Limiting: 100 req/sec prevents DoS attacks
- Atomic Metrics: Lock-free counters for production observability
Production Validation
This architecture has been validated for production use with comprehensive testing:
- Concurrency: 20,000+ concurrent operations (race detection)
- Real-world SQL: 115+ queries from production databases
- Unicode Support: 8 international languages (full UTF-8 compliance)
- Load Testing: Extended runs with stable memory profiles
- LSP Stress: 1000+ requests/min sustained (rate limited to 100/sec)
- Security: 50+ injection patterns tested across 8 attack categories
MCP Architecture
The MCP server (pkg/mcp/) is a thin HTTP adapter with no business logic — every tool handler delegates to existing GoSQLX packages.
Component Overview
cmd/gosqlx-mcp/main.go
└─ LoadConfig() + New() + Start()
└─ pkg/mcp.Server
├─ BearerAuthMiddleware (optional, wraps next http.Handler)
└─ StreamableHTTPServer (mark3labs/mcp-go)
└─ 7 registered tool handlers
├─ validate_sql → pkg/gosqlx.Validate / ParseWithDialect
├─ format_sql → pkg/gosqlx.Format
├─ parse_sql → pkg/gosqlx.Parse
├─ extract_metadata→ pkg/gosqlx.Parse + ExtractMetadata
├─ security_scan → pkg/sql/security.NewScanner().ScanSQL
├─ lint_sql → pkg/linter.New(10 rules).LintString
└─ analyze_sql → concurrent fan-out of all 6 above
Tool → Package Mapping
| MCP Tool | Package | Key Functions |
|---|---|---|
validate_sql | pkg/gosqlx | Validate(), ParseWithDialect() |
format_sql | pkg/gosqlx | Format(), FormatOptions{} |
parse_sql | pkg/gosqlx | Parse() |
extract_metadata | pkg/gosqlx | Parse(), ExtractMetadata() |
security_scan | pkg/sql/security | NewScanner(), ScanSQL() |
lint_sql | pkg/linter | New(rules...), LintString() |
analyze_sql | all above | sync.WaitGroup concurrent fan-out |
Concurrency
analyze_sql launches 6 goroutines via sync.WaitGroup, one per tool, collecting results through a buffered channel. Partial failures are surfaced under an "errors" key; successful results are always returned.