rspamd

Commit Graph

Author	SHA1	Message	Date
Vsevolod Stakhov	ecc587a3df	[Project] Add tests and fix stuff	2 months ago
Vsevolod Stakhov	346ec89803	[Feature] Allow selectors in regexp maps expressions	2 months ago
Vsevolod Stakhov	e8b90e0e10	[Minor] Update plugins that are using headers modifications	2 months ago
Vsevolod Stakhov	9848f16510	[Project] Implement backoff for upstreams revival	2 months ago
Vsevolod Stakhov	c132ade98b	[Project] Start to implement better revive strategy for upstreams	2 months ago
Vsevolod Stakhov	c0a16d53b0	[Minor] Optimise re-resolving for known IPs	2 months ago
Vsevolod Stakhov	3ffc665243	[Feature] Resolve DNS nameservers names using getaddrinfo	2 months ago
Vsevolod Stakhov	01a1032107	[Fix] Fix l= calculations again	3 months ago
Vsevolod Stakhov	2f52cdf2e1	Fix DKIM relaxed body canonicalization and optimize performance This PR addresses critical issues in DKIM relaxed body canonicalization and modernizes the codebase by replacing GLib types with standard C types. - RFC Compliance: Fixed incorrect canonicalization of lines containing only whitespace. Previously, such lines were not properly handled according to RFC 6376, which could lead to DKIM signature verification failures. - Memory Safety: Fixed incorrect pointer dereference in `rspamd_dkim_skip_empty_lines` that could cause undefined behavior. - Zero-copy Optimization: Reimplemented `rspamd_dkim_relaxed_body_step` to avoid unnecessary memory copies. The new implementation: - Processes input data directly without intermediate buffers - Reduces the number of `EVP_DigestUpdate` calls by processing larger chunks - Improves CPU cache efficiency - Results in significantly better performance for large email bodies - Replaced all GLib types with standard C equivalents: - `gsize` → `size_t` - `gssize` → `ssize_t` - `gboolean` → `bool` - `TRUE/FALSE` → `true/false` - And other GLib-specific types - Added necessary standard headers (`stdbool.h`, `stdint.h`, `limits.h`) - Added comprehensive debug logging for: - Chunk processing with size information - Empty line detection and skipping - Space collapsing operations Issue: #5590	3 months ago
Vsevolod Stakhov	69db7c992f	[Project] Add tests for LLM provider, fix various issues with metatokens	3 months ago
Vsevolod Stakhov	597ae4da82	[Project] Rework rspamc to allow training of different neural types	3 months ago
Vsevolod Stakhov	e6fdc3b42f	[Fix] GPT: Fix occasional damage	3 months ago
Vsevolod Stakhov	8c59bd7c2f	[Minor] Move common stuff to a separate function	3 months ago
Vsevolod Stakhov	f60a55f6c5	[Minor] Don't use coroutines	3 months ago
Vsevolod Stakhov	60e1b843b6	Neural module rework: provider-based feature fusion, LLM embeddings, normalization, and v3 schema This PR evolves the neural module from a symbols-only scorer into a general feature-fusion classifier with pluggable providers. It adds an LLM embedding provider, introduces trained normalization and metadata persistence, and isolates new models via a schema/prefix bump. - The existing neural module is limited to metatokens and symbols. - We want to combine multiple feature sources (LLM embeddings now; Bayes/FastText later). - Ensure consistent train/infer behavior with stored normalization and provider metadata. - Improve operability with caching, digest checks, and safer rollouts. - Provider architecture - Provider registry and fusion: `collect_features(task, rule)` concatenates provider vectors with optional weights. - New LLM provider: `lualib/plugins/neural/providers/llm.lua` using `rspamd_http` and `lua_cache` for Redis-backed embedding caching. - Symbols provider extracted: `lualib/plugins/neural/providers/symbols.lua`. - Normalization and PCA - Configurable fusion normalization: none/unit/zscore. - Trained normalization stats computed during training and applied at inference. - Existing global PCA preserved; loaded/saved alongside ANN. - Schema and compatibility - `plugin_ver` bumped to '3' to isolate from earlier profiles. - Redis save/load extended: - Profiles include `providers_digest`. - ANN hash can include `providers_meta`, `norm_stats`, `pca`, `roc_thresholds`, `ann`. - ANN load validates provider digest and skips apply on mismatch. - Performance and reliability - LLM embeddings cached in Redis (content+model keyed). - Graceful fallback to symbols if providers not configured or fail. - Basic provider configuration validation. - `lualib/plugins/neural.lua`: provider registry, fusion, normalization helpers, profile digests, training pipeline updates. - `src/plugins/lua/neural.lua`: integrates fusion into inference/learning, loads new metadata, applies normalization, validates digest. - `lualib/plugins/neural/providers/llm.lua`: LLM embeddings with Redis cache. - `lualib/plugins/neural/providers/symbols.lua`: legacy symbols provider wrapper. - `lualib/redis_scripts/neural_save_unlock.lua`: stores `providers_meta` and `norm_stats` in ANN hash. - `NEURAL_REWORK_PLAN.md`: design and phased TODO. - Enable LLM alongside symbols: ```ucl neural { rules { default { providers = [ { type = "symbols"; weight = 0.5; }, { type = "llm"; model = "text-embed-1"; url = "https://api.openai.com/v1/embeddings"; cache_ttl = 86400; weight = 1.0; } ]; fusion { normalization = "zscore"; } roc_enabled = true; max_inputs = 256; # optional PCA } } } ``` - LLM provider uses `gpt` block for defaults if present (e.g., API key). You can override `model`, `url`, `timeout`, and cache parameters per provider entry. - Existing (v2) neural profiles remain unaffected (new `plugin_ver = '3'` prefixes). - New profiles embed `providers_digest`; incompatible provider sets won’t be applied. - No immediate cleanup required; TTL-based cleanup keeps old keys around until expiry. - Validated: provider digest checks, ANN load/save roundtrip, normalization application at inference, LLM caching paths, symbols fallback. - Please test with/without LLM provider and with `fusion.normalization = none\|unit\|zscore`. - LLM latency/cost is mitigated by Redis caching; timeouts are configurable per provider. - Privacy: use trusted endpoints; no content leaves unless configured. - Failure behavior: missing/failed providers degrade to others; training/inference can proceed with partial features. - Rules without `providers` continue to use symbols-only behavior. - Existing command surface unchanged; future PR will introduce `rspamc learn_neural:*` and controller endpoints. - [x] Provider registry and fusion - [x] LLM provider with Redis caching - [x] Symbols provider split - [x] Normalization (unit/zscore) with trained stats - [x] Redis schema v3 additions and profile digest - [x] Inference uses trained normalization - [x] Basic provider validation and fallbacks - [x] Plan document - [ ] Per-provider budgets/metrics and circuit breaker for LLM - [ ] Expand providers: Bayes and FastText/subword vectors - [ ] Per-provider PCA and learned fusion - [ ] New CLI (`rspamc learn_neural`) and status/invalidate endpoints - [ ] Documentation expansion under `docs/modules/neural.md`	3 months ago
René Draaisma	c80d6b3cfd	Updated gpt.lua to set default gpt-5-mini as model, fix issue when GPT max_completion_tokens exceeded and returned empty reason field, Set default group to GPT for Symbols, group is now also configurable in settings with extra_symbols, fix issue when no score is defined in settings at extra_symbols, default score is now 0	3 months ago
René Draaisma	89e55435bf	Updated gpt.lua to provide model parameters with the settings	3 months ago
hunter-nl	19679a3664	Update gpt.lua to make use of lua_util.deepcopy function.	3 months ago
Vsevolod Stakhov	65b52ce843	[Fix] Bayes: Try to be bug-to-bug compatible	3 months ago
hunter-nl	d795beb024	Update gpt.lua to fix spaces on empty lines To fix luacheck "line contains only whitespace"	3 months ago
hunter-nl	588e74931c	Update gpt.lua to get fresh body for each model iteration	3 months ago
hunter-nl	a05810040f	Update gpt.lua to remove unneeded body_base.model Not needed in the body_base structure.	3 months ago
hunter-nl	ba7df736e4	Update gpt.lua to handle OpenAI parallel old and new models When in rspamd_config is specified multiple models (old/new), this is handled now correctly to set the required attributes for each model request.	3 months ago
hunter-nl	5efcf514b8	Update gpt.lua to support newer models without temperature attribute Newer models do not support temperature attribute anymore.	3 months ago
hunter-nl	d09b5d24fd	Update gpt.lua to support newer models with max_completion_tokens Newer and reasoning models requires max_completion_tokens instead of max_tokens attribute.	3 months ago
Andrew Lewis	b4e72dd243	[Minor] Drop overzealous regex from hfilter	3 months ago
Vsevolod Stakhov	b6a3d5c9a6	[Minor] Add specific calculations for binary classification case	3 months ago
Andrew Lewis	558e5cfa86	[Minor] Fix implicit declaration	3 months ago
Vsevolod Stakhov	d0c2a24ddc	[Fix] Try to fix learned order	3 months ago
Vsevolod Stakhov	9a713d0607	[Fix] Fix double free in the client...	3 months ago
Vsevolod Stakhov	aa34dd8ad0	[Minor] Fix 'Compression' header logic	3 months ago
Vsevolod Stakhov	81417aeec8	[Minor] Some more logic fixes	3 months ago
Vsevolod Stakhov	dc23bd1b20	[Minor] More cleanups for compression stuff	3 months ago
Vsevolod Stakhov	c6c53357b8	[Fix] Fix end-to-end proxy compression Issue: #5561	3 months ago
Vsevolod Stakhov	338f9ca1f5	[Fix] Fix whitelist options in the arc module Issue: #5558	4 months ago
Vsevolod Stakhov	23ed80bf78	[Fix] Check skip_hashes for the returned hashes	4 months ago
Vsevolod Stakhov	a22fbdc1ae	[Fix] Use a more straightforward approach for learn cache	4 months ago
Vsevolod Stakhov	44ee3d8b0a	[Fix] Fix various corner cases and tests	4 months ago
Vsevolod Stakhov	533b9f676e	[Minor] Add --log-tag option for rspamc	4 months ago
Vsevolod Stakhov	e3e85f617f	[Minor] Fix single class fallback	4 months ago
Vsevolod Stakhov	e4a78fdab2	[Project] Apply changes to bayes_expiry plugin	4 months ago
Vsevolod Stakhov	bfdd04e653	[Project] Fix unlearn stuff	4 months ago
Vsevolod Stakhov	7c2fd27461	[Project] Fix more calculation issues	4 months ago
Vsevolod Stakhov	e013a3bdc5	[Feature] Add some convenience options to rspamc	4 months ago
Vsevolod Stakhov	3428e63e3a	[Minor] Further adjustments	4 months ago
Vsevolod Stakhov	16985fecee	[Fix] Fix statfiles ordering	4 months ago
Vsevolod Stakhov	94e0e6a533	[Fix] Fix probabilities overflow	4 months ago
Vsevolod Stakhov	c0eca31a3a	[Minor] Fix stupid change to call Redis for each class	4 months ago
Vsevolod Stakhov	a9fbcf9d49	[Project] Fix various issues	4 months ago
Vsevolod Stakhov	e989261050	[Minor] Reduce debug level verbosity	4 months ago

1 2 3 4 5 ...

14421 Commits (f39430660d48ccea4f5b32cfa1cc7ac3900e12fd)