rspamd

Commit Graph

Author	SHA1	Message	Date
Vsevolod Stakhov	8c59bd7c2f	[Minor] Move common stuff to a separate function	3 months ago
Vsevolod Stakhov	f60a55f6c5	[Minor] Don't use coroutines	3 months ago
Vsevolod Stakhov	60e1b843b6	Neural module rework: provider-based feature fusion, LLM embeddings, normalization, and v3 schema This PR evolves the neural module from a symbols-only scorer into a general feature-fusion classifier with pluggable providers. It adds an LLM embedding provider, introduces trained normalization and metadata persistence, and isolates new models via a schema/prefix bump. - The existing neural module is limited to metatokens and symbols. - We want to combine multiple feature sources (LLM embeddings now; Bayes/FastText later). - Ensure consistent train/infer behavior with stored normalization and provider metadata. - Improve operability with caching, digest checks, and safer rollouts. - Provider architecture - Provider registry and fusion: `collect_features(task, rule)` concatenates provider vectors with optional weights. - New LLM provider: `lualib/plugins/neural/providers/llm.lua` using `rspamd_http` and `lua_cache` for Redis-backed embedding caching. - Symbols provider extracted: `lualib/plugins/neural/providers/symbols.lua`. - Normalization and PCA - Configurable fusion normalization: none/unit/zscore. - Trained normalization stats computed during training and applied at inference. - Existing global PCA preserved; loaded/saved alongside ANN. - Schema and compatibility - `plugin_ver` bumped to '3' to isolate from earlier profiles. - Redis save/load extended: - Profiles include `providers_digest`. - ANN hash can include `providers_meta`, `norm_stats`, `pca`, `roc_thresholds`, `ann`. - ANN load validates provider digest and skips apply on mismatch. - Performance and reliability - LLM embeddings cached in Redis (content+model keyed). - Graceful fallback to symbols if providers not configured or fail. - Basic provider configuration validation. - `lualib/plugins/neural.lua`: provider registry, fusion, normalization helpers, profile digests, training pipeline updates. - `src/plugins/lua/neural.lua`: integrates fusion into inference/learning, loads new metadata, applies normalization, validates digest. - `lualib/plugins/neural/providers/llm.lua`: LLM embeddings with Redis cache. - `lualib/plugins/neural/providers/symbols.lua`: legacy symbols provider wrapper. - `lualib/redis_scripts/neural_save_unlock.lua`: stores `providers_meta` and `norm_stats` in ANN hash. - `NEURAL_REWORK_PLAN.md`: design and phased TODO. - Enable LLM alongside symbols: ```ucl neural { rules { default { providers = [ { type = "symbols"; weight = 0.5; }, { type = "llm"; model = "text-embed-1"; url = "https://api.openai.com/v1/embeddings"; cache_ttl = 86400; weight = 1.0; } ]; fusion { normalization = "zscore"; } roc_enabled = true; max_inputs = 256; # optional PCA } } } ``` - LLM provider uses `gpt` block for defaults if present (e.g., API key). You can override `model`, `url`, `timeout`, and cache parameters per provider entry. - Existing (v2) neural profiles remain unaffected (new `plugin_ver = '3'` prefixes). - New profiles embed `providers_digest`; incompatible provider sets won’t be applied. - No immediate cleanup required; TTL-based cleanup keeps old keys around until expiry. - Validated: provider digest checks, ANN load/save roundtrip, normalization application at inference, LLM caching paths, symbols fallback. - Please test with/without LLM provider and with `fusion.normalization = none\|unit\|zscore`. - LLM latency/cost is mitigated by Redis caching; timeouts are configurable per provider. - Privacy: use trusted endpoints; no content leaves unless configured. - Failure behavior: missing/failed providers degrade to others; training/inference can proceed with partial features. - Rules without `providers` continue to use symbols-only behavior. - Existing command surface unchanged; future PR will introduce `rspamc learn_neural:*` and controller endpoints. - [x] Provider registry and fusion - [x] LLM provider with Redis caching - [x] Symbols provider split - [x] Normalization (unit/zscore) with trained stats - [x] Redis schema v3 additions and profile digest - [x] Inference uses trained normalization - [x] Basic provider validation and fallbacks - [x] Plan document - [ ] Per-provider budgets/metrics and circuit breaker for LLM - [ ] Expand providers: Bayes and FastText/subword vectors - [ ] Per-provider PCA and learned fusion - [ ] New CLI (`rspamc learn_neural`) and status/invalidate endpoints - [ ] Documentation expansion under `docs/modules/neural.md`	3 months ago
Vsevolod Stakhov	72b9261537	Merge pull request #5572 from hunter-nl/master Update GPT plugin to support OpenAI GPT-5 and other newer models	3 months ago
René Draaisma	1ee8c119d9	Merge branch 'master' of https://github.com/hunter-nl/rspamd	3 months ago
René Draaisma	c80d6b3cfd	Updated gpt.lua to set default gpt-5-mini as model, fix issue when GPT max_completion_tokens exceeded and returned empty reason field, Set default group to GPT for Symbols, group is now also configurable in settings with extra_symbols, fix issue when no score is defined in settings at extra_symbols, default score is now 0	3 months ago
hunter-nl	632cf52764	Merge branch 'master' into master	3 months ago
René Draaisma	89e55435bf	Updated gpt.lua to provide model parameters with the settings	3 months ago
hunter-nl	19679a3664	Update gpt.lua to make use of lua_util.deepcopy function.	3 months ago
Vsevolod Stakhov	65b52ce843	[Fix] Bayes: Try to be bug-to-bug compatible	3 months ago
Vsevolod Stakhov	873e89e0cb	Merge pull request #5574 from moisseev/e2e-playwright [Test] Display browser version in HTML report and console	3 months ago
Vsevolod Stakhov	2f9bb6fe9c	Merge pull request #5570 from fatalbanana/be_kind_to_comcast [Minor] Drop overzealous regex from hfilter	3 months ago
Vsevolod Stakhov	07126f5ea2	Merge branch 'master' into be_kind_to_comcast	3 months ago
Alexander Moisseev	100d74013a	[Minor] Enable multimap module in WebUI E2E workflow	3 months ago
Alexander Moisseev	cc0dd23046	[Test] Display browser version in HTML report and console	3 months ago
hunter-nl	d795beb024	Update gpt.lua to fix spaces on empty lines To fix luacheck "line contains only whitespace"	3 months ago
hunter-nl	588e74931c	Update gpt.lua to get fresh body for each model iteration	3 months ago
hunter-nl	a05810040f	Update gpt.lua to remove unneeded body_base.model Not needed in the body_base structure.	3 months ago
hunter-nl	ba7df736e4	Update gpt.lua to handle OpenAI parallel old and new models When in rspamd_config is specified multiple models (old/new), this is handled now correctly to set the required attributes for each model request.	3 months ago
hunter-nl	5efcf514b8	Update gpt.lua to support newer models without temperature attribute Newer models do not support temperature attribute anymore.	3 months ago
hunter-nl	d09b5d24fd	Update gpt.lua to support newer models with max_completion_tokens Newer and reasoning models requires max_completion_tokens instead of max_tokens attribute.	3 months ago
Andrew Lewis	b4e72dd243	[Minor] Drop overzealous regex from hfilter	3 months ago
Vsevolod Stakhov	03c75e1c47	Merge pull request #5569 from moisseev/e2e-playwright [Test] Add WebUI E2E workflow with Playwright	3 months ago
Alexander Moisseev	22046fed3f	[Test] Add WebUI E2E workflow with Playwright Add a GitHub Actions workflow to run WebUI E2E tests with Playwright on legacy and latest browser versions against rspamd binaries built in the pipeline.	3 months ago
Vsevolod Stakhov	b6a3d5c9a6	[Minor] Add specific calculations for binary classification case	3 months ago
Vsevolod Stakhov	4591b921f4	Merge pull request #5566 from fatalbanana/el10rpm [Minor] Build on EL10	3 months ago
Andrew Lewis	ac4c6ec421	[Minor] Use clang for build on EL10	3 months ago
Andrew Lewis	b57a57ec02	[Minor] Use embedded vectorscan on EL10	3 months ago
Andrew Lewis	558e5cfa86	[Minor] Fix implicit declaration	3 months ago
Vsevolod Stakhov	d0c2a24ddc	[Fix] Try to fix learned order	3 months ago
Vsevolod Stakhov	82f0d1eae7	Merge pull request #5562 from rspamd/vstakhov-proxy-compression [Fix] Fix end-to-end proxy compression	3 months ago
Vsevolod Stakhov	9a713d0607	[Fix] Fix double free in the client...	3 months ago
Vsevolod Stakhov	aa34dd8ad0	[Minor] Fix 'Compression' header logic	3 months ago
Vsevolod Stakhov	81417aeec8	[Minor] Some more logic fixes	3 months ago
Vsevolod Stakhov	dc23bd1b20	[Minor] More cleanups for compression stuff	3 months ago
Vsevolod Stakhov	c6c53357b8	[Fix] Fix end-to-end proxy compression Issue: #5561	3 months ago
Vsevolod Stakhov	d5fd71dfce	Merge pull request #5547 from rspamd/vstakhov-multi-class-bayes [Project] Multi-class classification	3 months ago
Vsevolod Stakhov	ff840f96a0	Merge pull request #5559 from rspamd/vstakhov-arc-fixes [Fix] Fix whitelist options in the arc module	3 months ago
Vsevolod Stakhov	3789ff947a	Merge pull request #5556 from rspamd/vstakhov-skip-hashes-fuzzy [Fix] Check skip_hashes for the returned hashes	3 months ago
Vsevolod Stakhov	338f9ca1f5	[Fix] Fix whitelist options in the arc module Issue: #5558	3 months ago
Vsevolod Stakhov	23ed80bf78	[Fix] Check skip_hashes for the returned hashes	3 months ago
Vsevolod Stakhov	af5e83da54	Merge pull request #5555 from heptalium/meissner-fix-dmarc-reports Use Redis write servers for write commands while generating DMARC reports	3 months ago
Jens Meißner	55990170a3	Use Redis write servers for write commands while generating DMARC reports.	3 months ago
Vsevolod Stakhov	a22fbdc1ae	[Fix] Use a more straightforward approach for learn cache	3 months ago
Vsevolod Stakhov	44ee3d8b0a	[Fix] Fix various corner cases and tests	3 months ago
Vsevolod Stakhov	496d57d63b	[Test] Add logic to match test id and logs id	3 months ago
Vsevolod Stakhov	533b9f676e	[Minor] Add --log-tag option for rspamc	3 months ago
Vsevolod Stakhov	e3e85f617f	[Minor] Fix single class fallback	3 months ago
Vsevolod Stakhov	e4a78fdab2	[Project] Apply changes to bayes_expiry plugin	3 months ago
Vsevolod Stakhov	6c6056c895	[Test] Some more adjustments to the tests	3 months ago

1 2 3 4 5 ...

22254 Commits (8c59bd7c2fa08b3445ce334a234955ecf65ad2fd) All Branches Search

22254 Commits (8c59bd7c2fa08b3445ce334a234955ecf65ad2fd)

All Branches