Vsevolod Stakhov
8c59bd7c2f
[Minor] Move common stuff to a separate function
3 months ago
Vsevolod Stakhov
f60a55f6c5
[Minor] Don't use coroutines
3 months ago
Vsevolod Stakhov
60e1b843b6
Neural module rework: provider-based feature fusion, LLM embeddings, normalization, and v3 schema
This PR evolves the neural module from a symbols-only scorer into a general feature-fusion classifier with pluggable providers. It adds an LLM embedding provider, introduces trained normalization and metadata persistence, and isolates new models via a schema/prefix bump.
- The existing neural module is limited to metatokens and symbols.
- We want to combine multiple feature sources (LLM embeddings now; Bayes/FastText later).
- Ensure consistent train/infer behavior with stored normalization and provider metadata.
- Improve operability with caching, digest checks, and safer rollouts.
- Provider architecture
- Provider registry and fusion: `collect_features(task, rule)` concatenates provider vectors with optional weights.
- New LLM provider: `lualib/plugins/neural/providers/llm.lua` using `rspamd_http` and `lua_cache` for Redis-backed embedding caching.
- Symbols provider extracted: `lualib/plugins/neural/providers/symbols.lua`.
- Normalization and PCA
- Configurable fusion normalization: none/unit/zscore.
- Trained normalization stats computed during training and applied at inference.
- Existing global PCA preserved; loaded/saved alongside ANN.
- Schema and compatibility
- `plugin_ver` bumped to '3' to isolate from earlier profiles.
- Redis save/load extended:
- Profiles include `providers_digest`.
- ANN hash can include `providers_meta`, `norm_stats`, `pca`, `roc_thresholds`, `ann`.
- ANN load validates provider digest and skips apply on mismatch.
- Performance and reliability
- LLM embeddings cached in Redis (content+model keyed).
- Graceful fallback to symbols if providers not configured or fail.
- Basic provider configuration validation.
- `lualib/plugins/neural.lua`: provider registry, fusion, normalization helpers, profile digests, training pipeline updates.
- `src/plugins/lua/neural.lua`: integrates fusion into inference/learning, loads new metadata, applies normalization, validates digest.
- `lualib/plugins/neural/providers/llm.lua`: LLM embeddings with Redis cache.
- `lualib/plugins/neural/providers/symbols.lua`: legacy symbols provider wrapper.
- `lualib/redis_scripts/neural_save_unlock.lua`: stores `providers_meta` and `norm_stats` in ANN hash.
- `NEURAL_REWORK_PLAN.md`: design and phased TODO.
- Enable LLM alongside symbols:
```ucl
neural {
rules {
default {
providers = [
{ type = "symbols"; weight = 0.5; },
{ type = "llm"; model = "text-embed-1"; url = "https://api.openai.com/v1/embeddings ";
cache_ttl = 86400; weight = 1.0; }
];
fusion { normalization = "zscore"; }
roc_enabled = true;
max_inputs = 256; # optional PCA
}
}
}
```
- LLM provider uses `gpt` block for defaults if present (e.g., API key). You can override `model`, `url`, `timeout`, and cache parameters per provider entry.
- Existing (v2) neural profiles remain unaffected (new `plugin_ver = '3'` prefixes).
- New profiles embed `providers_digest`; incompatible provider sets won’t be applied.
- No immediate cleanup required; TTL-based cleanup keeps old keys around until expiry.
- Validated: provider digest checks, ANN load/save roundtrip, normalization application at inference, LLM caching paths, symbols fallback.
- Please test with/without LLM provider and with `fusion.normalization = none|unit|zscore`.
- LLM latency/cost is mitigated by Redis caching; timeouts are configurable per provider.
- Privacy: use trusted endpoints; no content leaves unless configured.
- Failure behavior: missing/failed providers degrade to others; training/inference can proceed with partial features.
- Rules without `providers` continue to use symbols-only behavior.
- Existing command surface unchanged; future PR will introduce `rspamc learn_neural:*` and controller endpoints.
- [x] Provider registry and fusion
- [x] LLM provider with Redis caching
- [x] Symbols provider split
- [x] Normalization (unit/zscore) with trained stats
- [x] Redis schema v3 additions and profile digest
- [x] Inference uses trained normalization
- [x] Basic provider validation and fallbacks
- [x] Plan document
- [ ] Per-provider budgets/metrics and circuit breaker for LLM
- [ ] Expand providers: Bayes and FastText/subword vectors
- [ ] Per-provider PCA and learned fusion
- [ ] New CLI (`rspamc learn_neural`) and status/invalidate endpoints
- [ ] Documentation expansion under `docs/modules/neural.md`
3 months ago
Vsevolod Stakhov
72b9261537
Merge pull request #5572 from hunter-nl/master
Update GPT plugin to support OpenAI GPT-5 and other newer models
3 months ago
René Draaisma
1ee8c119d9
Merge branch 'master' of https://github.com/hunter-nl/rspamd
3 months ago
René Draaisma
c80d6b3cfd
Updated gpt.lua to set default gpt-5-mini as model, fix issue when GPT max_completion_tokens exceeded and returned empty reason field, Set default group to GPT for Symbols, group is now also configurable in settings with extra_symbols, fix issue when no score is defined in settings at extra_symbols, default score is now 0
3 months ago
hunter-nl
632cf52764
Merge branch 'master' into master
3 months ago
René Draaisma
89e55435bf
Updated gpt.lua to provide model parameters with the settings
3 months ago
hunter-nl
19679a3664
Update gpt.lua to make use of lua_util.deepcopy function.
3 months ago
Vsevolod Stakhov
65b52ce843
[Fix] Bayes: Try to be bug-to-bug compatible
3 months ago
Vsevolod Stakhov
873e89e0cb
Merge pull request #5574 from moisseev/e2e-playwright
[Test] Display browser version in HTML report and console
3 months ago
Vsevolod Stakhov
2f9bb6fe9c
Merge pull request #5570 from fatalbanana/be_kind_to_comcast
[Minor] Drop overzealous regex from hfilter
3 months ago
Vsevolod Stakhov
07126f5ea2
Merge branch 'master' into be_kind_to_comcast
3 months ago
Alexander Moisseev
100d74013a
[Minor] Enable multimap module in WebUI E2E workflow
3 months ago
Alexander Moisseev
cc0dd23046
[Test] Display browser version in HTML report and console
3 months ago
hunter-nl
d795beb024
Update gpt.lua to fix spaces on empty lines
To fix luacheck "line contains only whitespace"
3 months ago
hunter-nl
588e74931c
Update gpt.lua to get fresh body for each model iteration
3 months ago
hunter-nl
a05810040f
Update gpt.lua to remove unneeded body_base.model
Not needed in the body_base structure.
3 months ago
hunter-nl
ba7df736e4
Update gpt.lua to handle OpenAI parallel old and new models
When in rspamd_config is specified multiple models (old/new), this is handled now correctly to set the required attributes for each model request.
3 months ago
hunter-nl
5efcf514b8
Update gpt.lua to support newer models without temperature attribute
Newer models do not support temperature attribute anymore.
3 months ago
hunter-nl
d09b5d24fd
Update gpt.lua to support newer models with max_completion_tokens
Newer and reasoning models requires max_completion_tokens instead of max_tokens attribute.
3 months ago
Andrew Lewis
b4e72dd243
[Minor] Drop overzealous regex from hfilter
3 months ago
Vsevolod Stakhov
03c75e1c47
Merge pull request #5569 from moisseev/e2e-playwright
[Test] Add WebUI E2E workflow with Playwright
3 months ago
Alexander Moisseev
22046fed3f
[Test] Add WebUI E2E workflow with Playwright
Add a GitHub Actions workflow to run WebUI E2E tests
with Playwright on legacy and latest browser versions
against rspamd binaries built in the pipeline.
3 months ago
Vsevolod Stakhov
b6a3d5c9a6
[Minor] Add specific calculations for binary classification case
3 months ago
Vsevolod Stakhov
4591b921f4
Merge pull request #5566 from fatalbanana/el10rpm
[Minor] Build on EL10
3 months ago
Andrew Lewis
ac4c6ec421
[Minor] Use clang for build on EL10
3 months ago
Andrew Lewis
b57a57ec02
[Minor] Use embedded vectorscan on EL10
3 months ago
Andrew Lewis
558e5cfa86
[Minor] Fix implicit declaration
3 months ago
Vsevolod Stakhov
d0c2a24ddc
[Fix] Try to fix learned order
3 months ago
Vsevolod Stakhov
82f0d1eae7
Merge pull request #5562 from rspamd/vstakhov-proxy-compression
[Fix] Fix end-to-end proxy compression
3 months ago
Vsevolod Stakhov
9a713d0607
[Fix] Fix double free in the client...
3 months ago
Vsevolod Stakhov
aa34dd8ad0
[Minor] Fix 'Compression' header logic
3 months ago
Vsevolod Stakhov
81417aeec8
[Minor] Some more logic fixes
3 months ago
Vsevolod Stakhov
dc23bd1b20
[Minor] More cleanups for compression stuff
3 months ago
Vsevolod Stakhov
c6c53357b8
[Fix] Fix end-to-end proxy compression
Issue: #5561
3 months ago
Vsevolod Stakhov
d5fd71dfce
Merge pull request #5547 from rspamd/vstakhov-multi-class-bayes
[Project] Multi-class classification
3 months ago
Vsevolod Stakhov
ff840f96a0
Merge pull request #5559 from rspamd/vstakhov-arc-fixes
[Fix] Fix whitelist options in the arc module
3 months ago
Vsevolod Stakhov
3789ff947a
Merge pull request #5556 from rspamd/vstakhov-skip-hashes-fuzzy
[Fix] Check skip_hashes for the returned hashes
3 months ago
Vsevolod Stakhov
338f9ca1f5
[Fix] Fix whitelist options in the arc module
Issue: #5558
3 months ago
Vsevolod Stakhov
23ed80bf78
[Fix] Check skip_hashes for the returned hashes
3 months ago
Vsevolod Stakhov
af5e83da54
Merge pull request #5555 from heptalium/meissner-fix-dmarc-reports
Use Redis write servers for write commands while generating DMARC reports
3 months ago
Jens Meißner
55990170a3
Use Redis write servers for write commands while generating DMARC reports.
3 months ago
Vsevolod Stakhov
a22fbdc1ae
[Fix] Use a more straightforward approach for learn cache
3 months ago
Vsevolod Stakhov
44ee3d8b0a
[Fix] Fix various corner cases and tests
3 months ago
Vsevolod Stakhov
496d57d63b
[Test] Add logic to match test id and logs id
3 months ago
Vsevolod Stakhov
533b9f676e
[Minor] Add --log-tag option for rspamc
3 months ago
Vsevolod Stakhov
e3e85f617f
[Minor] Fix single class fallback
3 months ago
Vsevolod Stakhov
e4a78fdab2
[Project] Apply changes to bayes_expiry plugin
3 months ago
Vsevolod Stakhov
6c6056c895
[Test] Some more adjustments to the tests
3 months ago