Vsevolod Stakhov
ecc587a3df
[Project] Add tests and fix stuff
2 months ago
Vsevolod Stakhov
346ec89803
[Feature] Allow selectors in regexp maps expressions
2 months ago
Vsevolod Stakhov
e8b90e0e10
[Minor] Update plugins that are using headers modifications
2 months ago
Vsevolod Stakhov
9848f16510
[Project] Implement backoff for upstreams revival
2 months ago
Vsevolod Stakhov
c132ade98b
[Project] Start to implement better revive strategy for upstreams
2 months ago
Vsevolod Stakhov
c0a16d53b0
[Minor] Optimise re-resolving for known IPs
2 months ago
Vsevolod Stakhov
3ffc665243
[Feature] Resolve DNS nameservers names using getaddrinfo
2 months ago
Vsevolod Stakhov
01a1032107
[Fix] Fix l= calculations again
3 months ago
Vsevolod Stakhov
2f52cdf2e1
Fix DKIM relaxed body canonicalization and optimize performance
This PR addresses critical issues in DKIM relaxed body canonicalization and modernizes the codebase by replacing GLib types with standard C types.
- **RFC Compliance**: Fixed incorrect canonicalization of lines containing only whitespace. Previously, such lines were not properly handled according to RFC 6376, which could lead to DKIM signature verification failures.
- **Memory Safety**: Fixed incorrect pointer dereference in `rspamd_dkim_skip_empty_lines` that could cause undefined behavior.
- **Zero-copy Optimization**: Reimplemented `rspamd_dkim_relaxed_body_step` to avoid unnecessary memory copies. The new implementation:
- Processes input data directly without intermediate buffers
- Reduces the number of `EVP_DigestUpdate` calls by processing larger chunks
- Improves CPU cache efficiency
- Results in significantly better performance for large email bodies
- Replaced all GLib types with standard C equivalents:
- `gsize` → `size_t`
- `gssize` → `ssize_t`
- `gboolean` → `bool`
- `TRUE/FALSE` → `true/false`
- And other GLib-specific types
- Added necessary standard headers (`stdbool.h`, `stdint.h`, `limits.h`)
- Added comprehensive debug logging for:
- Chunk processing with size information
- Empty line detection and skipping
- Space collapsing operations
Issue: #5590
3 months ago
Vsevolod Stakhov
69db7c992f
[Project] Add tests for LLM provider, fix various issues with metatokens
3 months ago
Vsevolod Stakhov
597ae4da82
[Project] Rework rspamc to allow training of different neural types
3 months ago
Vsevolod Stakhov
e6fdc3b42f
[Fix] GPT: Fix occasional damage
3 months ago
Vsevolod Stakhov
8c59bd7c2f
[Minor] Move common stuff to a separate function
3 months ago
Vsevolod Stakhov
f60a55f6c5
[Minor] Don't use coroutines
3 months ago
Vsevolod Stakhov
60e1b843b6
Neural module rework: provider-based feature fusion, LLM embeddings, normalization, and v3 schema
This PR evolves the neural module from a symbols-only scorer into a general feature-fusion classifier with pluggable providers. It adds an LLM embedding provider, introduces trained normalization and metadata persistence, and isolates new models via a schema/prefix bump.
- The existing neural module is limited to metatokens and symbols.
- We want to combine multiple feature sources (LLM embeddings now; Bayes/FastText later).
- Ensure consistent train/infer behavior with stored normalization and provider metadata.
- Improve operability with caching, digest checks, and safer rollouts.
- Provider architecture
- Provider registry and fusion: `collect_features(task, rule)` concatenates provider vectors with optional weights.
- New LLM provider: `lualib/plugins/neural/providers/llm.lua` using `rspamd_http` and `lua_cache` for Redis-backed embedding caching.
- Symbols provider extracted: `lualib/plugins/neural/providers/symbols.lua`.
- Normalization and PCA
- Configurable fusion normalization: none/unit/zscore.
- Trained normalization stats computed during training and applied at inference.
- Existing global PCA preserved; loaded/saved alongside ANN.
- Schema and compatibility
- `plugin_ver` bumped to '3' to isolate from earlier profiles.
- Redis save/load extended:
- Profiles include `providers_digest`.
- ANN hash can include `providers_meta`, `norm_stats`, `pca`, `roc_thresholds`, `ann`.
- ANN load validates provider digest and skips apply on mismatch.
- Performance and reliability
- LLM embeddings cached in Redis (content+model keyed).
- Graceful fallback to symbols if providers not configured or fail.
- Basic provider configuration validation.
- `lualib/plugins/neural.lua`: provider registry, fusion, normalization helpers, profile digests, training pipeline updates.
- `src/plugins/lua/neural.lua`: integrates fusion into inference/learning, loads new metadata, applies normalization, validates digest.
- `lualib/plugins/neural/providers/llm.lua`: LLM embeddings with Redis cache.
- `lualib/plugins/neural/providers/symbols.lua`: legacy symbols provider wrapper.
- `lualib/redis_scripts/neural_save_unlock.lua`: stores `providers_meta` and `norm_stats` in ANN hash.
- `NEURAL_REWORK_PLAN.md`: design and phased TODO.
- Enable LLM alongside symbols:
```ucl
neural {
rules {
default {
providers = [
{ type = "symbols"; weight = 0.5; },
{ type = "llm"; model = "text-embed-1"; url = "https://api.openai.com/v1/embeddings ";
cache_ttl = 86400; weight = 1.0; }
];
fusion { normalization = "zscore"; }
roc_enabled = true;
max_inputs = 256; # optional PCA
}
}
}
```
- LLM provider uses `gpt` block for defaults if present (e.g., API key). You can override `model`, `url`, `timeout`, and cache parameters per provider entry.
- Existing (v2) neural profiles remain unaffected (new `plugin_ver = '3'` prefixes).
- New profiles embed `providers_digest`; incompatible provider sets won’t be applied.
- No immediate cleanup required; TTL-based cleanup keeps old keys around until expiry.
- Validated: provider digest checks, ANN load/save roundtrip, normalization application at inference, LLM caching paths, symbols fallback.
- Please test with/without LLM provider and with `fusion.normalization = none|unit|zscore`.
- LLM latency/cost is mitigated by Redis caching; timeouts are configurable per provider.
- Privacy: use trusted endpoints; no content leaves unless configured.
- Failure behavior: missing/failed providers degrade to others; training/inference can proceed with partial features.
- Rules without `providers` continue to use symbols-only behavior.
- Existing command surface unchanged; future PR will introduce `rspamc learn_neural:*` and controller endpoints.
- [x] Provider registry and fusion
- [x] LLM provider with Redis caching
- [x] Symbols provider split
- [x] Normalization (unit/zscore) with trained stats
- [x] Redis schema v3 additions and profile digest
- [x] Inference uses trained normalization
- [x] Basic provider validation and fallbacks
- [x] Plan document
- [ ] Per-provider budgets/metrics and circuit breaker for LLM
- [ ] Expand providers: Bayes and FastText/subword vectors
- [ ] Per-provider PCA and learned fusion
- [ ] New CLI (`rspamc learn_neural`) and status/invalidate endpoints
- [ ] Documentation expansion under `docs/modules/neural.md`
3 months ago
René Draaisma
c80d6b3cfd
Updated gpt.lua to set default gpt-5-mini as model, fix issue when GPT max_completion_tokens exceeded and returned empty reason field, Set default group to GPT for Symbols, group is now also configurable in settings with extra_symbols, fix issue when no score is defined in settings at extra_symbols, default score is now 0
3 months ago
René Draaisma
89e55435bf
Updated gpt.lua to provide model parameters with the settings
3 months ago
hunter-nl
19679a3664
Update gpt.lua to make use of lua_util.deepcopy function.
3 months ago
Vsevolod Stakhov
65b52ce843
[Fix] Bayes: Try to be bug-to-bug compatible
3 months ago
hunter-nl
d795beb024
Update gpt.lua to fix spaces on empty lines
To fix luacheck "line contains only whitespace"
3 months ago
hunter-nl
588e74931c
Update gpt.lua to get fresh body for each model iteration
3 months ago
hunter-nl
a05810040f
Update gpt.lua to remove unneeded body_base.model
Not needed in the body_base structure.
3 months ago
hunter-nl
ba7df736e4
Update gpt.lua to handle OpenAI parallel old and new models
When in rspamd_config is specified multiple models (old/new), this is handled now correctly to set the required attributes for each model request.
3 months ago
hunter-nl
5efcf514b8
Update gpt.lua to support newer models without temperature attribute
Newer models do not support temperature attribute anymore.
3 months ago
hunter-nl
d09b5d24fd
Update gpt.lua to support newer models with max_completion_tokens
Newer and reasoning models requires max_completion_tokens instead of max_tokens attribute.
3 months ago
Andrew Lewis
b4e72dd243
[Minor] Drop overzealous regex from hfilter
3 months ago
Vsevolod Stakhov
b6a3d5c9a6
[Minor] Add specific calculations for binary classification case
3 months ago
Andrew Lewis
558e5cfa86
[Minor] Fix implicit declaration
3 months ago
Vsevolod Stakhov
d0c2a24ddc
[Fix] Try to fix learned order
3 months ago
Vsevolod Stakhov
9a713d0607
[Fix] Fix double free in the client...
3 months ago
Vsevolod Stakhov
aa34dd8ad0
[Minor] Fix 'Compression' header logic
3 months ago
Vsevolod Stakhov
81417aeec8
[Minor] Some more logic fixes
3 months ago
Vsevolod Stakhov
dc23bd1b20
[Minor] More cleanups for compression stuff
3 months ago
Vsevolod Stakhov
c6c53357b8
[Fix] Fix end-to-end proxy compression
Issue: #5561
3 months ago
Vsevolod Stakhov
338f9ca1f5
[Fix] Fix whitelist options in the arc module
Issue: #5558
4 months ago
Vsevolod Stakhov
23ed80bf78
[Fix] Check skip_hashes for the returned hashes
4 months ago
Vsevolod Stakhov
a22fbdc1ae
[Fix] Use a more straightforward approach for learn cache
4 months ago
Vsevolod Stakhov
44ee3d8b0a
[Fix] Fix various corner cases and tests
4 months ago
Vsevolod Stakhov
533b9f676e
[Minor] Add --log-tag option for rspamc
4 months ago
Vsevolod Stakhov
e3e85f617f
[Minor] Fix single class fallback
4 months ago
Vsevolod Stakhov
e4a78fdab2
[Project] Apply changes to bayes_expiry plugin
4 months ago
Vsevolod Stakhov
bfdd04e653
[Project] Fix unlearn stuff
4 months ago
Vsevolod Stakhov
7c2fd27461
[Project] Fix more calculation issues
4 months ago
Vsevolod Stakhov
e013a3bdc5
[Feature] Add some convenience options to rspamc
4 months ago
Vsevolod Stakhov
3428e63e3a
[Minor] Further adjustments
4 months ago
Vsevolod Stakhov
16985fecee
[Fix] Fix statfiles ordering
4 months ago
Vsevolod Stakhov
94e0e6a533
[Fix] Fix probabilities overflow
4 months ago
Vsevolod Stakhov
c0eca31a3a
[Minor] Fix stupid change to call Redis for each class
4 months ago
Vsevolod Stakhov
a9fbcf9d49
[Project] Fix various issues
4 months ago
Vsevolod Stakhov
e989261050
[Minor] Reduce debug level verbosity
4 months ago