📊 Data & Models
This section covers how behavioral data is used to train machine learning models in bezpieczenstwoPwr, as well as how those models are integrated and run in the browser.
📁 Data Source
The machine learning models are trained using anonymized user behavior data, primarily from:
- Kaggle datasets related to mouse movement, bot detection, or human-computer interaction
- Custom-collected data from real users and simulation tools (e.g., puppeteer)
Typical features include:
- Mouse speed and acceleration
- Click patterns and hesitation times
- Pointer trajectory shapes
- Keystroke intervals and input dynamics
🧠 All data is anonymized before any storage or model training. No personally identifiable information is collected.
🛠 Data Preprocessing
Before feeding data into the model, we perform several preprocessing steps:
- Normalization of coordinates and timing
- Sequence padding or trimming to ensure consistent input shape
- Feature extraction such as velocity, direction changes, or dwell time
- Noise filtering to remove accidental or edge-case inputs
You can find preprocessing code inside the training pipeline scripts (not yet published, ask us if you want early access).
🧠 Model Architecture
Models are designed for lightweight, in-browser inference using TensorFlow.js.
We experimented with:
- LSTM / GRU networks for time-sequence analysis
- 1D CNNs for spatial-temporal patterns
- Ensemble methods trained offline and distilled into TensorFlow.js format
The final model is selected based on size, accuracy, and inference speed in the browser.
⚠️ Models are kept under
public/models/for now. Expect a proper versioning system and CDN-hosted models in future updates.
📦 Loading Models
The bbotd package automatically loads a model from a given URL.