Methodology

CreativeDynamics Library v0.9.8.1

The library employs techniques from Rough Path Theory to analyse time-series data. Core concept: calculating mathematical signatures of a data path and measuring distance between signatures over time for change-point detection.

Rough path signatures

Rough path signature: mathematical object capturing geometric features of a path (time series) through a hierarchy of Lie increments. Provides rich, non-linear summary of path evolution as a powerful feature extraction tool.

Key properties:

Robust to re-parameterisation: Depends on geometric shape, not traversal speed
Faithful representation: Under mild conditions, uniquely determines the path up to tree-like equivalences
Universal approximators: Truncated signatures approximate any continuous function on path space

Rough path essentials (formal)

This section summarises the minimum rough path notation needed to understand the library. It follows the notation and framing used in the accompanying paper (arXiv-2509.09758v3/main.tex).

Paths

Let $X : [a,b] \to \mathbb{R}^m$ be a continuous path of bounded variation. In our application, we typically use $m=2$ and embed a time series $\{y_t\}$ as a path $X_t := (t, y_t)$ , then normalise time and metric values to $[0,1]$ over each analysis window.

Signature (iterated integrals)

The signature of $X$ over $[a,b]$ is the sequence of iterated integrals

S(X)_{a,b} := \left(1, S^1(X)_{a,b}, S^2(X)_{a,b}, \dots\right),

where, for $k \ge 1$ ,

S^k(X)_{a,b} = \int_{a < t_1 < \cdots < t_k < b} dX_{t_1} \otimes \cdots \otimes dX_{t_k}.

In practice we truncate at depth $d$ to obtain a finite-dimensional feature vector.

Concatenation (Chen identity)

If $a < c < b$ , the signature satisfies a multiplicative property (Chen’s identity)

S(X)_{a,b} = S(X)_{a,c} \otimes S(X)_{c,b},

which is the algebraic reason signatures are useful for analysing local changes along a path.

Log-signatures (Lie increments)

The library computes log-signatures (Lie increments) for efficiency. Conceptually, this amounts to working in the free Lie algebra while preserving the geometric information of the truncated signature.

Mathematical foundation:

Implements signatures using Lie increments rather than tensor products for computational efficiency whilst maintaining mathematical rigour. For $m$ -dimensional path $X:[0,T] \to \mathbb{R}^m$ , the log-signature is an element of the free Lie algebra capturing the same information as the truncated signature up to depth $d$ (up to the truncation order).

Implementation details:

Uses roughpy library with Lie increment computation
Paths normalised to [0,1] interval before signature computation
Signature depth controls geometric detail level (default depth=4)
Computational complexity: O(T·d²) for fixed window size w

Signature calculation

Uses roughpy library for path signature calculation:

Accuracy: Well-tested library providing correct signature computations
Efficiency: Optimised C++ backend for performance
Standardisation: Standard, community-accepted tool

Primary module: creativedynamics.core.signature_calculator.

Path construction and normalisation

Specific normalisation procedure ensures numerical stability and consistent signature computation:

Two-dimensional path construction: For each metric, constructs 2D path X(t) = (t_norm, y_norm) where:
- t_norm ∈ [0,1]: normalised time coordinate
- y_norm ∈ [0,1]: normalised metric value
Normalisation procedure:
```
t_norm = (t - t_min) / (t_max - t_min)
y_norm = (y - y_min) / (y_max - y_min + ε)
```
where ε = 10^-8 prevents division by zero for constant metrics.
Signature parameters:
- Depth: Controls Lie increments level (default=4)
- Window Size (w): Consecutive data points per window (default=7)
- Sliding step: Windows slide by one time point for detailed analysis

Normalisation ensures signatures from different time periods and metrics are comparable, essential for distance-based change point detection.

Signature distance and change point detection

Sliding window approach detects changes in time series patterns:

Window-based signature computation: For time series of length T, computes signatures for overlapping windows of size w.
Distance calculation: Euclidean distance between consecutive window signatures:
```
d_t = ||S_t - S_{t-1}||_2
```
where S_t is the signature of window t.
Statistical thresholding: Change points detected when distance exceeds:
```
threshold = μ_d + k·σ_d
```
where μ_d and σ_d are mean and standard deviation of all distances, k is threshold multiplier (default k=1.5).
Computational efficiency: Overall complexity O(T·d²) for fixed window size w, efficient for real-time analysis.

Applications of signatures in the library

Primary built-in application within creativedynamics.core.analyzer module: change-point detection.

Four-phase analysis process

Detailed four-phase analysis pipeline:

Phase 1: Change point detection
- Computes sliding window signatures across time series
- Calculates signature distances between consecutive windows
- Identifies statistically significant change points using adaptive thresholding
- Output: List of change points segmenting time series
Phase 2: Segment analysis
- Divides time series into segments based on detected change points
- Computes segment statistics (mean, variance, trend)
- Classifies segment trends as “Stable”, “Improving”, or “Declining”
- Output: Characterised segments with trend classifications
Phase 3: Benchmark calculation
- Identifies longest stable or improving segment
- Computes benchmark values from optimal performance periods
- Validates benchmark reliability based on segment duration
- Output: Benchmark values for impact calculation
Phase 4: Impact quantification
- Calculates impact during declining periods
- Quantifies actual_overspend_gbp (financial inefficiency) and engagement_gap_clicks (operational impact)
- Provides correlation risk context; metrics are reported separately and not combined
- Output: Operational and financial impact of performance degradation (reported separately)

Implemented in creativedynamics.core.analyzer module with configurable parameters for each phase.

Visual representation

Visual reports for change-point analysis include:

Upper chart: Original time-series metric(s)
Lower chart: Calculated signature distances over time with significance threshold line and vertical markers for detected change points

Theoretical properties and advantages

Signature-based approach provides theoretical guarantees and practical advantages:

Theoretical properties:

Consistency: Change point detection is statistically consistent under mild conditions
Convergence: Signature distances converge to true pattern distance as window size increases
Invariance: Detection invariant to monotonic time transformations

Practical advantages:

Early detection: Captures subtle pattern changes before manifesting in aggregate metrics
Non-linearity: Naturally handles non-linear dynamics and complex interactions
Robustness: Resistant to outliers due to integral-based computation
Interpretability: Signature distances have clear geometric interpretation

Performance characteristics:

Precision-recall trade-off: Controlled by threshold multiplier k
Default settings: k=1.5 provides balanced precision (~0.7) and recall (~0.6)
Computational efficiency: Linear in time series length for fixed window size

Multi-dimensionality

“Multi-dimensionality” is key:

Path dimensionality: Input data is often multi-dimensional (e.g., time, metric A, metric B)
Signature dimensionality: Signature is a high-dimensional vector (or tensor), where each term captures different aspects of path geometry

Multi-dimensional approach allows detailed characterisation of time series compared to methods analysing each metric in isolation or considering only simple trends.

Data preparation and column naming conventions

The library standardises all column names to lowercase throughout the processing pipeline, simplifying the codebase by eliminating case-sensitivity issues and reducing complexity:

Column names transformed to lowercase immediately after CSV ingestion
Only lowercase column names used throughout entire analysis pipeline
Standard names include day, link_clicks, amount_spent_gbp, impressions, cpc, and ctr

Library entry points

Two primary entry points for analysis:

CLI entry point (cli.py): Recommended for production use. Uses YAML configuration files with nested column mapping structure for flexible and maintainable configuration.
Script entry point (run_analysis.py): Alternative entry point using flat JSON mapping files. Maintained for backward compatibility but may be deprecated in future versions.

For new implementations, CLI entry point with YAML configuration is the standard approach.