Skip to content

Methodology

CreativeDynamics Library v0.9.8.1

The library employs techniques from Rough Path Theory to analyse time-series data. Core concept: calculating mathematical signatures of a data path and measuring distance between signatures over time for change-point detection.

Rough path signature: mathematical object capturing geometric features of a path (time series) through a hierarchy of Lie increments. Provides rich, non-linear summary of path evolution as a powerful feature extraction tool.

Key properties:

  • Robust to re-parameterisation: Depends on geometric shape, not traversal speed
  • Faithful representation: Under mild conditions, uniquely determines the path up to tree-like equivalences
  • Universal approximators: Truncated signatures approximate any continuous function on path space

This section summarises the minimum rough path notation needed to understand the library. It follows the notation and framing used in the accompanying paper (arXiv-2509.09758v3/main.tex).

Let X:[a,b]RmX : [a,b] \to \mathbb{R}^m be a continuous path of bounded variation. In our application, we typically use m=2m=2 and embed a time series {yt}\{y_t\} as a path Xt:=(t,yt)X_t := (t, y_t), then normalise time and metric values to [0,1][0,1] over each analysis window.

The signature of XX over [a,b][a,b] is the sequence of iterated integrals

S(X)a,b:=(1,S1(X)a,b,S2(X)a,b,),S(X)_{a,b} := \left(1, S^1(X)_{a,b}, S^2(X)_{a,b}, \dots\right),

where, for k1k \ge 1,

Sk(X)a,b=a<t1<<tk<bdXt1dXtk.S^k(X)_{a,b} = \int_{a < t_1 < \cdots < t_k < b} dX_{t_1} \otimes \cdots \otimes dX_{t_k}.

In practice we truncate at depth dd to obtain a finite-dimensional feature vector.

If a<c<ba < c < b, the signature satisfies a multiplicative property (Chen’s identity)

S(X)a,b=S(X)a,cS(X)c,b,S(X)_{a,b} = S(X)_{a,c} \otimes S(X)_{c,b},

which is the algebraic reason signatures are useful for analysing local changes along a path.

The library computes log-signatures (Lie increments) for efficiency. Conceptually, this amounts to working in the free Lie algebra while preserving the geometric information of the truncated signature.

Mathematical foundation:

Implements signatures using Lie increments rather than tensor products for computational efficiency whilst maintaining mathematical rigour. For mm-dimensional path X:[0,T]RmX:[0,T] \to \mathbb{R}^m, the log-signature is an element of the free Lie algebra capturing the same information as the truncated signature up to depth dd (up to the truncation order).

Implementation details:

  • Uses roughpy library with Lie increment computation
  • Paths normalised to [0,1] interval before signature computation
  • Signature depth controls geometric detail level (default depth=4)
  • Computational complexity: O(T·d²) for fixed window size w

Uses roughpy library for path signature calculation:

  • Accuracy: Well-tested library providing correct signature computations
  • Efficiency: Optimised C++ backend for performance
  • Standardisation: Standard, community-accepted tool

Primary module: creativedynamics.core.signature_calculator.

Specific normalisation procedure ensures numerical stability and consistent signature computation:

  1. Two-dimensional path construction: For each metric, constructs 2D path X(t) = (t_norm, y_norm) where:

    • t_norm ∈ [0,1]: normalised time coordinate
    • y_norm ∈ [0,1]: normalised metric value
  2. Normalisation procedure:

    t_norm = (t - t_min) / (t_max - t_min)
    y_norm = (y - y_min) / (y_max - y_min + ε)

    where ε = 10^-8 prevents division by zero for constant metrics.

  3. Signature parameters:

    • Depth: Controls Lie increments level (default=4)
    • Window Size (w): Consecutive data points per window (default=7)
    • Sliding step: Windows slide by one time point for detailed analysis

Normalisation ensures signatures from different time periods and metrics are comparable, essential for distance-based change point detection.

Signature distance and change point detection

Section titled “Signature distance and change point detection”

Sliding window approach detects changes in time series patterns:

  1. Window-based signature computation: For time series of length T, computes signatures for overlapping windows of size w.

  2. Distance calculation: Euclidean distance between consecutive window signatures:

    d_t = ||S_t - S_{t-1}||_2

    where S_t is the signature of window t.

  3. Statistical thresholding: Change points detected when distance exceeds:

    threshold = μ_d + k·σ_d

    where μ_d and σ_d are mean and standard deviation of all distances, k is threshold multiplier (default k=1.5).

  4. Computational efficiency: Overall complexity O(T·d²) for fixed window size w, efficient for real-time analysis.

Primary built-in application within creativedynamics.core.analyzer module: change-point detection.

Detailed four-phase analysis pipeline:

  1. Phase 1: Change point detection

    • Computes sliding window signatures across time series
    • Calculates signature distances between consecutive windows
    • Identifies statistically significant change points using adaptive thresholding
    • Output: List of change points segmenting time series
  2. Phase 2: Segment analysis

    • Divides time series into segments based on detected change points
    • Computes segment statistics (mean, variance, trend)
    • Classifies segment trends as “Stable”, “Improving”, or “Declining”
    • Output: Characterised segments with trend classifications
  3. Phase 3: Benchmark calculation

    • Identifies longest stable or improving segment
    • Computes benchmark values from optimal performance periods
    • Validates benchmark reliability based on segment duration
    • Output: Benchmark values for impact calculation
  4. Phase 4: Impact quantification

    • Calculates impact during declining periods
    • Quantifies actual_overspend_gbp (financial inefficiency) and engagement_gap_clicks (operational impact)
    • Provides correlation risk context; metrics are reported separately and not combined
    • Output: Operational and financial impact of performance degradation (reported separately)

Implemented in creativedynamics.core.analyzer module with configurable parameters for each phase.

Visual reports for change-point analysis include:

  1. Upper chart: Original time-series metric(s)
  2. Lower chart: Calculated signature distances over time with significance threshold line and vertical markers for detected change points

Signature-based approach provides theoretical guarantees and practical advantages:

Theoretical properties:

  • Consistency: Change point detection is statistically consistent under mild conditions
  • Convergence: Signature distances converge to true pattern distance as window size increases
  • Invariance: Detection invariant to monotonic time transformations

Practical advantages:

  • Early detection: Captures subtle pattern changes before manifesting in aggregate metrics
  • Non-linearity: Naturally handles non-linear dynamics and complex interactions
  • Robustness: Resistant to outliers due to integral-based computation
  • Interpretability: Signature distances have clear geometric interpretation

Performance characteristics:

  • Precision-recall trade-off: Controlled by threshold multiplier k
  • Default settings: k=1.5 provides balanced precision (~0.7) and recall (~0.6)
  • Computational efficiency: Linear in time series length for fixed window size

“Multi-dimensionality” is key:

  1. Path dimensionality: Input data is often multi-dimensional (e.g., time, metric A, metric B)
  2. Signature dimensionality: Signature is a high-dimensional vector (or tensor), where each term captures different aspects of path geometry

Multi-dimensional approach allows detailed characterisation of time series compared to methods analysing each metric in isolation or considering only simple trends.

Data preparation and column naming conventions

Section titled “Data preparation and column naming conventions”

The library standardises all column names to lowercase throughout the processing pipeline, simplifying the codebase by eliminating case-sensitivity issues and reducing complexity:

  1. Column names transformed to lowercase immediately after CSV ingestion
  2. Only lowercase column names used throughout entire analysis pipeline
  3. Standard names include day, link_clicks, amount_spent_gbp, impressions, cpc, and ctr

Two primary entry points for analysis:

  1. CLI entry point (cli.py): Recommended for production use. Uses YAML configuration files with nested column mapping structure for flexible and maintainable configuration.

  2. Script entry point (run_analysis.py): Alternative entry point using flat JSON mapping files. Maintained for backward compatibility but may be deprecated in future versions.

For new implementations, CLI entry point with YAML configuration is the standard approach.