Non-parametric estimation of archimedean's radial parts -- a summer story
Background
The estimation of archimedean copulas is an issue that bothers me since… a while now. My first shot at reasearch, while still being a master’s student, was a project with a few friends involving those models into a solvency 2 and reinsurence pricing problem. I was always amazed by the fact that archimedean models are very constraints, often one-or-two parametrical, while the class itself is pretty wide – after all, generators are functions, infinite-dimensional. So, this summer, spending some time in a white zone, I was looking once again at the non-parametric estimation problem… This blog post summarizes my findings.
Theoretical setup
Let $C$ be a $d$-dimensional Archimedean copula $(d\ge 2)$ with $d$-monotone generator $\varphi$. By the Williamson representation, there exists a nonnegative random variable $R$ such that
$$ \varphi(t) = \mathbb{E}\left[\left(1-\frac{t}{R}\right)_{+}^{d-1}\right] =: \varphi_R(t). $$
Let $U\sim C$, and define the Kendall distribution
$$ K_R(x) = \mathbb{P}\left(C(U)\le x\right), \qquad x\in[0,1]. $$
For empirical work, given a sample ${U_i}_{i=1}^n$ from the copula and given the Deheuvels empirical copula $C_n$, the Kendall distribution can be estimated by
$$K_n(x) = \frac{1}{n}\sum_{i=1}^n I\left(C_n(U_i)\le x\right).$$
Assume now that $R$ is discrete, with $R \sim \sum_{j=1}^N w_j \delta_{r_j}$ with ordered atoms $0<r_1<\cdots<r_N$. The associated generator becomes of course
$$ \varphi_R(t) = \sum_{j=1}^N w_j \left(1-\frac{t}{r_j}\right)_{+}^{d-1}, $$
Knowing that, for archimedean copulas,
$$ K_R(x) = \mathbb P\left(\varphi_R(R) \le x\right), $$
a path toward estimation would be to invert this equation to find $R$ in terms of $K_n$. Remark that $K_R$ has jumps precisely at the points $x_j = \varphi_R(r_j)$ with masses $w_j$.
Smart inversion
From the Kendall pseudo-sample ${C_n(U_i)}_{i=1}^n$, collect the distinct values and their empirical frequencies, defining
$$ x_1 > x_2 > \cdots > x_N,\qquad w_j = \frac{1}{n}\sum_{i=1}^n \mathbf{1}{C_n(U_i) = x_j}, j=1,\dots,N, $$
so that $\sum_{j=1}^N w_j = 1$ and $K_n(x) = \sum_{j=1}^N w_j \mathbf{1}{x_j\le x}$. Recover support points $0<r_1<\cdots<r_N$ by solving the system $x_j = \varphi_R(r_j)$. Because positive parts in $\varphi_R$ switch on only when $r_j$ exceeds the evaluation point, the system is triangular:
$$ r_N := 1, $$
$$ r_{N-1} := 1 - \left(\frac{x_{N-1}}{w_N}\right)^{1/(d-1)}, $$
$$ \text{and, for } k = N-2,\dots,1:\quad x_k = \sum_{j=k+1}^N w_j\left(1-\frac{r_k}{r_j}\right)^{d-1}. $$
The last equation can be solved by 1D root finding in $r_k\in[0, r_{k+1})$. The estimate is then
$$ \hat{R} \sim \sum_{j=1}^N w_j \delta_{r_j}. $$
The associated generator $\varphi_{\hat{R}}$ is continuous and piecewise polynomial between the recovered radii.
Examples
We generate $n=2000$ samples from Archimedean copulas with the following radial parts:
- $R \sim \delta_1$, the dirac at point $1$, corresponding to the lower-bound Clayton copula, in dimension $d=3$.
- $R \sim \frac{\delta_1 + \delta_4 +\delta_8}{3}$, a simple mixture of point masses, in dimensions $d=2$ and $d=3$.
- $R \sim \texttt{Gamma}(0.2)$ in dimension $d=10$.
- $R \sim \texttt{LogNormal}(1,3)$, which is heavy tailed, in dimension $d=10$.
- $R \sim \texttt{Pareto}(1/2)$, which does not even have an expectation, in dimension $d=10$.
From each sample, we estimate $\hat{R}$ using the proposed procedure. The graphs below gives a graphical comparison of the empirical Kendall function and the fitted one, scatterplot samples from the original copula and the fitted copula, and histograms of samples from the estimated vs. true $R$.
Each estimator took only a few miliseconds to compute on a standard laptop, thanks to the power of Julia :)
It already exists!
When comming back from my white zone, I started looking around to see if something like that already exists, and in fact it does. The method was proposed in 2011 by the following paper:
@article{,
title = {Inference in Multivariate {{Archimedean}} Copula Models},
author = {Genest, Christian and Ne{\v s}lehov{\'a}, Johanna and Ziegel, Johanna},
year = {2011},
month = aug,
journal = {TEST},
volume = {20},
number = {2},
pages = {223--256},
issn = {1133-0686, 1863-8260}
}
The paper also contains proofs of functionality that are a bit better worded than mines!
Anyway, I ended up recycling my code as an example page of the Copulas.jl’s docs, and the code used to produce the upper graphics is there too. Soon, this code will become the main non-parametric estimator of archimedean generators in the package, while for Liouville models the dirichlet part will be estimated separately… this will provide a fast and efficient nonparametric archimedean estimator in Julia.