Exact chainrules derivatives for `beta_inc` and `beta_inc_inv`
Motivation
In many probabilistic and statistical models, the (regularized) incomplete beta function shows up in places where you do not expect it at first glance: CDFs of Beta and Student distributions, tail probabilities of binomial models, Bayesian posteriors, or parameterizations of the Student copula.
More concretely, let $I_x(a,b)$ denote the regularized incomplete beta function:
$$ I_x(a,b) = \frac{1}{B(a,b)} \int_0^x t^{a-1} (1-t)^{b-1} , dt, \qquad a>0,; b>0,; x \in [0,1]. $$
And let $I^{-1}(p; a,b)$ its inverse in $x$. They appear in several common constructions:
- the Beta CDF is $x \mapsto I_x(a,b)$;
- the Beta quantile is $p \mapsto I^{-1}(p; a,b)$;
- the Student-$t$ CDF with $\nu$ degrees of freedom can be written, for $t\in\mathbb R$,
$$
F_{t_\nu}(t)
= \frac{1}{2}
- \frac{\operatorname{sign}(t)}{2} I_{\frac{t^2}{t^2+\nu}}!\left(\frac{1}{2}, \frac{\nu}{2}\right); $$
- the multivariate Student-$t$ radial CDF for $X \sim t_\nu^d(\mu,\Sigma)$ and squared Mahalanobis radius $q = (x-\mu)^\top \Sigma^{-1} (x-\mu)$ is $$ \mathbb P\big((X-\mu)^\top \Sigma^{-1} (X-\mu) \le q\big) = I_{\frac{dq}{q+\nu}}!\left(\frac{d}{2}, \frac{\nu}{2}\right). $$
Whenever these objects appear inside a differentiable model, we need derivatives of $I_x(a,b)$ and of its inverse with respect to all of their arguments.
In Julia, these functions are exposed as beta_inc and beta_inc_inv in SpecialFunctions.jl. If you are just evaluating them pointwise, the story ends there. But as soon as you start differentiating through them with automatic differentiation—say, inside a log-likelihood or a loss function—the details of the derivatives become important.
This became very concrete for me in several occasions, while dealing with the MvTDist multivariate Student distribution in Distributions.jl, that has no density, and building the TCopula Student copula on it (through the simple inversion method). I was getting a model without a propper pdf, while it clearly exists !
A bit later,
a Discourse thread surfaced an error in a Turing model that used Gaussian copula priors with Beta marginals. The model relied on the Beta quantile, which in turn calls the inverse incomplete beta under the hood, and ForwardDiff quite reasonably complained that beta_inc_inv did not know what to do with dual numbers.
The same lack of derivatives had already bitten me in a few other places:
- Fitting multivariate Student distributions in Distributions.jl (the multivariate Student CDF depends on the incomplete beta).
- Building Gaussian copula and Student copula models in Copulas.jl with Beta marginals.
- Generally, anywhere the incomplete beta or its inverse sneaks into a probability transform.
All of this led to the SpecialFunctions.jl
#506 pull request titled “Exact chainrules derivatives for beta_inc and beta_inc_inv”. This post is a short note telling the story of this pull request, which will be hopefully already merged when you read this.
Why not just “let AD handle it”?
In principle, if your special function implementation were written in very pure Julia, fully AD-friendly, and free of branches, continued fractions, or asymptotic expansions, you could just rely on automatic differentiation through the primal.
In practice:
- Implementations of
beta_incandbeta_inc_invmake heavy use of piecewise algorithms (series vs. continued fractions, different asymptotic expansions) to remain accurate across the entire parameter space. - These branches tend to produce noisy, unstable gradients, particularly near region boundaries or for extreme parameter values.
- Reverse-mode AD through a complex special function can be significantly more expensive than evaluating a small number of carefully chosen closed-form expressions.
These reasons pushed the original developpers of the routine to (rightfully) restrict them to ::Float64, which does not allow dual numbers to pass through. To get the automatic differentiation system to work, we therefore need to provide specialised analytic rrules/frules, and implement these derivatives by hand.
Once these rules live in the SpecialFunctionsChainRulesCoreExt.jl extension, any
AD backend that goes through ChainRules — ForwardDiff via
DifferentiationInterface, Zygote, Enzyme+ChainRules, Mooncake, etc. — can pick
them up without further work on the user side.
The problem is not new…
When I first ran into these issues, I was not the only one. Back in 2020, @arzwa opened a Discourse thread titled “Differentiation of incomplete Beta function”, motivated by exactly the same pain point: using Beta CDFs and quantiles inside Turing models and running into missing or slow derivatives.
The problem of implementing the derivatives of the incomplete beta function w.r.t. its shape parameter is not new. In fact, you’ll easily find the paper by Boik & Robinson-Cox (1998) (see the BibTeX entry at the end of this post) that solves the problem 28 years ago. Very roughly, that paper derives explicit series and recurrence formulas for the partial derivatives of the incomplete beta with respect to $a$ and $b$, together with a discussion of how to evaluate them stably over a wide range of parameter values. The key idea is that one should not try to differentiate through an arbitrary special-function implementation, but instead implement the derivative formulas directly from the mathematical description.
The original article even came with a small S+/Matlab reference implementation. I can read S+ but not matlab, so my first instinct was simply to translate that code to Julia with the help of AI, wire it up to ChainRules, and add tests manually. Technically, this worked surprisingly well: the translated routine passed a broad grid of numerical checks and immediately unblocked the Turing and copula examples. The full translation, including the fixing to make the tests passes, took me less than 2 hours. Of course, the quality of the obtained code was not very readable (not very Julian), but it worked very well !
However, together with reviewers on the PR we quikly understood that this raised uncomfortable licensing issues: a near line‑by‑line translation of the S+/Matlab code is still a derivative work of the original implementation, which we could not simply drop into an MIT‑licensed package.
I then switched things up and tried to adapt, again using AI,
@arzwa‘s independent implementation that is in
this repository. Since the testing framework was already there, it was very easy and did go quite smoothly. After a quick discussion, arzwa kindly added an MIT license to that repository, which made reusing the code in SpecialFunctions.jl straightforward from a legal point of view.
The SpecialFunctions.jl PR simply glued these pieces together in a way that is compatible with the rest of the ecosystem: after abandoning the AI‑translated S+/Matlab code for licensing reasons, I switched to arzwa’s independent Julia port in IncBetaDer as the numerical core, wrapped it in ChainRules rrules for beta_inc (both the three‑argument beta_inc(a,b,x) and the four‑argument beta_inc(a,b,x,y) with x + y ≈ 1) and beta_inc_inv, and then tested everything on a broad grid of $(a,b,x)$ values and a few realistic model snippets.
An important practical detail is that the Boik & Robinson-Cox’s method does not differentiate through the special-function implementation itself. Instead, it re-implement the partial fraction expansion of the derivatives from scratch, which makes it very efficient, including in some corners of the parameter space.
On the testing side, the PR also includes a fairly broad grid of $(a,b,x)$ values, in both Float64 and Float32, designed to exercise the same branches as the primal beta_inc tests. These ChainRules tests are not cheap time‑wise, but maintainers were happy to pay that cost for the extra confidence that the rules behave well across the parameter ranges that appear in real models, including those Bayesian copula priors that started the story.
Takeaways
- The regularized incomplete beta and its inverse are central in many statistical models and are increasingly used inside differentiable programs.
- Exact ChainRules for
beta_incandbeta_inc_invcan be derived from straightforward calculus (density formulas and implicit differentiation) and implemented in terms oflogbeta,digamma, and related primitives. - These rules provide smooth access to derivatives that were not available to users before in the eco-system.
References
For the mathematical details, the main reference is:
@article{boik_robinson_cox_1998,
title = {Derivatives of the incomplete beta function with respect to its parameters},
author = {Boik, Robert J. and Robinson-Cox, James F.},
journal = {Computational Statistics \& Data Analysis},
volume = {27},
number = {1},
pages = {85--106},
year = {1998}
}