*This post was originally published by Georgi Tancev at Towards Data Science*

## The importance of recognizing non-identifiability.

In statistics, **identifiability** is a property that a **model** must satisfy in order for precise **inference** to be possible. When developing probabilistic or deterministic models, it is up to the scientist how the model equations are parametrized. In order to estimate the parameters of a model from data, an **inverse problem** is solved, which are some of the most important mathematical problems in science and mathematics because they provide information about parameters that cannot directly be observed, for example, parameters in a **generalized linear model**. Such an inverse problem is well-posed if

- a solution exists,
- the solution is unique,
- and the solution’s behavior changes continuously with the initial conditions.

Especially aspects two and three are difficult to fulfill. Imagine parametrizing a model as

```
y = a**b * x = c * x
```

where *x* and *y* is data, and *a* and *b* are parameters. This model is clearly over parametrized, i.e. it has more parameters than can be estimated from the data, which makes the problem ill-posed. The global optimum of the objective function such as mean squared error is not a point in the parameter space but a trajectory; only the lumped parameter *c* can be estimated, and different combinations of *a* and *b* are optimal. Depending on the starting point of gradient descent, different solutions can be obtained.

A model that is not identifiable is said to be **non-identifiable** or **unidentifiable**, i.e. two or more parametrizations are equivalent. A more familiar case is quadratic equations, as the sign of *x, in this case,* can not be identified. For instance, the square root of four is two or minus two, the solution, in this case, is not unique, it is either positive or negative.

```
y = a * x**2 <=> sqrt(y) = sqrt(a) * abs(x)
```

This is called **structural identifiability**. Recognized non-identifiability can be removed through the substitution of the non-identifiable parameters with their combinations like in the example above. By exploring the degrees of freedom of the model, non-identifiability can be discovered *a priori*, i.e. even before inference. In particular, computing the **rank** of the **sensitivity matrix** can provide more information; if the rank is lower than the number of parameters, the model structure cannot be identified.

Another issue arises due to a lack of data or noisy data. The variance in the data is transferred to the variance in the parameters (**Cramér-Rao bound**). If the confidence interval(s) of the parameter(s) includes zero, then the parameter(s) could also be redundant. In a case like this, more or better data is needed. This is called **practical identifiability**.

In both cases, a solution to the optimization problem might be obtained but it might be not the best one and predictions would fail. Hence, it is of importance to recognize situations of non-identifiability, as any downstream steps are worthless and more time should be invested in better-parametrized models or better data.

*This post was originally published by Georgi Tancev at Towards Data Science*