.. title: Representing Discrete Probabilities in RDF
.. slug: representing-discrete-probabilities-in-rdf
.. date: 2019-11-13 23:33:21 UTC-05:00
.. tags: software, rdf
.. category: software
.. link:
.. description:
.. type: text

Many RDF datasets are meant to be interpreted as simply "statements in the graph
are true". In order to record more nuanced interpretations, *we'll* make
meta-statements: statements about statements and graphs.

We'll use a predicate that assigns a probability to a statement or graph. When
its subject is a statement, it means that statement is true with that
probability. When its subject is a graph, it means all statements in that graph
are true with *at least* that probability.

Assume we have a dataset with a variety of statements, some of which we think
are true, some are false, and some are true only with some probability. We can't
just start inferring new statements from this dataset, since we have no idea
what they would mean.

First create a graph of "assumed true" statements. Let's call it |G1|. Add to it
a statement "|G1| probability 1". We can add to our "true" graph probabilities
for other statements not present in that graph itself. Within this graph we can
safely let loose a inference process to find more true statements.

Next comes the fun part. We want to allow inferences about statements with a
known, non-1 probability. To do that, let's say we have a statement outside
|G1|, |Sa|, and there's a statement in |G1| "|Sa| probability .9". We'll create
a new supergraph of |G1| consisting of  |G1| plus |Sa|. Let's call that |Ga|. We
can add to |G1| a statement "|Ga| probability .9". Then, we can safely let an
inference process run in that new supergraph.

Let's say we have another statement, called S\ :subscript:`~a`\ , known to be
the inverse of |Sa|, either because a human manually added a statement to |G1|
creating that relationship between them, or an automated process was able to
infer that. We can infer it's probability is .1, and carry out the same
supergraph process, this time producing a graph with probability .1.

Let's say we have another statement, |Sb|, also with probability .9. We could
perform the supergraph process with it on |G1| to produce yet another graph with
probability .9. We could also perform the supergraph process on |Ga|, producing
a graph that represents everything in |G1| being true, plus |Sa|, plus |Sb|.
We'll call that graph G\ :sub:`ab`\ . We can then add to |G1| a statement "G\
:sub:`ab` probability .81". This assumes the probability of the two statements
is independent.

To represent dependant probabilities, instead of making statements about
probability in |G1|, we can make them in the subgraphs that suppose those
statements are true. E.g. if |Sa| being true implies |Sb| has a probability of
.99, then we add a statement to |Ga| saying as much. The supergraph construction
then proceeds the same, but the resulting graph has a probability of 0.891.

-------

Let's try a more formal approach.

We'll define three predicates\:

*n* hasProbability *p*
  *n* is a statement or graph, *p* is a number between 0 and 1 (inclusive).

*n* hasProbabilityAtLeast *p*
  *n* is a graph or statement, *p* is a number between 0 and 1 (inclusive).

*s* isOppositeOf *t*
  *s* and *t* are statements.

You can make several inferences that are obvious. In addition:

* if *n* isOppositeOf *m* and *n* hasProbability *p* then *m* hasProbability
  1 − *p*

Let's think about a largest possible graph, the graph that contains all possible
statements. Let's call it |Guniverse|. Assigning meaning to |Guniverse| is
pretty much impossible; it contains every possible contradiction! You could
apply a certain kind of inference engine to it that just constructs subgraphs that
contain no contradictions, but it would have a infinite about of work to do. So
instead, we'll carve out subgraphs that we can assign some meaning to. We can
also nicely represent subjectivity while we're at it.

Let's say I want to mark some statement (|Sa|) as true. I create a new graph of
statements I think are true, and add the statement to it. Let's call it |Gm1|. I
can also add to |Gm1| the statement "|Gm1| hasProbability 1". I can let an
inference engine loose in this graph and have it add everything it can derive
from statements already in the graph to it.

Now let's say I want to say another statement (|Sb|) in |Guniverse| (but not
|Gm1|) has probability 0.8. I add a statement to |Gm1|: "|Sb| hasProbability
0.8". This can trigger the automatic creation of a new graph |Gmb|, which
contains |Sb|, and is also a supergraph of |Gm1|. We can then infer in |Gm1|
"|Gmb| hasProbabilityAtLeast 0.8".

.. |G1| replace:: G\ :sub:`1`
.. |Gm1| replace:: G\ :sub:`morgan:1`
.. |Gmb| replace:: G\ :sub:`morgan:b`
.. |Ga| replace:: G\ :sub:`a`
.. |Sa| replace:: S\ :sub:`a`
.. |Sb| replace:: S\ :sub:`b`
.. |Guniverse| replace:: G\ :sub:`🌌`