.. title: Representing Discrete Probabilities in RDF .. slug: representing-discrete-probabilities-in-rdf .. date: 2019-11-13 23:33:21 UTC-05:00 .. tags: software, rdf .. category: software .. link: .. description: .. type: text Many RDF datasets are meant to be interpreted as simply "statements in the graph are true". In order to record more nuanced interpretations, *we'll* make meta-statements: statements about statements and graphs. We'll use a predicate that assigns a probability to a statement or graph. When its subject is a statement, it means that statement is true with that probability. When its subject is a graph, it means all statements in that graph are true with *at least* that probability. Assume we have a dataset with a variety of statements, some of which we think are true, some are false, and some are true only with some probability. We can't just start inferring new statements from this dataset, since we have no idea what they would mean. First create a graph of "assumed true" statements. Let's call it |G1|. Add to it a statement "|G1| probability 1". We can add to our "true" graph probabilities for other statements not present in that graph itself. Within this graph we can safely let loose a inference process to find more true statements. Next comes the fun part. We want to allow inferences about statements with a known, non-1 probability. To do that, let's say we have a statement outside |G1|, |Sa|, and there's a statement in |G1| "|Sa| probability .9". We'll create a new supergraph of |G1| consisting of |G1| plus |Sa|. Let's call that |Ga|. We can add to |G1| a statement "|Ga| probability .9". Then, we can safely let an inference process run in that new supergraph. Let's say we have another statement, called S\ :subscript:`~a`\ , known to be the inverse of |Sa|, either because a human manually added a statement to |G1| creating that relationship between them, or an automated process was able to infer that. We can infer it's probability is .1, and carry out the same supergraph process, this time producing a graph with probability .1. Let's say we have another statement, |Sb|, also with probability .9. We could perform the supergraph process with it on |G1| to produce yet another graph with probability .9. We could also perform the supergraph process on |Ga|, producing a graph that represents everything in |G1| being true, plus |Sa|, plus |Sb|. We'll call that graph G\ :sub:`ab`\ . We can then add to |G1| a statement "G\ :sub:`ab` probability .81". This assumes the probability of the two statements is independent. To represent dependant probabilities, instead of making statements about probability in |G1|, we can make them in the subgraphs that suppose those statements are true. E.g. if |Sa| being true implies |Sb| has a probability of .99, then we add a statement to |Ga| saying as much. The supergraph construction then proceeds the same, but the resulting graph has a probability of 0.891. ------- Let's try a more formal approach. We'll define three predicates\: *n* hasProbability *p* *n* is a statement or graph, *p* is a number between 0 and 1 (inclusive). *n* hasProbabilityAtLeast *p* *n* is a graph or statement, *p* is a number between 0 and 1 (inclusive). *s* isOppositeOf *t* *s* and *t* are statements. You can make several inferences that are obvious. In addition: * if *n* isOppositeOf *m* and *n* hasProbability *p* then *m* hasProbability 1 − *p* Let's think about a largest possible graph, the graph that contains all possible statements. Let's call it |Guniverse|. Assigning meaning to |Guniverse| is pretty much impossible; it contains every possible contradiction! You could apply a certain kind of inference engine to it that just constructs subgraphs that contain no contradictions, but it would have a infinite about of work to do. So instead, we'll carve out subgraphs that we can assign some meaning to. We can also nicely represent subjectivity while we're at it. Let's say I want to mark some statement (|Sa|) as true. I create a new graph of statements I think are true, and add the statement to it. Let's call it |Gm1|. I can also add to |Gm1| the statement "|Gm1| hasProbability 1". I can let an inference engine loose in this graph and have it add everything it can derive from statements already in the graph to it. Now let's say I want to say another statement (|Sb|) in |Guniverse| (but not |Gm1|) has probability 0.8. I add a statement to |Gm1|: "|Sb| hasProbability 0.8". This can trigger the automatic creation of a new graph |Gmb|, which contains |Sb|, and is also a supergraph of |Gm1|. We can then infer in |Gm1| "|Gmb| hasProbabilityAtLeast 0.8". .. |G1| replace:: G\ :sub:`1` .. |Gm1| replace:: G\ :sub:`morgan:1` .. |Gmb| replace:: G\ :sub:`morgan:b` .. |Ga| replace:: G\ :sub:`a` .. |Sa| replace:: S\ :sub:`a` .. |Sb| replace:: S\ :sub:`b` .. |Guniverse| replace:: G\ :sub:`🌌`