Skip to content

Build

Creating GCM

Suppose we want to create the following Graphical Causal Model (GCM).

The graph object can be created using a string describing the graph and the class gcm.DAG() from the module gcm of causalinf. There are many options to use for the string syntax. Any combination of the syntax options below will work.

from causalinf import gcm

# option 1: group the parents of each node
# --------
dag = """
Y <- {X1, X2, D}
D <- {X1, Z}
Z <- X2
"""
# option 2: group the children of each node
# --------
dag = """
Z -> D
X1 -> {D, Y}
X2 -> {Z, Y}
D -> Y
"""
# option 3: one or more arrows per line
# --------
dag = """
Y <- X2 -> Z -> D -> Y
D <- X1 -> Y
"""
# or
dag = """
Y <- X2 -> Z -> D -> Y
X1 -> Y
X1 -> D
"""

G = gcm.DAG(dag)
print(G)
Graph:
Z -> D
X2 -> Y
X1 -> D
X1 -> Y
X2 -> Z
D -> Y
Observed: Z, X1, Y, X2, D

If no further information is provided, all variables (nodes) are assumed to be observed.

1
2
3
4
5
6
7
8
from causalinf import gcm 
dag = """
Y <- {X1, X2, D}
D <- {X1, Z}
Z <- X2
"""
G = gcm.DAG(dag)
print(G)
Graph:
Z -> D
X2 -> Y
X1 -> D
X1 -> Y
X2 -> Z
D -> Y
Observed: Z, X1, Y, X2, D

There are different ways to set the type of variable as Exposure, Outcome, or Latent using a dictionary.

var_types = {
    "Exposure": "D",
    "Outcome": "Y",
    "Latent": "X1"
}
# set when when creating the DAG:
G = gcm.DAG(dag, nodes_role=var_types)

# or using the set_nodes_role()
G = G.set_nodes_role(var_types)

print(G)
Graph:
Z -> D
X2 -> Y
X1 -> D
X1 -> Y
X2 -> Z
D -> Y
Observed: Z, X2
Exposure: D
Outcome: Y
Latent: X1

But note that resetting the variable types changes all of them, and omitted information assumes that the variables is of the Observed type. For instance:

1
2
3
4
# or using the set_nodes_role()
G = G.set_nodes_role({"Outcome": "D"})

print(G)
Graph:
Z -> D
X2 -> Y
X1 -> D
X1 -> Y
X2 -> Z
D -> Y
Observed: Z, X1, Y, X2
Outcome: D

Arbitraty types can be used. These are used only for plotting. For analysis (identification and estimation), only the four basic ones (Outcome, Exposure, Latent, and Observed) are used.

1
2
3
4
# or using the set_nodes_role()
G = G.set_nodes_role({"My favorite var": "D"})

print(G)
Graph:
Z -> D
X2 -> Y
X1 -> D
X1 -> Y
X2 -> Z
D -> Y
Observed: Z, X1, Y, X2
My favorite var: D

Bidirected/undirected edges

It is possible to define undirected and bidirected edges as well with -- and <->. Bidirected edges in the context of causal inference with DAGs indicate an omitted common cause between the variables connected by the bidirected arrow. Undirected edges, on the other hand, in the context of causal inference with DAGs, are used to represent the skeleton of the DAG or observationally equivalent edges. That is, they are edges whose direction cannot be decided inferentially using observational data unless other parametric assumptions are adopted.

from causalinf import gcm 
dag = """
Y <- {X1, X2, D}
D <- {X1, Z}
Z <- X2
Z2 <-> X2
X1 -- Y
"""
G = gcm.DAG(dag)
print(G)
Graph:
Z -> D
X2 -> Y
X1 -> D
X1 -> Y
X2 -> Z
D -> Y
Z2 <-> X2
X1 -- Y
Observed: Z, X1, Z2, Y, X2, D

GCM Object

The Graphical Causal Model (GCM) object has many useful properties.

from causalinf import gcm 
dag = """
Y <- {X1, X2, D}
D <- {X1, Z}
Z <- X2
Z2 <-> X2
X1 -- Y
"""
G = gcm.DAG(dag)
print(f"""
Nodes:
{G.nodes}
Nodes info:
{G.nodes_info}
Directed edges:
{G.directed}
Bidirected edges:
{G.bidirected}
Undirected edges:
{G.undirected}
""")
Nodes:
{'Z', 'X1', 'Z2', 'Y', 'X2', 'D'}
Nodes info:
{'Z': {'role': 'Observed', 'label': 'Z'}, 'X1': {'role': 'Observed', 'label': 'X1'}, 'Z2': {'role': 'Observed', 'label': 'Z2'}, 'Y': {'role': 'Observed', 'label': 'Y'}, 'X2': {'role': 'Observed', 'label': 'X2'}, 'D': {'role': 'Observed', 'label': 'D'}}
Directed edges:
[('Z', 'D'), ('X2', 'Y'), ('X1', 'D'), ('X1', 'Y'), ('X2', 'Z'), ('D', 'Y')]
Bidirected edges:
[(('Z2', 'X2'), ('X2', 'Z2'))]
Undirected edges:
[{'X1', 'Y'}]