Setvis Documentation Index

Setvis is a python package for exploring and visualizing data missingness (that is, the presence, number and pattern of missing data in a dataset).

It can also be used to visualize set membership of which data missingness is a special case.

It is designed to work particularly well when used interactively from a notebook, but can also be used non-interactively.

At the moment, setvis can load data from pandas dataframes, csv files, and also supports a Postgres database backend. It is designed with large datasets in mind – setvis may be able to load the missingness information from a dataset even if the dataset itself does not fit in memory.

Contents

Introductory Example

A quick example showing the output of setvis when run in a notebook.

See the tutorials and example notebooks for further examples, and an explanation of the plots and user interface.

The output is an interactive widget in the notebook (the user has chosen the ‘combination heatmap’ view - see setvis.plots.IntersectionHeatmap()). The selected data can be refined by choosing elements of interest on the plot.

import pandas as pd
from setvis.plots import PlotSession

df = pd.read_csv("Synthetic_APC_DIAG_Fields.csv")

session = PlotSession(df)

session.add_plot(name="initial plot")
_images/setvis-screen-capture.png

Acknowledgements

The development of the setvis software was supported by funding from the Engineering and Physical Sciences Research Council (EP/N013980/1; EP/K503836/1) and the Alan Turing Institute.

Indices and tables