CanvasXpress Data Parameter Guide

Understanding data structures and formats for effective scientific visualization with CanvasXpress.

Introduction to CanvasXpress Data Parameters

The CanvasXpress data parameter is incredibly versatile, designed to handle many data types for different visualizations. This guide focuses on the most common and specialized data structures used in scientific and biomedical research visualizations.

General JSON Data Structures

For most numerical graphs like bar charts, scatter plots, heatmaps, etc., CanvasXpress supports two primary JSON structures.

Simple 2D Array

This is a direct and simple format where the first row contains variable names and the first column contains sample names.

[
  ["Variable", "Sample1", "Sample2", "Sample3", "Tissue"],
  ["Variable1", 10, 20, 30, "Kidney"],
  ["Variable2", 35, 25, 15, "Lung"]
]

While CanvasXpress uses an object format internally, your source data can be in different shapes. A common task is determining if your data is "long" (multiple rows per item) or "wide" (one row per item with multiple measurement columns).

As a rule of thumb, CanvasXpress prefers the "short and wide" format for plots that compare a numerical column against a categorical one (like Bar or Boxplots). For plots comparing two numerical columns (like a Scatter plot), the data must be in a "long and skinny" format, where each row represents a complete observation with both an X and a Y coordinate.

Structured Dataframes (y, x, z)

For more complex datasets, you can use structured dataframes with separate containers for the core data (y), sample metadata (x), and variable metadata (z).

y container: Holds the core numerical data, with separate vectors for variable (vars) and sample (smps) names.
x container: Stores metadata for the samples, using key-value pairs.
z container: Holds metadata for the variables.

{
  "y": {
    "vars": [ "Variable1", "Variable2" ],
    "smps": [ "Sample1", "Sample2", "Sample3" ],
    "data": [
      [ 10, 20, 30 ],
      [ 35, 25, 15 ]
    ]
  },
  "x": {
    "Tissue": [ "Kidney", "Lung", "Heart" ]
  },
  "z": {
    "Symbol": [ "AAA", "BBB" ],
    "Pathway": [ "P1", "P2" ]
  }
}

When to Use Each Format

Simple 2D Array: Quick prototyping and simple visualizations, for small to medium datasets, and when minimal metadata is needed.
Structured Dataframes: For complex scientific datasets, when sample or variable metadata is important, and for larger datasets with many variables.

Loading Data from External Sources

CanvasXpress isn't limited to in-line data. You can also load data from external files or JavaScript functions.

Remote Files (CSV, JSON, PNG)

You can directly specify a URL to load files like CSV or JSON.

<!-- Loading a CSV file from a URL -->
new CanvasXpress({
  "renderTo": "canvasId",
  "data": "https://raw.githubusercontent.com/datasets/sample.csv"
});

JavaScript Function

For dynamic or programmatically generated data, you can provide a JavaScript function that returns the data object.

new CanvasXpress("canvasFunc",
  function () {
    return {
      "y": {
        "vars": ["Var1"],
        "smps": ["Smp0", "Smp1"],
        "data": [[11, 22]]
      }
    };
  }
);

Venn Diagram Data Formats

For creating a Venn diagram, you have two main options. The 2D array format is straightforward and similar to the structure used for many other graph types.

[
  ["Id", "Value"],
  ["A",    340],
  ["AB",   639],
  ["ABC",  552],
  ["ABCD", 148],
  ["ABD",  578],
  ["AC",   456],
  ["ACD",  298],
  ["AD",   257],
  ["B",    562],
  ["BC",   915],
  ["BCD",  613],
  ["BD",   354],
  ["C",    620],
  ["CD",   143],
  ["D",    592]
]

Alternatively, you can use a JSON object with a venn property. This format includes a data object for the intersection values and a legend for the set names.

{
  "venn": {
    "data": {
      "A": 340, "B": 562, "C": 620,
      "AB": 639, "AC": 456, "BC": 915,
      "ABC": 552
    },
    "legend": {
      "A": "List1", "B": "List2", "C": "List3"
    }
  }
}

Venn Diagram Data Structure Best Practices

Use the 2D array format for simpler Venn diagrams with fewer sets.
Use the JSON object format when you need custom set names or more complex intersections.
For the 2D array format, the "Id" column should contain set identifiers (e.g., "A", "AB", "ABC").
For the JSON object format, the keys in the "data" object represent set intersections.

Network Data Formats

For network visualizations, data can be defined in both JSON and XML formats.

JSON Format

The JSON format is defined by two properties: nodes and edges.

{
  "nodes": [
    { "id": "Node1", "color": "red" },
    { "id": "Node2", "color": "green" }
  ],
  "edges": [
    { "id1": "Node1", "id2": "Node2", "color": "yellow" }
  ]
}

Specialized XML Formats

In addition to a generic XML structure, CanvasXpress natively supports several widely used XML formats from the scientific community. This allows for seamless integration with data from major bioinformatics tools and databases:

GPML: The standard format for biological pathways used by WikiPathways.
KGML: The XML-based format for representing biological pathways from the KEGG database.
XGMML: An XML format for describing graphs, commonly used for data exchange with network analysis software like Cytoscape.
CanvasXpress also supports the specific XML formats for pathways used by Gephi and Metabase.