Understanding data structures and formats for effective scientific visualization with CanvasXpress.
The CanvasXpress data parameter is incredibly versatile, designed to handle many data types for different visualizations. This guide focuses on the most common and specialized data structures used in scientific and biomedical research visualizations.
For most numerical graphs like bar charts, scatter plots, heatmaps, etc., CanvasXpress supports two primary JSON structures.
This is a direct and simple format where the first row contains variable names and the first column contains sample names.
[
["Variable", "Sample1", "Sample2", "Sample3", "Tissue"],
["Variable1", 10, 20, 30, "Kidney"],
["Variable2", 35, 25, 15, "Lung"]
]
While CanvasXpress uses an object format internally, your source data can be in different shapes. A common task is determining if your data is "long" (multiple rows per item) or "wide" (one row per item with multiple measurement columns).
As a rule of thumb, CanvasXpress prefers the "short and wide" format for plots that compare a numerical column against a categorical one (like Bar or Boxplots). For plots comparing two numerical columns (like a Scatter plot), the data must be in a "long and skinny" format, where each row represents a complete observation with both an X and a Y coordinate.
For more complex datasets, you can use structured dataframes with separate containers for the core data (y), sample metadata (x), and variable metadata (z).
y
container: Holds the core numerical data, with separate vectors for variable
(vars
) and sample (smps
) names.x
container: Stores metadata for the samples, using key-value pairs.z
container: Holds metadata for the variables.{
"y": {
"vars": [ "Variable1", "Variable2" ],
"smps": [ "Sample1", "Sample2", "Sample3" ],
"data": [
[ 10, 20, 30 ],
[ 35, 25, 15 ]
]
},
"x": {
"Tissue": [ "Kidney", "Lung", "Heart" ]
},
"z": {
"Symbol": [ "AAA", "BBB" ],
"Pathway": [ "P1", "P2" ]
}
}
CanvasXpress isn't limited to in-line data. You can also load data from external files or JavaScript functions.
You can directly specify a URL to load files like CSV or JSON.
<!-- Loading a CSV file from a URL -->
new CanvasXpress({
"renderTo": "canvasId",
"data": "https://raw.githubusercontent.com/datasets/sample.csv"
});
For dynamic or programmatically generated data, you can provide a JavaScript function that returns the data object.
new CanvasXpress("canvasFunc",
function () {
return {
"y": {
"vars": ["Var1"],
"smps": ["Smp0", "Smp1"],
"data": [[11, 22]]
}
};
}
);
For creating a Venn diagram, you have two main options. The 2D array format is straightforward and similar to the structure used for many other graph types.
[
["Id", "Value"],
["A", 340],
["AB", 639],
["ABC", 552],
["ABCD", 148],
["ABD", 578],
["AC", 456],
["ACD", 298],
["AD", 257],
["B", 562],
["BC", 915],
["BCD", 613],
["BD", 354],
["C", 620],
["CD", 143],
["D", 592]
]
Alternatively, you can use a JSON object with a venn
property. This
format includes a data
object for the intersection values and a legend
for the set names.
{
"venn": {
"data": {
"A": 340, "B": 562, "C": 620,
"AB": 639, "AC": 456, "BC": 915,
"ABC": 552
},
"legend": {
"A": "List1", "B": "List2", "C": "List3"
}
}
}
For network visualizations, data can be defined in both JSON and XML formats.
The JSON format is defined by two properties: nodes
and edges
.
{
"nodes": [
{ "id": "Node1", "color": "red" },
{ "id": "Node2", "color": "green" }
],
"edges": [
{ "id1": "Node1", "id2": "Node2", "color": "yellow" }
]
}
In addition to a generic XML structure, CanvasXpress natively supports several widely used XML formats from the scientific community. This allows for seamless integration with data from major bioinformatics tools and databases: