{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "78539fd7-a71f-4027-8b1b-c7fd75526a22",
   "metadata": {},
   "source": [
    "# Scattermatrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c092a604-c386-4dda-bc90-0ff11a8f880f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from hvplot.plotting import scatter_matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1793d412-0d1e-4c8e-b470-ca978d5a53e2",
   "metadata": {},
   "source": [
    "`scatter_matrix` shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.\n",
    "\n",
    "This function is closely modelled on [pandas.plotting.scatter_matrix](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.scatter_matrix.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d6cb81d-7f65-41de-8386-26479e41ddd4",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])\n",
    "\n",
    "scatter_matrix(df, alpha=0.2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91efa178-7cb4-4dbe-8730-59a2864cdf9a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_sub = df[['A', 'B']].copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38913fbf-4982-4b82-bf81-01bb33faf8a0",
   "metadata": {},
   "source": [
    "The `chart` parameter allows to change the type of the *off-diagonal* plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4f72f5f-b9a1-41ce-b00d-59fad1b01b54",
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11943cd6-7d10-4990-9de6-18402c4c42a8",
   "metadata": {},
   "source": [
    "The `diagonal` parameter allows to change the type of the *diagonal* plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d46fd8d8-bf72-45fe-9faf-a85c1239d21e",
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter_matrix(df_sub, diagonal='kde')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d13318b-b96c-4e62-b237-5202e5108a44",
   "metadata": {},
   "source": [
    "Setting `tools` to include a selection tool like `box_select` and an inspection tool like `hover` permits further analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "135bdf59-c2c5-4253-b9a0-a786623db591",
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter_matrix(df_sub, tools=['box_select', 'hover'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4315596-0a9e-4c00-a0d3-c8f3a01b9518",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8b4cef5-25ea-4129-ab0e-7cc6c36d3451",
   "metadata": {},
   "source": [
    "The `c` parameter allows to colorize the data by a given column, here by `'CAT'`. Note also that the `diagonal_kwds` parameter (equivalent to `hist_kwds` in this case or `density_kwds` for *kde* plots) allow to customize the diagonal plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba178ac7-936e-4585-964b-34a99b711108",
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a97b3bc-0053-41b6-a71a-70ceec9e3341",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eaa67050-09ad-4c79-9b01-03dd96a6bada",
   "metadata": {},
   "source": [
    "Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to `True` the `rasterize` (or `datashade`) parameter that uses [Datashader](https://datashader.org/) to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.\n",
    "\n",
    "The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by `datashader`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed43bbf1-4c62-4a80-b044-970c235727a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter_matrix(df, rasterize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c50ec422-baf4-43d4-9f84-a8d74315b7e6",
   "metadata": {},
   "source": [
    "When `rasterize` (or `datashade`) is toggled it's possible to make individual points more visible by setting `dynspread=True` or `spread=True`. Head over to the [Working with large data using datashader](https://holoviews.org/user_guide/Large_Data.html) guide of [HoloViews](https://holoviews.org/index.html) to learn more about these operations and what parameters they accept (which can be passed as `kwds` to `scatter_matrix`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4974565-3c21-462c-b762-c5719d29c221",
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter_matrix(df, rasterize=True, dynspread=True)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}