Building a Modern Dashboard with Gradio
=========================================

**Overview**

This tutorial guides you through creating a professional, multi-tab Gradio application with interactive visualizations using the Iris dataset. You'll learn how to build a dashboard with multiple views, responsive filters, and statistical analysis—all with a clean, modern design.

.. note::
   This tutorial uses the built-in **Iris** dataset, so you can run it immediately without downloading external files. The architecture mirrors typical data science dashboards used in machine learning, data science, and exploratory analysis.

**What You'll Learn**

By following this tutorial, you'll understand:

- How to structure a multi-tab Gradio application using ``gr.Tabs()``
- Creating responsive layouts with Gradio's Row and Column components
- Building interactive filters (dropdowns for categorical data)
- Using Plotly for advanced visualizations (scatter, box, heatmap charts)
- Managing callbacks to handle user interactions
- Applying statistical analysis (correlation, PCA) to data
- Styling visualizations with custom colors and layouts

---

1. Setup & Prerequisites
------------------------

**Key Libraries Overview**

- **gradio**: The web framework for building interactive dashboards
- **plotly**: Advanced interactive visualizations
- **pandas**: Data manipulation and analysis
- **numpy**: Numerical computations
- **scikit-learn**: Machine learning utilities (StandardScaler, PCA)

---

2. Step-by-Step Implementation
------------------------------

2.1 Imports & Data Setup
^^^^^^^^^^^^^^^^^^^^^^^^

Start by importing all necessary libraries and loading your data:

.. code-block:: python

    import os
    import gradio as gr
    import plotly.express as px
    import plotly.graph_objects as go
    import pandas as pd
    import numpy as np
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA

    # Load the Iris dataset (built-in to Plotly)
    df = px.data.iris()

**Explanation**

- ``gr``: Gradio framework for building the UI
- ``plotly.express``: High-level API for creating interactive charts
- ``plotly.graph_objects``: Low-level API for fine-grained control
- The Iris dataset contains measurements of sepal and petal dimensions for three flower species

2.2 Define Your Design System (Theme)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create a centralized theme dictionary to maintain consistency:

.. code-block:: python

    theme = {
        'bg_main': '#F0F2F5',        # Light grey-blue main background
        'bg_card': '#FFFFFF',        # White for card backgrounds
        'text_primary': '#2C3E50',   # Dark blue-gray for primary text
        'text_secondary': '#7F8C8D', # Medium gray for secondary text
        'accent': '#6C5CE7',         # Purple for accents
        'success': '#00B894',        # Green for success/positive
        'font_family': '"Segoe UI", Roboto, Helvetica, Arial, sans-serif'
    }

2.3 Helper Functions
^^^^^^^^^^^^^^^^^^^^

Create functions to handle data updates:

**Function Overview:**

These three helper functions handle all data filtering, aggregation, and visualization logic. Each is designed to be triggered by a Gradio callback when the user selects a different species.

**update_overview() - Main Statistics Dashboard**

.. code-block:: python

    def update_overview(selected_species):
        """Update overview page with KPIs and visualizations"""
        dff = df[df.species == selected_species] if selected_species != 'All' else df
        
        # KPIs
        avg_sepal = f"{dff['sepal_length'].mean():.2f} cm"
        avg_petal = f"{dff['petal_length'].mean():.2f} cm"
        total_samples = f"{len(dff)} samples"
        
        kpi_text = f"""
        ### Key Performance Indicators
        
        **Average Sepal Length:** {avg_sepal}
        
        **Average Petal Length:** {avg_petal}
        
        **Total Samples:** {total_samples}
        """
        
        # Scatter Plot
        fig_scatter = px.scatter(
            dff, x="sepal_length", y="petal_length", color="species", size="sepal_width",
            hover_name="species", title="Sepal vs Petal Length"
        )
        fig_scatter.update_layout(
            plot_bgcolor='rgba(0,0,0,0)', paper_bgcolor='rgba(0,0,0,0)',
            font_family=theme['font_family']
        )
        
        # Bar Chart
        species_stats = dff.groupby("species")[["sepal_length", "petal_length"]].mean().reset_index()
        fig_bar = px.bar(
            species_stats, x="species", y="sepal_length", color="species",
            title="Average Sepal Length by Species"
        )
        fig_bar.update_layout(
            plot_bgcolor='rgba(0,0,0,0)', paper_bgcolor='rgba(0,0,0,0)',
            font_family=theme['font_family'], showlegend=False
        )
        
        
        # Table
        table_html = dff.head(10)[['species', 'sepal_length', 'petal_length', 'sepal_width']].to_html(index=False)
        
        return kpi_text, fig_scatter, fig_bar, table_html

**What this function does:**

1. **Data Filtering**: Checks if user selected a specific species or "All". If "All", uses entire dataset (150 samples); otherwise filters to 50 samples for that species
2. **KPI Calculation**: Computes mean sepal length, mean petal length, and total sample count
3. **Scatter Plot**: Visualizes relationship between sepal and petal lengths, with point size representing sepal width
4. **Bar Chart**: Shows average sepal length grouped by species
5. **HTML Table**: Displays first 10 records with key measurements
6. **Returns**: Tuple of (markdown text, scatter figure, bar figure, HTML table) to update all 4 dashboard elements

**update_species_details() - Distribution Analysis**

.. code-block:: python

    def update_species_details(selected_species):
        """Update species details page"""
        if not selected_species or selected_species == 'All':
            dff = df
        else:
            dff = df[df.species == selected_species]
        
        if len(dff) == 0:
            return px.box(title=f"No data for {selected_species}")
        
        fig = px.box(dff, y='sepal_length', x='species',
                     title=f"Sepal Length Distribution: {selected_species if selected_species != 'All' else 'All Species'}")
        fig.update_layout(
            plot_bgcolor='rgba(0,0,0,0)', paper_bgcolor='rgba(0,0,0,0)',
            font_family=theme['font_family']
        )
        return fig

**What this function does:**

1. **Data Filtering**: Filters iris data by selected species (or shows all if "All" selected)
2. **Validation**: Checks for empty data and returns empty plot if no data found
3. **Box Plot**: Creates box plot showing sepal length distribution (median, quartiles, outliers) grouped by species
4. **Styling**: Applies transparent background and custom font for consistency

**update_analytics() - Advanced Analysis with PCA**

.. code-block:: python

    def update_analytics(selected_species):
        """Perform advanced analytics with PCA"""
        from sklearn.preprocessing import StandardScaler
        from sklearn.decomposition import PCA
        
        if selected_species == 'All':
            dff = df.copy()
        else:
            dff = df[df.species == selected_species].copy()
        
        # Prepare data for PCA
        features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
        X = dff[features]
        
        # Standardize the features
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # Apply PCA
        pca = PCA(n_components=2)
        X_pca = pca.fit_transform(X_scaled)
        
        # Create PCA dataframe
        pca_df = pd.DataFrame(
            data=X_pca, columns=['PC1', 'PC2']
        )
        pca_df['species'] = dff['species'].values
        
        # PCA Plot
        fig_pca = px.scatter(
            pca_df, x='PC1', y='PC2', color='species',
            title=f"PCA Analysis (Explained Variance: {sum(pca.explained_variance_ratio_):.2%})"
        )
        fig_pca.update_layout(
            plot_bgcolor='rgba(0,0,0,0)', paper_bgcolor='rgba(0,0,0,0)',
            font_family=theme['font_family']
        )
        
        # Correlation heatmap
        corr = dff[features].corr()
        fig_heatmap = px.imshow(
            corr, labels=dict(color="Correlation"), title="Feature Correlation",
            color_continuous_scale='RdBu_r', zmin=-1, zmax=1
        )
        fig_heatmap.update_layout(font_family=theme['font_family'])
        
        return fig_pca, fig_heatmap

**What this function does:**

1. **Feature Selection**: Uses all 4 iris measurements (sepal/petal length & width)
2. **Standardization**: Scales features to zero mean and unit variance using StandardScaler (essential for PCA)
3. **PCA Transformation**: Reduces 4-dimensional space to 2D while preserving maximum variance
4. **PCA Scatter Plot**: Visualizes iris samples in the reduced 2D space with species color coding
   - Shows how well species cluster together
   - PC1 and PC2 represent the most important directions of variance
   - Title shows cumulative explained variance (typically 95%+)
5. **Correlation Heatmap**: Shows correlation matrix between all 4 features
   - Red = positive correlation
   - Blue = negative correlation
   - Used to identify which measurements are related
6. **Returns**: Tuple of (PCA scatter figure, correlation heatmap)

**Key Design Patterns:**

- **Modular Design**: Each function is independent and reusable
- **Consistent Styling**: All plots use the theme dictionary for uniform look
- **Error Handling**: update_species_details checks for empty data
- **Data Efficiency**: Functions work with filtered data (dff) to reduce computation
- **Return Types**: Functions return exactly what Gradio callbacks expect

2.4 Build the Gradio Interface
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create the multi-tab dashboard using Gradio:

.. code-block:: python

    # Get available species from data
    available_species = ['All'] + sorted(df['species'].unique().tolist())

    with gr.Blocks(title="Iris Dashboard", theme=gr.themes.Soft()) as app:
        gr.Markdown("# 🌸 Iris Flower Analysis Dashboard")
        gr.Markdown("Statistical analysis of iris flower measurements across species.")
        
        with gr.Tabs():
            # Tab 1: Iris Overview
            with gr.Tab("Iris Overview"):
                with gr.Row():
                    species_dropdown = gr.Dropdown(
                        choices=available_species,
                        value='All',
                        label="Select Species",
                        interactive=True
                    )
                
                with gr.Row():
                    kpis = gr.Markdown()
                
                with gr.Row():
                    scatter_graph = gr.Plot(label="Sepal vs Petal Length")
                    bar_graph = gr.Plot(label="Average Measurements by Species")
                
                with gr.Row():
                    table = gr.HTML()
                
                # Callbacks
                species_dropdown.change(
                    fn=update_overview,
                    inputs=[species_dropdown],
                    outputs=[kpis, scatter_graph, bar_graph, table]
                )
                
                app.load(fn=update_overview,
                        inputs=[species_dropdown],
                        outputs=[kpis, scatter_graph, bar_graph, table])
            
            # Tab 2: Species Details
            with gr.Tab("Species Details"):
                with gr.Row():
                    species_select = gr.Dropdown(
                        choices=available_species,
                        value='All',
                        label="Select Species"
                    )
                
                with gr.Row():
                    species_graph = gr.Plot(label="Measurement Distribution")
                
                # Callback
                species_select.change(
                    fn=update_species_details,
                    inputs=[species_select],
                    outputs=[species_graph]
                )
                
                app.load(fn=update_species_details,
                        inputs=[species_select],
                        outputs=[species_graph])
            
            # Tab 3: Advanced Analytics
            with gr.Tab("Advanced Analytics"):
                with gr.Row():
                    analytics_species = gr.Dropdown(
                        choices=available_species,
                        value='All',
                        label="Select Species"
                    )
                
                with gr.Row():
                    pca_graph = gr.Plot(label="PCA Analysis")
                    corr_graph = gr.Plot(label="Feature Correlation")
                
                # Callback
                analytics_species.change(
                    fn=update_analytics,
                    inputs=[analytics_species],
                    outputs=[pca_graph, corr_graph]
                )
                
                app.load(fn=update_analytics,
                        inputs=[analytics_species],
                        outputs=[pca_graph, corr_graph])

    if __name__ == "__main__":
        # Use environment variables set by %set_gradio_env or start_gradio
        host = os.getenv("GRADIO_HOST", "0.0.0.0")
        port = int(os.getenv("GRADIO_PORT", "8001"))
        root_path = os.getenv("GRADIO_ROOT_PATH", "")
        
        app.launch(
            server_name=host,
            server_port=port,
            root_path=root_path if root_path else None,
            share=False,
            quiet=True,
            inline=False
        )

---

3. Running the Dashboard
------------------------

**Jupyter Notebook Approach**

In a Jupyter notebook on nanoHUB:

.. code-block:: python

    %load_ext nanohubgradio
    %set_gradio_env

Then paste the iris dashboard code and run it. The dashboard will launch in an iframe showing interactive plots of iris flower measurements.

**Standalone Script Approach**

Save the complete dashboard code as ``dashboard_tutorial.py`` and run:

.. code-block:: bash

    start_gradio dashboard_tutorial.py

The ``start_gradio`` command will:

1. Start the Gradio application on port 8001
2. Configure the weber proxy on port 8000
3. Automatically inject a nanoHUB header with support and terminate buttons
4. Display the access URL
5. Allow you to filter by iris species and view statistical analyses

---

4. Dashboard Features Explained
-------------------------------

**Three Analysis Tabs**

The dashboard is organized into three main tabs:

- **Iris Overview**: Shows key statistics (average sepal/petal measurements, sample counts) and scatter plots comparing flower dimensions
- **Species Details**: Displays box plot distributions for each species
- **Advanced Analytics**: Performs dimensionality reduction using PCA to visualize patterns in the 4-dimensional measurement space

**Species Selection**

.. code-block:: python

    species_dropdown = gr.Dropdown(
        choices=['All'] + sorted(df['species'].unique().tolist()),
        value='All',
        label="Select Species",
        interactive=True
    )

Users can select "All" to view all 150 iris samples, or choose a specific species (Setosa, Versicolor, or Virginica) to view 50 samples for that species.

**Interactive Callbacks**

.. code-block:: python

    species_dropdown.change(
        fn=update_overview,
        inputs=[species_dropdown],
        outputs=[kpis, scatter_graph, bar_graph, table]
    )

When the user changes the species selection, the callback automatically updates all visualizations and KPIs to reflect the filtered data.

**PCA Analysis**

The Advanced Analytics tab uses scikit-learn's PCA to reduce the 4-dimensional measurement space to 2D for visualization:

.. code-block:: python

    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
    
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    pca = PCA(n_components=2)
    X_pca = pca.fit_transform(X_scaled)

This reveals clustering patterns between iris species and helps identify which measurement dimensions are most important.

---

5. Customization Ideas
----------------------

You can extend this dashboard with additional features:

- **Add more visualizations** such as histogram distributions, violin plots, or parallel coordinates plots
- **Include statistical tests** for comparing measurements between species (e.g., ANOVA, t-tests)
- **Add data export** functionality to download filtered datasets as CSV
- **Create custom themes** using ``gr.themes`` to match your organization's branding
- **Add interactive sliders** to filter samples by measurement ranges
- **Include regression analysis** to predict iris species from measurements
- **Use more advanced ML techniques** like clustering (K-means, DBSCAN) to discover patterns

---

6. Complete Example File
------------------------

The complete ``dashboard_tutorial.py`` example is available in the ``examples/`` directory of the nanohub-gradio repository. You can run it directly with:

.. code-block:: bash

    start_gradio examples/dashboard_tutorial.py

This will launch a multi-tab dashboard showing:

1. **Overview metrics** and measurement statistics for the selected species
2. **Detailed distributions** of sepal and petal measurements
3. **Advanced analytics** using PCA to visualize 4-dimensional data in 2D space