Skip to main content

Tutorial - Data Analysis with Plotly and Pandas

info

Level: Beginners
Time: 20 min

Prerequisites:

  • You have an account and completed the installation process. No account? Get one here
  • You have some experience with reading Python code

  • Not a reader? feel free to follow this tutorial as a video

    Introduction

    In this tutorial we’ll make a VIKTOR app to process, visualize and summarize data from a .csv.

    We’ll do this by using two well-known Python libraries: Pandas and Plotly.

    Visualizing data is one of the best ways to find trends and patterns. By plotting or grouping the data, we can quickly find these trends and make better business decisions! It is also an excellent way to present results to stakeholders like your manager or another business unit.

    In this tutorial you will learn how to:

    1. Create an empty app template
    2. Extract the data from a .csv
    3. Add input fields
    4. Plot the data using Plotly
    5. Add a data summary

    By the end of this tutorial you will be able to create a colorful chart that can be used for analysis and saved for reports to help explain data. You will understand how to automate the process in VIKTOR and it will look similar to the gif below:

    Need help?

    Are you encountering an error? Take a look at the complete app code.

    Also know that you can always ask for help at our Community Forum, where our developers are ready to help with any question related to the installation, coding and more.

    1. Create, install and start an empty app

    Let's create, install and start an empty app. This will be the starting point for the rest of the tutorial. But before we start, make sure to shut down any app that is running (like the demo app) by closing the command-line shell (for example Powershell) or end the process using Ctrl + C.

    Follow these steps to create, install and start an empty app:

    1. Go to the App store in your VIKTOR environment to create a new app. After clicking 'Create app' choose the option 'Create blank app' and enter a name and description of your choice. Submit the form by clicking 'Create and setup'.

    1. Select 'Editor' as app type and click 'Next'.
    2. Now follow the instructions to run the quickstart command to download the empty app template. After entering the command click 'I have run the command' to continue. The CLI will ask you to select your code editor of choice. Use the arrows and press enter to select a code editor. The app will now open in your code editor of choice.

    If all went well, your empty app is installed and connected to your development workspace. The terminal in your code editor should show something like this:

    INFO     : Connecting to platform...
    INFO : Connection is established: https://cloud.viktor.ai <---- here you can see your app
    INFO : The connection can be closed using Ctrl+C
    INFO : App is ready
    1. We want to use Pandas and Plotly in this app. Open requirements.txt and add plotly and pandas under your viktor version (don't change that line):

      viktor==X.X.X   <-- Don't modify this line
      plotly
      pandas

    Now disconnect your app by closing the connection using Ctrl+C to be able to install these new dependencies. You do this by entering the following command:

    viktor-cli install

    And reconnecting the app by entering the command:

    viktor-cli start
    Re-starting your app
    • You only need create an app template and install it once for each new app you want to make.
    • The app will update automatically once you start adding code in app.py, as long as you don't close the terminal or your code editor.
    • Did you close your code editor? Use viktor-cli start to start the app again. No need to install, clear, etc.
    Did you encounter any errors?

    • Always make sure to check the spelling of everything you placed in the command-line, a small mistake and the command you are trying to run may not be recognised!

    • If you are encountering:

      ERROR:
      Exiting because of an error: no requirements.txt file
      PS C:\Users\<username>\viktor-apps>

      Then you are not in the correct folder! check the command-line and navigate to the data-analysis-tutorial folder.

    • If you are encountering:

      Error: App definition is not compatible with the data currently stored in the database. Use the command 'viktor-cli clear' to clear the database.
      PS C:\Users\<username>\viktor-apps\data-analysis-tutorial>

      That means you have not cleared the database yet! Use the viktor-cli clear to clear and then you can use viktor-cli start to start the app. No need to install it again!

    Not seeing any of these errors? Head over to our community! There is a good chance another developer encountered it and solved it too!

    2. Extract data from .csv

    For this tutorial, we will dive into a dataset that contains the car sales data of a showroom.

    The CEO has asked you to create a tool so that they can compare which type of cars were sold. He did not specify what he wanted to compare so we will make this tool so that we compare any two properties of the sold cars as well as give some additional data/information that might be interesting.

    1. Download the dataset here.

    2. Make sure the file name is car-sales.csv

    3. Add the file to your app folder (data-analysis-tutorial)

    4. We'll create the function extract_data to access the data using Pandas, and add all the necessary imports so that we don't need to do this later. Open app.py and make sure the first part of the code looks like this:

      import viktor as vkt
      import pandas as pd
      import plotly.graph_objects as go
      from pathlib import Path


      def extract_data():
      parent_folder = Path(__file__).parent
      file_path = parent_folder/"car-sales.csv"
      data = pd.read_csv(file_path, sep=';')
      return data

      class Parametrization(vkt.Parametrization): # <--- this line is for your reference

    3. Add Input fields

    We want the CEO to interact with the app. For this, we will add two OptionFields. One to determine the variable on the X-axis and the other for the Y-axis. Also, we will make sure that the showed options correspond to the columns of the data set.

    Remember, we always add input fields in the Parametrization class.

    1. In the app.py file make sure the code in the Parametrization class looks like this:

      ...

      class Parametrization(vkt.Parametrization):
      introduction = vkt.Text(
      """
      # 📊 Data Analysis App!

      In this app you can summarise data and visualise it in a PlotlyView.
      The app will summarise some data in the DataView and allow the user to choose a column to analyse to create the Plotly chart.

      """
      )
      # Add some input field for the user
      main_column = vkt.OptionField('Choose main property', options=extract_data().columns.values.tolist())

      count_column = vkt.OptionField('Choose property to analyse', options=extract_data().columns.values.tolist())


      class Controller(vkt.Controller): # <--- this line is for your reference
    2. Go to your cloud environment (e.g. cloud.viktor.ai) and open the development card. Great! You will be able to see your input fields.

    How the app should look after the parametrization

    4. Plot the data

    Now, we'll make a bar chart to show the data. We will create the plot using Plotly, which is a well known plotting library. In VIKTOR, we show Plotly graphs in a PlotlyView.

    We'll use the function extract_data to extract the data form the .csv, and Pandas to process it.

    Remember, we allways add views in our Controller class.

    1. In app.py, find the Controller class and make it look like this:

      class Controller(vkt.Controller):
      parametrization = Parametrization

      @vkt.PlotlyView('Bar chart')
      def generate_plotly_view(self, params, **kwargs):

      # Extract and edit data to make it easy to plot
      data = extract_data()
      edited_data = data.groupby(params.main_column)[params.count_column].value_counts().unstack().fillna(0)

      # Make the bar chart
      fig = go.Figure()
      for column in edited_data.columns:
      fig.add_trace(go.Bar(x=edited_data.index, y=edited_data[column], name=column))

      # Edit the bar chart
      fig.update_layout(barmode='stack',xaxis_title=params.main_column,yaxis_title='Amount Sold')

      return vkt.PlotlyResult(fig.to_json())
    2. Refresh your app. Nice! You should now see a beautiful stacked bar chart of the data after you fill in the inputs. What you will also notice is that when you hover the mouse over one of the bars, Plotly will tell you the exact amount of cars with those two properties!

    The app is almost finished! It should look similar to this

    5. Add a data summary

    Fantastic, we have a fully functioning app 🎉. But that doesn't mean we cannot make it better.

    Let's now add a summary with the most important data next to our graph. We'll show the total number of cars sold and the selected column's best-performing and least-performing property.

    Thanks to Pandas, we can easily compute the totals per column and find the highest and lowest value with just a single line per value. For this, we need to modify the generate_plotly_view method.

    Here are the required steps. The code is below.

    1. Replace the @PlotlyView in your code for a @PlotlyAndDataView, so we can also display data next to the plot.

    2. Create a DataGroup with the DataItems we want to display.

    3. Make sure you return a PlotlyAndDataResult with the plot and the Datagroup

    Here is the code for your reference:

    ...

    class Controller(vkt.Controller):
    parametrization = Parametrization

    @vkt.PlotlyAndDataView('Bar chart') # <--- Change the PlotlyView to PlotlyAndDataView
    def generate_plotly_view(self, params, **kwargs):

    # Extract and edit data to make it easy to plot
    data = extract_data()
    edited_data = data.groupby(params.main_column)[params.count_column].value_counts().unstack().fillna(0)

    # Make the bar chart
    fig = go.Figure()
    for column in edited_data.columns:
    fig.add_trace(go.Bar(x=edited_data.index, y=edited_data[column], name=column))

    # Edit the bar chart
    fig.update_layout(barmode='stack', xaxis_title=params.main_column, yaxis_title='Amount Sold')

    # Create a summary with the data # <--- add this line
    summary = vkt.DataGroup( # <--- add this line
    vkt.DataItem("Total Sold" , len(data)), # <--- add this line
    vkt.DataItem("Most Occurring", edited_data.sum().idxmax()), # <--- add this line
    vkt.DataItem("Least Occurring", edited_data.sum().idxmin()), # <--- add this line
    )

    return vkt.PlotlyAndDataResult(fig.to_json(), summary) # <--- Modify this line (PlotlyResult to PlotlyAndDataResult)

    Refresh your app. You can now see the data on the right of the chart, and you can hide it using the arrow so you can focus on the bar chart!

    Now select the 'Make' as main property and 'body-style' to analyse and try to find the best selling convertible and best selling car! You should find that the Alfa-Romero Convertible is the best selling Convertible and that the Toyota Hatchbacks are the overall best selling automobile!

    Can you also find the amount of cars with a diesel powered turbo engine? Which type of engine and fuel combination are your best seller?

    By now, your app should look like this:

    The app should now look like this

    Complete code

    Were you able to do everything in this tutorial without error? If not, you can always look at the full code:

    Complete code

    import viktor as vkt
    import pandas as pd
    import plotly.graph_objects as go
    from pathlib import Path


    def extract_data():
    parent_folder = Path(__file__).parent
    file_path = parent_folder / "car-sales.csv"
    data = pd.read_csv(file_path, sep=';')
    return data


    class Parametrization(vkt.Parametrization):
    introduction = vkt.Text(
    """
    # 📊 Data Analysis App!

    In this app you can summarise data and visualise it in a PlotlyView.
    The app will summarise some data in the DataView and allow the user to choose a column to analyse to create the Plotly chart.

    """
    )
    # Add some input field for the user

    main_column = vkt.OptionField('Choose main property', options=extract_data().columns.values.tolist())
    count_column = vkt.OptionField('Choose property to analyse', options=extract_data().columns.values.tolist())


    class Controller(vkt.Controller):
    parametrization = Parametrization

    @vkt.PlotlyAndDataView('Bar chart')
    def generate_plotly_view(self, params, **kwargs):

    # Extract and edit data to make it easy to plot
    data = extract_data()
    edited_data = data.groupby(params.main_column)[params.count_column].value_counts().unstack().fillna(0)

    # Make the bar chart
    fig = go.Figure()
    for column in edited_data.columns:
    fig.add_trace(go.Bar(x=edited_data.index, y=edited_data[column], name=column))

    # Edit the bar chart
    fig.update_layout(barmode='stack', xaxis_title=params.main_column, yaxis_title='Amount Sold')

    # Create a summary with the data
    summary = vkt.DataGroup(
    vkt.DataItem("Total Sold" , len(data)),
    vkt.DataItem("Most Occurring", edited_data.sum().idxmax()),
    vkt.DataItem("Least Occurring", edited_data.sum().idxmin()),
    )

    return vkt.PlotlyAndDataResult(fig.to_json(), summary)

    Want to learn how VIKTOR works?

    If you are interested in how VIKTOR works behind the scenes, for example how it processes your input, expand the tabs below!

    How does it work?

    How does the Parametrization work?

    In the Parameterization class you can add input fields that allow the user to provide input to your app, and there are more than 20 different input fields you can use, including numbers, text, colors, images and files.

    Inside the Parametrization class, you can also format the layout of your app by adding sections, tabs, steps and pages.

    To show your Parametrization in the app, we need to add the line parametrization = Parametrization inside the Controller class, because it is the controller that determines what is shown and not.

    How does the Parametrization get saved?

    So you may be wondering, how do you get the information from the parametrization to my controller? Well, we do this automatically for you. The values of all parameters are stored in a single variable called params , which is accessible inside the Controller class.

    These variables are stored in a Munch; this is similar to a dictionary, but work with point denotation.

    Example:

    • Let's say we have a variable called height as a NumberField in our Parameterization.
    • To use it in a method in the Controller, define it as: def my_method(self, params, **kwargs)
    • You can now make calculations inside that method using our height parameter as params.height!

    How does the Controller work?

    The Controller class is the place where you add everything you want to calculate and show.

    As explained in this tutorial, we show results in a View and we always add views in our controller. You can even add several views in a single app by adding them to the controller class... and yes, we have many Views,for showing graphs, maps, 3D models, reports, images and more.

    In the Controller, you also do or call your calculation. Remember that the user input given in the parametrization, is accessible inside the Controller class in the variable The params.

    What's next?

    Very Impressive! you have mastered the basics on how you can use VIKTOR for data analysis. In this tutorial, we have only scratched the surface of what you can do with data so don't stop there!

    If you like an extra challenge, here are some ideas you can add:

    • Try analysing another dataset's columns, perhaps by uploading a set using the Filefield
    • extend the DataGroup to have an average and standard deviation of the property
    • Try to add another chart such as a hierarchy or pie chart suing a second PlotlyView
    • Display several charts on a single view using subplots

    Or just follow some of our other tutorials

    More about plotting

    You can find more information about how to make plots in the plotting guide. Also check more about the possibilities of the DataView