Tutorial - Data Analysis with Plotly and Pandas
Level: Beginners
Time: 20 min
Prerequisites:
- You have an account and completed the installation process. No account? Get one here
- You have some experience with reading Python code
Introduction
In this tutorial we’ll make a VIKTOR app to process, visualize and summarize data from a .csv
.
We’ll do this by using two well-known Python libraries: Pandas and Plotly.
Visualizing data is one of the best ways to find trends and patterns. By plotting or grouping the data, we can quickly find these trends and make better business decisions! It is also an excellent way to present results to stakeholders like your manager or another business unit.
In this tutorial you will learn how to:
- Create an empty app template
- Extract the data from a .csv
- Add input fields
- Plot the data using Plotly
- Add a data summary
By the end of this tutorial you will be able to create a colorful chart that can be used for analysis and saved for reports to help explain data. You will understand how to automate the process in VIKTOR and it will look similar to the gif below:
Are you encountering an error? Take a look at the complete app code.
Also know that you can always ask for help at our Community Forum, where our developers are ready to help with any question related to the installation, coding and more.
1. Create, install and start an empty app
Let's create, install and start an empty app. This will be the starting point for the rest of the tutorial.
But before we start, make sure to shut down any app that is running (like the demo app) by closing the command-line shell
(for example Powershell) or end the process using Ctrl + C
.
Follow these steps to create, install and start an empty app:
- Go to the App store in your VIKTOR environment to create a new app. After clicking 'Create app' choose the option 'Create blank app' and enter a name and description of your choice. Submit the form by clicking 'Create and setup'.
- Select 'Editor' as app type and click 'Next'.
- Now follow the instructions to run the
quickstart
command to download the empty app template. After entering the command click 'I have run the command' to continue. The CLI will ask you to select your code editor of choice. Use the arrows and press enter to select a code editor. The app will now open in your code editor of choice.
If all went well, your empty app is installed and connected to your development workspace. The terminal in your code editor should show something like this:
INFO : Connecting to platform...
INFO : Connection is established: https://cloud.viktor.ai <---- here you can see your app
INFO : The connection can be closed using Ctrl+C
INFO : App is ready
-
We want to use Pandas and Plotly in this app. Open
requirements.txt
and addplotly
andpandas
under yourviktor
version (don't change that line):viktor==X.X.X <-- Don't modify this line
plotly
pandas
Now disconnect your app by closing the connection using Ctrl+C to be able to install these new dependencies. You do this by entering the following command:
viktor-cli install
And reconnecting the app by entering the command:
viktor-cli start
- You only need create an app template and install it once for each new app you want to make.
- The app will update automatically once you start adding code in
app.py
, as long as you don't close the terminal or your code editor. - Did you close your code editor? Use
viktor-cli start
to start the app again. No need to install, clear, etc.
Did you encounter any errors?
-
Always make sure to check the spelling of everything you placed in the command-line, a small mistake and the command you are trying to run may not be recognised!
-
If you are encountering:
ERROR:
Exiting because of an error: no requirements.txt file
PS C:\Users\<username>\viktor-apps>Then you are not in the correct folder! check the command-line and navigate to the data-analysis-tutorial folder.
-
If you are encountering:
Error: App definition is not compatible with the data currently stored in the database. Use the command 'viktor-cli clear' to clear the database.
PS C:\Users\<username>\viktor-apps\data-analysis-tutorial>That means you have not cleared the database yet! Use the
viktor-cli clear
to clear and then you can useviktor-cli start
to start the app. No need to install it again!
Not seeing any of these errors? Head over to our community! There is a good chance another developer encountered it and solved it too!
2. Extract data from .csv
For this tutorial, we will dive into a dataset that contains the car sales data of a showroom.
The CEO has asked you to create a tool so that they can compare which type of cars were sold. He did not specify what he wanted to compare so we will make this tool so that we compare any two properties of the sold cars as well as give some additional data/information that might be interesting.
-
Download the dataset here.
-
Make sure the file name is
car-sales.csv
-
Add the file to your app folder (
data-analysis-tutorial
) -
We'll create the function
extract_data
to access the data using Pandas, and add all the necessary imports so that we don't need to do this later. Openapp.py
and make sure the first part of the code looks like this:import viktor as vkt
import pandas as pd
import plotly.graph_objects as go
from pathlib import Path
def extract_data():
parent_folder = Path(__file__).parent
file_path = parent_folder/"car-sales.csv"
data = pd.read_csv(file_path, sep=';')
return data
class Parametrization(vkt.Parametrization): # <--- this line is for your reference
3. Add Input fields
We want the CEO to interact with the app. For this, we will add two OptionFields
.
One to determine the variable on the X-axis and the other for the Y-axis. Also, we will make sure that the showed options
correspond to the columns of the data set.
Remember, we always add input fields in the Parametrization
class.
-
In the
app.py
file make sure the code in theParametrization
class looks like this:...
class Parametrization(vkt.Parametrization):
introduction = vkt.Text(
"""
# 📊 Data Analysis App!
In this app you can summarise data and visualise it in a PlotlyView.
The app will summarise some data in the DataView and allow the user to choose a column to analyse to create the Plotly chart.
"""
)
# Add some input field for the user
main_column = vkt.OptionField('Choose main property', options=extract_data().columns.values.tolist())
count_column = vkt.OptionField('Choose property to analyse', options=extract_data().columns.values.tolist())
class Controller(vkt.Controller): # <--- this line is for your reference -
Go to your cloud environment (e.g. cloud.viktor.ai) and open the development card. Great! You will be able to see your input fields.
How the app should look after the parametrization
4. Plot the data
Now, we'll make a bar chart to show the data. We will create the plot using Plotly, which is a well known plotting library.
In VIKTOR, we show Plotly graphs in a PlotlyView
.
We'll use the function extract_data
to extract the data form the .csv, and Pandas to process it.
Remember, we allways add views in our Controller
class.
-
In
app.py
, find theController
class and make it look like this:class Controller(vkt.Controller):
parametrization = Parametrization
@vkt.PlotlyView('Bar chart')
def generate_plotly_view(self, params, **kwargs):
# Extract and edit data to make it easy to plot
data = extract_data()
edited_data = data.groupby(params.main_column)[params.count_column].value_counts().unstack().fillna(0)
# Make the bar chart
fig = go.Figure()
for column in edited_data.columns:
fig.add_trace(go.Bar(x=edited_data.index, y=edited_data[column], name=column))
# Edit the bar chart
fig.update_layout(barmode='stack',xaxis_title=params.main_column,yaxis_title='Amount Sold')
return vkt.PlotlyResult(fig.to_json()) -
Refresh your app. Nice! You should now see a beautiful stacked bar chart of the data after you fill in the inputs. What you will also notice is that when you hover the mouse over one of the bars, Plotly will tell you the exact amount of cars with those two properties!
The app is almost finished! It should look similar to this
5. Add a data summary
Fantastic, we have a fully functioning app 🎉. But that doesn't mean we cannot make it better.
Let's now add a summary with the most important data next to our graph. We'll show the total number of cars sold and the selected column's best-performing and least-performing property.
Thanks to Pandas, we can easily compute the totals per column and find the highest and lowest value with just a single line
per value. For this, we need to modify the generate_plotly_view
method.
Here are the required steps. The code is below.
-
Replace the
@PlotlyView
in your code for a@PlotlyAndDataView
, so we can also display data next to the plot. -
Make sure you return a
PlotlyAndDataResult
with the plot and theDatagroup
Here is the code for your reference:
...
class Controller(vkt.Controller):
parametrization = Parametrization
@vkt.PlotlyAndDataView('Bar chart') # <--- Change the PlotlyView to PlotlyAndDataView
def generate_plotly_view(self, params, **kwargs):
# Extract and edit data to make it easy to plot
data = extract_data()
edited_data = data.groupby(params.main_column)[params.count_column].value_counts().unstack().fillna(0)
# Make the bar chart
fig = go.Figure()
for column in edited_data.columns:
fig.add_trace(go.Bar(x=edited_data.index, y=edited_data[column], name=column))
# Edit the bar chart
fig.update_layout(barmode='stack', xaxis_title=params.main_column, yaxis_title='Amount Sold')
# Create a summary with the data # <--- add this line
summary = vkt.DataGroup( # <--- add this line
vkt.DataItem("Total Sold" , len(data)), # <--- add this line
vkt.DataItem("Most Occurring", edited_data.sum().idxmax()), # <--- add this line
vkt.DataItem("Least Occurring", edited_data.sum().idxmin()), # <--- add this line
)
return vkt.PlotlyAndDataResult(fig.to_json(), summary) # <--- Modify this line (PlotlyResult to PlotlyAndDataResult)
Refresh your app. You can now see the data on the right of the chart, and you can hide it using the arrow so you can focus on the bar chart!
Now select the 'Make' as main property and 'body-style' to analyse and try to find the best selling convertible and best selling car! You should find that the Alfa-Romero Convertible is the best selling Convertible and that the Toyota Hatchbacks are the overall best selling automobile!
Can you also find the amount of cars with a diesel powered turbo engine? Which type of engine and fuel combination are your best seller?
By now, your app should look like this:
The app should now look like this
Complete code
Were you able to do everything in this tutorial without error? If not, you can always look at the full code:
Complete code
import viktor as vkt
import pandas as pd
import plotly.graph_objects as go
from pathlib import Path
def extract_data():
parent_folder = Path(__file__).parent
file_path = parent_folder / "car-sales.csv"
data = pd.read_csv(file_path, sep=';')
return data
class Parametrization(vkt.Parametrization):
introduction = vkt.Text(
"""
# 📊 Data Analysis App!
In this app you can summarise data and visualise it in a PlotlyView.
The app will summarise some data in the DataView and allow the user to choose a column to analyse to create the Plotly chart.
"""
)
# Add some input field for the user
main_column = vkt.OptionField('Choose main property', options=extract_data().columns.values.tolist())
count_column = vkt.OptionField('Choose property to analyse', options=extract_data().columns.values.tolist())
class Controller(vkt.Controller):
parametrization = Parametrization
@vkt.PlotlyAndDataView('Bar chart')
def generate_plotly_view(self, params, **kwargs):
# Extract and edit data to make it easy to plot
data = extract_data()
edited_data = data.groupby(params.main_column)[params.count_column].value_counts().unstack().fillna(0)
# Make the bar chart
fig = go.Figure()
for column in edited_data.columns:
fig.add_trace(go.Bar(x=edited_data.index, y=edited_data[column], name=column))
# Edit the bar chart
fig.update_layout(barmode='stack', xaxis_title=params.main_column, yaxis_title='Amount Sold')
# Create a summary with the data
summary = vkt.DataGroup(
vkt.DataItem("Total Sold" , len(data)),
vkt.DataItem("Most Occurring", edited_data.sum().idxmax()),
vkt.DataItem("Least Occurring", edited_data.sum().idxmin()),
)
return vkt.PlotlyAndDataResult(fig.to_json(), summary)
Want to learn how VIKTOR works?
If you are interested in how VIKTOR works behind the scenes, for example how it processes your input, expand the tabs below!
How does it work?
How does the Parametrization work?
In the Parameterization class you can add input fields that allow the user to provide input to your app, and there are more than 20 different input fields you can use, including numbers, text, colors, images and files.
Inside the Parametrization class, you can also format the layout of your app by adding sections, tabs, steps and pages.
To show your Parametrization in the app, we need to add the line parametrization = Parametrization
inside the
Controller
class, because it is the controller that determines what is shown and not.
How does the Parametrization get saved?
So you may be wondering, how do you get the information from the parametrization to my controller?
Well, we do this automatically for you. The values of all parameters are stored in a single variable called params
, which is accessible inside the Controller class.
These variables are stored in a Munch
; this is similar to a dictionary, but work with point denotation.
Example:
- Let's say we have a variable called
height
as a NumberField in ourParameterization
. - To use it in a method in the
Controller
, define it as:def my_method(self, params, **kwargs)
- You can now make calculations inside that method using our height parameter as
params.height
!
How does the Controller work?
The Controller class is the place where you add everything you want to calculate and show.
As explained in this tutorial, we show results in a View
and we always add views in our controller.
You can even add several views in a single app by adding them to the controller class... and yes, we have
many Views,for showing graphs, maps, 3D models, reports, images and more.
In the Controller, you also do or call your calculation. Remember that the user input given in the parametrization,
is accessible inside the Controller class in the variable The params
.
What's next?
Very Impressive! you have mastered the basics on how you can use VIKTOR for data analysis. In this tutorial, we have only scratched the surface of what you can do with data so don't stop there!
If you like an extra challenge, here are some ideas you can add:
- Try analysing another dataset's columns, perhaps by uploading a set using the
Filefield
- extend the
DataGroup
to have an average and standard deviation of the property - Try to add another chart such as a hierarchy or pie chart suing a second
PlotlyView
- Display several charts on a single view using subplots
Or just follow some of our other tutorials
You can find more information about how to make plots in the plotting guide. Also check more about the possibilities of the DataView