Storage for data persistence
VIKTOR offers a
Storage which can be used to store and retrieve files within an app
workspace. The storage is persistent, meaning that the data will remain available with no time limit. This can be very
helpful in cases where (intermediate) results need to be shared between jobs. For example, a long-running task is
performed in job 'A', of which the results can be accessed in job 'B' without the need to rerun the task.
The storage can be accessed from an app with the
Storage methods as shown below:
from viktor.core import Storage
# Setting data on a key
Storage().set('data_key_1', data=File.from_data('abc'), scope='entity')
Storage().set('data_key_2', data=File.from_data('def'), scope='entity')
# Retrieve the data by key
# List available data keys (by prefix)
Storage().list(scope='entity') # lists all files in current entity scope
Storage().list(prefix='data_key_', scope='entity') # lists 'data_key_1', 'data_key_2', ... etc.
# Delete data by key
scope argument the 'accessibility' of the stored data can be set. The following scopes are available:
- entity: when data needs to be accessed within a specific entity
- workspace: when data needs to be accessed workspace-wide
The entity scope means that the data is assigned to a specific entity. A common use-case for this scope is to store results of a long-running task (analysis, file parsing, etc.) and retrieve it without the need to rerun the task.
For example, data is stored in entity A1 on key
entity_data. VIKTOR stores the data in a zone in the storage
designated for entity A1:
Storage().set('entity_data', data=File.from_data('content set by A1'), scope='entity')
This data can then be retrieved in entity A1 using the
Storage().get('entity_data', scope='entity') # File content: 'content set by A1'
When we would try to perform this
get operation in entity A2, a
FileNotFound error will be raised because the
file does not (yet) exist in the storage zone designated for entity A2. This also means that we can re-use the same
key to set data, without overwriting the data stored in entity A1:
Storage().set('entity_data', data=File.from_data('content set by A2'), scope='entity')
It is also possible to store/retrieve data from one entity in another (even with entities of different type).
This cross-entity referencing can be achieved by passing the relevant
Entity object as
entity = ... # retrieve entity A1
Storage().get('entity_data', scope='entity', entity=entity) # File content: 'content set by A1'
See this guide on how to navigate to the correct entity using the API.
The workspace scope means that the data is accessible workspace-wide. All entities of all types will
point towards the same section in the storage with this scope. This scope can be seen as an extension of the
memoize functionality, with the difference that the stored results are permanent.
In entity A1 the following data is stored on key
workspace_data using the workspace scope:
Storage().set('workspace_data', data=File.from_data('content set by A1'), scope='workspace')
Storing data in entity A2 on the same key using the workspace scope overwrites the previously stored data:
Storage().set('workspace_data', data=File.from_data('content set by A2'), scope='workspace')
When we retrieve this key in either entity A1, entity A2, or even an entity from TypeB, the returned file content will be the data which is stored last (in this case 'content set by A2').
To store (long-running) results temporarily you can also make use of the
A practical use-case of
memoize is when a function call has input and output that is relatively small compared to the
time required for its evaluation.
In below example, a
DataView performs a long-running calculation when calling
func. When the user changes input in
the editor and updates the view again,
func will only be evaluated again if either one of
param_c is changed in-between jobs:
def func(*, param_a, param_b, param_c):
# perform lengthy calculation
def get_data_view(self, params, **kwargs):
result = func(param_a=..., param_b=..., param_c=...)
memoize on your development environment the cache is stored locally. The local storage is limited to
10 function calls for practical reasons. If the limit is exceeded, cached results are cleared based on the
first-in-first-out principle. In production the storage is not limited to 10 function calls.