Skip to main content

Storing results

Persistent storage

VIKTOR offers a Storage which can be used to store and retrieve files within an app workspace. The storage is persistent, meaning that the data will remain available with no time limit. This can be very helpful in cases where (intermediate) results need to be shared between jobs. For example, a long-running task is performed in job 'A', of which the results can be accessed in job 'B' without the need to rerun the task.

Supported operations

The storage can be accessed from an app with the Storage methods as shown below:

from viktor.core import Storage

# Setting data on a key
Storage().set('data_key_1', data=File.from_data('abc'), scope='entity')
Storage().set('data_key_2', data=File.from_data('def'), scope='entity')

# Retrieve the data by key
Storage().get('data_key_1', scope='entity')

# List available data keys (by prefix)
Storage().list(scope='entity') # lists all files in current entity scope
Storage().list(prefix='data_key_', scope='entity') # lists 'data_key_1', 'data_key_2', ... etc.

# Delete data by key
Storage().delete('data_key_1', scope='entity')

Storage scopes

With the scope argument the 'accessibility' of the stored data can be set. The following scopes are available:

  • entity: when data needs to be accessed within a specific entity
  • workspace: when data needs to be accessed workspace-wide

scope 'entity'

The entity scope means that the data is assigned to a specific entity. A common use-case for this scope is to store results of a long-running task (analysis, file parsing, etc.) and retrieve it without the need to rerun the task.

For example, data is stored in entity A1 on key entity_data. VIKTOR stores the data in a zone in the storage designated for entity A1:

Storage().set('entity_data', data=File.from_data('content set by A1'), scope='entity')

This data can then be retrieved in entity A1 using the get operation:

Storage().get('entity_data', scope='entity')  # File content: 'content set by A1'

When we would try to perform this get operation in entity A2, a FileNotFound error will be raised because the file does not (yet) exist in the storage zone designated for entity A2. This also means that we can re-use the same key to set data, without overwriting the data stored in entity A1:

Storage().set('entity_data', data=File.from_data('content set by A2'), scope='entity')

It is also possible to store/retrieve data from one entity in another (even with entities of different type). This cross-entity referencing can be achieved by passing the relevant Entity object as entity argument:

entity = ...  # retrieve entity A1
Storage().get('entity_data', scope='entity', entity=entity) # File content: 'content set by A1'
tip

See this guide on how to navigate to the correct entity using the API.

scope 'workspace'

The workspace scope means that the data is accessible workspace-wide. All entities of all types will point towards the same section in the storage with this scope. This scope can be seen as an extension of the memoize functionality, with the difference that the stored results are permanent.

In entity A1 the following data is stored on key workspace_data using the workspace scope:

Storage().set('workspace_data', data=File.from_data('content set by A1'), scope='workspace')

Storing data in entity A2 on the same key using the workspace scope overwrites the previously stored data:

Storage().set('workspace_data', data=File.from_data('content set by A2'), scope='workspace')

When we retrieve this key in either entity A1, entity A2, or even an entity from TypeB, the returned file content will be the data which is stored last (in this case 'content set by A2').

Memoize

To store (long-running) results temporarily you can also make use of the memoize function. A practical use-case of memoize is when a function call has input and output that is relatively small compared to the time required for its evaluation.

In below example, a DataView performs a long-running calculation when calling func. When the user changes input in the editor and updates the view again, func will only be evaluated again if either one of param_a, param_b, or param_c is changed in-between jobs:

@memoize
def func(*, param_a, param_b, param_c):
# perform lengthy calculation
return result

class Controller(ViktorController):
...

@DataView("Results", duration_guess=30)
def get_data_view(self, params, **kwargs):
...
result = func(param_a=..., param_b=..., param_c=...)
...

return DataResult(...)
caution

When using memoize on your development environment the cache is stored locally. The local storage is limited to 10 function calls for practical reasons. If the limit is exceeded, cached results are cleared based on the first-in-first-out principle. In production the storage is not limited to 10 function calls.