gooddata_pandas.dataframe.DataFrameFactory

class gooddata_pandas.dataframe.DataFrameFactory(sdk: GoodDataSdk, workspace_id: str)

Bases: object

Factory to create pandas.DataFrame instances.

There are several methods in place that should provide for convenient construction of data frames:

  • indexed() - calculate measure values sliced by one or more labels, indexed by those labels

  • not_indexed() - calculate measure values sliced by one or more labels, but not indexed by those labels, label values will be part of the DataFrame and will be in the same row as the measure values calculated for them

  • for_items() - calculate measure values for a one or more items which may be labels or measures. Depending what items you specify, this method will create DataFrame with or without index

  • for_insight() - calculate DataFrame for insight created by GoodData.CN Analytical Designer. Depending on what items are in the insight, this method will create DataFrame with or without index.

Note that all of these methods have additional levels of convenience and flexibility so their purpose is not limited to just what is listed above.

__init__(sdk: GoodDataSdk, workspace_id: str) None

Methods

__init__(sdk, workspace_id)

for_exec_def(exec_def[, label_overrides, ...])

Creates a data frame using an execution definition.

for_exec_result_id(result_id[, ...])

Creates a data frame using an execution result's metadata identified by result_id.

for_insight(insight_id[, auto_index])

Creates a data frame with columns based on the content of the insight with the provided identifier.

for_items(items[, filter_by, auto_index])

Creates a data frame for a named items.

indexed(index_by, columns[, filter_by])

Creates a data frame indexed by values of the label.

not_indexed(columns[, filter_by])

Creates a data frame with columns created from metrics and or labels.

result_cache_metadata_for_exec_result_id(...)

Retrieves result cache metadata for given :result_id: :param result_id: ID of execution result to retrieve the metadata for :return: corresponding result cache metadata

for_exec_def(exec_def: ExecutionDefinition, label_overrides: Optional[Dict[str, Dict[str, Dict[str, str]]]] = None, result_size_dimensions_limits: Tuple[Optional[int], ...] = (), result_size_bytes_limit: Optional[int] = None) Tuple[DataFrame, DataFrameMetadata]

Creates a data frame using an execution definition. The data frame will respect the dimensionality specified in execution definition’s result spec.

Each dimension may be sliced by multiple labels. The factory will create MultiIndex for the dataframe’s row index and the columns.

Example of label_overrides structure:

{
    "labels": {
        "local_attribute_id": {
            "title": "My new attribute label"
        ,...
    },
    "metrics": {
        "local_metric_id": {
            "title": "My new metric label"
        },...
    }
}
Parameters:
  • exec_def – execution definition

  • label_overrides – label overrides for metrics and attributes

  • result_size_dimensions_limits – A tuple containing maximum size of result dimensions. Optional.

  • result_size_bytes_limits – Maximum size of result in bytes. Optional.

Returns:

tuple holding DataFrame and DataFrame metadata

for_exec_result_id(result_id: str, label_overrides: Optional[Dict[str, Dict[str, Dict[str, str]]]] = None, result_cache_metadata: Optional[ResultCacheMetadata] = None, result_size_dimensions_limits: Tuple[Optional[int], ...] = (), result_size_bytes_limit: Optional[int] = None, use_local_ids_in_headers: bool = False) Tuple[DataFrame, DataFrameMetadata]

Creates a data frame using an execution result’s metadata identified by result_id. The data frame will respect the dimensionality specified in execution definition’s result spec.

Each dimension may be sliced by multiple labels. The factory will create MultiIndex for the dataframe’s row index and the columns.

Example of label_overrides structure:

{
    "labels": {
        "local_attribute_id": {
            "title": "My new attribute label"
        ,...
    },
    "metrics": {
        "local_metric_id": {
            "title": "My new metric label"
        },...
    }
}
Parameters:
  • result_id – executionResult ID from ExecutionResponse

  • label_overrides – label overrides for metrics and attributes

  • result_cache_metadata – Metadata for the corresponding exec result. Optional.

  • result_size_dimensions_limits – A tuple containing maximum size of result dimensions. Optional.

  • result_size_bytes_limit – Maximum size of result in bytes. Optional.

  • use_local_ids_in_headers – Use local identifiers of header attributes and metrics. Optional.

Returns:

tuple holding DataFrame and DataFrame metadata

for_insight(insight_id: str, auto_index: bool = True) DataFrame

Creates a data frame with columns based on the content of the insight with the provided identifier. The filters that are set on the insight will be applied and used for the server-side computation of the data for the data frame.

This method will create DataFrame with or without index - depending on the contents of the insight. The rules are as follows:

  • if the insight contains both attributes and measures, it will be mapped to a DataFrame with index

    • if there are multiple attributes, hieararchical index (pandas.MultiIndex) will be used

    • otherwise a normal index will be used (pandas.Index)

    • you can use the option ‘auto_index’ argument to disable this logic and force no indexing

  • if the insight contains either only attributes or only measures, then DataFrame will not be indexed and all attribute or measures values will be used as data.

Note that if the insight consists of single measure only, the resulting data frame is guaranteed to have single ‘row’ of data with one column per measure.

Parameters:
  • insight_id – insight identifier

  • auto_index – optionally force creation of DataFrame without index even if the data in the insight is eligible for indexing

Returns:

pandas dataframe instance

for_items(items: ColumnsDef, filter_by: Optional[Union[Filter, list[Filter]]] = None, auto_index: bool = True) pandas.DataFrame

Creates a data frame for a named items. This is a convenience method that will create DataFrame with or without index based on the context of the items that you pass.

  • If items contain labels and measures, then DataFrame with index will be created. If there is more than one label among the items, then hierarchical index will be created.

    You can turn this behavior using ‘auto_index’ parameter.

  • Otherwise DataFrame without index will be created and will contain column per item.

You may also optionally specify filters to apply during the computation on the server.

Parameters:
  • items

    dict mapping item name to its definition; item may be specified as:

    • object identifier: ObjId(id='some_id', type='<type>') - where type is either label, fact or metric

    • string representation of object identifier: <type>/some_id - where type is either label, fact or metric

    • Attribute object used in the compute model: Attribute(local_id=..., label='some_label_id')

    • subclass of Measure object used in the compute model: SimpleMeasure, PopDateMeasure, PopDatasetMeasure, ArithmeticMeasure

  • filter_by

    optionally specify filters to apply during computation on the server, reference to filtering column can be one of:

    • string reference to item key

    • object identifier in string form

    • object identifier: ObjId(id='some_label_id', type='<type>')

    • Attribute or Metric depending on type of filter

Returns:

pandas dataframe instance

indexed(index_by: IndexDef, columns: ColumnsDef, filter_by: Optional[Union[Filter, list[Filter]]] = None) pandas.DataFrame

Creates a data frame indexed by values of the label. The data frame columns will be created from either metrics or other label values.

The computation to obtain data from GoodData.CN workspace will use all labels that you specify for both indexing and in columns to aggregate values of metric columns.

Note that depending on composition of the labels, the DataFrame’s index may or may not be unique.

Parameters:
  • index_by

    one or more labels to index by; specify either:

    • string with reference to columns key - only attribute can be referenced

    • string with id: some_label_id,

    • string representation of object identifier: label/some_label_id

    • object identifier: ObjId(id='some_label_id', type='label'),

    • or an Attribute object used in the compute model: Attribute(local_id=..., label='some_label_id'),

    • dict containing mapping of index name to label to use for indexing - specified in one of the ways list above

  • columns

    dict mapping column name to its definition; column may be specified as:

    • object identifier: ObjId(id='some_id', type='<type>') - where type is either label, fact or metric

    • string representation of object identifier: <type>/some_id - where type is either label, fact or metric

    • Attribute object used in the compute model: Attribute(local_id=..., label='some_label_id')

    • subclass of Measure object used in the compute model: SimpleMeasure, PopDateMeasure, PopDatasetMeasure, ArithmeticMeasure

  • filter_by

    optional filters to apply during computation on the server, reference to filtering column can be one of:

    • string reference to column key or index key

    • object identifier in string form

    • object identifier: ObjId(id='some_label_id', type='<type>')

    • Attribute or Metric depending on type of filter

Returns:

pandas dataframe instance

not_indexed(columns: ColumnsDef, filter_by: Optional[Union[Filter, list[Filter]]] = None) pandas.DataFrame

Creates a data frame with columns created from metrics and or labels.

The computation to obtain data from GoodData.CN workspace will use all labels that you specify for both columns to aggregate values of metric columns.

Parameters:
  • columns

    dict mapping column name to its definition; column may be specified as:

    • object identifier: ObjId(id='some_id', type='<type>') - where type is either label, fact or metric

    • string representation of object identifier: <type>/some_id - where type is either label, fact or metric

    • Attribute object used in the compute model: Attribute(local_id=..., label='some_label_id')

    • subclass of Measure object used in the compute model: SimpleMeasure, PopDateMeasure, PopDatasetMeasure, ArithmeticMeasure

  • filter_by

    optionally specify filters to apply during computation on the server, reference to filtering column can be one of:

    • string reference to column key

    • object identifier in string form

    • object identifier: ObjId(id='some_label_id', type='<type>')

    • Attribute or Metric depending on type of filter

Returns:

pandas dataframe instance

result_cache_metadata_for_exec_result_id(result_id: str) ResultCacheMetadata

Retrieves result cache metadata for given :result_id: :param result_id: ID of execution result to retrieve the metadata for :return: corresponding result cache metadata