Index

template<typename type_defs>
class Index : public sps::AbstractIndex

The main sparse prefix sum index class.

Template Parameters:

type_defs – An instance of TypeDefs, that defines all compiletime parameters

Public Functions

inline Index(std::string sPrefix = "", bool bWrite = false)

Construct a new Index object.

Parameters:
  • sPrefix – Prefix path of the index on the filesystem (multiple files with different endings will be created), defaults to “”.

  • bWrite – Open the index in write mode (if this is set to False no changes can be made to the index), defaults to false.

inline void clear()

Clear the complete index.

Clears all datasets and all points.

inline void clearCorners()

Clear the complete index.

Clears all datasets and all points.

inline void clearKeepPoints()

Clear everything but the points.

template<bool trigger = !IS_ORTHOTOPE>
inline std::enable_if_t<trigger> addPoint(ret_pos_t vPos, val_t uiVal = 1)

Append a point to the data structure.

The point will not be queryable until generate is called.

Template Parameters:

trigger – This function is only active if IS_ORTHOTOPE = false

Parameters:
  • vPos – The position of the point.

  • uiVal – The value of the point, defaults to 1.

template<bool trigger = IS_ORTHOTOPE>
inline std::enable_if_t<trigger> addPoint(ret_pos_t vStart, ret_pos_t vEnd, val_t uiVal = 1)

Append an orthotope to the data structure.

The orthotope will not be queryable until generate is called.

Template Parameters:

trigger – This function is only active if IS_ORTHOTOPE = true

Parameters:
  • vStart – The bottom left position of the orthotope.

  • vEnd – The top right position of the orthotope.

  • uiVal – The value of the orthotope, defaults to 1.

inline class_key_t generate(double fFac = -1, size_t uiVerbosity = 1)

Generate a new dataset.

This may take a long time to compute.

This function is multithreaded.

Parameters:
  • fFac – Overlay size factor, defaults to -1. -1: factor is picked so that the estimated size of the datastructure is minimal

  • uiVerbosity – Degree of verbosity while creating the dataset, defaults to 1.

Returns:

class_key_t The id of the generated dataset.

inline val_t countSizeLimited(class_key_t xDatasetId, pos_t vFrom, pos_t vTo, isect_arr_t vInterTypes, size_t = 0) const

Count the number of points between from and to in the given dataset.

As opposed to count, this function allows specifying the start and end positions for all dimensions in the datastructure. This is only relevant for indices with orthotope dimensions.

Parameters:
  • xDatasetId – The id of the dataset to query

  • vFrom – The bottom left position of the query region.

  • vTo – The top right position of the query region.

  • vInterTypes – The used intersection types, defaults to enclosed.

Returns:

val_t The number of points in dataset_id between from_pos and to_pos.

inline val_t count(class_key_t xDatasetId, ret_pos_t vFromR, ret_pos_t vToR, const isect_arr_t &vInterTypes, bool_arr_t vNoPoints = false, size_t uiVerbosity = 0) const

Count the number of points between from and to and in the given dataset.

to_pos must be larger equal than from_pos in each dimension.

Parameters:
  • xDatasetId – The id of the dataset to query

  • vFromR – The bottom left position of the query region.

  • vToR – The top right position of the query region.

  • vInterTypes – The used intersection types, defaults to enclosed. Ignored if there are no orthotope dimensions. Accepts one argument per dimension.

  • vNoPoints – If true, no points only hyper-rectangles are counted. Accepts one argument per dimension.

  • uiVerbosity – Degree of verbosity while counting, defaults to 0.

Returns:

val_t The number of points in dataset_id between from_pos and to_pos.

inline std::vector<val_t> gridCount(class_key_t xDatasetId, std::array<std::vector<coordinate_t>, D - ORTHOTOPE_DIMS> vGrid, const isect_arr_t &vInterTypes, const bool_arr_t &vNoPoints, size_t uiVerbosity = 0) const

Count the number of points between from and to and in the given dataset.

to_pos must be larger equal than from_pos in each dimension.

Parameters:
  • xDatasetId – The id of the dataset to query

  • vGrid – The coordinates of the grid-lines for each dimensions.

  • vInterTypes – The used intersection types, defaults to enclosed. Ignored if there are no orthotope dimensions.

  • vNoPoints – If true, no points only hyper-rectangles are counted. Accepts one argument per dimension.

  • uiVerbosity – Degree of verbosity while counting, defaults to 0.

Returns:

val_t the values of the grid cells.

inline std::vector<val_t> countMultiple(std::vector<std::tuple<class_key_t, ret_pos_t, ret_pos_t>> vRegions, IntersectionType xInterType = IntersectionType::enclosed, size_t uiVerbosity = 0) const

Count the number of points between from and to and in the given dataset.

Counts for multiple regions. to_pos must be larger equal than from_pos in each dimension.

Parameters:
  • vRegions – Id of the dataset to query, The bottom left and top right positions of the queried regions.

  • xInterType – The used intersection type, defaults to enclosed. Ignored if there are no orthotope dimensions.

  • uiVerbosity – Degree of verbosity while counting, defaults to 0.

Returns:

std::vector<val_t> The number of points in dataset_id between from_pos and to_pos for each given region.

inline coordinate_t getNumInternalPrefixSums(class_key_t xDatasetId) const

Count the number of internal prefix sums stored in the dataset with id dataset_id.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t number of internal prefix sums

inline coordinate_t getNumOverlayPrefixSums(class_key_t xDatasetId) const

Count the number of overlay prefix sums stored in the dataset with id dataset_id.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t number of overlay prefix sums

inline coordinate_t getNumPrefixSums(class_key_t xDatasetId) const

Count the number of prefix sums stored in the dataset with id dataset_id.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t number of prefix sums

inline coordinate_t totalNumPrefixSums() const

Count the total number of prefix sums in the entire index.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t total number of prefix sums

inline coordinate_t getNumChangingPrefixSums() const

Count the number of prefix sums that are different than their immediate predecessor entries in all dimensions.

Returns:

coordinate_t number of changing prefix sums

inline coordinate_t getNumInternalSparseCoords(class_key_t xDatasetId) const

Count the number of internal sparse coordinates stored in the dataset with id dataset_id.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t number of internal sparse coordinates

inline coordinate_t getNumOverlaySparseCoords(class_key_t xDatasetId) const

Count the number of overlay sparse coordinates stored in the dataset with id dataset_id.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t number of overlay sparse coordinates

inline coordinate_t getNumOverlays(class_key_t xDatasetId) const

Count the number of overlay blocks.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

coordinate_t number of overlay blocks

inline coordinate_t getSize(class_key_t xDatasetId) const

Returns the size of the dataset in bytes.

Parameters:

xDatasetId – id of the queried dataset

Returns:

coordinate_t size in bytes

inline std::vector<std::tuple<uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, uint64_t>> estimateDataStructureElements(std::vector<double> vFac)

Predict the number of data structure elements stored in a dataset.

Here f is proportional to the number of boxes used in the data structure. For any dataset, there is an optimal value for f that leads to the minimal data structure size. Since the data structure size is proportional to the time required to build the datastructure this also minimizes construction time.

The purpose of this function is to find the optimal value for f.

Uses a statistical approach to predict the number of elements. For details see the corresponding github page or our manuscript.

There are five values predicted:

  • The number of internal prefix sums

  • The number of overlay prefix sums

  • The number of internal sparse coordinates

  • The number of overlay sparse coordinates

  • The total size in bytes

Parameters:

vFac – list of factors that are proportional to the number of boxes within the data structure

Returns:

std:vector<std::tuple<uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, uint64_t>> The predicted number of dataset structure elements for each factor

inline pos_t gridSize()

Get the axis sizes of the inserted points.

Returns:

pos_t an array containing the axis size for each dimension.

inline uint64_t pickNumOverlays(size_t uiVerbosity = 0)

Predict the best f.

Here f is a factor proportional to the number of boxes used in the data structure. See estimate_num_elements for a detailed description.

Parameters:

uiVerbosity – Degree of verbosity while creating the dataset, defaults to 1.

Returns:

uint64_t The predicted best value for f

inline double toFactor(uint64_t uiNumOverlays)

Convert a given number of overlay blocks to the factor f.

Here f is a factor proportional to the number of boxes used in the data structure. See estimate_num_elements for a detailed description.

Parameters:

uiNumOverlays – number of overlay blocks

Returns:

double f

inline std::string str() const

Return a string describing the index.

Very slow for large datasets.

inline std::vector<typename dataset_t::OverlayInfo> getOverlayInfo(class_key_t xDatasetId) const

Return information about the overlay boxes.

Parameters:

xDatasetId – The id of the dataset to query

Returns:

std::vector<typename dataset_t::OverlayInfo> information about all overlays

inline std::vector<std::array<pos_t, 3>> getOverlayGrid(class_key_t xDatasetId) const

Returns the bottom-left-front-… and top-right-back-… position of all overlays.

using sps::TypeDefs::coordinate_t = _coordinate_t

individual coordinate position

using sps::TypeDefs::val_t = _val_t

prefix sum value

using sps::TypeDefs::pos_t = std::array<coordinate_t, D>

position of a point (with dependent dimensions; used internally)

using sps::TypeDefs::class_key_t = _class_key_t

key of a dataset

using sps::TypeDefs::ret_pos_t = std::array<coordinate_t, D - ORTHOTOPE_DIMS>

position of a point