Logging Apache Geode PartitionedRegion Entry Details Per Bucket

Barry Oglesby
4 min readSep 15, 2020

--

Introduction

An Apache Geode PartitionedRegion is a Region that partitions its entries among all the servers that define it. The entries are stored in BucketRegions. Properties that affect the number and location of the BucketRegions include the total number of buckets and the number of redundant copies. The primary BucketRegion is hosted on one server, and if the number of redundant copies is greater than zero, the redundant BucketRegions are hosted on other servers. In addition, if eviction with overflow is configured, entry values are evicted to disk once the JVM’s used heap memory reaches a configured percentage of maximum.

Bucket and entry details available for the entire PartitionedRegion per member or across the DistributedSystem include:

  • the number of buckets
  • the number of primary buckets
  • the number of bucket entries
  • the number of bucket bytes
  • the number of bucket entries in memory
  • the number of bucket entries on disk
  • the number of bucket bytes in memory
  • the number of bucket bytes on disk

This article shows how to get similar details for each BucketRegion in the PartitionedRegion.

Accessing Entry Details for the PartitionedRegion

Currently, the details listed above are available for the PartitionedRegion via either Statistics or gfsh.

Via Statistics

Bucket and entry details are provided by PartitionedRegionStats and DiskRegionStatistics.

A PartitionedRegionStats object is defined for each PartitionedRegion and includes:

  • bucketCount — the number of buckets defined in the member
  • primaryBucketCount — the number of primary buckets defined in the member
  • dataStoreEntryCount — the number of entries in the member
  • dataStoreBytesInUse — the number of bytes in memory in the member

A DiskRegionStatistics object is defined for each persistent or overflowed PartitionedRegion and includes:

  • entriesInVM — the number of entries in memory
  • entriesOnlyOnDisk — the number of entries on disk
  • bytesOnlyOnDisk — the size of the entries on disk

This vsd chart shows the PartitionedRegionStats bucketCount (selected) and primaryBucketCount:

This vsd chart shows PartitionedRegionStats dataStoreBytesInUse (selected) and dataStoreEntryCount:

This vsd chart shows the DiskRegionStatistics bytesOnlyOnDisk (selected), entriesInVM and entriesOnlyOnDisk:

Via gfsh

The gfsh show metrics command lists PartitionedRegion bucket count and entry size per member or across the DistributedSystem.

Per member:

gfsh>show metrics --region=/Trade --member=server-1Category  |            Metric            | Value
--------- | ---------------------------- | ------
partition | bucketCount | 75
| primaryBucketCount | 37
| totalBucketSize | 331869
diskstore | totalEntriesOnlyOnDisk | 134643
gfsh>show metrics --region=/Trade --member=server-2Category | Metric | Value
--------- | ---------------------------- | ------
partition | bucketCount | 75
| primaryBucketCount | 38
| totalBucketSize | 331862
diskstore | totalEntriesOnlyOnDisk | 138806
gfsh>show metrics --region=/Trade --member=server-3Category | Metric | Value
--------- | ---------------------------- | ------
partition | bucketCount | 76
| primaryBucketCount | 38
| totalBucketSize | 336269
diskstore | totalEntriesOnlyOnDisk | 141941

Across the Distributed System:

gfsh>show metrics --region=/TradeCategory  |            Metric            | Value
--------- | ---------------------------- | -------
partition | bucketCount | 226
| primaryBucketCount | 113
| totalBucketSize | 1000000
| averageBucketSize | 1474
diskstore | totalEntriesOnlyOnDisk | 415390

Implementation

All source code described in this article as well as an example usage is available here.

In order to get the bucket and entry details per BucketRegion, it is necessary to iterate the BucketRegions in each member. The LogPartitionedRegionBucketDetailsFunction and its supporting objects do that.

The LogPartitionedRegionBucketDetailsFunction:

  • gets the PartitionedRegion
  • creates and logs a PartitionedRegionDetails

The PartitionedRegionDetails keeps track of the total number of:

  • primary and redundant buckets
  • bytes in memory and on disk
  • entries in memory and on disk

It iterates the BucketRegions and for each:

  • creates a BucketRegionDetails
  • adds the BucketRegionDetails to the appropriate list (primary or redundant)
  • increments the appropriate entry totals

Each BucketRegionDetails encapsulates:

  • the number of bucket entries
  • the number of bucket entries in memory
  • the number of bucket entries on disk
  • the number of bucket bytes
  • the number of bucket bytes in memory
  • the number of bucket bytes on disk

Execute LogPartitionedRegionBucketDetailsFunction

The LogPartitionedRegionBucketDetailsFunction execute method gets the PartitionedRegion and creates and logs the PartitionedRegionDetails like:

Initialize PartitionedRegionDetails

The PartitionedRegionDetails is initialized by sorting the BucketRegions by bucket id, and, for each, updating the appropriate totals like:

The initializeTotalDetails, initializeTotalPrimaryDetails and initializeTotalRedundantDetails methods increment the appropriate entry and bytes totals.

Initialize BucketRegionDetails

The BucketRegionDetails is initialized from various fields of the BucketRegion like:

Server Logging Output

Executing the LogPartitionedRegionBucketDetailsFunction will cause each server to log a message like this for the total, primary and redundant buckets:

Future

A gfsh command and API that provides these partitioned Region bucket details would be a useful addition to Apache Geode.

--

--