Logging Apache Geode PartitionedRegion Primary and Secondary Bucket Locations

3 min readSep 12, 2021

Introduction

An Apache Geode PartitionedRegion partitions its entries into buckets among all the servers where it is defined. Properties that affect the number and location of the buckets include total-num-buckets and redundant-copies. The total-num-buckets configures the number of buckets across all the members of the DistributedSystem. The redundant-copies configures the number of copies of each bucket. The primary bucket is hosted on one server, and if redundant-copies is greater than zero, the secondary buckets are hosted on other servers.

In addition, the redundancy-zone property helps determine where buckets are located. If two redundancy zones are defined and redundant-copies is one (meaning 2 copies of each bucket), then the primary bucket will be in a member in one zone, and the secondary bucket will be in a member in the other zone.

This article is a companion to my Logging Apache Geode PartitionedRegion Entry Details Per Bucket article. It provides an example of a compact view of the primary and secondary bucket locations per server and redundancy zone.

Implementation

All source code described in this article as well as an example usage is available here.

The GetBucketIdsFunction is executed on each server. It:

gets the PartitionedRegion for the input region name
gets the member’s redundancy zone
gets the configured number of buckets for the PartitionedRegion
gets the list of local bucket ids for the PartitionedRegion
gets the list of local primary bucket ids for the PartitionedRegion
creates and returns a ServerBucketIds object containing these values

The GetBucketIdsResultCollector created on the client combines each ServerBucketIds object into an AllBucketIds object.

The AllBucketIds object contains:

all bucket ids per server
primary bucket ids per server
all bucket ids per redundancy zone
primary bucket ids per redundancy zone
total number of bucket ids
total number of primary bucket ids
missing bucket ids per redundancy zone
extra bucket ids per redundancy zone

Execute the GetBucketIdsFunction

The GetBucketIdsFunction execute method first gets the PartitionedRegion. The PartitionedRegion provides the configured number of buckets. Its PartitionedRegionDataStore provides the local bucket ids and the local primary bucket ids. The redundancy zone is retrieved from the DistributionConfig. Finally, the Function creates and returns the ServerBucketIds object.

Process the ServerBucketIds Result

The GetBucketIdsResultCollector addResult method is called on the client when the ServerBucketIds result from each server is received. The method calls AllBucketIds process to process the ServerBucketIds object like:

The AllBucketIds process method sets the region name, configured number of buckets and the redundancy zones per server. In addition, it updates the bucket ids per server and redundancy zone.

The AllBucketIds updateBucketsPerServer method:

sorts each server’s bucket ids and primary bucket ids
updates all bucket ids per server and primary bucket ids per server
increments the total number of bucket and primary ids

The AllBucketIds updateBucketsPerRedundancyZone method:

gets the server’s redundancy zone
adds the server’s bucket ids to the redundancy zone bucket ids
adds the server’s primary bucket ids to the redundancy zone primary bucket ids
sorts the redundancy zone bucket ids and primary bucket ids

Display the Results

The AllBucketIds getDisplayString method builds the message containing the primary and secondary bucket locations per server and redundancy zone like:

Client Logging Output

Executing the GetBucketIdsFunction will cause the client to log a message like this showing the primary and secondary bucket locations per server and redundancy zone:

Future

A gfsh command and Function that provides PartitionedRegion primary and secondary bucket locations per server and redundancy zone like this example would be a useful addition to Apache Geode.