Using The Connector

The connector is included with Pivotal GemFire. The connector’s JAR file will automatically be included on the classpath.

To use the connector, specify configuration details in gfsh commands or within a cache.xml file. Do not mix the use of gfsh for configuration with the use of a cache.xml file.

To do an explicit mapping of fields, or to map only a subset of the fields, specify all configuration in a cache.xml file.

Specification with gfsh

gfsh may be used to configure all aspects of transfer and the the mapping, as follows:

  • If domain objects are not on the classpath, configure PDX serialization with the GemFire configure pdx command after starting locators, but before starting servers. For example:

    gfsh>configure pdx --read-serialized=true \
  • After starting servers, use the GemFire create jndi-binding command to specify all aspects of the data source. For example,

    gfsh>create jndi-binding --name=datasource --type=SIMPLE \
      --jdbc-driver-class="org.postgresql.Driver" \
      --username="g2c_user" --password="changeme" \
  • After creating regions, set up the gpfdist protocol by using configure gpfdist-protocol. For example,

    gfsh>configure gpfdist-protocol --port=8000
  • Specify the mapping of the GPDB table to the GemFire region with the create gpdb-mapping command. For example,

    gfsh>create gpdb-mapping --region=/Child --data-source=datasource \
      --pdx-name="io.pivotal.gemfire.demo.entity.Child" --table=child --id=id,parent_id

Specification with a cache.xml File

To provide configuration details within a cache.xml file, specify the correct xsi:schemaLocation attribute within the cache.xml file.

For the 3.3.0 connector, use

Connector Requirements and Caveats

  • Export is supported from partitioned GemFire regions only. Data cannot be exported from replicated regions. Data can be imported to replicated regions.

  • The number of Pivotal Greenplum® Database (GPDB) segments must be greater than or equal to the number of Pivotal GemFire servers. If there is a high ratio of GPDB segments to GemFire servers, the GPDB configuration parameter gp_external_max_segs may be used to limit GPDB concurrency. See gp_external_max_segs for details on this parameter. An approach to finding the best setting begins with identifying a representative import operation.

    • Measure the performance of the representative import operation with the default setting.
    • Measure again with gp_external_max_segs set to half the total number of GPDB segments. If there is no gain in performance, then the parameter does not need to be adjusted.
    • Iterate with values of gp_external_max_segs that are half as much at each iteration, until there is no performance improvement or the value of gp_external_max_segs is the same as the number of GemFire servers.

Upgrading Java Applications from Version 2.4 to Version 3.x

API changes implemented for version 3.0.0 that are also in this connector version require code revisions in all applications that use import or export functionality.

For this sample version 2.4 export operation, an upsert type of operation was implied:

// Version 2.4 API
long numberExported = GpdbService.createOperation(region).exportRegion();

Here is the equivalent version 3.x code to implement the upsert type of operation:

// Version 3.x API
ExportConfiguration exportConfig = ExportConfiguration.builder(region)
ExportResult result = GpdbService.exportRegion(exportConfig);
int numberExported = result.getExportedCount();

For this sample version 2.4 import operation,

// Version 2.4 API
long numberImported = GpdbService.createOperation(region).importRegion();

here is the version 3.x code to implement the import operation:

// Version 3.x API
ImportConfiguration importConfig = ImportConfiguration.builder(region)
ImportResult result = GpdbService.importRegion(importConfig);
int numberImported = result.getImportedCount();

Please note that the new result objects’ counts are of type int instead of type long. This is for consistency, as the connector internally uses JDBC’s executeQuery(), which supports int.