LATEST VERSION: 3.3.0 - RELEASE NOTES
Pivotal GemFire®-Greenplum® Connector v3.3

Exporting Data from GemFire to Greenplum

An export copies entries from a Pivotal GemFire® region to a Pivotal Greenplum® (GPDB) table.

The export operation implements the functionality of one of these:

  • The UPSERT functionality updates a GPDB row if the GemFire entry to be exported is already present as a GPDB row. If the GPDB row does not already exist, a new row is inserted.

  • The INSERT_ALL functionality does a GPDB insert operation for each region entry in the GemFire region. Since it does not check for the existence of GPDB rows prior to each insert, the export operation will fail and throw an error if a duplicate primary key already exists in the GPDB table.

  • The INSERT_MISSING functionality inserts rows into the GPDB table for GemFire region entries that are not already present in the table. It does not update any existing rows in the GPDB table.

  • The UPDATE functionality updates a GPDB row if the GemFire entry to be exported is already present as a GPDB row.

The mapping of GPDB table rows to GemFire region entries is within the region element definition of the GemFire cache.xml file.

An export operation may be invoked using API calls or with the gfsh export command.

Export Using the API

Export region functionality is exposed through the GpdbService class. If not explicitly set in the ExportConfiguration, an export will implement UPSERT functionality.

This example does the default UPSERT type of export:

ExportConfiguration exportConfig = ExportConfiguration.builder(region)
   .setType(ExportType.UPSERT)
   .build();

ExportResult exportResult = GpdbService.exportRegion(exportConfig);

// Get the total number of GemFire entries exported to GPDB.
int exportedCount =  exportResult.getExportedCount();

// Get the total number of GPDB rows updated.
int updatedCount = exportResult.getUpdatedCount();

// Get the total number of GemFire entries inserted into the GPDB table.
int insertedCount = exportResult.getInsertedCount();

For any of the export types, all GemFire region entries present as an export operation is initiated may be removed from the region after a successful commit of the GPDB table. Specify this option in the export configuration. Note that all exported entries are removed, independent of which rows in the GPDB table have been updated or inserted. GemFire region entries added during the export operation are not exported and will not be removed from the GemFire region at the end of the export operation.

This example does an INSERT_ALL type of export, removing all entries from the GemFire region after a successful commit of the GPDB table:

ExportConfiguration exportConfig = ExportConfiguration.builder(region)
   .setType(ExportType.INSERT_ALL)
   .removeGemFireEntries(true)
   .build();

ExportResult exportResult = GpdbService.exportRegion(exportConfig);

// Get the total number of GemFire entries removed.
int removedCount = exportResult.getRemovedCount();

The OperationEventListener interface provides further flexibility by allowing a set of callbacks to be defined. The callbacks are invoked at various points during an export to a GPDB table.

Export Using gfsh

See the command reference page for details on this use of the gfsh export command.

Requirements and Caveats

  • Export is supported from partitioned GemFire regions only. Data cannot be exported from replicated regions.

  • At least one GemFire region field must be mapped to a GPDB table column for the export to work. Fields are mapped with the <gpdb:field> element in the cache.xml file. With no mapped fields, an exception is thrown for an export operation.

  • The idempotent behavior of the export operation breaks for the case in which a GPDB key is not defined. This broken case would appear in the cache.xml file by defining an empty key: <gpdb:id />. The mapping has no GPDB key to use, so a new GPDB row must be created for each GemFire element exported. And, each subsequent export must repeat the creation of the new GPDB rows, causing duplication within the GPDB table.

  • An export operation fails if the mapping specifies the <gpdb:id /> field such that multiple GPDB table rows would be updated for a single GemFire region entry. The error implies an incorrect composite key specification.

  • If multiple GemFire regions are mapped to the same GPDB table, the possibility of concurrent export operations exists. Further, if the mapped columns of GPDB table for the GemFire region fields intersect, then multiple updates to the same GPDB table row can be in progress at the same time. The connector does not provide any sort of transactional protocol for this situation. The last update wins.

  • If the GPDB table is described within a GPDB schema, but that schema is not specified in the GPDB configuration parameter search_path, the schema name must be specified by a schema attribute within the gpdb:pdx element of the GemFire cache.xml file.

  • An export operation creates temporary tables in GPDB. Repetitive export operations create and remove temporary tables, likely causing system catalog bloat. The GPDB administrator should be made aware of this, such that appropriate vacuuming will be scheduled.

SQL Implementation Steps

The following sequence of SQL statements implements the export operation, with UPSERT functionality, into the GPDB table. Within the SQL statements, the < > sequences indicate items that will be replaced by expansions of table-specific definitions. The ellipses (...) indicate the location of further clauses within the SQL statement.

  1. A new temporary GPDB table is created:

    DROP TABLE IF EXISTS "tmpTable";
    CREATE TEMPORARY TABLE "tmpTable" (<column definitions>);
    
  2. A new external GPDB table is created that refers to the external GemFire data source using the gpfdist protocol:

    DROP EXTERNAL TABLE IF EXISTS "extTable";
    CREATE EXTERNAL TABLE "extTable" (LIKE "tmpTable")
      LOCATION (<URIs>)
      FORMAT 'TEXT' (DELIMITER E'\u001F' NEWLINE 'LF') ENCODING 'UTF8';
    
  3. The data from the external table is copied to the temporary table:

    INSERT INTO "tmpTable" (<fields>) SELECT <fields> FROM "extTable";
    
  4. The data from the temporary table is updated or inserted at the destination table:

    UPDATE "dstTable" AS d 
      SET (<fields>) = (<fields>) 
      FROM "tmpTable" AS s 
      WHERE d."<id-field>" = s."<id-field>" AND ...  ;
    INSERT INTO "dstTable" (<fields>)
      SELECT <fields>
      FROM "tmpTable" 
      WHERE ROW(<id-fields>) NOT IN (SELECT <id-fields> FROM "dstTable");
    

The following sequence of SQL statements implements the export operation, with INSERT_ALL functionality, into the GPDB table. Within the SQL statements, the < > sequences indicate items that will be replaced by expansions of table-specific definitions. The ellipses (...) indicate the location of further clauses within the SQL statement.

  1. A new external GPDB table is created that refers to the external GemFire data source using the gpfdist protocol:

    DROP EXTERNAL TABLE IF EXISTS "extTable";
    CREATE EXTERNAL TABLE "extTable" (LIKE "dstTable")
      LOCATION (<URIs>)
      FORMAT 'TEXT' (DELIMITER E'\u001F' NEWLINE 'LF') ENCODING 'UTF8';
    
  2. The data from the external table is inserted at the destination table:

    INSERT INTO "dstTable" (<fields>)
      SELECT <fields> FROM "extTable";
    

The following sequence of SQL statements implements the export operation, with INSERT_MISSING functionality, into the GPDB table. Within the SQL statements, the < > sequences indicate items that will be replaced by expansions of table-specific definitions. The ellipses (...) indicate the location of further clauses within the SQL statement.

  1. A new temporary GPDB table is created:

    DROP TABLE IF EXISTS "tmpTable";
    CREATE TEMPORARY TABLE "tmpTable" (<column definitions>);
    
  2. A new external GPDB table is created that refers to the external GemFire data source using the gpfdist protocol:

    DROP EXTERNAL TABLE IF EXISTS "extTable";
    CREATE EXTERNAL TABLE "extTable" (LIKE "tmpTable")
      LOCATION (<URIs>)
      FORMAT 'TEXT' (DELIMITER E'\u001F' NEWLINE 'LF') ENCODING 'UTF8';
    
  3. The data from the external table is copied to the temporary table:

    INSERT INTO "tmpTable" (<fields>) SELECT <fields> FROM "extTable";
    
  4. The data from the temporary table is inserted at the destination table:

    INSERT INTO "dstTable" (<fields>) 
      SELECT <fields> 
      FROM "tmpTable" 
      WHERE ROW(<id-fields>) NOT IN (SELECT <id-fields> FROM "dstTable");
    

The following sequence of SQL statements implements the export operation, with UPDATE functionality, into the GPDB table. Within the SQL statements, the < > sequences indicate items that will be replaced by expansions of table-specific definitions. The ellipses (...) indicate the location of further clauses within the SQL statement.

  1. A new temporary GPDB table is created:

    DROP TABLE IF EXISTS "tmpTable";
    CREATE TEMPORARY TABLE "tmpTable" (<column definitions>);
    
  2. A new external GPDB table is created that refers to the external GemFire data source using the gpfdist protocol:

    DROP EXTERNAL TABLE IF EXISTS "extTable";
    CREATE EXTERNAL TABLE "extTable" (LIKE "tmpTable")
      LOCATION (<URIs>)
      FORMAT 'TEXT' (DELIMITER E'\u001F' NEWLINE 'LF') ENCODING 'UTF8';
    
  3. The data from the external table is copied to the temporary table:

    INSERT INTO "tmpTable" (<fields>) SELECT <fields> FROM "extTable";
    
  4. The data from the temporary table updates the destination table:

    UPDATE "dstTable" AS d 
      SET (<fields>) = (<fields>) 
      FROM "extTable" AS s 
      WHERE d."<id-field>" = s."<id-field>" AND ...  ;