LATEST VERSION: 2.2.0 - RELEASE NOTES
Pivotal GemFire®-Greenplum® Connector v2.2

Exporting Data from GemFire to Greenplum

An export copies an entire region’s data from Pivotal GemFire® to a Pivotal Greenplum® (GPDB) table.

The export implements an upsert functionality: if a GemFire entry exported is already present as a GPDB row, the row will be updated. If the GPDB row does not already exist, a new row is inserted.

The mapping of GPDB table rows to GemFire region entries is within the region element definition of the GemFire cache.xml file.

An export operation may be invoked using API calls or with the gfsh export command.

Export Using the API

Export region functionality is exposed through the GpdbService class.

Example:

GpdbService.createOperation(region).exportRegion();

The OperationEventListener interface provides further flexibility by allowing a set of callbacks to be defined. The callbacks are invoked at various points during an export to a GPDB table.

Export Using gfsh

See the command reference page for details on this use of the gfsh export command.

Requirements and Caveats

  • Export is supported from partitioned GemFire regions only. Data cannot be exported from replicated regions.

  • At least one GemFire region field must be mapped to a GPDB table column for the export to work. Fields are mapped with the <gpdb:field> element in the cache.xml file. With no mapped fields, an exception is thrown for an export operation.

  • The idempotent behavior of the export operation breaks for the case in which a GPDB key is not defined. This broken case would appear in the cache.xml file by defining an empty key: <gpdb:id />. The mapping has no GPDB key to use, so a new GPDB row must be created for each GemFire element exported. And, each subsequent export must repeat the creation of the new GPDB rows, causing duplication within the GPDB table.

  • An export operation fails if the mapping specifies the <gpdb:id /> field such that multiple GPDB table rows would be updated for a single GemFire region entry. The error implies an incorrect composite key specification.

  • If multiple GemFire regions are mapped to the same GPDB table, the possibility of concurrent export operations exists. Further, if the mapped columns of GPDB table for the GemFire region fields intersect, then multiple updates to the same GPDB table row can be in progress at the same time. The connector does not provide any sort of transactional protocol for this situation. The last update wins.

  • An export operation creates temporary tables in GPDB. Repetitive export operations create and remove temporary tables, likely causing system catalog bloat. The GPDB administrator should be made aware of this, such that appropriate vacuuming will be scheduled.

SQL Implementation Steps

The following sequence of SQL statements implements the export operation into the GPDB table. Within the SQL statements, the < > sequences indicate items that will be replaced by expansions of table-specific definitions. The ellipses (...) indicate the location of further clauses within the SQL statement.

  1. A new temporary GPDB table is created:

    DROP TABLE IF EXISTS "tmpTable";
    CREATE TEMPORARY TABLE "tmpTable" (<column definitions>);
    
  2. A new external GPDB table is created that refers to the external GemFire data source using the gpfdist protocol:

    DROP EXTERNAL TABLE IF EXISTS "extTable";
    CREATE EXTERNAL TABLE "extTable" (LIKE "tmpTable")
      LOCATION (<URIs>)
      FORMAT 'TEXT' (DELIMITER E'\u001F' NEWLINE 'LF') ENCODING 'UTF8';
    
  3. The data from the external table is copied to the temporary table:

    INSERT INTO "tmpTable" (<fields>) SELECT <fields> FROM "extTable";
    
  4. The data from the temporary table is updated or inserted at the destination table:

    UPDATE "dstTable" AS d 
      SET (<fields>) = (<fields>) 
      FROM "extTable" AS s 
      WHERE d."<id-field>" = s."<id-field>" AND ...  ;
    INSERT INTO "dstTable" (<fields>) SELECT <fields>;