Exporting Data from GemFire to Greenplum
An export copies an entire region’s data from Pivotal GemFire® to a Pivotal Greenplum® (GPDB) table.
The export implements an upsert functionality: if a GemFire entry exported is already present as a GPDB row, the row will be updated. If the GPDB row does not already exist, a new row is inserted.
The mapping of GPDB table rows to GemFire region entries is
region element definition of the GemFire
An export operation may be invoked using API calls or with the
gfsh export command.
Export Using the API
Export region functionality is exposed through the
OperationEventListener interface provides
further flexibility by allowing a set of callbacks to be defined.
The callbacks are invoked at various
points during an export to a GPDB table.
Export Using gfsh
See the command reference page for details
on this use of the
gfsh export command.
Requirements and Caveats
Export is supported from partitioned GemFire regions only. Data cannot be exported from replicated regions.
At least one GemFire region field must be mapped to a GPDB table column for the export to work. Fields are mapped with the
<gpdb:field>element in the
cache.xmlfile. With no mapped fields, an exception is thrown for an export operation.
The idempotent behavior of the export operation breaks for the case in which a GPDB key is not defined. This broken case would appear in the
cache.xmlfile by defining an empty key:
<gpdb:id />. The mapping has no GPDB key to use, so a new GPDB row must be created for each GemFire element exported. And, each subsequent export must repeat the creation of the new GPDB rows, causing duplication within the GPDB table.
An export operation fails if the mapping specifies the
<gpdb:id />field such that multiple GPDB table rows would be updated for a single GemFire region entry. The error implies an incorrect composite key specification.
If multiple GemFire regions are mapped to the same GPDB table, the possibility of concurrent export operations exists. Further, if the mapped columns of GPDB table for the GemFire region fields intersect, then multiple updates to the same GPDB table row can be in progress at the same time. The connector does not provide any sort of transactional protocol for this situation. The last update wins.
If the GPDB table is described within a GPDB schema, but that schema is not specified in the GPDB configuration parameter
search_path, the schema name must be specified by a
schemaattribute within the
gpdb:pdxelement of the GemFire
An export operation creates temporary tables in GPDB. Repetitive export operations create and remove temporary tables, likely causing system catalog bloat. The GPDB administrator should be made aware of this, such that appropriate vacuuming will be scheduled.
SQL Implementation Steps
The following sequence of SQL statements implements the export operation
into the GPDB table.
Within the SQL statements, the
< > sequences indicate items that will
be replaced by expansions of table-specific definitions.
The ellipses (
...) indicate the location of further clauses within
the SQL statement.
A new temporary GPDB table is created:
DROP TABLE IF EXISTS "tmpTable"; CREATE TEMPORARY TABLE "tmpTable" (<column definitions>);
A new external GPDB table is created that refers to the external GemFire data source using the
DROP EXTERNAL TABLE IF EXISTS "extTable"; CREATE EXTERNAL TABLE "extTable" (LIKE "tmpTable") LOCATION (<URIs>) FORMAT 'TEXT' (DELIMITER E'\u001F' NEWLINE 'LF') ENCODING 'UTF8';
The data from the external table is copied to the temporary table:
INSERT INTO "tmpTable" (<fields>) SELECT <fields> FROM "extTable";
The data from the temporary table is updated or inserted at the destination table:
UPDATE "dstTable" AS d SET (<fields>) = (<fields>) FROM "extTable" AS s WHERE d."<id-field>" = s."<id-field>" AND ... ; INSERT INTO "dstTable" (<fields>) SELECT <fields>;