LATEST VERSION: 3.2.0 - CHANGELOG
Pivotal GemFire®-Greenplum® Connector v3.2

Datatype Mapping

How the types correspond between a Pivotal GemFire® region and a Pivotal Greenplum® (GPDB) table are described in a cache.xml file.

GemFire XML Description of a GPDB Table

In order to use the connector, a mapping between a GemFire region and a GPDB table has to be described. This mapping goes in the cache.xml file, which has an extended syntax for the mapping information.

The connector makes assumptions about the intended mapping if some parts of the mapping are incomplete. We advise you to explicitly define all keys and fields, such that the connector has no assumptions to make.

This cache.xml file example provides the XML description of the mappings for the region. The XML with the gpdb tag is the mapping between GemFire region fields and the GPDB table columns. It describes how to connect to GPDB, as well as gpfdist protocol to be used to transfer the data between the two systems.

<?xml version="1.0" encoding="UTF-8"?>
<cache xmlns="http://geode.apache.org/schema/cache"
  xmlns:gpdb="http://schema.pivotal.io/gemfire/gpdb"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://geode.apache.org/schema/cache
  http://geode.apache.org/schema/cache/cache-1.0.xsd
  http://schema.pivotal.io/gemfire/gpdb
  http://schema.pivotal.io/gemfire/gpdb/gpdb-2.4.xsd"
  version="1.0">

  <pdx read-serialized="true" persistent="false">
      <pdx-serializer>
          <class-name>org.apache.geode.pdx.ReflectionBasedAutoSerializer</class-name>
          <parameter name="classes">
              <string>io.pivotal.gemfire.demo.entity.*</string>
          </parameter>
      </pdx-serializer>
  </pdx>
  <jndi-bindings>
      <jndi-binding jndi-name="DemoDatasource" type="SimpleDataSource"
          jdbc-driver-class="org.postgresql.Driver" user-name="gpadmin"
          password="changeme" connection-url="jdbc:postgresql://localhost:5432/gemfire_db">
      </jndi-binding>
  </jndi-bindings>
  <region name="Parent">
      <region-attributes refid="PARTITION">
          <partition-attributes redundant-copies="1" />
      </region-attributes>
      <gpdb:store datasource="DemoDatasource">
          <gpdb:types>
              <gpdb:pdx name="io.pivotal.gemfire.demo.entity.Parent"
                  schema="public"
                  table="parent">
                  <gpdb:id field="id" />
                  <gpdb:fields>
                      <gpdb:field name="id" column="id" />
                      <gpdb:field name="name" />
                      <gpdb:field name="income" class="java.math.BigDecimal" />
                  </gpdb:fields>
              </gpdb:pdx>
          </gpdb:types>
      </gpdb:store>
  </region>
  <gpdb:gpfdist port="8000" />
</cache>

jndi-binding

In the example, there is a single JNDI binding to specify the details of the connection to the example’s GPDB database. This element defines a datasource that is referenced later in gpdb:store element using datasource attribute.

The connection to the GPDB instance will be to localhost:5432, with a default database of gemfire_db. The GPDB user name is gpadmin, and its password is changeme.

For more details on JNDI bindings, see JNDI.

gpdb:store

The gpdb:store element specifies the mapping between GPDB table contents and the GemFire region.

There should be one store element for each GPDB data source.

In this example, the gpdb:store element identifies the JNDI binding to specify the GPDB instance at the other end of the connector. It is the data source.

gpdb:types

The gpdb:types element contains a list of gpdb:pdx instances. There is only one gpdb:pdx instance in this example.

gpdb:pdx

Each gpdb:pdx instance identifies the mapping of a GemFire region’s entries to its corresponding GPDB table. This is needed for the data serialization and transfer implemented by the connector. The optional schema attribute specifies the GPDB schema for the table. See Creating and Managing Schemas within the GPDB manual for a description of GPDB schemas. The optional schema attribute is set to the default value of public when not specified. Make the schema attribute value the same across all GemFire servers that host the region.

gpdb:id

The gpdb:id element specifies the fields to be used as the key in the identification of a GemFire object.

The example uses a singleton key named id, also mapped to table column id:

  <gpdb:id field="id" />

Here is an example of a composite key with the two fields parentId and id mapped to table columns parentId and id.

  <gpdb:id>
    <gpdb:field ref="parentId" />
    <gpdb:field ref="id" />
  </gpdb:id>

gpdb:fields

The gpdb:fields element lists the fields mapped to values of the GemFire PdxInstance. A subset of the available GemFire fields or GPDB columns may be mapped. Omission of this element causes the connector to automap the GemFire region fields to GPDB table columns, using the assumption that the exactly matching field and column names are to be mapped.

gpdb:field

Each gpdb:field instance specifies the mapping from a GemFire PdxIntance field to a GPDB column. If no GPDB column attribute is specified, the GPDB column is assumed to have exactly the same name (case sensitive) as the GemFire field.

Here is an example in which a GemFire parentId field is mapped to a GPDB parent_id column, and the GemFire age field is of the Java class (data type) java.lang.Integer.

  <gpdb:field name="parentId" column="parent_id" />
  <gpdb:field name="age" class="java.lang.Integer" />

Special cases exist when the gpdb:field attributes are missing from the cache.xml file. With no gpdb:field attributes defined, the connector assumes that the GemFire region fields have the same names as the GPDB table columns. This list describes the current behavior for this type of special case:

  • If there exists a GPDB table column for which there is no GemFire region field, then an export operation does not update that GPDB table column. and an insert of a new row places either null or the defined default into the columns with no associated GemFire region field. An import operates on only those GemFire region fields for which the column with the matching name exists.

  • If there is a GemFire region field for which there is no matching GPDB table column, then an export operation only exports the GemFire region fields for which there is a GPDB table column of the same name. An import has side effects on the GemFire region. The import correctly acquires those GemFire region fields for which the column with the matching name exists. But, the import sets to null those GemFire region fields for which there is no GPDB table column of the same name; this occurs for both an update and an insert of region entries.

If one or more gpdb:field attributes are specified, there can exist a mismatch of fields to columns. This list describes the current behavior for this type of special case:

  • If there is exactly one mapping of a field to a column, and the naming of that GemFire region field or a GPDB table column within a gpdb:field attribute is incorrect such that it does not match a configured and defined name, an exception is thrown upon export, as this case appears as if there are no mappings.

  • With more than one gpdb:field attributes specified, if the naming of one GemFire region field or a GPDB table column is incorrect such that it does not match a configured and defined name, upon export an update to a GPDB table row will not change the value of the incorrectly named column. An insert of a new GPDB table row under the same assumptions acquires the correctly mapped GemFire region fields. The incorrectly mapped GPDB column will be set to null or its defined default value. Upon import, either an update of an existing entry or an insert of a new entry acquires the GPDB table column values for any correctly mapped fields, and uses null for any incorrectly named GemFire field.

gpdb:gpfdist

The gpdb:gpfdist element is not part of the region specification, but is part of the cache specification. It provides details of the gpfdist protocol used in the connector. More details may be found within the section on Working with File-Based External Tables in the GPDB manual.

To increase the security of the data during transfer between GemFire and GPDB, TLS/SSL transfers can be enabled, as in this example:

  <gpdb:gpfdist hostname="gpfdist.gemfire.test">
    <gpdb:ssl
        key-store="target/test-classes/io/pivotal/gemfire/gpdb/gpfdist/keystore.jks"
        key-store-password="123456"
        trust-store="target/test-classes/io/pivotal/gemfire/gpdb/gpfdist/truststore.jks"
        trust-store-password="123456" />
  </gpdb:gpfdist>

All four gpdb:ssl attributes must be present to configure the transfers to use SSL. The two password fields remain in plain text within the cache.xml file, so file system protections should be considered.

The specification of optional attributes overrides default values. These are the optional attributes:

  • A port specification overrides the listening port of gpfdist to be other than its default of 8080. A value of 0 causes automatic allocation of a port number from an ephemeral port range. If the specified port is already in use, the system increments the value until an available port is found to use. For example:

      <gpdb:gpfdist port="8000" />
    
  • A hostname specifies the host name used in URI generation. If SSL is enabled, this name must match the common name (CN) of installed key store certificate. The default value is an IP address of the local host.

      <gpdb:gpfdist hostname="host12.example.com" />
    
  • An address specification identifies the IP address for listening to be other than the default value of 0.0.0.0. For example:

      <gpdb:gpfdist address="192.0.2.0" />
    
  • idle-timeout specifies a timeout in milliseconds. If no data is transmitted within this timeout period, the connection may be closed. The default value is 60000 milliseconds. There will be no timeout if set to 0. For example

      <gpdb:gpfdist idle-timeout="120000" />
    

Requirements and Caveats in the Mapping

  • There must be only one GPDB table specified per GemFire region. If more than one GPDB table is specified for a single region by defining distinct GPDB tables across multiple cache.xml files, then an exception is thrown. If more than one GPDB table is specified within a single cache.xml file, no error will be thrown or issued, and the import or export operation will proceed, resulting in incorrect results.

Supported Datatypes

The following datatypes are supported. The left side shows the GPDB/PostgreSQL datatype, and the right side shows the associated GemFire/Java datatype which it is mapped to.

GPDB GemFire
Bigint Long (Object wrapper)
Bigint not null long (primitive)
Bigserial long (primitive)
Boolean not null boolean (primitive)
Boolean not null Boolean (wrapper)
Bit BitSet
Varbit BitSet
Char String
Varchar String
Decimal BigDecimal
Float Float (Object wrapper)
Float not null float (primitive)
Int Integer (Object wrapper)
Int not null int (primitive)
Real Float (Object wrapper)
Real not null float (primitive)
Serial int (primitive)
Smallint Short (Object wrapper)
Smallint not null short (primitive)
Text String
Date java.sql.Date
Time Time
Timetz Time
Timestamp Timestamp
Timestamptz Timestamp

Unsupported GPDB Datatypes

  • inet
  • cidr
  • bytea
  • box
  • circle
  • interval
  • seg
  • macaddr
  • path
  • point
  • polygon
  • xml
  • money
  • array

Unsupported GemFire Datatypes

  • Classes with data members that are objects. Not supported, as this would require a representation of nested objects in GPDB. Here is an example of an unsupported type:

    class HasAnObjectMember {
        private UserObject theObject;
    }
    
  • An data type other than BitSet, Boolean, Date, Number, and String. Array types are an example of an unsupported data type.