cassandra_自定义Cassandra数据类型

cassandra_自定义Cassandra数据类型_第1张图片

cassandra

在博客文章《从Java连接到Cassandra》中,我提到了用Java实现的Cassandra Java开发人员的一个优势是能够创建自定义Cassandra数据类型。 在这篇文章中,我将详细介绍如何执行此操作。

Cassandra具有许多内置数据类型,但是在某些情况下可能需要添加自定义类型。 通过扩展org.apache.cassandra.db.marshal.AbstractType类,可以在Java中实现Cassandra定制数据类型。 扩展此方法的类必须最终实现具有以下签名的三个方法:

public ByteBuffer fromString(final String) throws MarshalException
public TypeSerializer getSerializer()
public int compare(Object, Object)

下一个代码清单显示了本文的AbstractType示例实现。

UnitedStatesState.java –扩展AbstractType

package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.serializers.MarshalException;
import org.apache.cassandra.serializers.TypeSerializer;

import java.nio.ByteBuffer;

/**
 * Representation of a state in the United States that
 * can be persisted to Cassandra database.
 */
public class UnitedStatesState extends AbstractType
{
   public static final UnitedStatesState instance = new UnitedStatesState();

   @Override
   public ByteBuffer fromString(final String stateName) throws MarshalException
   {
      return getStateAbbreviationAsByteBuffer(stateName);
   }

   @Override
   public TypeSerializer getSerializer()
   {
      return UnitedStatesStateSerializer.instance;
   }

   @Override
   public int compare(Object o1, Object o2)
   {
      if (o1 == null && o2 == null)
      {
         return 0;
      }
      else if (o1 == null)
      {
         return 1;
      }
      else if (o2 == null)
      {
         return -1;
      }
      else
      {
         return o1.toString().compareTo(o2.toString());
      }
   }

   /**
    * Provide standard two-letter abbreviation for United States
    * state whose state name is provided.
    *
    * @param stateName Name of state whose abbreviation is desired.
    * @return State's abbreviation as a ByteBuffer; will return "UK"
    *    if provided state name is unexpected value.
    */
   private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName)
   {
      final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN";
      String abbreviation;
      try
      {
         abbreviation =  upperCaseStateName.length() == 2
                       ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation()
                       : State.valueOf(upperCaseStateName).getStateAbbreviation();
      }
      catch (Exception exception)
      {
         abbreviation = State.UNKNOWN.getStateAbbreviation();
      }
      return ByteBuffer.wrap(abbreviation.getBytes());
   }
}

上面的类列表引用了State枚举,如下所示。

State.java

package dustin.examples.cassandra.cqltypes;

/**
 * Representation of state in the United States.
 */
public enum State
{
   ALABAMA("Alabama", "AL"),
   ALASKA("Alaska", "AK"),
   ARIZONA("Arizona", "AZ"),
   ARKANSAS("Arkansas", "AR"),
   CALIFORNIA("California", "CA"),
   COLORADO("Colorado", "CO"),
   CONNECTICUT("Connecticut", "CT"),
   DELAWARE("Delaware", "DE"),
   DISTRICT_OF_COLUMBIA("District of Columbia", "DC"),
   FLORIDA("Florida", "FL"),
   GEORGIA("Georgia", "GA"),
   HAWAII("Hawaii", "HI"),
   IDAHO("Idaho", "ID"),
   ILLINOIS("Illinois", "IL"),
   INDIANA("Indiana", "IN"),
   IOWA("Iowa", "IA"),
   KANSAS("Kansas", "KS"),
   LOUISIANA("Louisiana", "LA"),
   MAINE("Maine", "ME"),
   MARYLAND("Maryland", "MD"),
   MASSACHUSETTS("Massachusetts", "MA"),
   MICHIGAN("Michigan", "MI"),
   MINNESOTA("Minnesota", "MN"),
   MISSISSIPPI("Mississippi", "MS"),
   MISSOURI("Missouri", "MO"),
   MONTANA("Montana", "MT"),
   NEBRASKA("Nebraska", "NE"),
   NEVADA("Nevada", "NV"),
   NEW_HAMPSHIRE("New Hampshire", "NH"),
   NEW_JERSEY("New Jersey", "NJ"),
   NEW_MEXICO("New Mexico", "NM"),
   NORTH_CAROLINA("North Carolina", "NC"),
   NORTH_DAKOTA("North Dakota", "ND"),
   NEW_YORK("New York", "NY"),
   OHIO("Ohio", "OH"),
   OKLAHOMA("Oklahoma", "OK"),
   OREGON("Oregon", "OR"),
   PENNSYLVANIA("Pennsylvania", "PA"),
   RHODE_ISLAND("Rhode Island", "RI"),
   SOUTH_CAROLINA("South Carolina", "SC"),
   SOUTH_DAKOTA("South Dakota", "SD"),
   TENNESSEE("Tennessee", "TN"),
   TEXAS("Texas", "TX"),
   UTAH("Utah", "UT"),
   VERMONT("Vermont", "VT"),
   VIRGINIA("Virginia", "VA"),
   WASHINGTON("Washington", "WA"),
   WEST_VIRGINIA("West Virginia", "WV"),
   WISCONSIN("Wisconsin", "WI"),
   WYOMING("Wyoming", "WY"),
   UNKNOWN("Unknown", "UK");

   private String stateName;

   private String stateAbbreviation;

   State(final String newStateName, final String newStateAbbreviation)
   {
      this.stateName = newStateName;
      this.stateAbbreviation = newStateAbbreviation;
   }

   public String getStateName()
   {
      return this.stateName;
   }

   public String getStateAbbreviation()
   {
      return this.stateAbbreviation;
   }

   public static State fromAbbreviation(final String candidateAbbreviation)
   {
      State match = UNKNOWN;
      if (candidateAbbreviation != null && candidateAbbreviation.length() == 2)
      {
         final String upperAbbreviation = candidateAbbreviation.toUpperCase();
         for (final State state : State.values())
         {
            if (state.stateAbbreviation.equals(upperAbbreviation))
            {
               match = state;
            }
         }
      }
      return match;
   }
}

我们还可以提供由上面显示的getSerializer()方法返回的TypeSerializer接口的实现。 通常,通过扩展org.apache.cassandra.serializers package中Cassandra提供的TypeSerializer的众多现有实现之一,通常最容易编写实现TypeSerializer类。 在我的示例中,我的自定义序列化程序扩展了AbstractTextSerializer并且我需要添加的唯一方法是签名public void validate(final ByteBuffer bytes) throws MarshalException 。 我的两个自定义类都需要通过静态访问提供对自身实例的引用。 这是通过AbstractTypeSerializer扩展实现TypeSerializer的类:

UnitedStatesStateSerializer.java –实现TypeSerializer

package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.serializers.AbstractTextSerializer;
import org.apache.cassandra.serializers.MarshalException;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
 * Serializer for UnitedStatesState.
 */
public class UnitedStatesStateSerializer extends AbstractTextSerializer
{
   public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();

   private UnitedStatesStateSerializer()
   {
      super(StandardCharsets.UTF_8);
   }

   /**
    * Validates provided ByteBuffer contents to ensure they can
    * be modeled in the UnitedStatesState Cassandra/CQL data type.
    * This allows for a full state name to be specified or for its
    * two-digit abbreviation to be specified and either is considered
    * valid.
    *
    * @param bytes ByteBuffer whose contents are to be validated.
    * @throws MarshalException Thrown if provided data is invalid.
    */
   @Override
   public void validate(final ByteBuffer bytes) throws MarshalException
   {
      try
      {
         final String stringFormat = new String(bytes.array()).toUpperCase();
         final State state =  stringFormat.length() == 2
                            ? State.fromAbbreviation(stringFormat)
                            : State.valueOf(stringFormat);
      }
      catch (Exception exception)
      {
         throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState.");
      }
   }
}

编写了用于创建自定义CQL数据类型的类后,需要将它们编译成.class文件并存档在JAR文件中。 此过程(使用javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java ,并将生成的文件存档.class以下屏幕快照中显示了将.class文件放入名为jar cvf CustomCqlTypes.jar *名为CustomCqlTypes.jar的JAR中。

compilingCustomTypesClasses
cassandra_自定义Cassandra数据类型_第2张图片

带有自定义CQL类型类的类定义的JAR需要放置在Cassandra安装的lib目录中,如下一个屏幕快照所示。

movingCqlCustomTypesJarToCassandraLibDir
cassandra_自定义Cassandra数据类型_第3张图片

通过在Cassandra安装目录的lib目录中包含自定义CQL数据类型类实现的JAR,应该重新启动Cassandra,以便它能够“看到”这些自定义数据类型定义。

下一个代码清单显示了一个Cassandra查询语言(CQL)语句,用于使用新的自定义类型dustin.examples.cassandra.cqltypes.UnitedStatesState创建表。

createAddress.cql

CREATE TABLE us_address
(
   id uuid,
   street1 text,
   street2 text,
   city text,
   state 'dustin.examples.cassandra.cqltypes.UnitedStatesState',
   zipcode text,
   PRIMARY KEY(id)
);

下一个屏幕快照通过描述cqlsh中创建的表来演示运行上述createAddress.cql代码的结果。

descUSAddressWithCustomType
cassandra_自定义Cassandra数据类型_第4张图片

上面的屏幕快照演示了自定义类型dustin.examples.cassandra.cqltypes.UnitedStatesStateus_address表的state列的类型。

可以使用常规INSERT将新行添加到US_ADDRESS表。 例如,以下屏幕快照演示了使用INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');命令INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');

insertingAddressWithCustomStateTypeIntoCassandraDB
cassandra_自定义Cassandra数据类型_第5张图片

请注意,虽然INSERT语句为该州插入了“纽约”,但它存储为“ NY”。

selectionStateFromCassandraCustomType
cassandra_自定义Cassandra数据类型_第6张图片

如果我在cqlsh中使用缩写以( INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');开头的缩写在cqlsh中运行INSERT语句INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118'); ),它仍然可以正常工作,如下图所示。

insertingAddressWithCustomStateTypeAbbreviationIntoCassandraDB
cassandra_自定义Cassandra数据类型_第7张图片

在我的示例中,无效状态不会阻止INSERT的发生,而是将状态持久保存为“ UK”(对于未知状态)[请参见UnitedStatesState.getStateAbbreviationAsByteBuffer(String)的实现UnitedStatesState.getStateAbbreviationAsByteBuffer(String)

证明为什么要在Java中实现自定义CQL数据类型的第一个优点就是,可以采用与关系数据库中的检查约束所提供的行为类似的行为。 例如,在这篇文章中,我的示例确保为新行输入的任何州列都是美国的五十个州,哥伦比亚特区或未知的“英国”之一。 不能在该列的值中插入其他值。

自定义数据类型的另一个优点是能够将数据整理成首选格式。 在此示例中,我将每个州名称都更改为大写的两位数缩写。 在其他情况下,我可能想要始终以大写形式存储或始终以小写形式存储或将有限的字符串集映射为数值。 定制的CQL数据类型允许在Cassandra数据库中进行定制的验证和值的表示。

结论

这篇文章介绍了如何在Cassandra中实现自定义CQL数据类型。 随着我更多地使用这个概念并尝试不同的方法,我希望就我所做的一些更细微的观察撰写另一篇博客文章。 如本文所显示,编写和使用自定义CQL数据类型非常容易,特别是对于Java开发人员而言。

翻译自: https://www.javacodegeeks.com/2014/07/custom-cassandra-data-types.html

cassandra

你可能感兴趣的:(java,python,android,mysql,数据库)