【软件工程实践】Pig项目3-Data目录源码分析-Tuple2

2021-10-16 20:01:58 阅读：199 来源： 互联网

标签：ExecException return Tuple tuple 元组 Tuple2 源码 Pig public

2021SC@SDUSC

上篇已经看了TupleFactory(抽象类)和TupleMaker(接口），接下来继续结合博客内容对源码进行分析

博客摘要：

在BinSedesTupleFactory的newTuple方法中，返回的是BinSedesTuple对象。BinSedesTuple类继承于DefaultTuple类，在DefaultTuple类中有List<Object> mFields字段，这便是存储Tuple数据的地方了，mFields所持有类型为ArrayList<Object>()；。类图关系：

找到项目中BinSedesTupleFactory的代码：

/**

* Default implementation of TupleFactory.

@InterfaceAudience.Private

public class BinSedesTupleFactory extends TupleFactory {

@Override

public Tuple newTuple() {

return new BinSedesTuple();

}

@Override

public Tuple newTuple(int size) {

return new BinSedesTuple(size);

}

@Override

@SuppressWarnings("unchecked")

public Tuple newTuple(List c) {

return new BinSedesTuple(c);

}

@Override

@SuppressWarnings("unchecked")

public Tuple newTupleNoCopy(List list) {

return new BinSedesTuple(list, 1);

}

@Override

public Tuple newTuple(Object datum) {

Tuple t = new BinSedesTuple(1);

try {

t.set(0, datum);

} catch (ExecException e) {

// The world has come to an end, we just allocated a tuple with one slot

// but we can't write to that slot.

throw new RuntimeException("Unable to write to field 0 in newly " +

"allocated tuple of size 1!", e);

}

return t;

}

@Override

public Class<? extends Tuple> tupleClass() {

return BinSedesTuple.class;

}

@Override

public Class<? extends TupleRawComparator> tupleRawComparatorClass() {

return BinSedesTuple.getComparatorClass();

}

@Override

public boolean isFixedSize() {

return false;

}

可以看到newTuple进行了多次重载，上篇我们知道，TupleFactory大多数方法都是抽象方法，这里出现的所有函数均为抽象的实现，是一种重写现象；newTuple返回值是Tuple，返回函数是new BinSedeTuple，Tuple是BinSedeTuple的父类，严格来说，有以下关系，也就是上文那张图

这里也可以看到继承的双面性，一方面可以重用父类的代码，另一方面，子类必须给父类的代码给出实现

这里提出疑问：面向对象程序设计的继承结构真的有必要吗？从前面的分析来看，是不是觉得继承很麻烦？一个程序可以完成的事情使用继承需要五六个程序完成？这里我们拿Tuple举例

指示该对象是否保存的值
*为null的标记
/**
 * Marker for indicating whether the value this object holds
 * is a null
 */
public static byte NULL = 0x00;

指示该对象是否保存的值
*不是空的标记/**
 * Marker for indicating whether the value this object holds
 * is not a null
 */
public static byte NOTNULL = 0x01;

使这个元组引用另一个元组的内容。此方法不进行复制
*底层数据。它维护对原始数据的引用
*元组(甚至可能是保存数据的数据结构)。
* @param t元组引用
/**
 * Make this tuple reference the contents of another.  This method does not copy
 * the underlying data.   It maintains references to the data from the original
 * tuple (and possibly even to the data structure holding the data).
 * @param t Tuple to reference.
 */
@Deprecated
void reference(Tuple t);

查找元组的大小。用于响应arity()。
* @return元组中字段的个数。
/**
 * Find the size of the tuple.  Used to be called arity().
 * @return number of fields in the tuple.
 */
int size();

找出给定字段是否为空。
* @param fieldNum检查字段是否为空。
* @如果字段为空则返回true，否则返回false。
* @throws ExecException如果给定的字段数更大
大于或等于元组中字段的数量。
/**
 * Find out if a given field is null.
 * @param fieldNum Number of field to check for null.
 * @return true if the field is null, false otherwise.
 * @throws ExecException if the field number given is greater
 * than or equal to the number of fields in the tuple.
 */
boolean isNull(int fieldNum) throws ExecException;

查找给定字段的类型。
* @param fieldNum获取类型字段的数量。
* @return类型，编码为字节值。值定义在
* {@link数据类型}。如果字段为空，则返回DataType。未知的
*将被返回。
* @throws ExecException如果字段数大于等于
元组中字段的数量。
/**
 * Find the type of a given field.
 * @param fieldNum Number of field to get the type for.
 * @return type, encoded as a byte value.  The values are defined in
 * {@link DataType}.  If the field is null, then DataType.UNKNOWN
 * will be returned.
 * @throws ExecException if the field number is greater than or equal to
 * the number of fields in the tuple.
 */
byte getType(int fieldNum) throws ExecException;

获取给定字段中的值。
* @param fieldNum获取值的字段号。
* @返回值，作为一个对象。
* @throws ExecException如果字段数大于等于
元组中字段的数量
/**
 * Get the value in a given field.
 * @param fieldNum Number of the field to get the value for.
 * @return value, as an Object.
 * @throws ExecException if the field number is greater than or equal to
 * the number of fields in the tuple.
 */
Object get(int fieldNum) throws ExecException;

*以列表形式获取元组中的所有字段。
* @return一个包含元组字段的对象有序列表
/**
 * Get all of the fields in the tuple as a list.
 * @return a list of objects containing the fields of the tuple
 * in order.
 */
List<Object> getAll();

在给定字段中设置值。这个不应该被调用，除非
*该元组由{@link TupleFactory#newTuple(int)}构造
*参数大于传递到这里的fieldNum。这个调用将
*不会自动扩展元组大小。如果你打电话的话
* {@link TupleFactory#newTuple(int)}使用2，可以调用
这个函数带有1，但不带有2或更大的值。
* @param fieldNum要设置值的字段号。
* @param val对象放入指定的字段。
* @throws ExecException如果字段数大于等于
元组中字段的数量
/**
 * Set the value in a given field.  This should not be called unless
 * the tuple was constructed by {@link TupleFactory#newTuple(int)} with an
 * argument greater than the fieldNum being passed here.  This call will
 * not automatically expand the tuple size.  That is if you called 
 * {@link TupleFactory#newTuple(int)} with a 2, it is okay to call
 * this function with a 1, but not with a 2 or greater.
 * @param fieldNum Number of the field to set the value for.
 * @param val Object to put in the indicated field.
 * @throws ExecException if the field number is greater than or equal to
 * the number of fields in the tuple.
 */
void set(int fieldNum, Object val) throws ExecException;

将字段附加到元组。这种方法并不像它可能的那样有效
*强制复制现有数据以增长数据结构。
*只要有可能，你就应该使用
* {@link TupleFactory#newTuple(int)}，然后用
* {@link #set(int, Object)}，而不是
*然后使用{@link TupleFactory#newTuple()}构造它并添加值。
* @param val对象附加到元组。
/**
 * Append a field to a tuple.  This method is not efficient as it may
 * force copying of existing data in order to grow the data structure.
 * Whenever possible you should construct your Tuple with 
 * {@link TupleFactory#newTuple(int)} and then fill in the values with 
 * {@link #set(int, Object)}, rather
 * than construct it with {@link TupleFactory#newTuple()} and append values.
 * @param val Object to append to the tuple.
 */
void append(Object val);

确定元组在内存中的大小。这是由数据袋使用的
*来确定它们的内存大小。这并不需要精确，但是
*应该是一个体面的估计。
* @return估计的内存大小，以字节为单位
/**
 * Determine the size of tuple in memory.  This is used by data bags
 * to determine their memory size.  This need not be exact, but it
 * should be a decent estimation.
 * @return estimated memory size, in bytes.
 */
long getMemorySize();

将值元组写入字符串。输出将是结果
对元组中的每个值调用toString。
* @param delim在字符串中使用的分隔符。
* @return包含元组的字符串。
* @throws ExecException不会抛出。这只存在于向后兼容的原因。
/** 
 * Write a tuple of values into a string. The output will be the result
 * of calling toString on each of the values in the tuple.
 * @param delim Delimiter to use in the string.
 * @return A string containing the tuple.
 * @throws ExecException this is never thrown. This only exists for backwards compatability reasons.
 */
String toDelimitedString(String delim) throws ExecException;

笔记：Tuple作为一个接口，其函数只提供了定义，没有提供实现，在abstractTuple中会更加详细的实现，我们来看看它的源码

public abstract class AbstractTuple implements Tuple {

@Override

public Iterator<Object> iterator() {

return getAll().iterator();

}

@Override

public String toString() {

return TupleFormat.format(this);

}

/**

* {@inheritDoc}

@Override

public String toDelimitedString(String delim) throws ExecException {

return Joiner.on(delim).useForNull("").join(this);

}

/**

* {@inheritDoc}

@Override

public byte getType(int fieldNum) throws ExecException {

return DataType.findType(get(fieldNum));

}

/**

* {@inheritDoc}

@Override

public boolean isNull(int fieldNum) throws ExecException {

return (get(fieldNum) == null);

}

@Override

public boolean equals(Object other) {

return (compareTo(other) == 0);

}

@Override

public void reference(Tuple t) {

throw new RuntimeException("Tuple#reference(Tuple) is deprecated and should not be used");

}

这里产生了一个疑问，这里toString在Tuple中并没有定义，为何也可以重写？原因如下

我们知道Java中只有接口是可以多重继承的，这里Tuple就用了多重继承，toString改写的是从某个父类继承下来的方法

abstractTuple给出了部分接口的实现，如果它的子类没有重写，那么就会调用abstractTuple里面的方法

abstractTuple调用了很多其他函数，比如TupleFormat,这个是impl目录下的文件，因此不展开分析了,这里给出注释

元组格式的默认实现。Dump和PigDump使用默认值
*实现

/**

* Default implementation of format of Tuple. Dump and PigDump use this default

* implementation

public class TupleFormat

转来转去，我们突然发现，分析了半天，我们仍然不知道Tuple究竟存放了什么！这是因为Tuple具体存放了什么放到defaultTuple实现了，估计的原因是Tuple支持多种类型，因此DefaultTuple只是其中一种实现，其他的实现和DefaultTuple一样继承自AbstractTuple，个人觉得这么设计特别麻烦，直接在顶层实现将各种方法定义好就完事了，这也是使用继承需要面对的问题，有时候继承不一定比直接编程好用，至于多种类型，实际上可以使用模板来实现

最后抛出个问题，在这个程序设计中，这样的继承设计真的有必要吗？

本文的内容先到这里

标签：ExecException,return,Tuple,tuple,元组,Tuple2,源码,Pig,public
来源： https://blog.csdn.net/Aulic/article/details/120700664

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

【软件工程实践】Pig项目3-Data目录源码分析-Tuple2