java - Sequence Files in Hadoop -
how these sequence files generated ? saw link sequence file here,
http://wiki.apache.org/hadoop/sequencefile
are these written using default java serializer ? , how read sequence file ?
sequence files generated mapreduce tasks , and can used common format transfer data between mapreduce jobs.
you can read them in following manner:
configuration config = new configuration(); path path = new path(path_to_your_file); sequencefile.reader reader = new sequencefile.reader(filesystem.get(config), path, config); writablecomparable key = (writablecomparable) reader.getkeyclass().newinstance(); writable value = (writable) reader.getvalueclass().newinstance(); while (reader.next(key, value)) // perform operating reader.close();
also can generate sequence files using sequencefile.writer.
the classes used in example following:
import org.apache.hadoop.conf.configuration; import org.apache.hadoop.fs.filesystem; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.sequencefile; import org.apache.hadoop.io.writable; import org.apache.hadoop.io.writablecomparable;
and contained within hadoop-core
maven dependency:
<dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-core</artifactid> <version>1.2.1</version> </dependency>
Comments
Post a Comment