There's a need recently , Computing user portraits .

The system probably has 800W Of users , Calculate some data of each user .

Large amount of data , Calculation hive There's no pressure , But it's written oracle, Before giving the data to the front end , It's hard .

And then there's another solution :

1.hive Calculation , Written HDFS

2.API Read it out , writes hbase(hdfs and hbase The version of does not match , No way sqoop Direct to )

Then the question came .

Need to write a API, read HDFS File on .

Main categories :ReadHDFS 

public class ReadHDFS {
public static void main(String[]args){
long startLong = System.currentTimeMillis();
HDFSReadLog.writeLog("start read file");
String path;
if (args.length > 1) {
// path = args[0];
Constant.init(args[0],args[1]);
}
HDFSReadLog.writeLog(Constant.PATH); try {
getFile(Constant.URI + Constant.PATH);
} catch (IOException e) {
e.printStackTrace();
} long endLong = System.currentTimeMillis();
HDFSReadLog.writeLog("cost " + (endLong -startLong)/1000 + " seconds");
HDFSReadLog.writeLog("cost " + (endLong -startLong)/1000/60 + " minute");
} private static void getFile(String filePath) throws IOException { FileSystem fs = FileSystem.get(URI.create(filePath), HDFSConf.getConf());
Path path = new Path(filePath);
if (fs.exists(path) && fs.isDirectory(path)) { FileStatus[] stats = fs.listStatus(path);
FSDataInputStream is;
FileStatus stat;
byte[] buffer;
int index;
StringBuilder lastStr = new StringBuilder();
for(FileStatus file : stats){
try{
HDFSReadLog.writeLog("start read : " + file.getPath());
is = fs.open(file.getPath());
stat = fs.getFileStatus(path);
int sum = is.available();
if(sum == 0){
HDFSReadLog.writeLog("have no data : " + file.getPath() );
continue;
}
HDFSReadLog.writeLog("there have : " + sum + " bytes" );
buffer = new byte[sum];
// Be careful. , If the file is too large , There may not be enough memory . When measured by this machine , Read a 100 many M The file of , This leads to insufficient memory .
is.readFully(0,buffer);
String result = Bytes.toString(buffer);
// writes hbase
WriteHBase.writeHbase(result); is.close();
HDFSReadLog.writeLog("read : " + file.getPath() + " end");
}catch (IOException e){
e.printStackTrace();
HDFSReadLog.writeLog("read " + file.getPath() +" error");
HDFSReadLog.writeLog(e.getMessage());
}
}
HDFSReadLog.writeLog("Read End");
fs.close(); }else {
HDFSReadLog.writeLog(path + " is not exists");
} }
}

Configuration class :HDFSConfie( It's no use rushing ,url and path It's ready , You can read without configuration )

public class HDFSConf {
public static Configuration conf = null;
public static Configuration getConf(){
if (conf == null){
conf = new Configuration();
String path = Constant.getSysEnv("HADOOP_HOME")+"/etc/hadoop/";
HDFSReadLog.writeLog("Get hadoop home : " + Constant.getSysEnv("HADOOP_HOME"));
// hdfs conf
conf.addResource(path+"core-site.xml");
conf.addResource(path+"hdfs-site.xml");
conf.addResource(path+"mapred-site.xml");
conf.addResource(path+"yarn-site.xml");
}
return conf;
} }

Some constants :

url : hdfs:ip:prot

path : HDFS The path of

notes : Considering the reading table , There may be more than one file , It's a loop .

Look at the next chapter , Go to hbase Writing data

Java Read HDFS More articles on file systems

  1. adopt java Read HDFS The data of ( turn )

    Link to the original text : adopt java Read HDFS The data of adopt JAVA Direct reading HDFS In the time , It must be used FSDataInputStream class , adopt FSDataInputStream In the form of flow from HDFS Read the data code as follows ...

  2. java Read HDFS The compressed file is garbled

    java By calling HDFS Systematic FileSystem etc. API Direct reading HDFS The compressed file will produce garbled code resolvent : 1. Call decoded API, Pass after decoding IO Stream processing . public static void mai ...

  3. JAVA Read HDFS Information

    uri Fill in the path public static void main(String[] args) throws IOException { String uri = "/user/WeiboAD/ ...

  4. hadoop Series two :HDFS File system commands and JAVA client API

    Please indicate the author and source at the top of the page One : explain Here are some blog posts of big data series , If you have time, it will be updated one after another , Contains some content of big data , Such as hadoop,spark,storm, Machine learning, etc . Currently in use hadoop Version is 2.6 ...

  5. HDFS File system basic operation --Java Realization

    Java Realize to HDFS The basic operation of the file system 1. Get ready jar package 2. Create a class 1. Test connection @Test // Test whether the connection is successful public void test() { // Add the configuration ==> c ...

  6. Use JAVA API Read HDFS The solution to the garbled file data

    Use JAVA api Read HDFS The documents are disorderly coded Want to write a read HFDS Part of the file data on the preview interface , According to the blog on the Internet , It is found that sometimes there will be garbled code when reading information , For example, read a csv when , Strings are separated by commas English string ...

  7. Use Java API operation HDFS file system

    Use Junit encapsulation HFDS import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; import org ...

  8. Java API Read HDFS A single file of

    HDFS Single file on : -bash-3.2$ hadoop fs -ls /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_categor ...

  9. Record a read hdfs Problems with files java.net.ConnectException: Connection refused

    The company's hadoop The cluster was built by former colleagues , I ( Xiaobai is one ) stay spark shell Read from hdfs On the file , Execute the following instructions >>> word=sc.textFile("hdfs ...

Random recommendation

  1. rdlc Report size settings

    Reference resources :http://stackoverflow.com/questions/427730/how-to-limit-rdlc-report-for-one-page-in-a-pdf The main settings are : report form ...

  2. C# The picture changes freely Arbitrary distortion

    I wanted to play a subway driving game before , Among them, I want to make some rectangular pictures into a trapezoid , But found GID+ There's no such thing as that . So in Google Valley . No, ! We can only find the coveted , There is no source code . I tried my own idea for a day or two , A little bit of an effect ...

  3. Some to be verified IOS problem

    1.images.assert The image format in must be png.(jpg The format of the picture doesn't work )

  4. ckplayer Web player easy tutorial

    Preface ckplayer It's a free video plug-in for playing videos on Web pages , The plug-in has strong compatibility . Easy to use .api complete . in addition , Any personal website or commercial website can be used for free without modifying the right-click copyright . The following will be true of ckplayer The whole of ...

  5. [ dialogue CTO] Dangdang.com Xiong Changqing : Interest is the first factor to be a good engineer -CSDN.NET

    Women Techmaker Beijing station [ dialogue CTO] Dangdang.com Xiong Changqing : Interest is the first factor to be a good engineer -CSDN.NET     [ dialogue CTO] Dangdang.com Xiong Changqing : Interest is the first factor to be a good engineer     Published in 2 ...

  6. setAttribute The compatibility of

    class and className Compatible methods : object.setAttribute("class","content") stay IE8.Chrome. firefox .Opera ...

  7. git establish tag , see tag , Delete tag

    brief introduction   use git For a long time , I also like this version control tool , Today, let's share , How to create tag, see tag, Delete tag And bring the local tag Push to remote git Server C:\Users\\WandaPuHuiProject ...

  8. blinn-phong High light reverse penetration problem

    blinn-phong highlights : H=normalize(V+L); specular=pow(saturate(dot(N,H)),shiness); You will encounter the following problems : The light source in the picture is surface Back , ...

  9. Spring IOC( 6、 ... and ) Dependency lookup

    Spring IOC( 6、 ... and ) Dependency lookup Spring Series catalog (https://www.cnblogs.com/binarylei/p/10198698.html) Spring BeanFactory ...

  10. php Static variables and methods and phar Use

    This section uses the example before class and static variable transformation :php Generate configuration files based on command line parameters ghostinit.php: <?php class ghostinit{ static $version = 'ghost ...