Fetch HTML part in java











up vote
0
down vote

favorite












I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.










share|improve this question






















  • Perhaps you can try Jsoup?
    – manfromnowhere
    Nov 20 at 10:31










  • It builds dom from whole page. It quite fast but not enough
    – Vlad Doronin
    Nov 20 at 10:41















up vote
0
down vote

favorite












I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.










share|improve this question






















  • Perhaps you can try Jsoup?
    – manfromnowhere
    Nov 20 at 10:31










  • It builds dom from whole page. It quite fast but not enough
    – Vlad Doronin
    Nov 20 at 10:41













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.










share|improve this question













I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.







java html inputstreamreader






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 at 10:20









Vlad Doronin

33




33












  • Perhaps you can try Jsoup?
    – manfromnowhere
    Nov 20 at 10:31










  • It builds dom from whole page. It quite fast but not enough
    – Vlad Doronin
    Nov 20 at 10:41


















  • Perhaps you can try Jsoup?
    – manfromnowhere
    Nov 20 at 10:31










  • It builds dom from whole page. It quite fast but not enough
    – Vlad Doronin
    Nov 20 at 10:41
















Perhaps you can try Jsoup?
– manfromnowhere
Nov 20 at 10:31




Perhaps you can try Jsoup?
– manfromnowhere
Nov 20 at 10:31












It builds dom from whole page. It quite fast but not enough
– Vlad Doronin
Nov 20 at 10:41




It builds dom from whole page. It quite fast but not enough
– Vlad Doronin
Nov 20 at 10:41












2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted










I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



Example using the library I mentioned:



byte pageInBytes = readAllBytesFromTheURL();
VTDGen vg = new VTDGen();
vg.setDoc(pageInBytes);
vg.parse(false);
VTDNav vn = vg.getNav();

AutoPilot ap = new AutoPilot(vn);

//Jump to the section that we want to process
ap.selectXPath("/html/body/div");
String fileId = vn.toString(vu.getElementFragment());





share|improve this answer























  • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
    – Vlad Doronin
    Nov 20 at 11:35










  • cool, can you accept my answer. I'm trolling for points on the stack overflow :)
    – piotr szybicki
    Nov 20 at 12:50












  • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
    – Vlad Doronin
    Nov 20 at 12:57










  • Can you share your solution when you are done. I'm curious to see what you come up with.
    – piotr szybicki
    Nov 20 at 14:18










  • Posted my code in next answer
    – Vlad Doronin
    Nov 21 at 12:06


















up vote
0
down vote













Wrote helper to read url content. Parser for elements in another class.



public class HTMLReaderHelper {

private final URL currentURL;

HTMLReaderHelper(URL url){
currentURL = url;
}

public CharIterator charIterator(){
CharIterator iterator;
try {
iterator = new CharIterator();
} catch(IOException ex){
return null;
}
return iterator;
}

public StringIterator stringIterator(){
return new StringIterator();
}

class CharIterator implements java.util.Iterator<Character>{

private InputStream urlStream;

private boolean isValid;

private Queue<Character> buffer;

private CharIterator() throws IOException {
urlStream = currentURL.openStream();
isValid = true;
buffer = new ArrayDeque<>();
}

@Override
public boolean hasNext() {
char c;
try {
c = (char)urlStream.read();
buffer.add(c);
} catch (IOException ex) {
markInvalid();
return false;
}
return c != (char) -1;
}

@Override
public Character next() {
if(!isValid){
return null;
}
char c;
try {
if(buffer.size() > 0){
return buffer.remove();
}
c = (char)urlStream.read();
} catch (IOException ex) {
markInvalid();
return null;
}
return (c != (char)-1) ? c : null;
}

private void markInvalid(){
isValid = false;
}
}

class StringIterator implements java.util.Iterator<String>{

private CharIterator charPointer;

private Queue<String> buffer;

private boolean isValid;

private StringIterator(){
charPointer = charIterator();
isValid = true;
buffer = new ArrayDeque<>();
}

@Override
public boolean hasNext() {
String value = next();
try {
buffer.add(value);
} catch (NullPointerException ex){
markInvalid();
return false;
}
return isValid;
}

@Override
public String next() {
if(buffer.size() > 0){
return buffer.remove();
}
if(!isValid){
return null;
}
StringBuilder sb = new StringBuilder();
Character currentChar = charPointer.next();
if(currentChar == null){
return null;
}
while (currentChar.equals('n') || currentChar.equals('r')){
currentChar = charPointer.next();
if(currentChar == null){
return null;
}
}
while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
sb.append(currentChar);
currentChar = charPointer.next();
}
return sb.toString();
}
private void markInvalid(){
isValid = false;
}
}
}





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390833%2ffetch-html-part-in-java%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());





    share|improve this answer























    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
      – Vlad Doronin
      Nov 20 at 11:35










    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)
      – piotr szybicki
      Nov 20 at 12:50












    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
      – Vlad Doronin
      Nov 20 at 12:57










    • Can you share your solution when you are done. I'm curious to see what you come up with.
      – piotr szybicki
      Nov 20 at 14:18










    • Posted my code in next answer
      – Vlad Doronin
      Nov 21 at 12:06















    up vote
    0
    down vote



    accepted










    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());





    share|improve this answer























    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
      – Vlad Doronin
      Nov 20 at 11:35










    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)
      – piotr szybicki
      Nov 20 at 12:50












    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
      – Vlad Doronin
      Nov 20 at 12:57










    • Can you share your solution when you are done. I'm curious to see what you come up with.
      – piotr szybicki
      Nov 20 at 14:18










    • Posted my code in next answer
      – Vlad Doronin
      Nov 21 at 12:06













    up vote
    0
    down vote



    accepted







    up vote
    0
    down vote



    accepted






    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());





    share|improve this answer














    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 20 at 11:22

























    answered Nov 20 at 11:14









    piotr szybicki

    423210




    423210












    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
      – Vlad Doronin
      Nov 20 at 11:35










    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)
      – piotr szybicki
      Nov 20 at 12:50












    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
      – Vlad Doronin
      Nov 20 at 12:57










    • Can you share your solution when you are done. I'm curious to see what you come up with.
      – piotr szybicki
      Nov 20 at 14:18










    • Posted my code in next answer
      – Vlad Doronin
      Nov 21 at 12:06


















    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
      – Vlad Doronin
      Nov 20 at 11:35










    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)
      – piotr szybicki
      Nov 20 at 12:50












    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
      – Vlad Doronin
      Nov 20 at 12:57










    • Can you share your solution when you are done. I'm curious to see what you come up with.
      – piotr szybicki
      Nov 20 at 14:18










    • Posted my code in next answer
      – Vlad Doronin
      Nov 21 at 12:06
















    Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
    – Vlad Doronin
    Nov 20 at 11:35




    Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull
    – Vlad Doronin
    Nov 20 at 11:35












    cool, can you accept my answer. I'm trolling for points on the stack overflow :)
    – piotr szybicki
    Nov 20 at 12:50






    cool, can you accept my answer. I'm trolling for points on the stack overflow :)
    – piotr szybicki
    Nov 20 at 12:50














    Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
    – Vlad Doronin
    Nov 20 at 12:57




    Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.
    – Vlad Doronin
    Nov 20 at 12:57












    Can you share your solution when you are done. I'm curious to see what you come up with.
    – piotr szybicki
    Nov 20 at 14:18




    Can you share your solution when you are done. I'm curious to see what you come up with.
    – piotr szybicki
    Nov 20 at 14:18












    Posted my code in next answer
    – Vlad Doronin
    Nov 21 at 12:06




    Posted my code in next answer
    – Vlad Doronin
    Nov 21 at 12:06












    up vote
    0
    down vote













    Wrote helper to read url content. Parser for elements in another class.



    public class HTMLReaderHelper {

    private final URL currentURL;

    HTMLReaderHelper(URL url){
    currentURL = url;
    }

    public CharIterator charIterator(){
    CharIterator iterator;
    try {
    iterator = new CharIterator();
    } catch(IOException ex){
    return null;
    }
    return iterator;
    }

    public StringIterator stringIterator(){
    return new StringIterator();
    }

    class CharIterator implements java.util.Iterator<Character>{

    private InputStream urlStream;

    private boolean isValid;

    private Queue<Character> buffer;

    private CharIterator() throws IOException {
    urlStream = currentURL.openStream();
    isValid = true;
    buffer = new ArrayDeque<>();
    }

    @Override
    public boolean hasNext() {
    char c;
    try {
    c = (char)urlStream.read();
    buffer.add(c);
    } catch (IOException ex) {
    markInvalid();
    return false;
    }
    return c != (char) -1;
    }

    @Override
    public Character next() {
    if(!isValid){
    return null;
    }
    char c;
    try {
    if(buffer.size() > 0){
    return buffer.remove();
    }
    c = (char)urlStream.read();
    } catch (IOException ex) {
    markInvalid();
    return null;
    }
    return (c != (char)-1) ? c : null;
    }

    private void markInvalid(){
    isValid = false;
    }
    }

    class StringIterator implements java.util.Iterator<String>{

    private CharIterator charPointer;

    private Queue<String> buffer;

    private boolean isValid;

    private StringIterator(){
    charPointer = charIterator();
    isValid = true;
    buffer = new ArrayDeque<>();
    }

    @Override
    public boolean hasNext() {
    String value = next();
    try {
    buffer.add(value);
    } catch (NullPointerException ex){
    markInvalid();
    return false;
    }
    return isValid;
    }

    @Override
    public String next() {
    if(buffer.size() > 0){
    return buffer.remove();
    }
    if(!isValid){
    return null;
    }
    StringBuilder sb = new StringBuilder();
    Character currentChar = charPointer.next();
    if(currentChar == null){
    return null;
    }
    while (currentChar.equals('n') || currentChar.equals('r')){
    currentChar = charPointer.next();
    if(currentChar == null){
    return null;
    }
    }
    while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
    sb.append(currentChar);
    currentChar = charPointer.next();
    }
    return sb.toString();
    }
    private void markInvalid(){
    isValid = false;
    }
    }
    }





    share|improve this answer

























      up vote
      0
      down vote













      Wrote helper to read url content. Parser for elements in another class.



      public class HTMLReaderHelper {

      private final URL currentURL;

      HTMLReaderHelper(URL url){
      currentURL = url;
      }

      public CharIterator charIterator(){
      CharIterator iterator;
      try {
      iterator = new CharIterator();
      } catch(IOException ex){
      return null;
      }
      return iterator;
      }

      public StringIterator stringIterator(){
      return new StringIterator();
      }

      class CharIterator implements java.util.Iterator<Character>{

      private InputStream urlStream;

      private boolean isValid;

      private Queue<Character> buffer;

      private CharIterator() throws IOException {
      urlStream = currentURL.openStream();
      isValid = true;
      buffer = new ArrayDeque<>();
      }

      @Override
      public boolean hasNext() {
      char c;
      try {
      c = (char)urlStream.read();
      buffer.add(c);
      } catch (IOException ex) {
      markInvalid();
      return false;
      }
      return c != (char) -1;
      }

      @Override
      public Character next() {
      if(!isValid){
      return null;
      }
      char c;
      try {
      if(buffer.size() > 0){
      return buffer.remove();
      }
      c = (char)urlStream.read();
      } catch (IOException ex) {
      markInvalid();
      return null;
      }
      return (c != (char)-1) ? c : null;
      }

      private void markInvalid(){
      isValid = false;
      }
      }

      class StringIterator implements java.util.Iterator<String>{

      private CharIterator charPointer;

      private Queue<String> buffer;

      private boolean isValid;

      private StringIterator(){
      charPointer = charIterator();
      isValid = true;
      buffer = new ArrayDeque<>();
      }

      @Override
      public boolean hasNext() {
      String value = next();
      try {
      buffer.add(value);
      } catch (NullPointerException ex){
      markInvalid();
      return false;
      }
      return isValid;
      }

      @Override
      public String next() {
      if(buffer.size() > 0){
      return buffer.remove();
      }
      if(!isValid){
      return null;
      }
      StringBuilder sb = new StringBuilder();
      Character currentChar = charPointer.next();
      if(currentChar == null){
      return null;
      }
      while (currentChar.equals('n') || currentChar.equals('r')){
      currentChar = charPointer.next();
      if(currentChar == null){
      return null;
      }
      }
      while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
      sb.append(currentChar);
      currentChar = charPointer.next();
      }
      return sb.toString();
      }
      private void markInvalid(){
      isValid = false;
      }
      }
      }





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        Wrote helper to read url content. Parser for elements in another class.



        public class HTMLReaderHelper {

        private final URL currentURL;

        HTMLReaderHelper(URL url){
        currentURL = url;
        }

        public CharIterator charIterator(){
        CharIterator iterator;
        try {
        iterator = new CharIterator();
        } catch(IOException ex){
        return null;
        }
        return iterator;
        }

        public StringIterator stringIterator(){
        return new StringIterator();
        }

        class CharIterator implements java.util.Iterator<Character>{

        private InputStream urlStream;

        private boolean isValid;

        private Queue<Character> buffer;

        private CharIterator() throws IOException {
        urlStream = currentURL.openStream();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        char c;
        try {
        c = (char)urlStream.read();
        buffer.add(c);
        } catch (IOException ex) {
        markInvalid();
        return false;
        }
        return c != (char) -1;
        }

        @Override
        public Character next() {
        if(!isValid){
        return null;
        }
        char c;
        try {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        c = (char)urlStream.read();
        } catch (IOException ex) {
        markInvalid();
        return null;
        }
        return (c != (char)-1) ? c : null;
        }

        private void markInvalid(){
        isValid = false;
        }
        }

        class StringIterator implements java.util.Iterator<String>{

        private CharIterator charPointer;

        private Queue<String> buffer;

        private boolean isValid;

        private StringIterator(){
        charPointer = charIterator();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        String value = next();
        try {
        buffer.add(value);
        } catch (NullPointerException ex){
        markInvalid();
        return false;
        }
        return isValid;
        }

        @Override
        public String next() {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        if(!isValid){
        return null;
        }
        StringBuilder sb = new StringBuilder();
        Character currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        while (currentChar.equals('n') || currentChar.equals('r')){
        currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        }
        while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
        sb.append(currentChar);
        currentChar = charPointer.next();
        }
        return sb.toString();
        }
        private void markInvalid(){
        isValid = false;
        }
        }
        }





        share|improve this answer












        Wrote helper to read url content. Parser for elements in another class.



        public class HTMLReaderHelper {

        private final URL currentURL;

        HTMLReaderHelper(URL url){
        currentURL = url;
        }

        public CharIterator charIterator(){
        CharIterator iterator;
        try {
        iterator = new CharIterator();
        } catch(IOException ex){
        return null;
        }
        return iterator;
        }

        public StringIterator stringIterator(){
        return new StringIterator();
        }

        class CharIterator implements java.util.Iterator<Character>{

        private InputStream urlStream;

        private boolean isValid;

        private Queue<Character> buffer;

        private CharIterator() throws IOException {
        urlStream = currentURL.openStream();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        char c;
        try {
        c = (char)urlStream.read();
        buffer.add(c);
        } catch (IOException ex) {
        markInvalid();
        return false;
        }
        return c != (char) -1;
        }

        @Override
        public Character next() {
        if(!isValid){
        return null;
        }
        char c;
        try {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        c = (char)urlStream.read();
        } catch (IOException ex) {
        markInvalid();
        return null;
        }
        return (c != (char)-1) ? c : null;
        }

        private void markInvalid(){
        isValid = false;
        }
        }

        class StringIterator implements java.util.Iterator<String>{

        private CharIterator charPointer;

        private Queue<String> buffer;

        private boolean isValid;

        private StringIterator(){
        charPointer = charIterator();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        String value = next();
        try {
        buffer.add(value);
        } catch (NullPointerException ex){
        markInvalid();
        return false;
        }
        return isValid;
        }

        @Override
        public String next() {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        if(!isValid){
        return null;
        }
        StringBuilder sb = new StringBuilder();
        Character currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        while (currentChar.equals('n') || currentChar.equals('r')){
        currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        }
        while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
        sb.append(currentChar);
        currentChar = charPointer.next();
        }
        return sb.toString();
        }
        private void markInvalid(){
        isValid = false;
        }
        }
        }






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 at 12:05









        Vlad Doronin

        33




        33






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390833%2ffetch-html-part-in-java%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            Refactoring coordinates for Minecraft Pi buildings written in Python