kafka + reading from topic log file












0















I have a topic log file and the corresponding .index file. I would like to read the messages in a streaming fashion and process it. How and where should I start?




  1. Should I load these files to Kafka producer and read from topic?

  2. Can i directly write a consumer to read data from the file and process it?


I have gone through the Kafka website and everywhere, it uses pre-built Kafka producers and consumers in the examples. So, I couldn't get enough guidance.



I want to read in streaming fashion in Java.



The text looks encrypted so i am not posting the input files.



Any help is really appreciated.










share|improve this question























  • I cannot follow. What do you exactly mean by "I have a topic log file"?

    – Matthias J. Sax
    Nov 24 '18 at 21:52











  • It's not encrypted. It's serialized in raw bytes, but standard consumers are deserailizing that... Otherwise, what you want is the CLI tool to dump log segments, but not clear why you're wanting these raw files

    – cricket_007
    Nov 24 '18 at 22:19













  • @matthias-j-sax there are two files in a folder, they are named 0000000000.index and 000000000.log. I want to read the files and do aggregate operations.

    – Thomson
    Nov 24 '18 at 23:36











  • You would need to build a deserializer that understand the internal format used by Kafka. Overall, this seems to be a very special request. Those files are not designed to be consumer from any other application. As @cricket_007 pointed out, there is a dump log segments tool: this should help you read the files if you have the correct deserializers for keys and values.

    – Matthias J. Sax
    Nov 25 '18 at 4:22
















0















I have a topic log file and the corresponding .index file. I would like to read the messages in a streaming fashion and process it. How and where should I start?




  1. Should I load these files to Kafka producer and read from topic?

  2. Can i directly write a consumer to read data from the file and process it?


I have gone through the Kafka website and everywhere, it uses pre-built Kafka producers and consumers in the examples. So, I couldn't get enough guidance.



I want to read in streaming fashion in Java.



The text looks encrypted so i am not posting the input files.



Any help is really appreciated.










share|improve this question























  • I cannot follow. What do you exactly mean by "I have a topic log file"?

    – Matthias J. Sax
    Nov 24 '18 at 21:52











  • It's not encrypted. It's serialized in raw bytes, but standard consumers are deserailizing that... Otherwise, what you want is the CLI tool to dump log segments, but not clear why you're wanting these raw files

    – cricket_007
    Nov 24 '18 at 22:19













  • @matthias-j-sax there are two files in a folder, they are named 0000000000.index and 000000000.log. I want to read the files and do aggregate operations.

    – Thomson
    Nov 24 '18 at 23:36











  • You would need to build a deserializer that understand the internal format used by Kafka. Overall, this seems to be a very special request. Those files are not designed to be consumer from any other application. As @cricket_007 pointed out, there is a dump log segments tool: this should help you read the files if you have the correct deserializers for keys and values.

    – Matthias J. Sax
    Nov 25 '18 at 4:22














0












0








0








I have a topic log file and the corresponding .index file. I would like to read the messages in a streaming fashion and process it. How and where should I start?




  1. Should I load these files to Kafka producer and read from topic?

  2. Can i directly write a consumer to read data from the file and process it?


I have gone through the Kafka website and everywhere, it uses pre-built Kafka producers and consumers in the examples. So, I couldn't get enough guidance.



I want to read in streaming fashion in Java.



The text looks encrypted so i am not posting the input files.



Any help is really appreciated.










share|improve this question














I have a topic log file and the corresponding .index file. I would like to read the messages in a streaming fashion and process it. How and where should I start?




  1. Should I load these files to Kafka producer and read from topic?

  2. Can i directly write a consumer to read data from the file and process it?


I have gone through the Kafka website and everywhere, it uses pre-built Kafka producers and consumers in the examples. So, I couldn't get enough guidance.



I want to read in streaming fashion in Java.



The text looks encrypted so i am not posting the input files.



Any help is really appreciated.







apache-kafka apache-kafka-streams






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 24 '18 at 18:17









ThomsonThomson

1




1













  • I cannot follow. What do you exactly mean by "I have a topic log file"?

    – Matthias J. Sax
    Nov 24 '18 at 21:52











  • It's not encrypted. It's serialized in raw bytes, but standard consumers are deserailizing that... Otherwise, what you want is the CLI tool to dump log segments, but not clear why you're wanting these raw files

    – cricket_007
    Nov 24 '18 at 22:19













  • @matthias-j-sax there are two files in a folder, they are named 0000000000.index and 000000000.log. I want to read the files and do aggregate operations.

    – Thomson
    Nov 24 '18 at 23:36











  • You would need to build a deserializer that understand the internal format used by Kafka. Overall, this seems to be a very special request. Those files are not designed to be consumer from any other application. As @cricket_007 pointed out, there is a dump log segments tool: this should help you read the files if you have the correct deserializers for keys and values.

    – Matthias J. Sax
    Nov 25 '18 at 4:22



















  • I cannot follow. What do you exactly mean by "I have a topic log file"?

    – Matthias J. Sax
    Nov 24 '18 at 21:52











  • It's not encrypted. It's serialized in raw bytes, but standard consumers are deserailizing that... Otherwise, what you want is the CLI tool to dump log segments, but not clear why you're wanting these raw files

    – cricket_007
    Nov 24 '18 at 22:19













  • @matthias-j-sax there are two files in a folder, they are named 0000000000.index and 000000000.log. I want to read the files and do aggregate operations.

    – Thomson
    Nov 24 '18 at 23:36











  • You would need to build a deserializer that understand the internal format used by Kafka. Overall, this seems to be a very special request. Those files are not designed to be consumer from any other application. As @cricket_007 pointed out, there is a dump log segments tool: this should help you read the files if you have the correct deserializers for keys and values.

    – Matthias J. Sax
    Nov 25 '18 at 4:22

















I cannot follow. What do you exactly mean by "I have a topic log file"?

– Matthias J. Sax
Nov 24 '18 at 21:52





I cannot follow. What do you exactly mean by "I have a topic log file"?

– Matthias J. Sax
Nov 24 '18 at 21:52













It's not encrypted. It's serialized in raw bytes, but standard consumers are deserailizing that... Otherwise, what you want is the CLI tool to dump log segments, but not clear why you're wanting these raw files

– cricket_007
Nov 24 '18 at 22:19







It's not encrypted. It's serialized in raw bytes, but standard consumers are deserailizing that... Otherwise, what you want is the CLI tool to dump log segments, but not clear why you're wanting these raw files

– cricket_007
Nov 24 '18 at 22:19















@matthias-j-sax there are two files in a folder, they are named 0000000000.index and 000000000.log. I want to read the files and do aggregate operations.

– Thomson
Nov 24 '18 at 23:36





@matthias-j-sax there are two files in a folder, they are named 0000000000.index and 000000000.log. I want to read the files and do aggregate operations.

– Thomson
Nov 24 '18 at 23:36













You would need to build a deserializer that understand the internal format used by Kafka. Overall, this seems to be a very special request. Those files are not designed to be consumer from any other application. As @cricket_007 pointed out, there is a dump log segments tool: this should help you read the files if you have the correct deserializers for keys and values.

– Matthias J. Sax
Nov 25 '18 at 4:22





You would need to build a deserializer that understand the internal format used by Kafka. Overall, this seems to be a very special request. Those files are not designed to be consumer from any other application. As @cricket_007 pointed out, there is a dump log segments tool: this should help you read the files if you have the correct deserializers for keys and values.

– Matthias J. Sax
Nov 25 '18 at 4:22












1 Answer
1






active

oldest

votes


















1














You can dump log segments and use the deep iteration option to deserialize the data into something more readable.



If you want to "stream it", then use a standard Unix pipe to output to some other tool




do aggregate operations




Then use Kafka Streams to actually read from the topic for all partitions rather than the single partition on that single broker






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53461087%2fkafka-reading-from-topic-log-file%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You can dump log segments and use the deep iteration option to deserialize the data into something more readable.



    If you want to "stream it", then use a standard Unix pipe to output to some other tool




    do aggregate operations




    Then use Kafka Streams to actually read from the topic for all partitions rather than the single partition on that single broker






    share|improve this answer






























      1














      You can dump log segments and use the deep iteration option to deserialize the data into something more readable.



      If you want to "stream it", then use a standard Unix pipe to output to some other tool




      do aggregate operations




      Then use Kafka Streams to actually read from the topic for all partitions rather than the single partition on that single broker






      share|improve this answer




























        1












        1








        1







        You can dump log segments and use the deep iteration option to deserialize the data into something more readable.



        If you want to "stream it", then use a standard Unix pipe to output to some other tool




        do aggregate operations




        Then use Kafka Streams to actually read from the topic for all partitions rather than the single partition on that single broker






        share|improve this answer















        You can dump log segments and use the deep iteration option to deserialize the data into something more readable.



        If you want to "stream it", then use a standard Unix pipe to output to some other tool




        do aggregate operations




        Then use Kafka Streams to actually read from the topic for all partitions rather than the single partition on that single broker







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 25 '18 at 5:01

























        answered Nov 24 '18 at 22:23









        cricket_007cricket_007

        82.3k1143111




        82.3k1143111
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53461087%2fkafka-reading-from-topic-log-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'