PDF Parser in C | Extracting Text

Поделиться
HTML-код
  • Опубликовано: 19 ноя 2024

Комментарии • 29

  • @acestandard6315
    @acestandard6315 Месяц назад +1

    I am always so amazed by how Alex and Tsoding make programming look so easy.
    They aren't trying to use every complex feature the language has but just what can get the job done.

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад +1

      Thank you! Really flattered to be compared to Tsoding haha 😆

  • @BogdanTheGeek
    @BogdanTheGeek Месяц назад +15

    Tsoding 2.0

  • @luserdroog
    @luserdroog Месяц назад

    The objects don't necessarily have to start immediately after the header lines. Since objects are all located by file offsets in the xref table at the end, you could hide data between lines 3 and 4 (adjusting the xrefs of course) and most software should ignore it.

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад

      that is interesting, thanks for sharing :)

  • @hi_arav
    @hi_arav Месяц назад +1

    This is great. +1 sub and looking forward to more

  • @korigamik
    @korigamik Месяц назад

    This is good, I wonder why pdf readers don't allow this kind of functionality? maybe the big corporate doesn't want you to download their images burh

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад +1

      I feel like it is much more complicated than I made it look like, so probably they are just lazy or want you to buy all their tools

  • @YabseraPython
    @YabseraPython Месяц назад +1

    I am trying it with pure golang no library.

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад

      Good luck, that sounds like a fair challenge :D

    • @YabseraPython
      @YabseraPython Месяц назад +1

      @@AlexTheRealDev i managed to get the stream data out and flatedecoded it. The next part is to read the friendly manual it is 1280 pages long from Adobe. I am stuck on that part right.

  • @AK-vx4dy
    @AK-vx4dy Месяц назад

    With out any library?
    Brave... I done text extraction using some library wich braked pdf to all logical parts, it was still hard becuase of characters maps.
    Beware that pdf can be constructed in many ways so probably your parser will fail on many.

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад +1

      Thanks for the heads up. I expect it to fail a lot, but will try to fix issues as they show up

    • @AK-vx4dy
      @AK-vx4dy Месяц назад

      @@AlexTheRealDev I believe you can, I only wanted to say, that it will be a long journey. Good job, I subscribe for next journeys.

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад

      I appreciate

  • @acestandard6315
    @acestandard6315 Месяц назад

    Hey did you write ds.h yourself

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад +2

      I did, there are some older videos where I did that from scratch, but it kind of evolved over the time

    • @acestandard6315
      @acestandard6315 Месяц назад

      @AlexTheRealDev You have really inspired me.
      I have been going through for some time now and I am learning a lot.
      Could you please get me a link to the old videos or atleast a title as I am finding difficulties finding them.
      My goal is to build a game engine.
      Also please don't stop making videos

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад +1

      @@acestandard6315 should be the ones titled “data structures” but check out the github, you will find ds.h there alexjercan/ds.h

    • @acestandard6315
      @acestandard6315 Месяц назад

      @@AlexTheRealDev Thanks found the video

    • @AlexTheRealDev
      @AlexTheRealDev  Месяц назад

      that is great :)

  • @joa-p2m
    @joa-p2m Месяц назад

    C

  • @theemacsen1518
    @theemacsen1518 Месяц назад

    Tsoding with nvim xd