Types of PDF - Computerphile

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • "Just send me a PDF!" - but what kind of PDF? As Professor Brailsford explains, PDF is simply a wrapper which can contain a variety of joys!
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottsco...
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Комментарии • 396

  • @isaac10231
    @isaac10231 3 года назад +799

    Life goal - finding something to be as passionate in life as this man is about crispy text.

    • @skuzzbunny
      @skuzzbunny 3 года назад +15

      crispy text is the best!!!!!D

    • @unlokia
      @unlokia 3 года назад +21

      CRISP, *_not_* "crispy". This is a silly error that seems to be propagating net-wide +as usual we can blame the yanks!!+
      A brand of creme donuts' products are named "crispy", images and text are *CRISP!!*

    • @CJT3X
      @CJT3X 3 года назад +8

      @@unlokia no need to be so crispy ‘bout it

    • @DryPaperHammerBro
      @DryPaperHammerBro 3 года назад +1

      @@skuzzbunny {o{obi,l. K.l k I 98xd

    • @kokoinmars
      @kokoinmars 3 года назад

      Crispy text is nothing to scoff about.

  • @23Scadu
    @23Scadu 3 года назад +661

    What PDF says to me isn't quality, but uniformity, as in it'll look the same no matter what device or software you're using to view it, even if it's a sheet of paper instead of a screen. (I know this isn't actually the case, but as I understand it, it's how it _should_ work.) So when I get a PDF, I trust that each line and character is exactly where it's supposed to be, and not shifted due to text reflow or different fonts or whatever. From that perspective it doesn't matter if it's using razor sharp vectors or blocky bitmaps.

    • @max15half
      @max15half 3 года назад +54

      Well, you could be reasonably sure that a bitmap will not misplace your lines and characters.

    • @23Scadu
      @23Scadu 3 года назад +18

      @@max15half Sure, but there are other qualities of bitmaps that make them less than ideal for text. PDF has the same advantages as other document formats while feeling more trustworthy than, say, a .doc or a .html, even if they're not always used to the fullest.

    • @Platoqp
      @Platoqp 3 года назад +6

      I think that is how it started too. That said, if a professor asks for a PDF, it is a decent implication for some layout

    • @JMNTY
      @JMNTY 3 года назад +11

      @@max15half But how are those bitmaps viewed by the receiver?
      Numeric ordered images but reader tries to open them in alphabetical order, size order or age order (whatever is the default on their image viewer).
      Varying image sizes and the image viewer scales them in stupid ways.
      PDF is still good system even if the content is just bitmaps. It keeps them all in correct scale and order.

    • @ccreutzig
      @ccreutzig 3 года назад +8

      @@hammerhals These days, not everything in PDF is "statically linked." Many PDF viewers, including Acrobat, have a JavaScript engine, and for the modern type of PDF forms, where you may be able to add table rows etc., you kind of need that.
      That in turn means some people embed code in their PDF to, say, render animations etc.

  • @martinbean
    @martinbean 3 года назад +457

    Imagine saying something as innocuous as “I’ll send you a PDF” to this guy and then getting a 2-hour lecture in response…

    • @FriedEgg101
      @FriedEgg101 3 года назад +20

      Maybe you could cut the lecture short by following up with "it'll be PDF Normal".

    • @erwinmulder1338
      @erwinmulder1338 3 года назад +17

      Professor Brailsford can lecture me all day.

    • @michaeldamolsen
      @michaeldamolsen 3 года назад +7

      That would be the best day of the month for sure!

    • @swiftfox3461
      @swiftfox3461 3 года назад +4

      I'd listen closely and turn off my phone to make sure I didn't miss anything.

    • @amicaaranearum
      @amicaaranearum 3 года назад +6

      Professor Brailsford definitely made this video in response to receiving a low-quality PDF scanned from a photocopy.

  • @sedawk
    @sedawk 3 года назад +282

    “I asked someone to send me a PDF and all I got was this lousy bit map” - would make a great t-shirt.

    • @SomethingUnreal
      @SomethingUnreal 3 года назад +30

      Complete with blocky JPEG artifacts all around the text, of course!

    • @frankharr9466
      @frankharr9466 3 года назад +5

      Don't tempt me.

    • @naughtiusmaximus789
      @naughtiusmaximus789 2 года назад

      Grand Theft Auto : Vice City 100% completion reward

  • @greatquux
    @greatquux 3 года назад +181

    Brailsford’s eyesight is better than mine, he can use xterm at the default font size!

  • @StevenSeiller
    @StevenSeiller 3 года назад +87

    🤓me before video: "Finally time to learn the differences between PDF/X, PDF/E, and PDF/A!"
    🤷‍♂️me after video: "Where is PDF(FTG), PDF(I), or PDF(I+HT) in my Adobe Save As...???"

  • @thuokagiri5550
    @thuokagiri5550 3 года назад +88

    How much we missed prof Brailsford

  • @ToSMaster12345
    @ToSMaster12345 3 года назад +49

    I was smiling in total bliss throughout the video! Finally I feel understood!
    This is the reason why I write all my documents in LaTeX and using vector images for figures that have embedded text! So that even the scalebar and axis labels in my plots can be selected or searched via text!
    Reject Bitmap! Embrace PDF-FTG! :D

    • @carlosmspk
      @carlosmspk Год назад +3

      I mean, anyone wtih academic background would understand you

  • @mikefochtman7164
    @mikefochtman7164 3 года назад +15

    Reminded of a similar issue we had with old mechanical, piping, and electrical drawings, the kind that were literally 'blueprints'. They had been photographed onto microfische and the originals worn out/lost. Taking the microfische cards and having them scanned (causing even more loss of quality).
    Then a team of graphics artists would import the scanned image as 'background' into a modern drafting tool and literally 'trace' over each marking on the original. This basically re-drew the drawings using the scanned background image as the template. The final step was to 'hide' the background and voila! A modern, vector drawing that was searchable and could be manipulated with modern tools. If anyone suspected a mistake in the redrawing, we would 'unhide' the background to look at the scanned image, or even go back to the microfische (we kept a 30-year-old viewer on hand).
    I forget how much that cost, but it was about 3 graphics artists working over a year to do several hundred drawings. :(

  • @1337Unlucky
    @1337Unlucky 3 года назад +65

    He clearly has strong views on PDFs, it's funny because it reminds me of me but explaining formats for photography and how to preserve quality. God i hate when they send photos via social media without using .zip or .rar and all the photos gets ultra compressed.
    It's not only about photos and not only about PDFs, I understand the man, it's about PRESERVATION. The world needs to understand better formats and ways to preserve content. I just love this man.

    • @ZaneDaMagicPufferDragon
      @ZaneDaMagicPufferDragon 3 года назад +3

      💯 Preservation!!! I’m a Preservationist At Heart ❤️😉

    • @LordMegatherium
      @LordMegatherium 3 года назад +6

      If it's about preservation then rar should be out of the picture because it's a closed format. It's unlikely that we won't be able to open them in 50+ years especially since we have a libre decompression implementation but the point still stands.

    • @Entertainment-
      @Entertainment- 2 года назад

      That's why I love Telegram, it does the compression too, but it also allows you to send pictures or any file for that matter in it's original size

  • @mastertacosmith
    @mastertacosmith 3 года назад +85

    This man needs a 40” ultrawide so he can truly enjoy a good typeface at scale

  • @IIARROWS
    @IIARROWS 3 года назад +245

    I got worse: an Excel sheet with a picture pasted inside it.
    And not a picture of a table, a screenshot of the application I was working on.

    • @olik136
      @olik136 3 года назад +16

      my architectural software has a library folder with a drawing file that contains a screenshot of that library folder telling you that certain files are hidden and can only be found with windows explorer...

    • @recklessroges
      @recklessroges 3 года назад +2

      I'll send you a screen-shot of that in an HTML email ;-) /s

    • @david.mcmahan
      @david.mcmahan 3 года назад +12

      I once had a client take a screenshot of their full desktop (with an opened PDF among many windows), paste it into a Word doc., crop it down to just a signature graphic, and then scale it back up because the signature was too small. This was their method of "extracting" the signature image from a PDF.
      Fair enough, but it was because they wanted the version of the signature we had already cleaned up to look better in print.

    • @JNCressey
      @JNCressey 3 года назад +2

      @@david.mcmahan, can whoever they give the Word document to tell Word to show the full image to see everything they had open in the screenshot?

    • @david.mcmahan
      @david.mcmahan 3 года назад +5

      @@JNCressey Yes, I could see everything they had opened on the screen. There was nothing bad, but it could have been a security incident.

  • @nikolayrayanov2895
    @nikolayrayanov2895 3 года назад +9

    This is gold. I've tried to explain to people at work about different types of PDFs for years.

  • @noferblatz
    @noferblatz 3 года назад +5

    This professor is positively the best you feature. His enthusiasm and his ability to explain complex technical concepts in a simple way is unmatched.

  • @drskelebone
    @drskelebone 3 года назад +8

    I'm in a completely different field, and when the Professor states "if you want a straight line, you just say Line()" he is 100% talking to my soul and speaking the truth I have wanted to shout into so many faces.
    ty!

  • @jlivewell
    @jlivewell 3 года назад +17

    Every time I watch a video by Dr. Brailsford, Phd, I add a new life regret …. That I didn’t meet him when I was 17 and learn everything from him.

    • @jackkraken3888
      @jackkraken3888 2 года назад

      With someone like him you can never learn everything.

  • @balmar3
    @balmar3 3 года назад +10

    Yesss! Professor is using Alpine, one of the best emailers out there. You should make some videos on the awesome power of terminal-based utilities.

  • @m47h4r
    @m47h4r 2 года назад +1

    This was a joy to watch! I respect people like him very much. Being genuinely interested in something and actually putting the time in to learn about its ins and outs. Never mind the fact that he uses Linux with a bunch of open terminals, that's just the cherry on top!

  • @Sam-th4jl
    @Sam-th4jl 3 года назад +1

    i think i could listen to him talk about literally anything and find it interesting just because of his delivery

  • @harshjinger
    @harshjinger 3 года назад +7

    Thanks... I rely on open source information to learn about computer based things that occurred even before I was born.
    Recently, I was looking into this exact question for a project of my own, And this is a perfect resource.
    I have never used Adobe's official softwares, being a novice ungrad student besides being broke, this serves as a great reference.
    Thanks a lot again...

  • @YingwuUsagiri
    @YingwuUsagiri 3 года назад +16

    As someone in an administrative job when someone says send me a PDF they mean "any quality yet not easily edited". Invoices for example are never allowed to be easily editable like Word or Excel (and yes that happens often enough). If they want infinitely scalable they'll ask for a Vector and if they want something that's super sharp made in InDesign etc. they'll ask for an INDD. In my almost decade of working in administrations PDF just means can't be edited (easily, because I am very well aware that you still can somehow).

    • @Starguy256
      @Starguy256 3 года назад +1

      I edit PDFs every day in my work. Sometimes our software prints the wrong thing and instead of going in and trying to fix it, just edit it on the PDF before you send it. As long as it's FTG (as anything not produced by a photocopier should be) you just hit "Edit PDF" in Acrobat.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 2 года назад

      The irony is that using vector graphics and actual text objects make it easier to edit the PDF file. The hardest type to edit is the one where every page is a bitmap.

  • @RhinoBlindado
    @RhinoBlindado 3 года назад +3

    Prof B looking quite dapper today. Loved the video!

  • @kasamikona
    @kasamikona 3 года назад +3

    Prof Brailsford you're a very brave man pronouncing PNG as "ping" around these parts...

  • @TheAstronomyDude
    @TheAstronomyDude 3 года назад +31

    How does post office OCR work? Sorting centers read the address off an envelope in a fraction of a second and they've been doing it for decades; long before Adobe.

    • @666Tomato666
      @666Tomato666 3 года назад +32

      fundamentally the same technology, but they have the benefit that the address is highly redundant; can't read the full postcode? check the city and street name

    • @bluedeath996
      @bluedeath996 3 года назад +15

      Combined with a very standardised way to format addresses. There is also a "lost letter" centre where a person decodes things the OCR can't read, but newer tech is better at the job.

    • @the_lenny1
      @the_lenny1 3 года назад +2

      @@666Tomato666 yeah, and on top of that the most important information is the postcode, which is only numbers.

  • @JNCressey
    @JNCressey 3 года назад +19

    Some interesting wierd things I've encountered with PDFs:
    1. I remember some time last year I copied a JPEG out of a PDF container and found it had a slightly different format than regular JPEGs. I think normal JPEGs have the word "JFIF" at the beginning of the file but I think this had something else maybe "ADOBE" through I don't exactly remember, could have been a different word.
    2. Just today I found out there are two options to save a pdf from Microsoft edge. "Save as PDF" vs "Microsoft print to PDF", and the "Microsoft print to PDF" produced a file that was significantly larger and slower to load when viewing.
    3. some PDFs I've seen allow you to search and select text, but don't let you copy or print. I think it's called "secured PDF". I'm not sure why PDF viewers from companies other than adobe would respect those restrictions. Is there something in the file that fundamentally makes these actions impossible or does it just ask the program to disallow them?

    • @neumdeneuer1890
      @neumdeneuer1890 3 года назад +12

      Response to point 3:
      Yes, the PDF just asks nicely to not allow copying. There are no technical restrictions and more then enough programms which ignore such requests.

    • @hanelyp1
      @hanelyp1 3 года назад +1

      And a fair selection of the software you could use to read the open format PDF is open source. If such software did pay attention to a "no copy" flag it would be possible to alter the software to ignore it.

  • @PhilReynoldsLondonGeek
    @PhilReynoldsLondonGeek 3 года назад +55

    The only real *problem* with PDF is that many organisations provide you with their forms as images. If they could be done as proper forms it would be far easier to actually use them.

    • @turpialito
      @turpialito 3 года назад +14

      But isn't it that it's not actually a PDF problem, but rather people not using the proper PDF generator; in this case Adobe Forms (which AFAIR is bundled with Acrobat)?

    • @ophello
      @ophello 3 года назад +2

      This isn’t a problem with PDF. It’s a problem with organizations.

  • @mickjames73
    @mickjames73 3 года назад +3

    Pdf variability is very frustrating for blind or low vision people. You would often receive a document of instruction manual which was rendered as an image only and we used to have to print, rescan and ocr them (often quiite tricky with complex page layouts). Luckily there is now a fairly accurate builtin ocr engine in things like acrobat reader. The other issue with pdf variantion is many pdf dont confirm to standards for accessibility and thus become unusable, or difficult, when viewed with accessibility features turned on.

    • @Jebusankel
      @Jebusankel 3 года назад

      I was frustrated recently that my auto insurance documents are all in bad bitmap PDF format. But if I complain to them and claim to be blind, I think they'll have some follow up questions. 😜

  • @Yupppi
    @Yupppi 3 года назад +6

    I see new computerphile with prof. Brailsford's face and my week is immediately better. I even got to walk inside his home a little bit this time!
    After seeing bad photocopies of 80's device manuals, I too can get behind their obsession about pdf quality. Even the manufacturer's archives has that poor photocopy and the original pront could've been subpar.

  • @squishmastah4682
    @squishmastah4682 3 года назад +12

    "[PDF] covers a multitude of sins."
    Yes. Especially at Hustler Magazine.

  • @TheFakeVIP
    @TheFakeVIP 3 года назад +3

    I feel it bares also pointing out that correctly type-set text in PDF files that is reproduced from a font, not a bitmap, significantly increases the accessibility of such documents for people who use assistive technologies such as screen readers. PDF files are often ripped to shreds by the blind community for this exact reason. Even correctly produced PDFs that are, for instance, produced from a word processor, often cause problems for screen readers depending on how the text is drawn, and the competency of the software to add accessibility hints where appropriate. A common example of this is text in columns: quite often assistive technologies don't expect this, and so read it linearly (I.E. they read both columns at once). Properly tagging important landmarks such as headings can also be a great help, as screen reader users frequently navigate (or even summarise) a document simply by jumping between headings.

    • @williamchamberlain2263
      @williamchamberlain2263 3 года назад

      Yes

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 2 года назад

      DJVU format deals with this by storing searchable text objects which are not rendered, separate from the actual page rendering.
      I think PDF allows this also.

  • @deansundquist9601
    @deansundquist9601 3 года назад

    The strive for excellence in typesetting is very noble. As always, thanks for the wonderful content Prof. Brailsford.

  • @Richardincancale
    @Richardincancale 3 года назад +12

    Do you remember desk-top search engines? I used to test them by hiding the word ‘marmalade’ in a PowerPoint in a zip file to test their ability to find and index text :-)

    • @ShankarSivarajan
      @ShankarSivarajan 3 года назад

      Did that work?

    • @CJT3X
      @CJT3X 3 года назад +1

      You mean like an early version of Spotlight/Alfred?

    • @Richardincancale
      @Richardincancale 3 года назад

      @@CJT3X I recall that both Altavista and Hoogle had desktop indexing tools. Yes it worked and found my hidden marmalade!

    • @Richardincancale
      @Richardincancale 3 года назад

      @@ShankarSivarajan Yup

  • @soccerox817
    @soccerox817 3 года назад +32

    Exactly why I cant stand when people just ask for a PDF or send a poorly rendered pdf. Gotta write documents in LaTex and export a quality PDF

    • @peterwhitey4992
      @peterwhitey4992 3 года назад +2

      LaTex is overrated.

    • @miran248
      @miran248 3 года назад +14

      @@peterwhitey4992 Wouldn't say overrated, but maybe an overkill in most cases. Something like markdown should be more than enough for simple stuff (w/o math equations, ..)

    • @peterwhitey4992
      @peterwhitey4992 3 года назад

      @@miran248 - I know it's practical to write in, but it's the result that I find overrated. You can always tell when a paper/book is written in LaTex. They all look the same. Especially textbooks written in LaTex are generally not very good.

    • @Platoqp
      @Platoqp 3 года назад +1

      @@peterwhitey4992 It is excellent for writings that include mathematics and other scientific formulas

    • @michaelb2047
      @michaelb2047 3 года назад +4

      @@peterwhitey4992 I would say most natural science textbooks are written in latex. You can change everything so you won’t notice that it was actually written with latex. You notice it only if they use the default template / font. Also they are often much cleaner / more consistent than „Word“ books for example.

  • @tjarko72
    @tjarko72 3 года назад +14

    I always tought that PDF(ftg) was closely related to postscript, I would have expected a mention of postscript. More mordern, also PDF/A.

    • @ZedaZ80
      @ZedaZ80 3 года назад +1

      PostScript is lovely

    • @nezZario
      @nezZario 3 года назад

      It is.

  • @jorisschellekens4630
    @jorisschellekens4630 3 года назад

    This is such a wonderful video. I'm the author of a PDF library (pText) and you have no idea how often people will complain about something like "it doesn't seem to extract the text".
    Thus forcing me to explain "Yeah, but this is an image, not a PDF."

  • @ajayrangishetti5515
    @ajayrangishetti5515 3 года назад +7

    Please do a video on explaining Pentium processor architecture, and about how multi-core processor perform out-of-order execution.

  • @TimothyWhiteheadzm
    @TimothyWhiteheadzm 3 года назад +16

    Expecting a certain quality of content from the pdf format is as ridiculous as expecting quality content on a web page. A container is just that. It can contain flowers, or manure. As for the OCR feature, that is great, but one wonders if that is part of 'pdf' or part of the tool that creates the pdf?

    • @harshjinger
      @harshjinger 3 года назад

      Idk... About this... I would love to know more... Commenting for any followups

    • @majorgnu
      @majorgnu 3 года назад +1

      It's a feature of the software that produced the PDF, obviously.
      Even if the format was extended at some point with features that facilitate this kind of use, the file itself still only contains the *result* of the OCR process, which was performed by whatever applications were used to produce it.

    • @drawapretzel6003
      @drawapretzel6003 3 года назад +1

      Well, its not in the free version of adobe reader, thats for sure.
      Theres lots of free OCR software that can OCR a pdf for you, but yes, its included in the tools for an actual PDF creation software too.

    • @HetareKing
      @HetareKing 3 года назад

      The actual OCRing happens in the creation tool, but this whole notion of having a bitmap overlay invisible text has to be encoded into the file and so the format has to support it. And since this functionality only really makes sense in the context of the OCR feature, I think it's fair to say it's part of "PDF".

    • @JNCressey
      @JNCressey 3 года назад

      I suppose if the creator of the pdf has a bitmap with text that is obviously unOCRable (maybe stylised text) they would manually add the hidden text, getting the same effect but without OCR.
      Styles that come to mind that OCR wouldn't work well on could be extra objects between the letters (google doodles), people posing in letter shapes (it's fun to stay at the YMCA), drawing just the negative space, bubble text or drawing just the shadows of the text, leaving out lines (E as 3 horizontal lines, A without the horizontal part), or using characters of other alphabets that look similar (like in r/grssk).

  • @DaimlerSleeveValve
    @DaimlerSleeveValve 3 года назад +4

    It surprised me that for the last couple of years, Google has been running OCR on the contents of PDFs which contain only images. I've located names mentioned only on signs visible in the backgrounds of pictures of something else.

  • @okusa7750
    @okusa7750 3 года назад +2

    Feel like David Attenborough just lectured me about the types of PDF. Amazing passionate storyteller

  • @delhatton
    @delhatton 3 года назад +1

    OCR for pure text. Maybe OK. It will still require editing. OCR for numerical data, like some Excel sheets, by the time you've verified all the numbers, you might as well have retyped it.

  • @MrBoubource
    @MrBoubource 3 года назад +13

    My internship topic is to find the paragraphs containing some keywords in a pdf with 4 different formatting depending on its provider.
    I am beginning to hate it.

    • @DT-dc4br
      @DT-dc4br 3 года назад +4

      Might be a job for a Linux shell script with awk / grep & sed

    • @MrBoubource
      @MrBoubource 3 года назад +3

      @@DT-dc4br I went with python (and regex's) because I'm most familiar with it... But holy what a mess it is to covert pdf to html and plain text..

    • @etziowingeler3173
      @etziowingeler3173 3 года назад

      Hahaha I can imagine

  • @anarchist
    @anarchist 3 года назад +3

    8:40 4:3 monitor because nothing can throttle Brailsford's brain power.
    Not PDF but something that tickled when working with TIFFs was a joke it stands for "Thousands of Incompatible File Formats"

  • @geirtwo
    @geirtwo Год назад

    I wish this channel had more satisfying visuals.

  • @superfluidity
    @superfluidity 3 года назад +3

    If you can, don't just aim for the highest quality that your audience demands - aim for quality far beyond that. That will give you more freedom to rework the document later if you want to.

  • @zombiegeorge749
    @zombiegeorge749 3 года назад +5

    2:42 whats up with the edges of the screen?

    • @Computerphile
      @Computerphile  3 года назад +4

      if you read the small text on the "newspaper" it helps explain it a little :) -Sean (basically I rotated it a little to fix my wonky camerawork and missed zooming it in)

  • @jorisschellekens4630
    @jorisschellekens4630 3 года назад

    The way most PDF libraries or programs handle OCR is by something the spec calls "optional content groups".
    Optional content groups allow you to mark any content in the pdf content stream with a particular tag (typically the layer name).
    Programs like Adobe will then show you a listing of all the layers. So you could imagine being able to toggle OCR on and off.

  • @PhilipStorry
    @PhilipStorry 3 года назад +2

    How do I subscribe to Vague Magazine? If it has high quality reminiscing from Professor Brailsford, then I need a subscription! 😉

  • @magacacciari3565
    @magacacciari3565 3 года назад

    Huge fan of Professor B and his computer lores.

  • @johnno4127
    @johnno4127 3 года назад

    The searchable nature of image and hidden text or (image with text replaced by an actual font) is fantastic!
    .
    The vast quantity of extra spaces and line returns can get frustrating when trying to use that OCR text, though. It's also a pain when adobe put a random space in the middle of a word or between EACH LETTER and now you can't find what you're looking for.

  • @jashaswimalyaacharjee9585
    @jashaswimalyaacharjee9585 3 года назад +1

    I am totally convinced that Prof. Brailsford uses this machine 9:58 as his occasional-use Computer. What Peeping Toms like me can observe, there's Alpine 2.21 (fairly latest software compared to the system)

  • @Gnsdtc
    @Gnsdtc 3 года назад +1

    This is beautiful. The OCR version is PDF I+HT!

  • @bhargavk1515
    @bhargavk1515 Год назад +1

    Can you make a tutorial (or is there a tutorial) on how prof. Brailsford restored the bitmap pdf into pdf encoding...

  • @John_Fx
    @John_Fx 3 года назад +4

    He barely scratched the surface of the complexity of PDF formats. Didn't even cover PDF/A or why you should never redact a PDF and send out that original file.

    • @Jebusankel
      @Jebusankel 3 года назад

      There is a true Redact function in Adobe Acrobat. You just have to use that instead of drawing a box on top.
      Ditto on PDF/A though.

  • @Graham_Rule
    @Graham_Rule 3 года назад

    The photocopier/scanner at work can scan to PDF/A which generates searchable text by doing OCR. Being internet enabled it can then send a copy by email (possibly bcc'd to Xerox or other third parties without our knowlege).

  • @HugoOneYT
    @HugoOneYT 3 года назад +2

    To me PDF is about compatibility, there's a reason why all invoices are PDF, everything can open it

  • @camadams9149
    @camadams9149 3 года назад

    Sounds like people don't know what each file format does
    1) PDFs - I use exclusively for pages I wanted bundled together in a single document that always looks the same regardless of device viewed on OR for a fillable document
    2) PNG - I use exclusively for a single image that I want to be static in quality & size
    3) JPG - I don't use it
    4) SVG - A PNG that may need to be resized while retaining quality
    Then again, I don't pay for file editors. So my approach is very much: I want you to be able to use the files natively

  • @jeromethiel4323
    @jeromethiel4323 3 года назад +1

    I worked for a company, and we had electrical prints that were paper only. We paid a company to generate CAD files of the prints. What they did is insert scans of the paper copy into the CAD software, which isn't what we wanted. They basically screwed us over big time.
    The whole point of having them i CAD format was so that we could edit the bloody things!

  • @AleksyGrabovski
    @AleksyGrabovski 3 года назад +2

    Can you also do a video on DJVU format?

  • @saranchance5650
    @saranchance5650 3 года назад +1

    Pdf has additional accessibility features that the variants you described make possible

  • @MartinOmander
    @MartinOmander 3 года назад

    Excellent video! I have a request for future videos: please consider keeping the camera still if the subject is stationary. The shakycam effect unfortunately made me seasick and distracted from the professor's excellent performance.

  • @henke37
    @henke37 2 года назад +2

    Fun fact: the pdf format is so complex that it literally includes functionality for executing arbitrary shell commands. As a feature.

  • @UncleKennysPlace
    @UncleKennysPlace 3 года назад +2

    My day job is assembling documents in PDF format for aviation certification. It's shocking how many engineers send everything as PDF, even bitmaps, when I know they had to convert them, despite instructions saying we can work with any format that their native applications produce.

    • @bhargavk1515
      @bhargavk1515 Год назад

      Sir how do I learn to pdf format encoding, any guide?

  • @adrianalexandrov7730
    @adrianalexandrov7730 Год назад

    That's kinda how djvu worked: saving text as a high detailed foreground and compressing background. That was miracle how scanned hundreds of pages book could fit into just a few Mb

  • @Baxtexx
    @Baxtexx 3 года назад +1

    Urg this reminds me of a software I was working on that was consuming pdfs and rebranding them. There were so many edge cases all the time!

  • @StevenSeiller
    @StevenSeiller 3 года назад +3

    🔆 The request for a PDF should be followed by the question, “What for?” Its intended use will dictate how it should be generated.
    ⁉️On a related note, isn’t it so fun to be asked for a specific file format by someone who doesn’t know why, nor the necessary specifications, while they assert that you are the one who is making the process complicated by asking so many questions?!? 🤔

  • @oposkainaxei
    @oposkainaxei 3 года назад +3

    4:30 OCR Systems

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 2 года назад

    12:03 It looks like a scan that has been quantized into a bilevel (black and white only, no greys) bitmap. Those little hairy extensions on the edges are characteristic of that.

  • @bartas9693
    @bartas9693 3 года назад +6

    It's ok I'll send you a PDF.

    • @SimGunther
      @SimGunther 3 года назад

      Yeah, but what? Image, full, text?

  • @willis936
    @willis936 3 года назад +1

    I was hoping to hear about the glories of vector graphics. I just want some validation from spending months in Latex on my thesis.
    I got all of my MATLAB plots looking just right, used third party functions (permissively licensed) to output eps, wrote scripts to optimize the size of the eps, used pdfsizeopt on the resultant file. All of this was after I set up a *hefty* style with hyperlinks for ToC, bibliography references, and cross references.
    There are no classes for this kind of thing. There is barely even interest in the highest end communities. You just have to be a little crazy.

  • @sweting
    @sweting 3 года назад

    please enable auto-generated captions if you are unable to provide custom captions, removing auto-generated captions when they are automatically provided means that people who need assistance with hearing will have nothing to fall back on

  • @Ice_Karma
    @Ice_Karma 3 года назад +1

    Prof. Brailsford, do you still use PINE, or Alpine? =D
    (PINE user since 3.87...)

  • @SeanBZA
    @SeanBZA 3 года назад

    Also different types of PDF creator gives different file size outputs. Firefox PDF is massive, often bigger than the original, as it is a PDF of the page as it would be sent to the printer, but the PDF output from Debian is a lot smaller, just a file with the fonts and text, as the original document had.

  • @LoesserOf2Evils
    @LoesserOf2Evils 3 года назад

    If you can decompose the PDF into the text and the graphics and then recreate them into a word processing document, that can help. Then drop the document into Adobe Indesign for better and tighter layout. I admit that's a lot of effort, but sometimes it's worth it; and if the PDF standard changes in the future and it's important to produce a new standard, it'll be far easier.

  • @pierreabbat6157
    @pierreabbat6157 3 года назад +1

    Many of my programs output PostScript, which can be converted to PDF. I've seen many PS files get bigger when converted to PDF; I just checked one which is 4.5 times as big in PDF as in PS. I also once wrote a PS file using the random number generator and converted it to PDF. The converted file lost the randomness.
    I'm a surveyor and download maps in PDF from register of deeds sites. The old ones are scanned, of course. But the ones drawn with CAD are, I think, also scanned. They should be taken from the PDF output of the CAD program, except that the signature is written on paper (or clear plastic sheet), which poses a problem. Digitizing the numbers from a printed copy of the plat can result in illegible numbers (is that a 6, an 8, or a 9?).

  • @iabervon
    @iabervon 3 года назад

    Midway through the video, I was distracting by recognizing that Professor Brailsford uses the same program for email that I do.
    I often solve crossword puzzles that I get as PDFs, and it's interesting to see whether the program that made the PDF put the text of the clues in the logical order that you'd read them, or if it went top to bottom, left to right, ignoring columns.

  • @Chobungus
    @Chobungus 3 года назад +1

    Can someone clarify for me, when he is going over the "hideously complex mathematical equations" @ 9:19, he says that you do not want to have to type that out character-by-character. Yet he then demonstrates that he is able to zoom in greatly while preserving quality. So how did he translate the bitmap image to that high quality type set?

    • @Computerphile
      @Computerphile  3 года назад +3

      In this case that's exactly what the Prof is working on, recreating this important document page by page using similar software to what Dennis would have had available - Professor Brailsford talks about it in a recent video but it has been an almost full time job for him for a while now! -Sean p.s. if you see the two pictures early in this video you'll see that a version of the Thesis Dennis held was damaged but one his friend had reviewed is OK - The damaged one has amendments so this is a difficult task!

    • @Chobungus
      @Chobungus 3 года назад

      @@Computerphile Thanks for the reply! Great video!

  • @Ziphoroc
    @Ziphoroc 2 года назад

    You missed the most common reason people choose to put thinks into a PDF. You can put multiple things into the PDF and having it be one single file, allowing you to send all the documents in one neatly organized PDF rather than sending multiple separate files that won’t have any order. It’s much more convenient to be able to scroll back and forth, rather than having to open multiple windows back and forth to get the same information. I wouldn’t have finished college if the online textbooks I paid for came in 350 separate JPEG files in a folder, rather than a PDF of the entire book that I can scroll through. I’d take the PDF even if it was for whatever reason an even lower quality images than the individual pages in JPEG.

  • @trollhunter200
    @trollhunter200 3 года назад +2

    Debian with KDE Plasma is the best.

    • @gug1970
      @gug1970 3 года назад +1

      It was oddly satisfying to watch him using KDE on Debian on my Debian box running KDE.

  • @power-max
    @power-max 3 года назад +3

    2:30 why is the video askew? why is it when I google that word the search results are askew?

    • @PhilBoswell
      @PhilBoswell 3 года назад

      As to the first, see Sean's answer elsewhere on this page; as to the second, that's a Google Easter Egg.

    • @peterwhitey4992
      @peterwhitey4992 3 года назад

      Isn't it obvious why the results are askew?

    • @power-max
      @power-max 3 года назад

      @@PhilBoswell yeah I know that was just the joke

  • @Rubrickety
    @Rubrickety 3 года назад

    Fascinating video with perhaps the least clickbaity title in history.

  • @samuelworsnop9983
    @samuelworsnop9983 3 года назад +3

    I really want to know what Professor Brailsford's favourite font is!

  • @xelaxander
    @xelaxander 2 года назад

    What’s the software Prof. Brailsford is using? I’d really love to search to some older mathematical books.

  • @ZaneDaMagicPufferDragon
    @ZaneDaMagicPufferDragon 3 года назад

    PDF FTG FTW 🙌🏻 I LOVE ❤️ PDF AND ITS PROGRESS IS AMAZING 🤩 GREAT VIDEO PROFESSOR 👨🏻‍🏫 BRAILSFORD!!!

  • @danielmnet
    @danielmnet 3 года назад

    If Prof. Brailsford is explaining I am interested in, it doesn't matter the subject

  • @ieperlingetje
    @ieperlingetje 3 года назад

    4:24 Sean often gets camera settings wrong and things come out blurry, so here's an animation to hide that.

  • @turpialito
    @turpialito 3 года назад

    Brailsfordphile, Brady. I think it's high time ;)

  • @volodyadykun6490
    @volodyadykun6490 3 года назад +4

    4:18 great newspaper

    • @miran248
      @miran248 3 года назад

      .5btc - that's one expensive newspaper :)

    • @klaxoncow
      @klaxoncow 3 года назад

      @@miran248 Or maybe not. Depends how well Bitcoin's doing at the time.
      Virtual currency, yes. Anchored currency, no.

  • @marsgal42
    @marsgal42 3 года назад

    In a past life I did a lot of work with PostScript and one product we developed was a PostScript sanitizer that would take any deranged PostScript you threw at it and output well-behaved well-structured PostScript suitable for further processing. We got the idea from generating PDF then printing it to a file with Adobe's PostScript printer driver.

  • @unlokia
    @unlokia 3 года назад

    Prof Brailsworth: The font of all PDF knowledge.

  • @kakka4462
    @kakka4462 3 года назад

    2:31 whole clip is tilted showing background clip of table rug?

  • @lablnet
    @lablnet 3 года назад +1

    Nice love to see more video's like these

  • @No0utlet
    @No0utlet 3 года назад

    At 2:30, it appears that the video of Prof. Brailsford is overlaying a video of the paper on his table and is rotated a very slight amount. Are there any video editors out there that could explain how that might happen by accident?

  • @OleJacobsen
    @OleJacobsen 3 года назад

    I think you mentioned PostScript *once*, yet it's the basic underlying language for all PDF files and certainly why fonts can be scaled without loss of resolution, lines and circles can be drawn etc. Your video is about converting printed documents to PDF and the various technologies available for such a task, and not really about "Types of PDF".

    • @Jebusankel
      @Jebusankel 3 года назад

      There's at least one other Computerphile video with Prof. Brailsford that does a deep dive into postscript.

  • @TheSeverian
    @TheSeverian 3 года назад +3

    How did he talk about PDF for 14 minutes without mentioning Postscript? :)

    • @turpialito
      @turpialito 3 года назад +2

      Prof. Brailsford has a mind as sharp as a razorblade. I've seen videos of him frequently saying things like "... (so many) episodes ago, we talked about this and that, so I'm not going to bore you again..." or "Prof. such and such already did a video on this one, go watch it if you need to". His vids feel like a world-class lecture given at a cozy coffee hangout.

    • @Computerphile
      @Computerphile  3 года назад +4

      As Luis mentioned, we have multiple videos on postscript and pdf by prof Brailsford as this is his specialism: ruclips.net/video/S_NXz7I5dQc/видео.html ruclips.net/video/48tFB_sjHgY/видео.html
      HTH -Sean

    • @TheSeverian
      @TheSeverian 3 года назад

      @@Computerphile Sorry! I wasn't complaining. I was trying to be funny. Oh well, I usually fail...

  • @TheStevenWhiting
    @TheStevenWhiting 3 года назад

    I'd like Professor Brailsford to do a video about converting a PDF into a text/word document. I ask because back in 2008 during my time in NHS IT I specifically remember one department asking to convert some PDFs to Word so they could edit them. I told them it wasn't that easy. We tried with some and they'd be partly messed up. They moaned, I had to research why and discovered why it wasn't as simple as they thought. Was all because of what the file was originally using for font, style etc. If they used something not regular, like an obscure font, it would mess the conversion up.

  • @andrewjc13
    @andrewjc13 3 года назад +1

    I've found PDF-I to be very useful when professors have ridiculous requirements for their assignment format but just say "give me a pdf." Why yes, I'll happily do this assignment in word and then convert it to the biggest bitmap PDF possible. Here's your 200MB non-searchable pdf, enjoy grading!

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 2 года назад

    By the way, the “F” in “PDF” stands for “Format”. There’s no point in my sending you a “Portable Document Format”, but there is in my sending you a *file* in Portable Document Format. In other words, I don’t send you a PDF, I send you a PDF *file.*

  • @UnOrigionalOne
    @UnOrigionalOne 3 года назад +1

    One could argue similar points for video.

  • @Fre1maurer
    @Fre1maurer 3 года назад

    My first PDF was the manual of the flight simulator game TFX back in 1994, it was the re-release budget version without printed manual. There was Adobe Acrobat Reader for MS-DOS on the game CD, and holy crap was the quality of the document bad (and the clumsy Reader itself was not much better). They obviously simply scanned a real printed manual and saved it as images with something like 4-Bit grayscale and the the text sections looked like plain 1-Bit black-or-white without any anti-aliasing. I never thought this text for the poor called PDF could be a thing in the future.

  • @Amonimus
    @Amonimus 3 года назад +1

    To me a PDF is like an archive with multiple images or doc that you can list through.