Scanf Basics: the good, the bad, and why so many pointers?

Поделиться
HTML-код
  • Опубликовано: 30 янв 2023
  • Patreon ➤ / jacobsorber
    Courses ➤ jacobsorber.thinkific.com
    Website ➤ www.jacobsorber.com
    ---
    Scanf Basics: the good, the bad, and why so many ampersands? We're talking about scanf today, the function you might need, even if it isn't the function you want. We talk about its strengths and weaknesses, how it works, and why you have to put those pesky ampersands in front of the arguments.
    ***
    Welcome! I post videos that help you learn to program and become a more confident software developer. I cover beginner-to-advanced systems topics ranging from network programming, threads, processes, operating systems, embedded systems and others. My goal is to help you get under-the-hood and better understand how computers work and how you can use them to become stronger students and more capable professional developers.
    About me: I'm a computer scientist, electrical engineer, researcher, and teacher. I specialize in embedded systems, mobile computing, sensor networks, and the Internet of Things. I teach systems and networking courses at Clemson University, where I also lead the PERSIST research lab.
    More about me and what I do:
    www.jacobsorber.com
    people.cs.clemson.edu/~jsorber/
    persist.cs.clemson.edu/
    To Support the Channel:
    + like, subscribe, spread the word
    + contribute via Patreon --- [ / jacobsorber ]
    Source code is also available to Patreon supporters. --- [jsorber-youtube-source.heroku...]

Комментарии • 80

  • @benjaminrich9396
    @benjaminrich9396 Год назад +45

    Jacob, these 'deeper look at the basics of C' kind of videos are really useful. Great stuff.

  • @AceAufWand
    @AceAufWand Год назад +17

    Scanf is full of amazing stuff when you consider it, especially when you consider its behavior when treating non-format specifier. For example something like:
    result = scanf(" %*dinput%d%c", &x, &trailing_char);
    Which is probably the closest thing standard C got to regex.
    When starting with C, I tought that the best part of printf and scanf was print and scan but when I started to understand what they did, I realized that the best part of it is actually the f.

  • @redcrafterlppa303
    @redcrafterlppa303 Год назад +10

    11:43 calling fflush on stdin is undefined behavior according to the C language standard. It just happens to work most of the time. A better solution would be to read() from stdin until it's empty.

  • @beeeeeee42333
    @beeeeeee42333 Год назад +3

    really love someone who addresses scanf and printf as function and further explain pass by value n pass by reference , as many tutorials just call them this is the way to get input and display output ! ,!!

  • @unperrier5998
    @unperrier5998 Год назад +9

    A way to workaround buffer overflows is to use "%ms" and provide the address of the string variable. scanf will allocate the string for you (similar to asprintf)
    It is part of the POSIX standard so not a problem on Linux, but not available on Windows (which is compliant with the old POSIX-1 standard) and likely not on embedded.
    Note that it used to be "%as" (very old compilers/libc)

    • @anon_y_mousse
      @anon_y_mousse Год назад +1

      A better and standard compliant way is to just write your own getline() equivalent and make sure it's rock solid and use it everywhere. Or to find a good library that does it for you and use that. You can also just be fine with the fact that you might have more of the line in the buffer than you've read and just skip what's left since that's very likely what most people will do anyway.

  • @mirrors.of.reality
    @mirrors.of.reality 9 месяцев назад

    I am really glad I found your channel, very useful information and well presented. Thank you!

  • @NinaNanni
    @NinaNanni 8 месяцев назад

    I swear you beautiful youtube creators are going to make me the programmer i thrive to be. Amazing and thorough work! I wondered why one function took 15 minutes and came here with no expectations, but you surely had beautiful tools to educate about. Thank you!

  • @BryanChance
    @BryanChance Год назад +2

    This explain so well why I love C. I initially had all the problems you described here with using scanf(). But when I read up on the function details, I found a solution which you described here (with the nextchar , buffer flush, and while loop). Now, some would say that's a lot of work for such a simple thing. Well, I've never written a C program that only use scanf(). SO there's a little bit of setup, putting this in a functio and call it from the rest of your code. What it gives you a the building blocks to read an input. Let's say my input a huge file and I've pre-checked and data is clean, then it's faster to use scanf() without all the error checking. And it makes sense why scanf() works the way it does, not some kind of magic. And the buffer overflow, again put it your custom "input function" or something. LOL modern programmers.. (err developers rather). At least with my experience, it takes some time to figure out C.. it's not Python where you can't just slurp an entire text file into an array, even if the file is 10gb. LOL

  • @ZeroCool2211
    @ZeroCool2211 11 месяцев назад +4

    Just one note, fflush has an undefined behaviour when it is being used for stdin because it is originally was made for stdout

  • @lean.drocalil
    @lean.drocalil 10 месяцев назад

    Yet another great video ❤

  • @kellingc
    @kellingc Год назад +1

    This is cool. I can think of several instances where I want mixed input say like the value then units like 13' 4" or 37F.

  • @FEFFeX
    @FEFFeX Год назад +1

    Your videos are awesome
    Huge thanks

  • @megachar0x01
    @megachar0x01 Год назад +1

    Just adding more info :
    example code at 7:26 does create a buffer overflow which can cause program to exit peacefully . in defualt compilation stack canary is placed which when clobar exit program peacefully. not only that aslr makes it hard to get shell. so in short to exploit just a simple buffer overflow we need to have a memory leak .

  • @ukaase
    @ukaase Год назад

    hi jacob can u make a video about register level programming in embedded systems would be a nice topic. love so much your content

  • @aj.arunkumar
    @aj.arunkumar Год назад

    thanks jacob. i now understand why my scanfs didnt work in college days...

  • @abdelrahmanemad1122
    @abdelrahmanemad1122 Год назад

    Awesome ❤️❤️❤️
    Could you tell us the name of font you are using in vs code?

  • @HansBezemer
    @HansBezemer Год назад +6

    I've been programming in C since 1987 and I can honestly say I *NEVER* used scanf() in any of my programs. Its behavior is simply too murky for my taste. Like you said, I'd rather read in the whole shebang using fgets() and tokenize the whole bunch myself (not necessarily using strtok() for that).
    Just for fun, I've been doing a sscanf() like routine for my own Forth compiler - and still: some behavior of scanf() was baffling me. A few changes I made:
    (1) My ”SSCANF” really doesn't like whitespace - neither in the buffer nor in the format string. When it encounters it, it will vehemently look for the first ”non-white space” character and resume parsing from there. Which means that these format strings are equivalent: "%c %c %c" and "%c%c%c".
    (2) When parsing it takes a real good look at the delimiters you defined in the format string. If you define: "%s" and your buffer contains ”Hans Bezemer”, it will parse the entire string. However, if you define: "%s ", it will only parse ”Hans” and leave the rest of the buffer unparsed.
    The upside of all this is, is that strings in the buffer are *not* automatically delimited by whitespace. Take "Invoice issued by [%s] on %u-%u-%u". If we feed ”SSCANF” this buffer: "Invoice issued by [Hans Bezemer] on 2022-04-03", it will happily read the entire ”Hans Bezemer” - and not just ”Hans”.
    Still - although it was lots of fun to develop, I've never used it (yet) in my own Forth programs. I still don't trust it with real world data ;-)

    • @grimvian
      @grimvian Год назад +1

      Agreed: "I'd rather read in the whole shebang using fgets() and tokenize the whole bunch myself"
      As part of my training as a C beginner, I wrote all the string handling myself and getchar() would be my first try.

    • @anon_y_mousse
      @anon_y_mousse Год назад

      I know how you feel. When I first implemented my own libc I couldn't stand doing things in just the way the standard defined and wound up just writing a completely new definition of a library from scratch. I wound up adding data structures and algorithms and still use it to this day for all of my in-house projects.

  • @noahvanmiert
    @noahvanmiert Год назад

    Hey, can you maybe make a video about filesystems?

  • @coderstubechannel
    @coderstubechannel Год назад +2

    This video on Scanf Basics is an absolute game-changer! It's exactly the type of content I've been searching for. I'm so glad I stumbled upon this video. It has inspired me to create more programming content on my channel. Thank you for sharing your knowledge, I can't wait to see more from you! 🦾

  • @xCwieCHRISx
    @xCwieCHRISx Год назад +1

    I use fgets compined with sscanf to read user input.

  • @zxuiji
    @zxuiji Год назад +4

    8:26, last I checked you could use the ".*" modifier to constraint by variable length such as:
    #define LENG 31
    char name[LENG+1] = "";
    scanf("%.*s", LENG,name);
    At the very least I'm going to support either that or just * in my custom version of it later, using parsef for my naming scheme instead of scanf though, lines up nicely with printf :)

    • @31Uluberlu
      @31Uluberlu Год назад +1

      I'm afraid this format only works with printf:
      printf("%.*s", 5, "Hello World!"); // Hello
      not scanf.

    • @infastin3795
      @infastin3795 Год назад

      It is not supported anywhere. Precision specifier is only printf thing.

    • @zxuiji
      @zxuiji Год назад +1

      @@infastin3795 perhaps I mis-remembered then, oh well, I'm making a library that'll be a possible stand in replacementment of libc etc, not for the symbols like musl, but for cross platform stuff, it's called paw, everything is prefixed with paw as well so they can be used together, I'll post a link eventually once I'm satisfied the 1st version is reasonably feature complete, including threads, mutices, semaphores, graphics, ux, all that "fun" stuff that stdc chose to ignore at the start

    • @anon_y_mousse
      @anon_y_mousse Год назад

      @@zxuiji Good, everyone should do that at least once in their career, especially if they intend on seriously using C.

    • @zxuiji
      @zxuiji Год назад

      @@anon_y_mousse What, make a library? Make a custom printf/scanf? You weren't clear as to which of my comments you were replying. Side note, just yesterday I managed to finish my pseudo mutices, the main issue I always had with pthread_mutex_t etc is that it was never defined exactly what happens when a thread tries to delete the mutex at the same time another tries to lock it, after a number of re-thinks I finally arrived at an octal permissions based design.
      The mutex requires data to be attached to it during creation along with a type string & callback for what it should do when deleting said data (which can only be triggered when no thread is capable of attaching to the mutex), as a bonus I added a couple of prev/next pointers to create a linked list GC with, the only time the GC is ever searched is when a thread declares it is abandoning all mutices it has permission to attach to, the owning thread ends up in a blocked state but the rest just happily kill their own permissions and attachment count.
      It was a real task to create such a mutex but now I'm comfortable using it in a multi-threaded environment since I know exactly what will happen if one thread tries to delete the mutex while another is trying to lock it, the delete will just not happen because it will detect other threads still have permission to attach and just refuse to remove the owner's permissions when it tries, the owner's permissions have to be removed before deletion is triggered so the only way for something unexpected to happen is if the dev is being stupid by not clearing their pointer after revoking the permissions of their thread.

  • @kevinyonan2147
    @kevinyonan2147 Год назад +2

    I do know a trick with scanf to have variable-size reading limits for strings. One thing I do is have a numeric string that'll become the format string itself where I have I convert the size of a buffer to a string and have that sandwiched between the '%' and 's' and then use that. If your buffers are always a fixed size, you can optionally use a macro that makes the int into a string literal `#define INT_TO_STRING(x) #x`

    • @Hauketal
      @Hauketal Год назад

      There is a width option "*" for scanf. You add an extra item in the parameter list for the width.
      Example:
      char name [20];
      scanf ("%*s", sizeof name - 1, name);
      Works too if the size is not a literal, but a parameter to your function.

    • @kevinyonan2147
      @kevinyonan2147 Год назад +2

      @@Hauketal it doesn't work. the `*` skips stuff. you're thinking of the `*` for `printf`.

  • @tshaka_
    @tshaka_ Год назад

    There's also the bounds checked scanf_s from C11.

  • @Thwy
    @Thwy Год назад +8

    as far as i know, fflush(stdin) has undefined behavior and it doesn't work with GCC on Linux.
    You're living dangerously there.

    • @HansBezemer
      @HansBezemer Год назад +2

      True. I needed to do that myself - and in order to achieve that (portable across several very different platforms and compilers) I had to read it until EOF.

    • @5cover
      @5cover 7 месяцев назад

      ​@HansBezemer which should not work since stdin is never eof.
      The terminal just pauses your program and prompts when the buffer is empty.

    • @Thwy
      @Thwy 7 месяцев назад

      @@5cover The stdin can have an EOF.
      try to run
      ./myprogram < test.txt
      The stdin will be the file "test.txt" and it will reach EOF.

  • @skeleton_craftGaming
    @skeleton_craftGaming 7 месяцев назад

    Regarding the last question, at least , because C doesn't have C++ style references... In C++ you should always use std::cin (or std::ifstream for places where you would use fscanf) partially because of the aforementioned pointer issues...

  • @31redorange08
    @31redorange08 Год назад +1

    Why doesn't it skip the second scanf when there's apparently still the '
    ' in the buffer?

    • @31Uluberlu
      @31Uluberlu Год назад

      Most formats including "%s" consume and discard leading whitespace characters. Only "%c", "%[...]" and "%n" don't.

  • @jittertn
    @jittertn 6 месяцев назад

    There is scanf_s that protects against overflows since C11

  • @charankoppineni4498
    @charankoppineni4498 Год назад +1

    Where can I buy this t shirt ?

    • @JacobSorber
      @JacobSorber  Год назад

      It should now be available on my store

  • @germankoga8640
    @germankoga8640 10 месяцев назад

    So I should better stop using scanf in favor of fgets? at least for strings, in case of numbers I guess I'm stuck with scanf

    • @RobBCactive
      @RobBCactive 6 месяцев назад +1

      You can read input into a buffer with fgets, then process that data with sscanf was something he said.
      Using scanf is OK in personal programs but not in general professionally.
      In general reading a file format, you analyse the input returning a token to say what type of input text you found and have the value available, converting a string of digits into a number. It's called lexical analysis and parsing for the grammar rules. That way you can detect errors and avoid buffer over runs and overflow.

  • @randomscribblings
    @randomscribblings Год назад

    scanf() has the ability to take length as a parameter.

  • @nunyobiznez875
    @nunyobiznez875 Год назад +1

    fflush(stdin) on an input stream, is undefined behavior. It works on WIndows, where there's poor standards compliance. But MSYS2, Cygwin, BSD, and Apple all should use fpurge(stdin) and Linux uses __fpurge(stdin) from Solaris, defined in stdio-ext.h. They really need to ISO standardize fpurge(), or even POSIX, but neither board likes moving too quickly and it hasn't yet been 40 years yet 🤣. It's a bit of a mess. I just took the time to write my own header, so I can use a fpurge() macro that'll select the correct function for the system, that can be dropped in and used everywhere, when I need to write portable code. Or there's also the minimally flawed alternative: while((getc(stdin) != '
    '));

  • @zxuiji
    @zxuiji Год назад

    Can avoid the whole pointer issue and verify input by just making a custom function:
    uintmax_t parseju( FILE *file, char *stopped )
    {
    uintmax_t value = 0;
    unsigned int c = 0;
    int was = 0;
    while (1)
    {
    was = fgetc();
    c = was - '0';
    if ( c > 9 )
    break;
    value *= 10;
    value += c;
    }
    if ( stopped ) *stopped = was;
    return value;
    }
    Don't remember how to "put back" the read character but you get the gist

    • @Hauketal
      @Hauketal Год назад +1

      This will result in bad values for anything entered before '0', like '+'.

    • @zxuiji
      @zxuiji Год назад

      @@Hauketal That's fine, you're supposed to check for that yourself if you want to support it, the ONLY purpose of this function is to read digits, that can then be used by wrapper functions that need it, like floating point number readers for instance

    • @your-mom-irl
      @your-mom-irl Год назад +2

      ungetc

  • @1873Winchester
    @1873Winchester Год назад +1

    Newbie to C here but I think I would try and use the isdigit function, it only checks a char at a time, but you can write a new function using isdigit, or just copy the one on stackoverflow. So if (0 == (isdigits(result))) or something like that.

  • @user-sl6gn1ss8p
    @user-sl6gn1ss8p Год назад

    11:38 oh no, don't let stack overflow see that D :

  • @aaaowski7048
    @aaaowski7048 Год назад

    >write my own scanf
    thats the first thing that came to my mind
    why not just use "read (1, buff, buffsize)"?

  • @23trekkie
    @23trekkie 11 месяцев назад

    Typing in letter when program expects a number:
    Python - throws an error and ends 😎
    Pascal - throws an error and ends 😎
    Commodore 64 basic - just asks again 😎
    Go - assigns 0 to the variable 😎
    C - infinite loop until your CPU is on fire🤬
    (yes, I know I can walk around this using char array and fgets and sscanf functions, but still...)

    • @5cover
      @5cover 7 месяцев назад

      Because scanf was designed to read trusted data (such as formatted data in files), it needs not check for errors.
      The C standard library doesn't offer a way to read user input from the console because there was no need for it. I mean, have you ever performed console input in a "real" project? Apart from the usual (y/n) confirmation (which can trivially be implemented with getchar, as you only need to read a single character), you seldom see console user input as all the information necessary is specified in the command parameters.
      But it's easy to implement it using fgets and functions such as strtol.

  • @rosen8757
    @rosen8757 4 месяца назад

    scanf("%d") is not a valid way to read signed integers unless you know that every input fits into an int, scanf doesn't check for overflow and signed overflow is undefined behaviour. You have to use the strtol family of functions instead.

  • @PavitraGolchha
    @PavitraGolchha Год назад

    Do you Rust?

    • @sumofat4994
      @sumofat4994 Год назад +3

      Rust is no

    • @PavitraGolchha
      @PavitraGolchha Год назад

      @@sumofat4994 Rust is trust

    • @robertstrickland9722
      @robertstrickland9722 Год назад

      @@sumofat4994 Rust at least "meh". Meh, enough for Linux to consider adding support for it.

    • @patryk_49
      @patryk_49 Год назад +1

      @@0xDEAD-C0DE Rust is slow and bloated.

    • @sumofat4994
      @sumofat4994 Год назад +1

      @@0xDEAD-C0DE You are delusional.

  • @Stopinvadingmyhardware
    @Stopinvadingmyhardware Год назад

    Tokenizing free knowledge.
    No

  • @CCoder--
    @CCoder-- Год назад +1

    😂 this is just for fun..
    #include
    int main ()
    {
    char a[10]; // suppose we have an array of 10 char
    int n=9; // hence we can fill a max of 9 characters + '\0';

    // generating the scanf format using sprintf();
    char tmp[10];
    sprintf (tmp,"%%%ds",n); // here we are generating the string "%9s" which will be the format for scanf
    printf ("enter a string:");
    scanf (tmp,&a);
    printf ("string=\"%s\"",a);
    return 0;
    }

    • @CCoder--
      @CCoder-- Год назад

      @@rustycherkas8229 Sorry, my mistake in the line scanf(tmp, &a);
      but the program will still work. I got lucky here 🤓
      The address of an entire array has the same value as the address of the first element of the array.

    • @rustycherkas8229
      @rustycherkas8229 Год назад

      @@CCoder-- I've deleted my comment. You're right about the array, and about "getting lucky"... Suggest you "tweak" the code to avoid pedantic comments from pedantic people like me... 😁

    • @CCoder--
      @CCoder-- Год назад

      @@rustycherkas8229 no It's all right😀, pointers are a bit confusing.
      You can keep your previous comment to help others understand.

    • @rustycherkas8229
      @rustycherkas8229 Год назад

      @@CCoder-- Kinda sorry I brought it up... @06:35 Jacob makes a point about dropping the "address of" from "&name"... imho, better that it is NOT present for those times when a block of code is factored out into a separate function... Easy to overlook an array becoming a pointer only (received as a parameter.) AND, it confuses the sh*t out of newbies, too! 🤣That's worth a LOT.... 🤣🤣🤣

  • @papasmurf9146
    @papasmurf9146 Год назад

    This isn't a pure macro version of what you're trying to accomplish -- and it will only have a %4s if you send a char* instead of a char str[20]; But then again, I only spent a few minutes on it.
    #include
    /*------------------------------------------------------------------------------
    // In appending the line number, if we don't have an indirect CONCAT then
    // __LINE__ gets appended to the parammeter X and not the adctual line
    // number. In order to give the pre-processor a chance to convert it,
    // we use the CONCAT_INDRIECT.
    //----------------------------------------------------------------------------*/
    #define CONCAT(a,b) a ## b
    #define CONCAT_INDIRECT(a,b) CONCAT(a,b)
    #define APPEND_LINE(X) CONCAT_INDIRECT(X,__LINE__)
    char* string_input_size(char* format, size_t size)
    {
    sprintf(format, "%%%ds", size);
    return format;
    }
    #define INTERNAL_SCANF(str, FUNC) \
    char APPEND_LINE(format)[13]; \
    string_input_size(APPEND_LINE(format), sizeof(str)-1); \
    FUNC(APPEND_LINE(format), str)
    #define SCANF(str) INTERNAL_SCANF(str, scanf)
    #define SSCANF(str) INTERNAL_SCANF(str, sscanf)
    #define FSCANF(str) INTERNAL_SCANF(str, fscanf)
    int main(int argc, char* argv[])
    {
    char name[20];
    printf("Name: ");
    SCANF(name);
    printf("You entered '%s'
    ", name);
    }

  • @kiyotaka31337
    @kiyotaka31337 Год назад

    using FORTIFY_SOURCE with gcc prevents some simple buffer overflows, This would be a cool trick to show

    • @HansBezemer
      @HansBezemer Год назад

      Careful and defensive programming prevents almost all buffer overflows. Also very effective against memory leaks.