I wrote my own JSON parser. Also, JSON is a terrible standard.

Gregg Ink

Просмотров 4,4 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 янв 2025

Комментарии • 22

@BenWeisz 5 месяцев назад ⁺¹
21:00 You could potentially use const char* variadic arguments. So in this case you'd call jse_get("file", "json") or jse_get("file.json") depending on whichever you need
@rabbishekelstein6003 8 месяцев назад ⁺²
25:27 Surprised no one mentioned this yet but that file is "JSON5" and not actually JSON. Biggest change is it supports trailing commas and comments
@maximood-tired Год назад ⁺¹
7:51 perfect explanation, now I understand C arrays much better
@GreggInkCodes Год назад ⁺²
Thanks. Concerning arrays and pointers, I go into more detail here: ruclips.net/video/1yrqGG2KnNY/видео.html You might be interested.
@cyp_ Год назад ⁺⁷
While I feel like the video is great, you are able to tell cool things and being interesting.
But there are some issues:
I also implemented a JSON library myself - multiple times, in C, C++... - I use JSON for describing data in some of my projects (in OSDEV, gltf parsing...). The power of JSON is its simplicity, and generally, the size of the parser.
- 3:41 when you say that you would love to have beginning and end markers, we already know when the file is expected to end: when the depth is 0 after parsing the first element. I mean, if you start to see the '{' token, you know that the file has not ended until you have seen the corresponding closing bracket. Also, the error handling should not be inside the JSON data, but should rather be placed outside the management of the standard. I mean, if you open a nonexistent file in an OS, the error is returned in another place, rather than using a fake file that contains 'null'. JSON is used to store errors in an API because it's a whole another abstraction. It's not the JSON parser that returns an error, but the API. I explain later why the degenerates' case makes sense.
The beginning and end markers are also placed outside the JSON standard, for example, in a file system, we already know the length of the file, in a request, we know its length...
- 5:40 The key isn't the only way to access an element, generally it's like a map, you explore it by its key or by exploring each element by itself. Nothing restrain you from using JSON map for storing things in the form of {name}: {data} and so, using UTF-8 seems logical. Also, for a text based data format, it's not a specific part of the document that is UTF-8 but the whole document.
- 7:00 JSON arrays, are implemented in a way that it's easy to implement a recursive JSON parser, for each entry, it's like parsing a whole new JSON document for the current entry. It's just that enforcing a single type for all arrays entry would be slower, because you would have to check each entry type. (for example, A parser that I help to implement a long time ago: (GitHub) brutal-org/brutal/blob/main/sources/libs/json/parser.c ). It seems like you want to map a JSON array structure to a C struct perfectly, and you can do it with the help of unions:
struct JsonValue {
JsonType type;
union {
JsonMap map;
JsonArray array;
float floating;
...
};
};
Then you would use everything as a JsonValue, also your document. That's why a document could have just a value (3:23, 3:16), and why an array could have multiple types.
It's meant to be implemented that way because a 'Value' could be a literal, an object, an array...
You say that it's slower because it doesn't have constraint, but it's false, First, if the speed is the most important component, you should not use JSON but rather a binary format. (But binary formats have their own disadvantages). JSON is aimed at being simple to implement as a recursive parser, and still being easy to edit. And JSON is already fast enough, I mean, at the end, the bottleneck that you encounter isn't your library but the file system/disk itself.
---------------
- 11:46 The standard should not contain a minimum value, because it's specific to each implementation. JSON is meant to be used in other specification, or your own. For example, it's used in Javascript. It's used in the GLTF specification, and so on. This means that those specifications may be able to put a value to the JSON limit. Your JSON file is meant to target things it knows in advance (you would not write a JSON file without knowing where it would go ?). The C standard puts a limit on thing (like the variable name length) because it's meant to be used by multiple compilers. But JSON is only generally used to abstract data for another standard.
For example, a GLTF file may be massive, (10M +), so a high limit is important, but for a config file, having a 1M+ limit seems overkill. So it's implementation dependent.
(same things for 2, 3, 4).
--------------
- 14:30 I feel like the comparison is a little bit unfair, because your code doesn't have error handling. Technically, the code on the right doesn't handle it, but the library that he uses does.
Have a nice day ! Great work.
@cathalogrady2331 Год назад ⁺¹
it is almost like the standard was written by programmers who anticipated writing json parsers, and wanted the "freedom" to change the criteria of there implementation
@cathalogrady2331 Год назад ⁺¹
you actually saved me so much time. I was just going to write a json parser for the same reasons (all libraries I have found are overly verbose), didn't think the standard was so loose, although it is something derived from Javascript so ....
@oxey_ Год назад
about ascii keys, it's cool and pretty understandable you don't want emojis in there but it breaks down if you wanted keys written in a language with non-ascii letters. Contrary to popular belief, not all APIs are in English
@deadmarshal 2 месяца назад ⁺¹
probably the only good notation is s-expressions. XML,JSON or other stuff all have problems.
@vicsteiner 5 месяцев назад
On 6:56 when you say that JSON does not understand arrays I think it is worst than that. You see, JSON does not understand anything right, it is an specification. Arrays in C and low level languages as you say are defined on your words as contiguous memory holding elements of same size to complete. But you see JSON is not a low level language it does not have to use the same definition of array. In the end JSON is just a text file with certain syntactical rules, that is structure. Literals (true; false; null) in JSON are just strings as everything That is why it is so funny to me when people use it as an example of unstructured data which is nothing besides max entropy. But I agree with you that the curse of JSON is too much flexibility and not being able to know better what to expect. On my field of work, this express itself largely as a confusion where people seem to believe that JSON can replace a data model which is a completely different concept. And here is also where constraints come in place! So, yeah agree with you that the lack of defined constraints limit a lot JSON as a data exchange format and causes a large percentage of the headache for most data engineers besides all the business loss derived by the semantic loss given the lack of constraints.
@DaneilDa Год назад ⁺¹
amazing work thanks
@GreggInkCodes Год назад ⁺¹
You are welcome.
@sakda357 Год назад ⁺³
How about introduce the better one.
@geckwwo 5 месяцев назад ⁺¹
... and then we'd have N+1 competing serialization standards...
@cathalogrady2331 Год назад
why are you including pthreads, I don't see it used in the code
@GreggInkCodes Год назад ⁺³
I have a library called "technical" which I use for virtually every project. It contains a range of things including string manipulation functions, file-related functions and code to start another process. "Technical" requires pthreads so it has to be included, even if the specific project doesn't use it.
@cathalogrady2331 Год назад ⁺¹
@@GreggInkCodes ahh makes sense, when I saw it I was like (no way hes splitting the json up)
@pooroldpedro Год назад ⁺²
So JSON would be so much better if it only supported American English ? The American Standard Code for Information Interchange? Maybe get a globe - there's a whole world out here that isn't in the USA.
@GreggInkCodes Год назад ⁺²²
I was not suggesting that JSON should be in American English only. I am European. I have never been in the USA. I have been in France, the Netherlands, Belgium, the UK and Ireland. I speak several languages well (English, French and Dutch) and have at least some level in Esperanto and Italian. I am well aware of what's on the globe.
I do recognize that computers have an American bias because of their history. The thing is: that will be true for almost everything in computing. Some Indian programmer with Urdu as a mother tongue still types words like "if" and "while" in his code. And the thing is, that's most convenient for him. Keyboards all over the world might have a variety of layouts (Qwerty, Azerty, Dvorak) but they all have a Western bias. To type Chinese, you will either need a keyboard that covers the entire floor of your office, or you come up with some more sophisticated system based on a more European or American keyboard. The Chinese use a system called "Pinyin"; it is the "foremost romanization system for Standard Mandarin Chinese." (according to Wikipedia) This system was invented by a Chinese person, published by the Chinese government and pre-dates computers.
The world isn't perfect and for better or worse, a romanized system is the way for typing code and developing software. My suggestion in the video is that the *keys* should be in ASCII, which doesn't necessarily mean English. In my opinion, that is better for everyone, is the most convenient and would cause the fewest compatibility problems.
I made clear in the video that the *values* is the place for emojis and other scripts ( kanji, cyrillic, etc. ).
It's a convenient, technical solution in an imperfect world, not a statement of superiority.
@theemacsen1518 Год назад ⁺¹⁰
@@GreggInkCodes W response.
@deleted-u5g 8 месяцев назад
🤡
@voidvector 2 месяца назад
JSON is fine. Not being strongly-typed is by design. There are alternative strongly-typed wire format if that's your use case. (e.g. protobuffer)
We are on the real world -- people use different languages, run incompatible software versions, will enter garbage data, have to pay contractors to update their software -- JSON allows for sufficient leeway for that (backwards compatibility, fail-safe).

Следующие

Автовоспроизведение