I am a bit of both. I start with a giant lump of code, and as I start to see which parts of the code are similar and repeating, make some functions that cover those use cases, gradually refactoring. I don't like to get obsessive over doing a lot of functions early, because I don't yet know the details on what pieces will be involved yet, still working out the best approach to code it. Times when I've tried splitting too early I just end up wasting a lot of time after needing to pull all the code back out and rearranging it all. In a recent case I was trying to create a recursive function and was having a lot of troubles wrapping my head around it. This function was not split up at all, so it was a few hundred lines of code by the end. I was about to start refactoring it with named sub-functions, but then I realised that the approach I had taken wasn't ideal, and reorganised a lot of the code and made changes. If I had already created all the sub-pieces, it would have been a lot more work redoing it. As it was, I could just quickly copy and paste the chunks of code into the new order and change the bits around the edges. Get the code fully working, then reduce it into pieces.
a frustration in programming is that a feature (which is a mental concept) doesn't always map neatly to a single module or set of modules. Sometimes you get these cross-cutting concerns.
@@perfectionbox That's true, but we try our level best to minimise it. I've found that a more optimal solution is not the first one, have to live with it for a while. Nearly every sprint starts with some refactoring.
You always get cross-cutting concerns. Why? Because reality is full of many-to-many relationships, while typical programming approaches outside of SQL and databases are hierarchical. E.g., any JSON or XML doc is a hierarchy, not a set of relationships. What Beck is talking about here might be regarded as "factors". In psychology the "Five Factor Model" of personality arises from "clusters" of words that people use to describe themselves. So it's reasonable to regard those clusters as actual "factors" of human personality. So what Beck is getting at is that the "poop" comes in clusters, too, so it makes sense in some cases to formalize that cluster and make it a "thing" that is easier to manage. It doesn't get rid of the many-to-many relationships, but it puts them in one spot so that the rest of your code is mostly hierarchical.
Another frustration is when I create a class but then I want to have another implementation that I can swap out so try to create an interface but struggle because the implementations work in quite different ways so it becomes hard to make a interface. Another frustration is when I want to separate concerns so I have two classes, one class use the other class, but I find it difficult to decide what belongs in which class, the boundaries of the concerns.
I'm a firm believer in a combination of these two approaches. Splitting is best for code that is likely to be reused or needs to be more modular, whereas lumping is better for improving the readability of some more complex functions. I like the way he put it, where you're lumping the crap together so at least you know where it is. My general rule is that functions should serve a purpose which can be described by its name. It doesn't necessarily matter if the function is small or large so long as it achieves its task. Further, anyone reviewing the code should be able to understand it without having to read each dependency. It may make sense to break some functional components into their own functions, provided they again, have a discrete purpose which can be easily conveyed by the function name, but conveying their purpose through naming is essential. Splitting things unnecessarily makes the code convoluted and difficult to read and understand. That said, lumping things together can make code inefficient to review which is improved with selective splitting.
I think it's often harder to read things all in one place if there is too much complexity looking back at you. If I hide some of the complexity under readable object and method names, the high-level flow becomes more comprehensible. If someone is interested in the complexity that becomes hidden that way, it's easy enough to click into it.
You can do things such as splitting out the expressions in your if conditions into a private function with a name that describes what it does, so instead of "if x < 1 && y > 2 || z == 3" you do "if is_valid(x, ,y, z)".
I'm a splitter. The problem with lumping code together is that it's all to easy to create poor coupling and hence side effects. Splitting code enforces discipline.
Conversely, splitting necessarily obfuscates and makes it easier to create bad state and over-complicated flows. Either has downsides if done naively, but its worth mentioning that coupling is only an issue if it is the wrong coupling. Obfuscation is a cost you and everyone on your team constantly pays the price for.
@@NukeCloudstalker Your point about either approach done naively is problematic, which is why it shouldn't be one or the other, but rather an intentional application of both.
@@NukeCloudstalker Maybe my comment has to do with recent refactoring experience of files 8000 lines long! But I do take your point, it's all about splitting wisely.
"Splitting code enforces discipline" That doesn't track with my experience. In my experience, splitting code gives you a *facsimile* of discipline, by ticking the boxes that The Masters say you should tick, that can deceive your team into thinking they don't need to spend the energy being disciplined. Everyone can go off and play in their sandbox at the level of quality they find fun, and the end result is hundreds of tiny trivial functions, lots of duplication, and no way to see the big picture. Lumping code makes it obvious up front that, among other things, you need to be disciplined to not turn the code into something makes your successor scream in frustration, so people are more motivated to actually spend that energy on discipline. Lumping code forces collaboration because you're all working in the same space. Some people don't actually like collaboration, of course, so splitting makes them happy...
I am a lumping old man programmer. Early in my career (1980s), I learned how to read large programs really well, and was able to track down difficult bugs. Developing a program is a lot like the stages in writing (drafting, revising, and editing). Editing doesn't come first, it comes after the final draft. For me, its easier to have it all in one place as possible to do the drafting and revising. Splitting decisions is a lot like editing decisions.
I am fine with splitting it up. YET, I must be able to see ALL functions/procedures in a file in one single listing. I do not like coding environments that only let you see the function you are working on. I need to be able to see it all at once and scroll as needed. I started in to mid '80s. COBOL and BASIC followed shortly by C and Oracle Pro-C. Was also familiar with Fortran and Pascal at the time, though that was likely less than 5% of my time with the bulk of my work in COBOL or BASIC depending on the project.
As an avid splitter for years, I recently rediscovered being a lumper, specifically for a class that represents an object (and many derived objects) in a game. This happened strictly so that the machine would not be jumping around vtables and pointer lists in performance-critical code.
Used to be a splitter but more and more becoming a lumper. It's just so much easier to debug or change something if all the logic related to one thing is in the same place, and some parts are not reused in other places. Of course ultimately it depends on the situation. Repeating something in a few places often makes the system itself a lot simpler and more robust, even if it looks ugly or wrong to us developers
There is always a balance between the two. Since both approaches have own benefits, it only comes with practice when lump is too big and when split is too sparse.
I think I am a lumping splitter. I like lumps of related code split up into relevant functions/procedures. I guess I like classes, yet I do not like multi-inheritance. I learned the hard way back in the early '90s that multi-inheritance is a complete recipe for weird behavior and bugs that can be difficult to reliable replicate and troubleshoot. It sort of like having two equal bosses at work assigning you conflicting tasks that sometimes have incompatible requirements.
For my last prototype I was a slumper. For my last micro framework project, I was a splitter. One needed to get developed fast to have something working to show. The second one needed to be correctly architectured, readable, understandable while still working. One thing is that you could *NEVER* correctly go from one to the other: you have to rewrite from scratch!
@@aytviewer2421 You have to rethink the responsibilities and contracts to create clean interfaces (not "interface" in sense of some OOP languages). Elsewhere it will still be slump code in a clean code attire...
Splitting relies on trust that down the line the things are what they say they are and that they are grouped in a way that is predictably accessible.And if the naming is good then it is also easier to navigate.
That is how I started, though with experience it became more obvious where to split and when it's best not to. When in doubt, I tend to lump, because as you say, you can always break it down later.
For me, mental load is important. There's a maximum of stuff I can keep track of in my mind. If the code is very repetitive I don't mind files with thousands of lines in the global scope. If the code is clever, then things need to be split up, scope needs to be limited, files smaller.
I'm a splitter, but am extremely keen on keeping things grouped as locally as possible, like modules, to prevent improper (re)use of methods private to the module. In C#, I keep several classes in the same file as a module if it's small enough, or even use nested private classes.
And always use the private visibility modifier, and if that don't work then use the internal visibility modifier, and then lastly if that don't work, only then use the public visibility modifier.
Not to get on a rant--but AI has absolutely wasted more of my time than helped. It's helpful predictive naming/renaming of variables based on what I have been typing has caused me pain by altering working code. I no longer use it. (Looking at you Co-Pilot!)
I was a splitter to the most atomic level for over a decade. SRP! Demeter! Abstraction! Then I realized it was stupid. Breaking a solution into dozens of 'collaborators' that, in practice, only collaborate in 1-2 ways, is mental masturbation. Yagni. The longer you can wait to make *informed* abstractions, the better. *Maybe* your sweet abstraction works. Great, you spent time writing code when it wasn't needed but turned out useful. You could also just... wait until you know what you need and write it then.
I sometimes wish I had a tool that would automatically "de-split" code for me so I could re-evaluate the value of its current decomposition. Like how some compilers optimize by inlining code, but on the source text itself.
@@babgab Exactly. Things like loose coupling have downsides. It isn't inherently better; you really only want loose coupling where its actually providing value.
Ahhhh, with age and experience comes wisdom over time. It can only be self-learned through hard lessons and reflection upon maintaining your own code base.
@@wiczus6102 This has NOTHING to do with technology or code editors. Every level of indirection makes it harder to understand. Period. Sometimes that's worth it, sometimes not.
I am a bit of both. I start with a giant lump of code, and as I start to see which parts of the code are similar and repeating, make some functions that cover those use cases, gradually refactoring. I don't like to get obsessive over doing a lot of functions early, because I don't yet know the details on what pieces will be involved yet, still working out the best approach to code it. Times when I've tried splitting too early I just end up wasting a lot of time after needing to pull all the code back out and rearranging it all.
In a recent case I was trying to create a recursive function and was having a lot of troubles wrapping my head around it. This function was not split up at all, so it was a few hundred lines of code by the end. I was about to start refactoring it with named sub-functions, but then I realised that the approach I had taken wasn't ideal, and reorganised a lot of the code and made changes. If I had already created all the sub-pieces, it would have been a lot more work redoing it. As it was, I could just quickly copy and paste the chunks of code into the new order and change the bits around the edges.
Get the code fully working, then reduce it into pieces.
a frustration in programming is that a feature (which is a mental concept) doesn't always map neatly to a single module or set of modules. Sometimes you get these cross-cutting concerns.
@@perfectionbox That's true, but we try our level best to minimise it. I've found that a more optimal solution is not the first one, have to live with it for a while. Nearly every sprint starts with some refactoring.
You always get cross-cutting concerns. Why? Because reality is full of many-to-many relationships, while typical programming approaches outside of SQL and databases are hierarchical. E.g., any JSON or XML doc is a hierarchy, not a set of relationships. What Beck is talking about here might be regarded as "factors". In psychology the "Five Factor Model" of personality arises from "clusters" of words that people use to describe themselves. So it's reasonable to regard those clusters as actual "factors" of human personality. So what Beck is getting at is that the "poop" comes in clusters, too, so it makes sense in some cases to formalize that cluster and make it a "thing" that is easier to manage. It doesn't get rid of the many-to-many relationships, but it puts them in one spot so that the rest of your code is mostly hierarchical.
Another frustration is when I create a class but then I want to have another implementation that I can swap out so try to create an interface but struggle because the implementations work in quite different ways so it becomes hard to make a interface.
Another frustration is when I want to separate concerns so I have two classes, one class use the other class, but I find it difficult to decide what belongs in which class, the boundaries of the concerns.
@fred.flintstone4099 You need to learn about Liskov substitution principle, that'll help with your first part.
I'm a firm believer in a combination of these two approaches. Splitting is best for code that is likely to be reused or needs to be more modular, whereas lumping is better for improving the readability of some more complex functions. I like the way he put it, where you're lumping the crap together so at least you know where it is.
My general rule is that functions should serve a purpose which can be described by its name. It doesn't necessarily matter if the function is small or large so long as it achieves its task. Further, anyone reviewing the code should be able to understand it without having to read each dependency. It may make sense to break some functional components into their own functions, provided they again, have a discrete purpose which can be easily conveyed by the function name, but conveying their purpose through naming is essential. Splitting things unnecessarily makes the code convoluted and difficult to read and understand. That said, lumping things together can make code inefficient to review which is improved with selective splitting.
Yes, if you split the code into too many functions you end up with ravioli code.
I think it's often harder to read things all in one place if there is too much complexity looking back at you. If I hide some of the complexity under readable object and method names, the high-level flow becomes more comprehensible. If someone is interested in the complexity that becomes hidden that way, it's easy enough to click into it.
You can do things such as splitting out the expressions in your if conditions into a private function with a name that describes what it does, so instead of "if x < 1 && y > 2 || z == 3" you do "if is_valid(x, ,y, z)".
I'm a splitter. The problem with lumping code together is that it's all to easy to create poor coupling and hence side effects. Splitting code enforces discipline.
Conversely, splitting necessarily obfuscates and makes it easier to create bad state and over-complicated flows.
Either has downsides if done naively, but its worth mentioning that coupling is only an issue if it is the wrong coupling. Obfuscation is a cost you and everyone on your team constantly pays the price for.
@@NukeCloudstalker Your point about either approach done naively is problematic, which is why it shouldn't be one or the other, but rather an intentional application of both.
@@michaelbaker2718 Nice reply.
@@NukeCloudstalker Maybe my comment has to do with recent refactoring experience of files 8000 lines long!
But I do take your point, it's all about splitting wisely.
"Splitting code enforces discipline"
That doesn't track with my experience. In my experience, splitting code gives you a *facsimile* of discipline, by ticking the boxes that The Masters say you should tick, that can deceive your team into thinking they don't need to spend the energy being disciplined. Everyone can go off and play in their sandbox at the level of quality they find fun, and the end result is hundreds of tiny trivial functions, lots of duplication, and no way to see the big picture. Lumping code makes it obvious up front that, among other things, you need to be disciplined to not turn the code into something makes your successor scream in frustration, so people are more motivated to actually spend that energy on discipline. Lumping code forces collaboration because you're all working in the same space. Some people don't actually like collaboration, of course, so splitting makes them happy...
I am a lumping old man programmer. Early in my career (1980s), I learned how to read large programs really well, and was able to track down difficult bugs. Developing a program is a lot like the stages in writing (drafting, revising, and editing). Editing doesn't come first, it comes after the final draft. For me, its easier to have it all in one place as possible to do the drafting and revising. Splitting decisions is a lot like editing decisions.
I am fine with splitting it up. YET, I must be able to see ALL functions/procedures in a file in one single listing. I do not like coding environments that only let you see the function you are working on. I need to be able to see it all at once and scroll as needed. I started in to mid '80s. COBOL and BASIC followed shortly by C and Oracle Pro-C. Was also familiar with Fortran and Pascal at the time, though that was likely less than 5% of my time with the bulk of my work in COBOL or BASIC depending on the project.
As an avid splitter for years, I recently rediscovered being a lumper, specifically for a class that represents an object (and many derived objects) in a game. This happened strictly so that the machine would not be jumping around vtables and pointer lists in performance-critical code.
Used to be a splitter but more and more becoming a lumper. It's just so much easier to debug or change something if all the logic related to one thing is in the same place, and some parts are not reused in other places. Of course ultimately it depends on the situation. Repeating something in a few places often makes the system itself a lot simpler and more robust, even if it looks ugly or wrong to us developers
There is always a balance between the two. Since both approaches have own benefits, it only comes with practice when lump is too big and when split is too sparse.
I'm a lumper, and proud of it!
I prefer splitting. It might be an adhd thing. I get overwhelmed trying to hold too much state in my head
This comment doesn't make sense to me. If you split you need to keep more state in your head?
@@pudicio Well you can move out a long complicated if condition into a private boolean method with a good name.
I think I am a lumping splitter. I like lumps of related code split up into relevant functions/procedures. I guess I like classes, yet I do not like multi-inheritance. I learned the hard way back in the early '90s that multi-inheritance is a complete recipe for weird behavior and bugs that can be difficult to reliable replicate and troubleshoot. It sort of like having two equal bosses at work assigning you conflicting tasks that sometimes have incompatible requirements.
I do both depending on the tool/language.
For my last prototype I was a slumper. For my last micro framework project, I was a splitter.
One needed to get developed fast to have something working to show. The second one needed to be correctly architectured, readable, understandable while still working.
One thing is that you could *NEVER* correctly go from one to the other: you have to rewrite from scratch!
In code, I think it is easier to split lumps than to lump splits.
@@aytviewer2421 You have to rethink the responsibilities and contracts to create clean interfaces (not "interface" in sense of some OOP languages). Elsewhere it will still be slump code in a clean code attire...
Splitting relies on trust that down the line the things are what they say they are and that they are grouped in a way that is predictably accessible.And if the naming is good then it is also easier to navigate.
Well, I would think you start with lumping - draw the big picture - then breaking it down into details...
That is how I started, though with experience it became more obvious where to split and when it's best not to. When in doubt, I tend to lump, because as you say, you can always break it down later.
When I need a fast-as-possible prototype for a demo, I lump. Otherwise, for reliable production code built for years -- I split.
For me, mental load is important. There's a maximum of stuff I can keep track of in my mind.
If the code is very repetitive I don't mind files with thousands of lines in the global scope.
If the code is clever, then things need to be split up, scope needs to be limited, files smaller.
Mental load is eased by having 3-4 large monitors where you can see many windows, specs, functions, and data at a glance. At least it is for me!
I'm a splitter, but am extremely keen on keeping things grouped as locally as possible, like modules, to prevent improper (re)use of methods private to the module. In C#, I keep several classes in the same file as a module if it's small enough, or even use nested private classes.
And always use the private visibility modifier, and if that don't work then use the internal visibility modifier, and then lastly if that don't work, only then use the public visibility modifier.
@fred.flintstone4099 definitely!
all in one place make easier for AI to get context of your code
Not to get on a rant--but AI has absolutely wasted more of my time than helped. It's helpful predictive naming/renaming of variables based on what I have been typing has caused me pain by altering working code. I no longer use it. (Looking at you Co-Pilot!)
I was a splitter to the most atomic level for over a decade. SRP! Demeter! Abstraction! Then I realized it was stupid. Breaking a solution into dozens of 'collaborators' that, in practice, only collaborate in 1-2 ways, is mental masturbation. Yagni.
The longer you can wait to make *informed* abstractions, the better. *Maybe* your sweet abstraction works. Great, you spent time writing code when it wasn't needed but turned out useful. You could also just... wait until you know what you need and write it then.
I sometimes wish I had a tool that would automatically "de-split" code for me so I could re-evaluate the value of its current decomposition. Like how some compilers optimize by inlining code, but on the source text itself.
@@babgab Exactly. Things like loose coupling have downsides. It isn't inherently better; you really only want loose coupling where its actually providing value.
Ahhhh, with age and experience comes wisdom over time. It can only be self-learned through hard lessons and reflection upon maintaining your own code base.
A lot of the time people create huge amounts of indirection by splitting in the name of clean code.
Indirection is removed by the compiler so who cares.
@@wiczus6102 The humans that have to maintain that illogical mess
@@adambickford8720 Putting a piece of code under a category does not change the logic in any way and is easy to navigate if you don't code in notepad.
@@wiczus6102 This has NOTHING to do with technology or code editors.
Every level of indirection makes it harder to understand. Period. Sometimes that's worth it, sometimes not.
@@adambickford8720 You don't understand what you've just said, that's why I won't argue with you.
Both extremes are bad. The best place to be is in the middle.
Exactly!
I am a splitter.