NEW STUDY Does Co-Development With AI Assistants Improve Code?

Continuous Delivery

Просмотров 28 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2025

Комментарии • 265

@ContinuousDelivery 2 месяца назад ⁺⁷
SIGN UP FOR AI STUDY: Does Co-Development With AI Assistants Improve Code ➡ form.typeform.com/to/PnVpuZGr?
@youroldmangaming8150 2 месяца назад
Here is another question. When you have a sev1, with some AI generated/automated code who is going to take ownership of the immediate support and remediation. Good luck getting any code through CAB without an owner.
@glebbondarenko67 2 месяца назад
Hope, we will see the video about the result :)
@adambickford8720 2 месяца назад ⁺⁵³
I've found it slows me down and traditional static analysis tools are still better. They are faster and don't hallucinate.
@giorgos-4515 2 месяца назад
I really wonder if LLMs get the static analysis tools outputs if they could do anything actually useful.
@puntoycoma47 2 месяца назад ⁺¹
New guy here, what is static analysis?
@traveller23e 2 месяца назад ⁺⁶
@@puntoycoma47 Basically a clever algorithm in the IDE (either integrated or provided by a separate engine via some plugin system) that looks at the code and detects some set of errors and warnings. The limitation is that the algorithm is only as clever as the amount of effort put into creating it and can be limited by language design, so for example a lot of them have poor or nonexistant null-dereference checks, but they do tend to be good at finding obvious issues as well as (usually) figuring out whether or not the program will even compile.
@ErazerPT 2 месяца назад ⁺¹
@@giorgos-4515 Not much a non-llm model couldn't do better. Give it the "bad code", give it the "good code" human corrected post static analysis. Train over LOTS of it. Now the model will correct your code as he learned to do. Bad part? Needs training data. And good training data, or it's GIGO. Might hallucinate here and there, nothing you can do about it, it shares that problem with us.
@purdysanchez 2 месяца назад
I haven't messed with it much, but if you know what you are trying to do and clearly define the problem, it does OK in giving code suggestions in a narrow scope. The problem comes down to the developer. Some are inclined to just use it if runs without analyzing and revising the code.
@AndrewBlucher 2 месяца назад ⁺¹³
Great project!
In the 1980s I did a minor thesis on Programmer Productivity. Many, many, tools and systems were marketed with productivity claims, but none of the vendors had test results to back those claims. At this time there were several research projects on code completion and code suggestion, and I had the pleasure of watching as these came to market with the real results we see today.
The big issue that I see with this so called AI code generation is, apart from the reproducibility issue Dave mentions, hallucinations. After all LLMs are chatbots. They are not reasoning about how best to solve the problem.
All the best with the project!
@traveller23e 2 месяца назад ⁺⁶
Also even if they did always produce valid code it would be essentially equivalent to constantly taking the first result on stack overflow. It might _work_, but you're likely to be misusing libraries and leaking memory all over the place.
@davidmartensson273 2 месяца назад ⁺¹
@@traveller23e I have very little experience so far with using AI assisted code but the little I have I have treated the same way I do any code I find on the internet, as a suggestion and a source of ideas.
The code might be bad or plain wrong, but it might at least contain relevant methods, classes or libraries that I was not aware of, or a pattern for solving the problem I did not know.
With this knowledge I can search for more specific info on the topics and either validate the code, fix the code into something usable, or hopefully at least find some piece of help.
Even a solution that proves to be bad might be helpful as it might spur new ideas of my own.
Will AI be better than random searches on stackoverflow or google, I do not know, I expect that over time the results will improve, but I also expect that I will at least for the foreseeable future need to tweak most code, unless its trivial things where the main benefit is not having to write the boilerplate parts.
For the inline help in visual studio my main experience is bad, it very often suggests very wrong things to the point it actually reduce productivity since I as a reflex spend time trying to read and understand something completely out of place, or having to erase something it autocomplete, but thats mainly due to some complete stupidity that space should be the autocomplete key and that seems not be changeable :/
@traveller23e 2 месяца назад
@@davidmartensson273 VS's "TAB to insert xyz" feature is horrendous, you never know what TAB is going to do the next time you hit it.
@seanwoods647 2 месяца назад ⁺¹⁰
If you need an experimental control, I've written an HTTP server from first principles in both Java an Tcl. I'm the author of the httpd module in Tcllib.
@k98killer 2 месяца назад ⁺³
Copilot is good at filling out mindless boilerplate, which is a significant amount of "enterprise" code in Java specifically. For Python, I have found Cursor to be helpful for refactoring or generating sample code to play around with new libraries.
@charliemopps4926 2 месяца назад ⁺⁵⁹
The problem is, Management doesn't want "Good code" they want solutions fast... they don't care if its good or not. AI is exactly what they've been looking for.
@traveller23e 2 месяца назад ⁺⁴
fast code is what led the PR I'm reviewing to be stuck in the open stage for months as the dev responsible went from hastily patching one critical bug to the next after a poorly managed piece of development leading into a premature golive.
@CallousCoder 2 месяца назад ⁺⁴
This is so true. I run into this at my banking customer all the time.
Just this morning 20 lines of OR statements in a SQL script all checking a table not to be null. So I wrote to make this more clever.
The manager was like: “that’ll take time we just copy the line and change it.” And I wonder where the pride of these developers has gone and how management can support even less than mediocrity.
@CallousCoder 2 месяца назад ⁺¹
@@traveller23ejust commit it and run it shadow that’s the only way you’ll see if it is working all fine. When you have end to end tests you’ll be happy as you can already have a preliminary outcome but even then massive bug fixes you want to run in parallel for a healthy amount of time.
@alanmacmillan6957 2 месяца назад ⁺⁴
and this is why we end up with problems like the post office scandal.
@CallousCoder 2 месяца назад ⁺²
@ that was more a legal fuck up.
Sure the software was bad but nobody in their right mind would suspect franchisees that did operate for 20 years to full satisfaction that they are fraudulent by not first suspecting the brand new software! And Two Tier Kier was the main prosecutor hmmmmm who’s really incapable and to blame here?
@danielt63 2 месяца назад ⁺²⁵
Personally, I think it does a better job of doing an initial review of code rather than writing the code in the first place.
@jacquesduplessis6175 2 месяца назад ⁺¹
Yes, agreed. It seems to be much quicker at finding the obvious (much repeated) flaws/bugs when looking a piece of code.
@baruchrevivo2966 2 месяца назад
Thank You 😊 I was usually putting myself in the solution verifier position. All I got was cognitive overload and a wish for a nap.
@seanwoods647 2 месяца назад ⁺²⁶
Ok, so most of the "people" I see posting about how AI is an "improvement" seem to be throwing out a lot of emotional arguments. But no sign that they've actually written a single line of production code in their life. Speaking as a guy who has been writing code since the age of 10 (and I'm 50 now), every attempt I've had with AI code generation with an LLM has been a shitshow. Yes, it will cook up some plausible looking snippet. But it will just invent a fluffy little cloud of a function that does all of the heavy lifting off-screen. And when you try to track down what this mystery function is, it doesn't exist an any API.
When I ask an LLM to generate a specific solution, it will regurgitate a tangentially related example from Stack Overflow. And I know, because the problem I was trying to solve was in corner cases because 10 years ago I had written a library function built around that very example that I had cribbed from stack overflow.
It's a mimic. And the worst part of a mimic is that what it produces sounds perfectly plausible if you have no idea what you are doing.
And before you start calling me an AI hater, I pay the mortgage writing expert systems. I know what machine generated responses should look like. I also know that the proof is in the regression testing. And I also know that letting a machine off the leash is a guarantee you will be bitten on the ass. I'll be happy to show you my scars.
@plaidchuck 2 месяца назад ⁺³
Any idea when companies may realize this and start hiring entry level people again?
@darylphuah 2 месяца назад ⁺⁸
anyone saying AI is improving their work has outed themselves as mediocre devs.
@pawelhyzopski6456 2 месяца назад ⁺³
@@darylphuahim not top. Self taught only. And whenever i get answer from ai is always a generic. I better write generic stuff myself and learn something from it.
@ikusoru 2 месяца назад ⁺²
@@darylphuah This! 100% true
@mandisaw 2 месяца назад ⁺²
@@plaidchuck As soon as the stock returns stop chasing the hype. Outside of Big Tech and startups though, plenty of companies are still hiring entry-level. Depends where you are and what your Edu & Exp looks like.
@AmiGanguli 2 месяца назад ⁺²⁵
AI is super useful. No, it doesn't deliver useful code of any real scale or complexity. But it can save a ton of time looking up API calls. That's what really takes the most time in programming nowadays. I can whip up a little algorithm to manipulate a tree structure or something like that in no time. It usually even works first time. But finding the right library function for my current requirement and figuring out what it needs in order to work properly sucks up my time. If ChatGPT knows the library, it can whip something up that probably doesn't do what I want, but likely uses the right API calls in mostly the right way. And figuring that out is 80% of my day.
Well, that and meetings. If I could send ChatGPT to meetings, that would be a real time-saver.
@purdysanchez 2 месяца назад ⁺⁵
I guess if you're working with a license that allows you to feed it the entire documentation as context. If not, it regularly makes up fake function calls that don't exist in an API.
@harmless2u2 2 месяца назад
@@AmiGanguli 100% agree
@AmiGanguli 2 месяца назад
@@purdysanchez Hmm. I haven't had that happen. I have had cases where it uses old versions of the API that are no longer supported, or even mixes different versions. That's a pain, but it still give a good starting point most of the time. You at least know where to look in the docs for what you need.
@mandisaw 2 месяца назад
@@purdysanchez Even if the docs are public. Google AI Search wasted 10min of my precious side-project time chasing an API hallucination. Looking up the real API manually, and writing the code I needed took about 10-15min. Reading docs isn't hard - reasoning about what you need from them relative to your problem-domain is what takes the time, and LLMs can't meaningfully help with that.
@marcbotnope1728 2 месяца назад ⁺¹⁵
What i have noticed is that the AI tools "kills" IntelliSense based is a code-completion. Replacing it with random guessed completions that very often is not an part of the interface for the object you are working on.
@matsim0 2 месяца назад ⁺⁴
Yes, right? Sometimes it's useful and saves you a trip to stack overflow or the docs, but often it gets in the way by suggesting code completions that make no sense while intellisense would have given you the right answer immediately.
@ianosgnatiuc 2 месяца назад
Because it lacks the context. It does a lot better when additional context, such as available types and methods, are included.
@nonickch 2 месяца назад
Oh god yes. It generates code and then supresses all the IDE warnings about it. Good luck trying to spot that error when you've learned to rely on your IDE for the sillies of bugs.
@Roboprogs 2 месяца назад
Other non-MS IDEs also have type/inference based autocompletion, by the way. “Intellisense” makes it sound like only Microsoft can do this. But anyway, AI makes noise…
@gaiustacitus4242 2 месяца назад
AI even includes deprecated object libraries which are no longer supported or distributed. It doesn't know that the code it was provided is no longer viable.
@chrisnuk 2 месяца назад ⁺¹⁰
It's radically reduced the barriers to entry. There will be so much code written over the next few years. At the moment, the people who need help with their VBA, Python, or SQL query have disappeared. In a year or two, there will be a mess to clear up. I think our jobs will evolve, but they always have.
@almazingsk8er 2 месяца назад
I have worked with engineers where I ask them the same question in multiple instances, "What node version are you on?" and they ask CoPilot how to find that out each time. I know people building apps right now who don't have "node -v" memorized because they can just ask copilot whenever they need it. It's having a weird effect on how people learn to code. It's weird asking someone a question during pair programming, then watching them type it into a chatbot and then begin to answer the question after they skim the response it gave them.
@KulaGGin 2 месяца назад ⁺⁸
Signed up. Interesting. In all he years I've been watching YT since 2006 I haven't gotten to an interactive video like this(an actual challenge).
@ContinuousDelivery 2 месяца назад ⁺¹
That’s awesome. I hope you find it something worthwhile and productive.
@leopoldodonnell1979 2 месяца назад
I really appreciate this. This study will help to provide useful data driven insights into how we can improve the productivity of our industry. On a smaller scale, we're doing an internal study to determine the impact of AI assisted code comparing Dora metrics, commit churn, and PR feedback against team norms in addition to some less empirical data such as perceived cognitive load.
All said, and from comments in this thread, its clear that people are going to have a lot of feelings and opinions about AI assisted code - this is human nature.
My personal experience is that it has helped reduce the amount of times if had to go back to the manual or visit stack overflow. Its also, IMO, a better alternative to Low Code / No Code solutions which nearly always end in tears.
@RiaanRoos 2 месяца назад ⁺¹
Over the last 24 months, I have been experimenting with different ways of pair programming with AI and think that although it is still limited in its real-world application, it is improving.
The latest release of OpenAI's canvas is a step in the right direction to help with the problem of "new code" being produced each time you prompt AI for help.
Plugins for 'co-pilots' also are getting better at alleviating this problem.
I have had the best success when starting with well-written unit tests and then letting Gen AI write just enough code to satisfy the tests!!!!
This mirrors my normal workflow the closest and constraints AI well enough that I do not experience the hallucinations commonly seen when just 'letting AI do the work'.
@matthewjamesbetts 2 месяца назад ⁺⁸
Michael Feathers and Thoughtworks are doing and writing a lot of interesting things on this topic, including for example, TDD as a way to get code from AI that is tested and fits your design, and AI at many stages of software development, not just for coding.
@donharrold1375 2 месяца назад ⁺²
I’m not a professional programmer, but I do use coding to solve complex problems. I usually have a very clear idea of what I want to achieve before I start writing code, whether that’s a complex statistical analysis, machine learning or chemical engineering analysis. Thinking about the problem and developing an idea of how to solve it is much more important and time consuming than developing the syntax. In that respect, AI has been a game changer for me. Once I’ve set out the problem then code can be generated in a matter of minutes to solve it. That used to take quite a long time. So, AI yes it definitely helps people like me immensely.
@ikusoru 2 месяца назад ⁺¹
Let’s say that you use AI to crunch some numbers and analyze a dataset. How do you verify it is doing exactly what you want if you don’t fully understand the code syntac and therefore the program generated by the AI? (There must me at least 1% of faith you have in the generated code. Definitelly can't be 100% certain)
@donharrold1375 2 месяца назад
@@ikusoru I do work through the syntax and check that it makes sense. I also sense check the results; fundamentally I understand what the output should look like. My point is that I don't have to dogmatically type every line of code. Entering code takes a lot of time, particularly if you don't type quickly.
@stevetodd7383 2 месяца назад ⁺¹
This reminds me of when 4GLs were new and shiny. IBM gave one to all their staff and told them to use it to solve their problems. In the short term productivity went up as the programs created automated the work. After a while however they discovered that productivity had fallen below their starting point. On investigation the extra time was being spent maintaining the code. The cost of an application isn’t just in how long it takes to create.
@MikeOchtman 2 месяца назад ⁺⁷
It generates bad code very quickly. You have to be able to code yourself, and use the ai to provide some of the boilerplate and tedious stuff. It does know some good algorithms, but it is not good at creating solutions to new problems.
@gaiustacitus4242 2 месяца назад ⁺¹
Those good algorithms are stolen from someone's copyrighted work.
@esra_erimez 2 месяца назад ⁺⁹¹
The short answer is: No. The longer answer is: No, it's not. I have used different AI tools to assist with development and it didn't work for me. Although, it was helpful to me in learning new topics and augmenting search engines.
Edit: Here is a case in point. There are some deprecated library functions I needed to replace. The AI tools made up library functions that simply didn't exist. I tried changing the prompt, I tried giving feedback but in all cases never got a solution and they tended to go back to recommending the deprecated library functions after a while.
@homomorphic 2 месяца назад ⁺³
Yeah, but I bet those APIs *should* exist.
@oysteinsoreide4323 2 месяца назад ⁺³
You can't use the code directly. But at least I use it as a tool to get ideas.
@sashbot9707 2 месяца назад ⁺⁹
I am much more productive with ai. And in my Company I can Show that I am around 2 Times faster than my peers with the same code quality.
@Paul-uy1ru 2 месяца назад
Have you tried gpt 01 preview?
@purdysanchez 2 месяца назад ⁺⁷
The biggest danger is that people use it to write code in technologies they are unfamiliar with instead of reading the documentation.
@technolus5742 2 месяца назад ⁺²
I think you are conflating 3 productivity use cases:
- code completion (minimal use - single/few line completion)
- code generation (maximal use - complex function code, full page/ multipage generation, project wide guidance)
- mixed approaches (styles that mix the two extremes)
@AnnCatsanndra 2 месяца назад ⁺⁴
In my subjective experience, it depends but is generally more effective than the old "copypaste from stackoverflow" approach.
More useful than the in-IDE code assistants though are the LLM chatbots where I can ask specific _questions_ and then figure out what code I need from there and a little bit of verification for good measure. That and if I _really_ need to pump out some easy to do but tedious code, asking an LLM to intelligently fill that is faster than writing my own code generator for it.
But if we're specifically talking about the autocomplete copilot style, yeah, I stopped using most of them because they break up my train of thought more than they help. Usually when I'm *actually about to type in the editor* I've already decided on what I plan to implement, and having huge overbloated contextless code snippets pop up while I'm doing that is more annoying than helpful.
@eaglethebot1354 2 месяца назад ⁺²
I use it for all the infrastructure stuff that I don’t feel like digging through docs to debug, such as CICD pipelines, infrastructure as code, networking, etc.
Google says AI can’t reason, but anyone who’s tried to have it write code can tell you that. But When you give a task that only has a few correct answers it does quite well.
@purdysanchez 2 месяца назад
I think it's a good starting point. Say you don't use a technology often. Have AI offer some initial idea, and then it's easy to check the docs to make sure the AI is actually doing what you asked for.
@mrpocock 2 месяца назад ⁺⁷
I have had good experiences with ai code assistance. I probably count as a senior developer - been coding in one language or another since the 80s. You get code for free. It takes work and knowledge to get quality. Sometimes it is enough to let it make working code, and for that it is already there.
@alrightsquinky7798 2 месяца назад ⁺³
I write Clojure code, so my code bases are too small to even merit using AI. I have my whole code bases memorized. It’s nice to have a complex web server implementation with only a few hundred lines of code.
@timelschner8451 2 месяца назад ⁺¹
The assumption that the code generation is a one-shot-trial-from scratch approach is not the best way to use code generation. As a user you provide a high level idea, break it down in smaller tasks and iterate and backfeeding the code to the LLM. In my experience so far, the LLM is great for creating correct and mostly effective code IF you prompt the system correctly and YOU know how programming in general work then LLMs are very valuable. LLMs can take off the burden of learning syntax and writing tedoius code blocks, helping you learning und understand new concepts and also, if you put in the effort, become a better thinker and programmer. On the other hand, If one is just lazy and uses the LLM mostly as a Copy Paste machine hoping to put the brain in idle mode, one will loose brain power in the long run. Are LLMs good? It just depends on the individual mind set.
@itskittyme 2 месяца назад ⁺¹
This guys gets it. And it seems we are in a minority bro xD
@immunoglobul 2 месяца назад
Thanks
@ContinuousDelivery 2 месяца назад
You're welcome!
@BaldurNorddahl 2 месяца назад ⁺¹
There is a tool called Aider that most developers don't know. This is a different way to use AI for code generation. I have found that for some tasks, I can use Aider to produce software much faster than would otherwise be possible. The trick it is to learn how to use it including when not to use it.
While Copilot is something that tries to help in the editor, Aider is like a team member, that you will ask to do a task. Give it an example of a feature and tell it to implementing a new feature. I might have a feature to create an object in a database and now I need to have a feature to update and a feature to delete - a few moments later I will have just that without coding a single thing myself.
@semsomify 2 месяца назад
I personally use it to generate code snippets for certain specific tasks. For example, if you're going to use a new API, you can ask it to generate a code snippet that's close to your specific uses of the API and also ask it to explain the API. It also helps for some data processing tasks. Also debugging. So it saves a lot of time you'd otherwise have spent debugging, reading documentation. When there's code I am very familiar with, using it would actually slow me down. But to generate full projects, I think we're not there yet.
@juaneshberger9567 2 месяца назад ⁺¹¹
Working with medium sized repos (don't know how effective it will be with super large code bases). If you write good tests, documentation, and make sure that your functions are small (only do one thing) and have clear names, I have found really effective results with claude sonnet when writing single functions, also if you have good tests you can feed its output into the llm and get good feedback.
In summary, good code practices, TDD, documenation, clean code, small functions/classes and propers levels of abstraction are even more important for AIs than they are for people.
Giving a project context to claude (with docs and tests) and only asking for one function at a time has given me realtively good results.
@giorgos-4515 2 месяца назад ⁺¹
Can LLMs handle a codebase worth of information??
@EiziEizz 2 месяца назад
TDD is for gullible idiots that think that creep uncle bob is not a scamming psycho
@almazingsk8er 2 месяца назад ⁺⁴
It does do well with OOP specifically, I have found. I used copilot pretty heavily to write C# code and it has a much easier time following strict OOP standards than when prompting it via chat.
That said, I did run into instances where it suggested outdated libraries, nonexistent functions, or introduced very subtle bugs. The moment I began to "trust" the code it wrote was the moment I found myself debugging something nasty that was the result of a small hallucination. It requires quite a bit of diligence to produce quality code while working with an LLM is ultimately what I concluded, and whether or not the diligence is worth the time/effort spent was relatively situational.
@retagainez 2 месяца назад
@@giorgos-4515 No. I think that's the purpose of RAGs.
@ikusoru 2 месяца назад ⁺¹
@juaneshberger9567 I had a different experience using Claude Sonnet on smaller and medium sized repos. Same as with Copilot and Cursor (well and ChatGPT) the halucinations are still quite a problem and I find it takes me more time to review the code and fix issues than it would if I write it manually.
@SupBro31 2 месяца назад
I wrote a simple lexer in go. Co-pilot tried to improve it by duplicating matches or getting the order wrong. Tried to cajole it to do it right but ended up doing it myself. Prob. cannot read regexes and it fails ordering it from most specific to general.
@jahelation8658 2 месяца назад ⁺¹
Copilot invents function calls on 3rd party APIs that do not exist on the API. This is frustrating as hell. Seen this especially with Crypto code.
@kkiimm009 2 месяца назад
I find AI extremely useful when I work with languages and frameworks I can read and understand but don't understand so good that I can effortlessly write it. It is also very good at converting stuff from say a sql table to a entity class and so on. I also recently fed it a description of an import text file that has fields on given column positions and lengts and so on in a text file, and it made that parser without me having to do that boring work myself. It is also very good when I dont remember how I did something and instead of googling after an example it usually give me the correct answer on the first try. And so much more.
And they are pretty good at not changing everything but only changing the place that needs fixing. If you already have som code and ask it to fix a problem then it dont give you a whole new code, it just fixes the spots that need fixing. Usually.
@mattbristo6933 2 месяца назад ⁺²
AI is great when you need snippets of Html, JavaScript or want it to check some code to offer improvements. When I have asked it to write code, it doesn't know the problem domain and cannot reason about it and therefore the code is not always optimal.
@saucetguitars 2 месяца назад
I found Cursor AI incredibly useful personally. I don’t expect it to write fully functioning code, to me it is more of an autocomplete on steroids. It speeds up repetitive tasks and is usually pretty good at guessing what I’m going to type, which saves me a lot of time. It has on a few occasions helped me refactor some code by just asking it to do so and explaining what I wanted it to do, but it never does it right on the first try. You have to keep prompting it with corrections, and more often than not make manual changes too.
@ScottLahteine 2 месяца назад
I maintain a large and active open source project with a lot of C++ with meta-programming, and so far AI coding tools have been only marginally useful in the main codebase, mainly by providing better and more context-sensitive auto-completion while typing code. Where they have been more useful is in writing the support scripts, primarily in Python. These are not usually very complex, but the LLM still needs everything spelled out in advance in plain language to get the best result. It might be possible to get LLMs to produce better code by iterating more on smaller units using agents, allowing them to work from high level overview of the task down to the fine details in a tree-like iteration pattern, breadth-first I presume. This approach would also help deal with the relatively small context windows of current models. Add in a code review agent and even better things could happen….
@andrewdunbar828 2 месяца назад
I use them to modify open source software. Sometimes they're too bone-headed to get what I want. Sometimes I get good help after asking several different ways. Sometimes they seem to both read my mind and have deep knowledge of the repo. Sometimes the code magically writes itself, just works, and passes review. Sometimes they are able to explain the missing links between the parts of the code I want to touch so I can write the interior connecting parts manually. Sometimes they break code that already works. Sometimes they're a step ahead of me and give me what I want just before I realize I want it.
@tradingisthinking 2 месяца назад
It is my second mind, i dont need to learn every detail of some tool, it helps a lot :)
@alanmacmillan6957 2 месяца назад
I've been using AI to generate code but it's heavily dependent on the definition you provide in the first instance, and the examples on the internet it uses to construct the "solution". It's useful but.... you have to have the personal knowledge to understand if the solution it's presented really does do what you actually want. Similarly I've seen it generate code that on initial inspection seems to do what it says on the tin but in subsequent testing it either isn't suitable or needs significant rework. What you also find is it brings a new type of technical debt:- i.e. you get a lot done more quickly {initially}.... but when it needs to be modified, enhanced, reworked to meet a new requirement or a bug found and fixed it can be hard unpicking something that you didn't write yourself and the learning curve you avoided first time round you have to overcome sooner or later. it's also diabolical when you need a solution to a more arcane and abstract problem. (e.g. legacy machine code or proprietary hardware)
@erditanergokalp4311 2 месяца назад
is thre a link to the research study paper?
@thepaintedsock 2 месяца назад
I've found it useful for building examples to help me move forward with technologies I am not familiar. This is especially useful for devops, full stack development and training where it fills skills gaps.
The downside is the code provided is verbose and often has mistakes and made up solutions. This can set a person back in time, so it can be a bit touch and go. Its either use that or go on stackoverflow, wait and get your question brutally downvoted and closed, especially if its devops or aws.
Chatgpt is a joke but claude and cody are pretty good.
The code however is boilerplate stuff and doed not compare to the quality, speed and finesse of a coder fluent in the language.
I only use it for my skills gaps.
@HarleyPebley 2 месяца назад ⁺⁵
Reminds me of the old joke: Ask 10 programmers for a solution to a problem and get 20 different answers.
For the new world: As 1 AI for a program as a solution to a problem and get 65535 answers.
:-D
@Roboprogs 2 месяца назад
Did you mean to say get -1 answers? Signing off, now 😉
@andrewdunbar828 2 месяца назад
Off-by-one error? Or 0 or -1 is an error flag?
@HarleyPebley 2 месяца назад
@@andrewdunbar828 unsigned 16 bit max value
@HarleyPebley 2 месяца назад
Unsigned vs signed 😉
@Roboprogs 2 месяца назад
@@andrewdunbar828 2s complement integers wrap as you add 1 (from biggest positive signed value to most negative value) at just under half the max unsigned value: 0 … 32677 , -32678 … -1
@Rick104547 2 месяца назад
I use it to for specific tasks sometimes like generating a good name for a unit test. It also works pretty well to 'search' through documentation, ofcourse you always have to check but it's faster to do it this way.
For the rest it's really a hit and miss for me.
@Danielm103 2 месяца назад ⁺¹
I see a lot of people trying to use AI to generate AutoLISP code for AutoCAD. The forums are littered with people asking for help, to fix AI generated code. I tried to make a AutoLISP generator where the user could type something like, create a cube 100x100,100. I would pass the info to AI and try to execute the results. Kind of neat, but didn’t work too well
@TheBackyardChemist 2 месяца назад
I think it is obviously not there yet, but that may change, given enough time and effort by the developers of AI systems. But I think if there is one paradigm that is best suited for code getting written by AI, it is TDD. Humans are required to write a bunch of good, new tests, and the task of the AI is to make all new tests pass without breaking any of the old ones. Putting in execution time and peak memory usage tests (fail if the program uses too much time/space), as well as a lot of random fuzzing tests is probably also a good idea if you try to do this.
@Yulrag 2 месяца назад ⁺²
I found AI to be good at explaining obscure topics like some Maven configuration aspects (although xml itself was faulty, but it gave me a push in the right direction). But to write code by prompt? It will take me longer to write to a good prompt than to write code itself.
I would say, this incarnation of AI assistant is good at preventing procrastination at difficult points (since instead of googling you just ask a question and get an exhaustive answer), but it won't improve code itself. Then again, I have not worked with something well integrated with IDE yet, so I may be wrong on this.
@tedand 2 месяца назад
@@Yulrag try Cursor IDE, it's a VSCode fork that is very well integrated with LLMs like Claude Sonnet. It makes a fantastic assistant to me.
@xananacs 2 месяца назад ⁺²
AI is fairly good as an expensive and energy intensive snippet provider.
If the exercise is something that could be defined a starter template like a node http server, I would guess the study can't yield any interesting result. Of course AI can do that very fast, but so could a snippet collection.
The exercise has to be about making something new.
@mandisaw 2 месяца назад
A good real-world example would include incomplete or domain-aware specs. So much of the online training data is basically beginner web tutorials, or documentation samples, so it's probably useful for people who only handle those sorts of basic tasks. But production software usually involves a lot of decisions, even when the final code isn't all that novel.
@xananacs 2 месяца назад ⁺¹
@@mandisaw that's a good insight yea
2 месяца назад
Would be interesting to😢 read that study. I worked for Swedish IT a lot, I know that Lund U-ty is in top there.
Now I am in differret terithory, trying to evaluate if copilot is good or not.
It looks that juniors love it while seniors hate it.
So I think that this tool generates statistical median.
@matthewhussey1980 2 месяца назад
Hi, I've signed up but can't find a link to contact people for questions so asking here. If using AI for the study should we be using it as if it is leading or as an assistant?
For example, I have recently been coding using AI as if I'm an idiot to see how much it can do. This isn't how I would do it if trying to be productive, which is more a few questions but mostly using it as a speed up for intellisense.
@wonseoklee80 2 месяца назад
It’s improving in ways we’ve never seen before. While it’s not a direct replacement for human coders, it serves as a significant performance enhancer. Until AGI emerges (whether in the near or distant future), AI will play a crucial role as a productivity booster for all developers.
@PatrickMetzdorf 2 месяца назад
This understanding of AI-assisted code may be a bit outdated, though.
Using a web app like chatgpt, yea you get from-scratch code every time, indeterminate and vulnerable to quality issues every single time.
But the tooling ecosystem (e.g. Cursor) allow you to use AI assistance in a more controlled and granular fashion, with in-line editing, context enhancements (documentation like frameworks, code conventions etc).
So the more productive way is not to ask chaGPT to "Write an app that counts sheep", but to set a framework for what your code should be like, e.g. via a set of types a code convention markdown file etc, and then you give it more precise instructions, e.g. "implement this interface with the params object I have given you." or "update the Feed component to add error handling for the case when responses are slow", and so on...
We need to use AI for what it's good at (doing the low-level grunt work and explaining things), and give it clear instructions for what it is not good at.
@brianpendell6085 2 месяца назад
Here's a concern: Are AI coding assistants going to inhibit the development of better coding practices?
I ask, because I'm watching your excerpt on functional programming. But what if , back in that time, the coding was done all by AIs which were trained on 2000s era OO code? How would we ever adopt a new programming methodology, or push for "clean code", or add dependency injection, if all the copilots are trained on last decades code ? There isn't anywhere enough samples , at the start, to train AI on new approaches. Does this mean that AI will act as a boat anchor on the development and innovation of new programming approaches, since they will always try to pull their users back to what they were trained on?
@devsuvara 2 месяца назад
AI tools are just search tools, I use them to look up things I could find in the broader pool of experience. Like which API should I use for this, or what is the call to run a quick sort on a list of Typescript objects. That kind of thing.
I’ve used it once to write a program, but then replaced most of the code anyway.
@joseph.cotter 2 месяца назад ⁺¹
A couple of quick comments. This type of study without some structure as to how the developers would use the ai model is imo close to useless. Second, repeatability is only an issue if the aspect of version control is offloaded to the a.i. This is a pretty obvious failure of an assumption. If, on the other hand, the a.i. was tasked with looking at code and generating, refactoring, etc... on a component by component basis with the developer acting as the one maintaining overall structure and version control, we are talking about an entirely different situation. Could a.i. handle the whole task start to finish? At some point maybe, but I don't think that is feasible at this point. Can we offload many tasks to a.i. while using the developer as a critical 'manager' of the process... well that is a much more interesting and useful question.
@寺内宏之 2 месяца назад
From my experience, if I know the logic already, ChapGPT can boost my productivity in the code generation area. It is like coding by nature language.
@godzilla47111 2 месяца назад ⁺²
I have good experience in co-coding with a rubber duck that will very patiently listen to my reasoning why some code would work or not. And I have even better experience with discussing with GPT about code.
So, if anyone is disappointed in generative AI, maybe try a different approach with generative AI.
@mrpocock 2 месяца назад
It is really good as that sounding board.
@russellormes776 2 месяца назад
You can use it to generate the code that gets the test to pass and then refactor to your heart's content.
@fandorm 18 дней назад
Personally I think the argument most people are having around AI coding is wrong. Most seems to come from the perspective that AI should be doing everything and I just don’t see that. I am using AI as a coding assistant, it helps me code and often it helps me with lots of details like geometry and trigonometry and stuff that I don’t remember from school (been working 25 years with development…). But I am always in control of the structure of the code, I often manually refactor (sometimes ask AI to refactor) the generated code and do cleanup, because I do want to to work with maintainable and well structured code. And when I have reached a state that is working, I commit, then I do more generation, or cleanup with or without AI, and do new commits when in working state
I go back and forth very iteratively and I don’t feel the code produced is worse now than before but I feel I can move faster and I don’t fear tackle problems now that may have made me hesitant before. It makes me more creative and helps me solve difficult problems I didn’t even try before. But I am still in control.
I get that doing a prompt and expecting AI to output a complete program won’t work. But don’t do it like that. Keep working iteratively. Use it as a pair programmer. Be critical and keep using your own brain. And it will be just fine.
@youroldmangaming8150 2 месяца назад ⁺²
Ive set up an automation with well defined inputs and outputs. I put together an ai BOT that creates code to the desired outcomes with set inputs. I just blackbox it. It just iterates away by itself until it gets what I have asked for. If it diverges then it stops going down that part of path where it was closer to the desired outcome. This way I just keep control of the architecture. If the black boxes dont perform then I look at optimizing those individually. In saying that this is just to generate a proof of concept. Once there is something close to want I need then I start to look at the code generated and apply best practice. Not sure it this is workable in a professional environment just yet, as I stopped doing this for money many years ago. But the one thing I can say about change is that if you dont embrace it, you will be left behind.
@ikusoru 2 месяца назад ⁺¹
"if you dont embrace it, you will be left behind"
That is quite a statement from someone who doesn't write software profesionally for living (aka lacks experience on the topic). It is more of an opinion. 😅
@youroldmangaming8150 2 месяца назад
@@ikusoru Yes it is. That is. quite a statement for someone who no longer writes software any more professionally after 30 years doing it for a job.
@brianpendell6085 2 месяца назад
Here's a question: IF AI is writing the code , how important is it the code be maintainable and readable by humans? Think to the effect of compilers on assembly language : Compiler-written assembly language is very hard to read and is not optimal compared to hand-tuned assembly, but I haven't had to work in assembly language since the 90s.
I wonder if AI driven copilots might be to the modern programmer what compilers were to Assembly. On one level, it wiped out the entire profession -- how many people work professionally in Assembly today? But on another level, it opened up entire other professions that didn't even exist in the 1960s. Software engineer, full-stack programmer, game developer, website creator -- none of these things were possible in the 1960s because the computing tools were too primitive to allow them to be created easily. It's possible that AI will have the same effect on our own job space, creating entire new professions, such as promptcraft engineer or MLOps specialist, which didn't exist even ten years ago.
@Marko-di3mb 2 месяца назад
Biggest limitation of LLM is that you should only аpply it to problem you could have solve or at minimum fully understand solution yourself. So don't use it to solve problems you have no knowledge about. In other words you must be able to fully stand behind those lines as if you wrote them yourself.
@ErazerPT 2 месяца назад ⁺¹
Is it any good... well, I'd say it depends. Give it a very small specific thing to do on a domain it knows a lot about, and yes, it will do OK, better than a human with zero domain knowledge at least. Give it something complex in a domain where it only has cursory knowledge and... well...
All in all, we're still at the "hard AI" problem that killed it back in the day and won't let it go further anytime soon, it being the intractable "how do we reduce it to some form of formal logic so it can self check for correctness?". And what is correctness after all? Builds? A does what A is supposed to do (even if it messes up B)? Both? And more?
@ariverosmg 2 месяца назад
I think the experiment is quite preliminary... Because it will give very different answers if the devs using AI are just using autocomplete related tools, or real AI powered development tools like with Cursor IDE, and let's say they're using those good tools (and not simply Github Copilot), then it'll be very different if they're prompting the system to generate high quality and maintanable code all the time, or not. I write code with AI constantly, at this point, I cannot imagine how somebody would argue isn't faster, and better, and have better quality, and my only answer to this, is that most devs have still to figure out, that you should request the right things to the AI, you are the guide to ensure quality.... I build using TDD, I request to create the test from the Acceptance Criteria I have to fulfill, then the code to satisfy that test, then refactor if needed... and this works too good to ignore, it is fast and efficient... But I guess that if you just give your task and use whatever it returns, then yes, that's not useful at all and you end with the idea that it doesn't help.
@PaulRoneClarke 2 месяца назад
It’s a little better than going to Stack Overflow and copy pasting from there. Also quicker.
But that’s about it. Very handy. I have it open all the time. But for large multi-modal applications. It’s absolutely hopeless.
And if you, as a coder, can’t spot and correct the large number of bugs the code will never be right.
And if you can spot and fix all the bugs… you’re doing 80% of the job anyway.
@robw3610 2 месяца назад
So far I have had good experiences, at least with Jetbrains AI assistant in Rider. But am not using it for large swaths of code. I am mostly using it for documentation and writing boiler plate. I tend to find that I get better "help" from Chat GPT 4o, with the main advantage of being able to learn new APIs and frameworks a lot faster than looking for online guides and turtorials that are almost always not geared for what I am trying to do.
While I think the in editor assistants are good for boilerplate and refactoring, I dont trust the output enough for production code, at least not without heavy scrutiny.
@qj0n 2 месяца назад
In technologies which I know enough to write a good code, it's quicker for me to just write it instead of generating via copilot and fixing afterwards. However, when I had to do small task in popular language I code very rarely (JS), copilot produced something probably not-so-good, but not as bad as mine (and much quicker than me)
I believe that at best, copilots will make it easier to build cross-functional teams and exchange work within a team and that's a good thing about them. But I don't believe managers will understand it
@nickbarton3191 2 месяца назад ⁺³
Security worries me, I've signed a NDA, it didn't.
@mandisaw 2 месяца назад ⁺¹
Liability and risk-management as well. When a developer (or team) f*cks up, they can be trained, reprimanded, fired, or even sued. Can't reprimand the AI.
@purdysanchez 2 месяца назад ⁺¹
I would say it's in the uncanny valley. If you glance at it quickly it looks like code
@fredgeorge6513 2 месяца назад
In my object training class, students recently (with AI enabled) get the correct answer generated. Of course, my reference solutions to my exercises have been scrapped and added to (at least one) AI engine. So, it is good code in my opinion. After all, I wrote it! But these same programmers didn't even understand what that generated code actually did. So maybe another question is, "Do programmers even understand the AI generated code?"
@njiahtata2267 16 дней назад
For junior programmers AI helps move fast. For senior programmers with lots of experience, AI slows down the rate of work.
@ingo-eichhorst 2 месяца назад ⁺²
[3:45] If you set temperature to zero and make sure to use the same seed if you can configure it, the LLM indeed produces the same output. To double-check I've just created the exact same unit tests for a bubble-sort algo with GWEN2.5 and temp 0. And no diffs where found.
Non-determinism is a lie and a result of a lack in understanding of LLMs (and missing config options, sometimes).
@ikusoru 2 месяца назад ⁺¹
@ingoeichhorst1255 Maybe the reason why OpenAI and the likes make the models non deterministic is to avoid legal liability from using copyrighted content to train the models?
@ingo-eichhorst 2 месяца назад
@ikusoru I really like the idea, but the term “temperature” was first introduced in the 1980s with the advent of Boltzmann machines. Probably before Altman was born 😇.
@ianosgnatiuc 2 месяца назад ⁺¹
A good programmer will make the code maintainable with or without assistans. A bad programmer will make it unmaintainable no matter what.
@philadams9254 2 месяца назад ⁺⁴
It's as good as your prompt. Most times I ask something of AI, 99% of my prompt is code style and guidelines. You can be incredibly specific and it will deliver. I'm not just talking about "use spaces instead of tabs" and the minor details - you can tell it to do FP/OOP or avoid certain patterns etc
@edgeeffect 2 месяца назад ⁺⁸
But by the time you've found the correct language to specify all of that and then adequately checked the results are correct, do you think that you could have just written the code?
@philadams9254 2 месяца назад
@@edgeeffect I don't understand the first part of your sentence? I have a standard template of instructions I just paste in and I know it's right or wrong straight away as I'm only asking for small quantities of code each time. Yes, I could write the code but sometimes there are some boilerplate things or complex array structures that I struggle with.
@ikusoru 2 месяца назад ⁺³
@edgeeffect Exactly! I find that 9/10 times coding with LLM assistance takes the same or more time to write a function as doing it without it.
@mandisaw 2 месяца назад ⁺¹
@@philadams9254 I think @edge's point was that if you're using a template of specific instructions incl code style, guidelines, domain-specs, etc then you're basically already 90% to writing the actual code. Most of the time spent IMExp isn't writing code, it's working out the solution to a problem that suits your business needs, constraints, & strategy. By the time you know the solution, actually writing it is quick.
@philadams9254 2 месяца назад ⁺¹
@@mandisaw OK, I agree there. But what are you guys doing for boilerplate/UI stuff? I can draw out a UI on paper, upload it, paste in my standard instructions and have a UI created in minutes. Please don't tell me you'd rather code that stuff yourself? Front-end people may have the skills to do it but I'm a backend guy so use LLMs to speed through my weak points such as UI design.
Also, what about when you "know the solution" as you say, but you don't know how to implement it? For example, sometimes I have a complex array/object and need to transform it into a different structure to fit the solution I know I need to create, but don't know how to do it. Often the AI will be able to figure it out and in ways I'd never be able to figure out, so in the end I actually get smarter from it in some cases.
@julianbrown1331 2 месяца назад
If you constrain AI code gen with effective tests (thinking TDD) then does it matter if the reproducibility of the code doesn’t exist? If it meets the same constraints as TDD (notably code coverage) and is passing all tests then it is arguable over whether reproducibility matters. In effect your tests are encoding the problem, genAI is inferring the solution. Granted that isn’t how it works now but if it did, would that constitute a viable model to AI coding?
@barbidou 2 месяца назад ⁺²
No it would not. First, there is no such thing as 100% test coverage, unless in very small artificial domains, where you can test all possible input combinations. Second, when something goes wrong, it becomes much more difficult to find out which changes need to be reverted. Maintenance-wise, it is a recipe for a nightmare.
@julianbrown1331 2 месяца назад
@@barbidou I'm just playing devils advocate here...
The domain in question is the code being generated so I would argue that, actually, 100% code coverage is both achievable and actually desirable to stop your genAI adding in code that a) you don't need, and b) could be adding undesired behaviours. You still need to resolve the same boilerplate problems (depending on the language in question) but those are an understood problem
The code being generated doesn't have to be the entire solution so this isn't radical. You wouldn't try to create a single monolithic unit test suite for your entire solution...
The premise isn't new either - 5th gen languages are supposed to work in a similar fashion and they have been around for 40 years. The problem was the AI available then wasn't able to cope and it all went out of fashion within the space of a few years because it wasn't scalable to more complex problems
As for "fixing" problems - the generated code is disposable, instead you revise the tests to refine behaviour. You don't even need to put your generated code under version control because it is so disposable and, as you've pointed out, a complete nightmare to maintain by hand
@grokitall 2 месяца назад ⁺²
@@julianbrown1331if as you accept, it is a nightmare to maintain by hand, and it is not suitable for version control, it is almost certainly bad code.
most reasons for code to be hard to,maintain are due to breaking good practice, and the code which was generated based on largely untested training data is almost certain to be a testability nightmare are well. not to mention the security issues, copyright problems, etc.
@julianbrown1331 2 месяца назад
@ the lack of need for VC for generated code is down to a lack of repeatability- changes to the tests result in (potentially) radically different code. You would still capture it but comparing deltas is pointless.
The point is to treat the code as a black box. The code in the box only has to meet a few criteria. That it meets all the tests and the constraints on coverage (although that is, itself a test). You can add static analysis as a test too
The real crux is to let go of the generated code and accept that it meets the requirements. You aren’t coding the product, only the tests
Personally I’d rather worry about the code but that isn’t the premise of the question I posed
@grokitall 2 месяца назад
@@julianbrown1331 yes, you can treat it as a black box, only doing version control on the tests, but as soon as you do you are relying on the ai to get it right 100% of the time, which even the best symbolic ai systems cannot do. also, the further your requirements get away from the ones defining the training data, the worse the results get.
also the copyright issues are non trivial. when your black box creates infringing code, then by design you are not aware of it, and have no defence against it.
even worse, if someone infringes your code, you again do not know by design, cannot prove it, as you are not saving every version, and if you shout about how you work, there is nothing stopping a bad actor copying the code, saving the current version, letting you generate something else, then suing you for infringement, which you cannot defend against because you are not saving your history.
it is the wrong answer to the wrong problem, with the potential legal liabilities being huge.
@trignals 2 месяца назад
Interesting project.
Would like to see phase 2 and on mix in AI. If it is possible to use AI to deliver a solution comparable to maintaining code, do you even care about maintenance? It extends the viable timescale, even if all the code is regenerated.
The value of maintenance is that it is the fastest way to deliver change. If you can establish confidence in new solutions fast enough why maintain?
I'd guess it's a harder experiment to run but also a lower hurdle for AI code to pass. Looking forward to hearing the results on this one.
@barbidou 2 месяца назад ⁺¹
The cost of verification that a regenerated solution really does what it's supposed to do can be prohibitive. Small localized changes during maintenance are less likely to affect a solution as a whole. Code regeneration, on the other hand, calls for full blown regression testing.
@grokitall 2 месяца назад ⁺¹
the value of maintainence is not in speed of change, but in the fact that when done well, it produces ever improving code which is easier to change. this requires minor updates which make specific changes to make particular types of improvements, which requires understanding why the code is less than optimal, and which change is the better one to make.
this is fundamentally at odds with how statistical ai in general works, and when you regenerate sections of code in big blocks, you have no reason to believe that what it guessed this time is any better than what it guessed last time, or that it is not throwing away better code to replace it with something worse.
it also fundamentally screws up the whole idea of version control, as it is impossible to create good commit messages, and you are repeatedly just bulk replacing large chunks of code rather than evolving it.
@trignals 2 месяца назад
@@barbidou agreed the cost could be prohibitive. Alternatively it might not be. If a study prohibits the attempt it can't make any comment.
@trignals 2 месяца назад
@grokitall I've tried to see how "ever improving code which is easier to change" could have a distinct meaning and I've failed. To me it's exactly the same with different words.
Readability and good design make it easier to make changes, they embed tacit knowledge of the problem domain. Like clustering it into areas where stability and confidence are high distinct from where either or both are low.
As a technique automated testing allows us to quickly assess we have preserved the behaviors we care about after a change. This is coupled to the production code. Not in the sense of the code smell tightly coupled tests, just in that we exercise parts of production code by calling it by name.
However none of this leaves any trace for the user. They interact with the code as a large black box. The user does not know if any code from one release version to the next bears any similarity to its predecessor. Or if the underlying structures have been preserved.
Version control, automated testing etc. are all techniques geared towards a particular paradigm of code generation. They are techniques for making it safe to work at a particular level of abstraction, a particular intermediary step between the user and the machine code.
They are of value because of the assumed workflow. When a study looks at a new technique because it could potentially fundamentally alter the cost benefit analysis of the whole workflow, it is noteworthy that it looks to deviate as little as possible from traditional workflows.
So I fully agree with your larger point that it is at odds with how a generative AI works.
I expect the research team is aware of that and made the choice because they are asking and answering the simplest question first. That will leave them better suited to ask follow up questions. But I didn't take part in those conversations so I'm having fun discussing it here.
Go well buddy
@trignals 2 месяца назад
@@grokitall re-reading this it strikes me maybe you assumed I meant a one time low cost of change as opposed to a persistently lower cost to change. Is that the distinction you wanted to make?
@raybod1775 2 месяца назад
Internal corporate AI systems program and design much better than what’s given to the public. Retrieval Augmented Generation likely uses internal documentation and internal code to keep and maintain code correctly.
@nschul4 2 месяца назад
I don't see how there's so much room for debate on this. When we first got llms they weren't very helpful to coders, in only two years we've seen them become very helpful as an assistant, in two years more...
@ErazerPT 2 месяца назад
You're falling into a damning "scaling fallacy", and every serious ML practitioner that isn't trying to sell some ML based "AI tool" will tell you that. If not sure about what I'm talking, the issue is called diminishing returns. If in doubt, go check image classifiers and detectors. Sure, improving, but... the times of "big returns" like LeNet>AlexNet>SSD>YOLO are long gone. Now you fight over a 0.05% improvement with an exponential increase in resources spent. OpenAI is already hitting the wall hard, so is llama and every other llm, because the approach is fundamentally flawed, if fun.
@scycer 2 месяца назад
All these comments are quite interesting, more against it than id expect. Ive found AI tools to be much closer to humans than people realise when coding. If you told a dev to make something in code, with only the context of the current file and the words you write (which itself have their own interpretation), then its likely the quality is low, asking AI to do it twice is obviously going to do it differently, so would i if there was enough time between sessions to completely forget my past implementation.
I think fundamentally, once we start to get more and more "context" into the systems we use in software development, abstracted at the right level and provided as context to LLMs with multi stage iterations, we will see the big shift in productivity. Aider has massive potential already, starting this path with /ask and /architect to plan out and research before implementation. If we start adding in BDD, better requirements, tailered small language models and other feedback loops into it, its likely to keep improving dramatically.
@igors634 2 месяца назад
We cannot stop the innovation, but we should not push something that will leave us jobless. I'm talking about programmers who develop AI programmers.
@ThomasTomiczek 2 месяца назад
They are bette than you think - except not the ones you tested. Current tools have serious limitations that are being worked on - but there is some stuff in development that COULD work better (should actually) but is not going to be as cheap as people think. We talk of AI agentic frameworks with a 100 million token context window and near perfect recall - enough to start loading a lot of documentation and code base in to analyse it. Until you can have an AI put together its own context by researching relevant code - you are very limited in what an AI can see at a time.
Hallucinations are less an issue if the AI does not write code, but writes code, compiles and tests it - error happen, also with humans, but there is no reason a properly INTEGRATED system cannot fix them. In fact, when I have AI do coding - it is often a lot back and forth with me acting as the hands of the AI. Put code in, report errors (also from unit tests). There is NO reason the AI could not do that itself. Heck, an AI should - if under i.e. git - branch off, work with the source control, then submit the merge request once it has everything done. But it seriously sucks how they are currently integrated.
Btw., reproducibility is "solved" - that is where the chat part comes in. They take a random sample of the next token so that answers are not repeating itself - for code parts, you want this off and always take the "best" token, but in chat interfaces you cannot change the temperature and even with API you cannot do that in parts of the answer. Again, a problem of the integration.
I look forward to the moment AI can do refactoring autonomously. This is not about code quality - but this basically requires the complete stack under AI control, including running and fixing issues in unit tests. THAT will make them useful.
@grokitall 2 месяца назад
the problem with the idea of using statistical ai for refactoring is that the entire method is about producing plausible hallucinations that conform to very superficial correlations.
to automate refactoring, you need to understand why the current code is wrong in this context. this is fundamentally outside the scope of how these systems are designed to work, and no minor tweaking can remove the lack of understanding from the underlying technology.
the only way around this is to use symbolic ai, like expert systems or the cyc project, but that is not where the current money is going.
given the current known problems with llm generated code, lots of projects are banning it completely.
these issues include:
exact copies of the training data right down to the comments, leaving you open to copyright infringement.
producing code with massive security bugs due to the training data not being written to be security aware.
producing hard to test code, due to the training data not being written with testing in mind.
the code being suggested being identical to code under a different license, leaving you open to infringement claims.
when the code is identified as generated, it is not copyrightable, but if you don't flag it up it moves the liability for infringement to the programmer.
the only way to fix generating bad code is to completely retrain from scratch, which does not guarantee fixing the problem and risks introducing more errors.
these are just some of the issues of statistical methods, there are many more.
@ThomasTomiczek 2 месяца назад
@@grokitall Btw, you miss the reason for the refactoring. This requires a good overview over a larger codebase, as well as use if source control AND IDE / developer tools (to run unit tests). We are not there - in particular the large codebase is a PAIN for now. It gets worse if the app i question is visual - and i.e. requires analysing screen shots, but even if one takes that out a large application is a real problem for now just from the context.
@grokitall 2 месяца назад
@@ThomasTomiczek i do not underestimate the potential of ai to help with all sorts of programing tasks, only the implausability of getting there using the currently popular statistical ai systems, the whole point of which is that you throw a lot of data at them and they do not create a model of the system, just a bunch of plausible corellations each of which could be noise.
you could have a swarm of symbolic ai's each looking at the code base and figuring things out which other parts could then make use of, which in turn could help take legacy code from an untestable big ball of mud to a much better design, but to do that, it needs to have some clue as to what the existing code is doing, why, and how to move it in the direction of something better.
@ThomasTomiczek 2 месяца назад
@@grokitall I think the statiscal approach is good - but we are simply training them wrong at the moment. Plenty of research about that. That, plus dropping an expectation that the AI does a perfect answer - i.e. allow agentic self-correction - is likely a much better way. Also, helps that we hopefully soon get much bigger contexts ;)
@grokitall 2 месяца назад
@@ThomasTomiczek the whole approach is based on not modeling the problem and looking for ever larger numbers of low level correlations, which be purely coincidental and hoping it can come up with something which is good enough.
when you point out this problem, the suggested solution is just to throw more of the same at the issues, which will somehow magically solve problems caused by how the tech works.
turtles all the way down does not work in philosophy, and it does not work in software.
@Adargi 2 месяца назад
I have found it to be more of a hassle then any sort of help. Wastes far more time than it saves.
@drancerd Месяц назад
i need that tshirt!!!! ❤️
@ContinuousDelivery Месяц назад
Check the description 👀
@lewke1059 2 месяца назад
another potential issue with the github study:
who actually writes a http server? there's a myriad of better pre-built packages and solutions for that, and the specs are quite large and nuanced, it wouldn't surprise me if that idea was cherrypicked to make AI look more useful in development.
I would rather see AI help write a convoluted e-commerce site, something far more applicable to most developers now.
@amirnathoo4600 2 месяца назад
The problem also is that no AI in the world can save poor human programming practices. If a programmer doesn’t believe for example that unit tests are valuable or that separation of concerns, modularity aren’t not just necessary but required in many cases, they will never prompt the AI to follow these practices.
@kamertonaudiophileplayer847 2 месяца назад
AI should be trained on the best code samples, and then it will generate a code on level from an average to a skilled developer.
@jacquesduplessis6175 2 месяца назад
The studies I've seen and done sped up junior developers greatly when doing simple tasks, but gave little to no improvement to seasoned developers. Ai tools actually slowed down pro level developers doing very complicated tasks. It's also language specific. ie. for me writing a simple crud function in javascript is fast with ai tools becuase it's been done a million times. But a complex rust based systems has been greatly flawed and more often than not unrunnable.
@diamantberg 2 месяца назад ⁺¹
"Java is to JavaScript as ham is to hamster." - Jeremy Keith
Nevertheless, I'm going to take the survey.
@ikusoru 2 месяца назад ⁺³
I feel same about AI and the hype around it 😅: "AI is to Artificail Inteligence as ham is to hamster."
@diamantberg 16 дней назад
I like this analogy
@corsaro0071 2 месяца назад
The answer is: yes, if you already know what good code looks like
@josemartins-game 2 месяца назад ⁺²
Ocasionaly. But most of times is crap.
@bitshopstever 2 месяца назад
Many of the coding assistants provide a PR like or git patch view to proposed changes which negates a few of your points, or at least tries to.
These things are arrogant junior engineers, who claimed they know everything - That can produce value with the right oversight.
@gaiustacitus4242 2 месяца назад
Unless the code you need to write is a minor variation of code that AI has ingested, then it can't write a solution. If you're really working on something innovative, then all AI can do is autocomplete API calls and simple code blocks from applications found in publicly accessible repositories. If you're making a knock-off of an existing application, then why bother?
@Kenbomp 2 месяца назад
It's not as good as professional but it's much much cheaper so there will be a market for it
@mccleod6235 2 месяца назад
AI is very useful for quick scripting, I find.
@derekcarday 2 месяца назад ⁺¹
Which AI? because yesterdays ai is very different from todays. Don't blink Dave. Things are getting weird
@DaveKirby-58 2 месяца назад
"If you give the same LLM the same prompt twice it will write two different versions of the code". Guess what - if you give the same programmer the same specification twice, s/he will also output two different versions of the code.
The usefulness of AI in coding depends very much on how you use it - if you expect it to output complete working code all on its own you are going to be disappointed. If you treat it as a pairing partner and co-develop the code incrementally, with tests, then you will get much better results. This playlist has a good set of tutorials on using LLMs well for coding (not mine, I just found it useful) - ruclips.net/p/PLk7JCUQLwRrMBCxQRKNTVslHyro_CbT5b
@BubbuDubbu 2 месяца назад
Come on Dave this study isn't going to tell us about AI assisted programming effectiveness. Even if the AI you choose doesn't produce maintainable code, who is to say the next version won't be better?
@bestopinion9257 2 месяца назад
It is, after you explain several time that you want something else.
@RyanElmore_EE 2 месяца назад ⁺¹
The point of if you ask the code generation twice with the same words, it may give different results. If you give the problem to 2 humans, will they give the same exact code to solve the problem? There's more than one way, no? Why is AI put on a different accuracy / repeatability KPIs for 'success'? (I don't know why either)
@itskittyme 2 месяца назад ⁺¹
It's quite shocking to me how supposed programming "experts" fail to see the game changer AI is here.
They are nit-picking on AI supposedly writing poor code, but fail to see that that can be mitigated through applying clever strategies.
And if you do, it changes everything.
How do coders, of all people, fail to see that? That's the real horror here.
@marcelotemer 2 месяца назад
95 devs? What was p?
@natbirchall1580 2 месяца назад
You need remove the human part to come to good results. In the meantime we have to waste our time with some very flawed individuals.