How I would feel if GitHub trains an AI on my model? Sorry. I would feel sorry. For the people who are going to use an AI that is using my shitty code layout.
16:57 : my understanding is that it wouldn’t technically be *under the GPL* but rather *required to be under the GPL*. That is, if you release it but not under the GPL, you could be in violation, and required to either stop distributing it, or to release it under the GPL, but until you explicitly release under the GPL, what you have released isn’t technically released under the GPL.
5:50 IMO Yes, because you can make the argument that a model is like a fancy generator of the original samples but compressed, especially when it's been shown to be able to re-create entire functions from its training set
The currently preferred form for modification of a neural network is the parameters plus the topology. The parameters are modified by fine tuning, and you can't fine tune if you don't know the topology. Therefore, the model behind Copilot, including the parameters, is GPL'd.
I think the real value is on the other side: When you code and it checks if it would have coded something substantially different (or included some check that you didn't) and asks you if it should augment your code. That would help novice programmers a lot without making them deliver code they didn't think through.
Funny that you mention nightmare consumer experiences with Amazon and Paypal. Same experience here with buying an Amazon gift card. I was locked out of the account and ergo the multi-verse. Paypal was worse. They refused to follow their own terms of service.
Look, i just need to spend 3 minutes to write a doc string and then I get a fully completed function that i could have written myself in 5 minutes. Of course, I need an additional 3 minutes to check if the implementation is any good, but thats fine! I feel so much more productive now!
You covered the GitHub copilot in a lot of details but kinda felt to explain WHY code licensing matters at all. Linux kernel is GPL licensed and it essentially means that for any version of Linux kernel, we can know exactly who wrote what code and can provide proper attribution. If you're using GPL licensed code, you don't technically own it and HAVE to provide proper attributions to original authors. What GitHub copilot claims is that "all the code that copilot generates for you is YOURS". This means that if I trained copilot on Linux kernel (as a software engineer in FAANG earning 200k$/year) and it happened to reproduce a major driver verbatim, I OWN THE ENTIRE DRIVER CODE and don't need to attribute to thousands of contributors who worked very hard (and probably earned much less than me) on making this driver This is the part that drives people crazy because now people can copy any code verbatim and blame that GitHub copilot generated it for them. Not only is this violation of code licenses, it is also unfair to the people who spent hundreds of (potentially unpaid) hours on an open source project Other than this, I think you covered the topic in very great details as compared to other creators and nonsense twitter rants. Great job :)
This is likely not accurate. An extremely small snippet like a one sentence quote said in a movie isn't copyright infringement if you also publish the quote when reviewing the movie. (or at least I'm unaware of anyone losing a case for something like this) However, if you cut out 10 minute clips of the movie to publish on your website, that clearly is not within fair use. I think this is more in line with how a court would handle it if someone considered a substantial amount of their code copied and wanted to sue someone.
Federal law says if a business relies on what a worker does as part of its core function, and if they get to manage how the work is done, they're employers and the workers are employees. Amazon is a retailer whose core appeal is its ability to deliver goods to your home, so.... yeah.
Recovering a blocked amazon account is a joke compared to recovering a banned, with no reason, whatsapp number and associated conversation history,. 99 percent of customer communications are done through this channel . After that a number without whatsapp is useless for sales purposes. So change all the marketing actions associated with that phone number from one day to the next 🤦🤦 Your channel and videos are excellent!👌
Copilot integrating GPL code into it's training data means Copilot would need to be GPL as well. Also any software made with it would need to be released under GPL.
The Economist has an article this week, titled "What if an AI won the Nobel prize for medicine?". So I asked my computer, is this really gonna happen soonner than I think? Computer says no.
cruise control law is enforced on the driver. I would imagine Open AI would implement their own user agreements like geoHot and Musk would with autopilot on vehicles. We shall see, I guess
Does Amazon have a bug bounty program? Getting fired clearly looks like a bug in their AI system. There could also be security issues, like someone tricking the Amazon AI into firing all Amazon drivers....
i find the so many angle changes /zoom in-outs very distracting. I get it -- but it just seems to happen every few seconds. Tone it down a notch? thanks.
I would say, when it comes to code unless you make major effort to keep it a secret. You have no right to IP. A derivative work in music is normally distinguishable by a layman when the cases go to court there's normally year's of litigation. The syntax in code makes it hard to distinguish.
hi, perfect... but Yanic tell your jazzy video guy that when he zoomed on your face when you were explaining anything I stopped noticing what you were saying. Zooming in is cool but my brain must be taking it as an act of aggression leaving my poor conscience in a fuzzy state. If it is an intended effect that's fine but I've got a feeling may be was not?
So training a GAN theft auto model is also a violation of copyright, no? As is a child's drawing if they have viewed any copyrighted material, since that must have influenced the weights of their biological neurons, right? MoMA: We're confiscating all of your refrigerator art. You went on a field trip to the Museum of Modern Art last week, and none of your art was developed in a clean room. Momma: Now go clean your room.
They can train multiple models each trained with code of one license (or single backbone with multiple heads). Eg - PilotMIT, PilotApache, PilotGPL etc. Then, user of CoPilot can use any of these generators but he should follow the rules of license which was associated with code (training data) used to train that generator model.
I think you could treat NN similarly to human brain (same principle anyway). Humans are similarly producing something new by extending what they learned from others.
@@JuaniPisula I don't say it's the same, but it could be treated equally in copyright issues. Until it do not actually violate copyrights, it shouldn't be treated as it was using them fairly.
@@XOPOIIIO yes for sure i agree. Its going to be an interesting day when a judge needs to set the precedent if a generative model encapsulates creativity or not lol
Would be more useful if it could highlight coding errors based on what methods are documented to do and what the model understands the code as going to do.
"Source code" would allow someone to recompile, and in this case, that would likely be training. It sounds wrong to say training data is not source, but auto-tuning software that alters a singers voice wouldn't include the original recording of said voice.
OUTLINE:
0:00 - Intro
0:20 - GitHub Copilot
6:55 - My opinion on Copilot & Copyright
17:25 - Facebook AI image similarity challenge
18:00 - Brickit app scans your LEGOs and suggests builds
18:40 - Distill journal goes on break
19:50 - Amazon uses algorithms to hire & fire Flex drivers
23:20 - Helpful Libraries: TF Decision Forests, Habitat, Falken, Brax
24:20 - AI-generated papers give science a hard time
The image matching "not giving a fuck sarcasm" was so well acted that I almost believed it
I was sold on the intro already. Yannic is THE Netflix of AI RUclips :D
How I would feel if GitHub trains an AI on my model?
Sorry. I would feel sorry. For the people who are going to use an AI that is using my shitty code layout.
It lets you choose from multiple answers so no worries!
@@CristianGarcia But maybe all the, JZipper is not the only one with "shitty" code... So, I guess, good luck with multiple shitty options :)
Finally Thursday ML news are here.
That intro and the OpenAI punchline 10/10!
ML news is getting better and better with time, Good job yo, this is the quality content I've been looking for
Love the Amazon shade lol
How would Yannic feel if openAI trained a transformer using his youtube videos?
Proud.
This video is pure gold. Incredibly refreshing common sense and humanity/humor
Did not realize I'd be going to law school this morning; this is awesome 😎
This is just sooo good! Thanks for the update end the entertainment! :)
that intro was GOLD!
My randomizer send me a notification suggesting that there might be a novel episode from my favorite RUclipsr today. Bingo!
16:57 : my understanding is that it wouldn’t technically be *under the GPL* but rather *required to be under the GPL*. That is, if you release it but not under the GPL, you could be in violation, and required to either stop distributing it, or to release it under the GPL,
but until you explicitly release under the GPL, what you have released isn’t technically released under the GPL.
Correct!
I loved this video from its first seconds already!!!
thank you so much for doing this. Perfect to keep up during holidays!
There's news about AI everyday.
It's turning into a regular biweekly news update. Love it 👍
that laptop reflection with mona lisa in it in your sunglasses (~11:40) look like eyes. Can't unsee
I'm beginning to love Mondays.
LOLed at the personal account problem segment. best 5 dollars i've ever spent in my life so far
Wow. Your Amazon story. Exactly like mine :DD I have been locked out for months now
Great video! I think "GitHub Copilot - Copyright" would have been great as a standalone video.
5:50 IMO Yes, because you can make the argument that a model is like a fancy generator of the original samples but compressed, especially when it's been shown to be able to re-create entire functions from its training set
@Yannic, I like the randomness of the schedule. Very anti-pavlovian :) . Cheers!
I look forward to every one of these videos. Bravo homie
You are funny and informative and cool at the same time.
Just Perfect
The currently preferred form for modification of a neural network is the parameters plus the topology. The parameters are modified by fine tuning, and you can't fine tune if you don't know the topology. Therefore, the model behind Copilot, including the parameters, is GPL'd.
That futuristic robot is what Bert is right... Androids are all you need
I think the real value is on the other side: When you code and it checks if it would have coded something substantially different (or included some check that you didn't) and asks you if it should augment your code. That would help novice programmers a lot without making them deliver code they didn't think through.
Interesting how Copilot can suggest/advertise companies:
def return_best_pizza_restaurant():
[Autocompletion] return "Pizza Hut"
Funny that you mention nightmare consumer experiences with Amazon and Paypal. Same experience here with buying an Amazon gift card. I was locked out of the account and ergo the multi-verse. Paypal was worse. They refused to follow their own terms of service.
Brilliant cold open - such spice 😂
AI automates writing code, going to fridge to get food and gaming. Wait, that's all I do. Smiles in existential dread.
I thought this was a Monday thing ?
it is
@@YannicKilcher ….
@@YannicKilcher totally regular every week monday thing 🤣👌
Yannic is using a random seed. I look forward to these videos :)
You can choose to watch it on a Monday.
Look, i just need to spend 3 minutes to write a doc string and then I get a fully completed function that i could have written myself in 5 minutes. Of course, I need an additional 3 minutes to check if the implementation is any good, but thats fine! I feel so much more productive now!
Even permissive license often require credit in derivatives...
Open source is programming version of soyjack.
26:40 Well, you already have the sunglasses, how are we going to notice a Terminator replacing you?
You covered the GitHub copilot in a lot of details but kinda felt to explain WHY code licensing matters at all. Linux kernel is GPL licensed and it essentially means that for any version of Linux kernel, we can know exactly who wrote what code and can provide proper attribution. If you're using GPL licensed code, you don't technically own it and HAVE to provide proper attributions to original authors. What GitHub copilot claims is that "all the code that copilot generates for you is YOURS". This means that if I trained copilot on Linux kernel (as a software engineer in FAANG earning 200k$/year) and it happened to reproduce a major driver verbatim, I OWN THE ENTIRE DRIVER CODE and don't need to attribute to thousands of contributors who worked very hard (and probably earned much less than me) on making this driver
This is the part that drives people crazy because now people can copy any code verbatim and blame that GitHub copilot generated it for them. Not only is this violation of code licenses, it is also unfair to the people who spent hundreds of (potentially unpaid) hours on an open source project
Other than this, I think you covered the topic in very great details as compared to other creators and nonsense twitter rants. Great job :)
This is likely not accurate. An extremely small snippet like a one sentence quote said in a movie isn't copyright infringement if you also publish the quote when reviewing the movie. (or at least I'm unaware of anyone losing a case for something like this) However, if you cut out 10 minute clips of the movie to publish on your website, that clearly is not within fair use. I think this is more in line with how a court would handle it if someone considered a substantial amount of their code copied and wanted to sue someone.
Got my copilot invite today! :)
This new Yannic AI Agent makes a decent job
Alright Yannic!!! [You the man]
[23:00]
Will you have a separate episode to talk about DeepMind’s alphaFold 2?
Federal law says if a business relies on what a worker does as part of its core function, and if they get to manage how the work is done, they're employers and the workers are employees. Amazon is a retailer whose core appeal is its ability to deliver goods to your home, so.... yeah.
kewl shades
Also ML News theme song is becoming my favorite boogie no joke
Recovering a blocked amazon account is a joke compared to recovering a banned, with no reason, whatsapp number and associated conversation history,. 99 percent of customer communications are done through this channel . After that a number without whatsapp is useless for sales purposes. So change all the marketing actions associated with that phone number from one day to the next 🤦🤦
Your channel and videos are excellent!👌
I can feel the pain that lawyers and judges will go through if they had to deal with this issue. AI will bring a hell to all of our laws OMG
Amazing videos
Copilot integrating GPL code into it's training data means Copilot would need to be GPL as well. Also any software made with it would need to be released under GPL.
Now i need some lego
I think the 'decision forest' in tensorflow is just random forests, gbt, decision trees in tensorflow. Not 100% sure though.
The Economist has an article this week, titled "What if an AI won the Nobel prize for medicine?".
So I asked my computer, is this really gonna happen soonner than I think?
Computer says no.
Where to get those sexy glasses ? 🤣🤣
Bdw regular here...
The joke about "open" is openly cool
Amazing
ROFL - thank you Yannic
Great editing!
Very likely the Copilot could be trained to the same degree of quality without licensed repos. Just a few tweeks to the scraper.
cruise control law is enforced on the driver. I would imagine Open AI would implement their own user agreements like geoHot and Musk would with autopilot on vehicles. We shall see, I guess
comment for the MACHINE
Now what frontend dev will do 💁
Copilot is sufficiently transformative I doubt Amy of the licensing issues matter.
Open bottle killed me
Does Amazon have a bug bounty program? Getting fired clearly looks like a bug in their AI system. There could also be security issues, like someone tricking the Amazon AI into firing all Amazon drivers....
It has been a long time since I boycotted Amazon. Dont miss it at all.
then how do you get your stuff? :|
@@monteepython84 There are other delivery services, depending on where you live.
@@monteepython84 I also abandoned Amazon, I don't get my stuff cause I am broke.
@@herp_derpingson , as a matter of public interest and down with tyrant and all that, I ask, can you give examples of which services you use?
@@monteepython84 hahahaha that is sad...
i find the so many angle changes /zoom in-outs very distracting. I get it -- but it just seems to happen every few seconds. Tone it down a notch? thanks.
I would say, when it comes to code unless you make major effort to keep it a secret.
You have no right to IP.
A derivative work in music is normally distinguishable by a layman when the cases go to court there's normally year's of litigation.
The syntax in code makes it hard to distinguish.
Solution is let copilot "open source", because some data are under GPL license.
In short, Yannic thinks copilot is not GPL but any software that is written using it must (might) become GPL.
He said that's one thing the judges may decide, so he warned us to be cautious with code written by Copilot.
Buy which GPL license?
Ladies and Gentlemen, welcome to the roast of OpenAI
Screw you, Amazon.
Why currencies should not be floats ?
All you need is error checking
People have already been annotating training data for AI for years with recaptcha.
@yannick post on twitter about your experience with Amazon and tag them. This helped me return a defective item I bought from Amazon. :P
OK GitHub, but what about human DNA? Is it copyrighted enough?
hi, perfect... but Yanic tell your jazzy video guy that when he zoomed on your face when you were explaining anything I stopped noticing what you were saying.
Zooming in is cool but my brain must be taking it as an act of aggression leaving my poor conscience in a fuzzy state.
If it is an intended effect that's fine but I've got a feeling may be was not?
GPL-J-6B
I hope GitHub has not trained the algorithm with my code, otherwise it will only add bugs to people's code
Copy and paste of stack overflow already does what copilot does
Great video but less zooming in & out please
Get Yannic's Amazon account back!
Cool goggles but annoying reflection
So training a GAN theft auto model is also a violation of copyright, no? As is a child's drawing if they have viewed any copyrighted material, since that must have influenced the weights of their biological neurons, right?
MoMA: We're confiscating all of your refrigerator art. You went on a field trip to the Museum of Modern Art last week, and none of your art was developed in a clean room.
Momma: Now go clean your room.
They can train multiple models each trained with code of one license (or single backbone with multiple heads). Eg - PilotMIT, PilotApache, PilotGPL etc.
Then, user of CoPilot can use any of these generators but he should follow the rules of license which was associated with code (training data) used to train that generator model.
Nice idea! But you also have to attribute individual copyright owners. Which ones would you attribute? Probably all of them?!
Dude, you should've become a lawyer
Too many camera movements for me. Please reduce this a little bit again.
I think you could treat NN similarly to human brain (same principle anyway). Humans are similarly producing something new by extending what they learned from others.
This is the debate that has been going on since the begining of AI. Is AI the same as human cognition? Some authors will say yes, some will say no.
@@JuaniPisula I don't say it's the same, but it could be treated equally in copyright issues. Until it do not actually violate copyrights, it shouldn't be treated as it was using them fairly.
@@XOPOIIIO yes for sure i agree. Its going to be an interesting day when a judge needs to set the precedent if a generative model encapsulates creativity or not lol
Can’t they claim it is satire?
Would be more useful if it could highlight coding errors based on what methods are documented to do and what the model understands the code as going to do.
We need a new kind of licence that would allow or disallow train ML on our code
GPT-3 is micro-plagiarism.. at scale.
"Source code" would allow someone to recompile, and in this case, that would likely be training. It sounds wrong to say training data is not source, but auto-tuning software that alters a singers voice wouldn't include the original recording of said voice.
Don't most large "social" sites have some rule against scraping content?
it's a mess
If you want (hurtful) feedback: just stick to the news and knowledge because your jokes are often very cringy.