What makes Modjourney v4 so much Better?
HTML-код
- Опубликовано: 9 ноя 2022
- Midjourney v4 dropped and along a lot of metrics it blows evetything else, including stable diffusion out of the water. In this video we explore the techniques used by midjourney to make their models better than everything else.
Discord: / discord
------- Links -------
Midjourney V2 vs V3 comparison (very lit): github.com/willwulfken/MidJou...
David Holz interview: www.theregister.com/2022/08/0...
Other David Holz interview: www.theverge.com/2022/8/2/232...
------- Music -------
Music from freetousemusic.com
‘Onion’ by LuKremBo: • (no copyright music) l...
‘Snow’ by LuKremBo: • lukrembo - snow (royal...
‘Sunset’ by ‘LuKremBo’: • (no copyright music) j...
‘Affogato’ by LuKremBo: • lukrembo - affogato (r...
Many thanks to LuKremBo
#stablediffusion #aiart #news #art #midjourney #ai #technology #breakingnews # - Наука
Best content around. Very sincere and nice to watch.
damn dude, that's so nice!
I have also been wondering how Midjourney is so damn good. When you said they integrated user feedback. It all made perfect sense. Look at the difference between GPT3 and ChatGPT (GPT3.5). They added user feedback to create 3.5 and it made it SO much better. That's definitely the key.
I think I found my fav youtube channel about AI
clearly you haven't seen: ruclips.net/user/SirajRavalvideos
@@lewingtonn i know that channel ty. You should believe in yourself more
@@PeppePascale_ haha thanks, I was kind of joking since that guy was like shown to be a scammer
You may have missed the office hours that DavidH held yesterday, through the mj discord. Interesting to hear his viewpoint and insights into future mj
what were the hightlights?
Was it recorded?
@@lewingtonn looks like not recorded, by mj team at least. If you search for 'office hours' or 'davidh' in discord you might find some reactions and tidbits. It was maybe 2 hours long. I think it's a weekly thing, but this was my first one
@@jessebwilson yup I might give it a browse... there's probably some sigma male out there recording every episode and uploading
While I like the V4 quality I feel we will soon end up with DallE 3 fiasco where the model is closed and you are free to use the slow API ... for money. That's why I thank God there is SD to equalize the market.
Interesting to revisit this now 3 months later. How are they doing it? We can surmise a couple of things. First, their model has no "consumer" hardware memory restrictions so it can be much larger and more accurate. Second, with the (public) release of the "offset noise" technique, we now know how they're making their images darker. Stability will address both of these very quickly and is already teasing its own "large" model called SD-XL, in addition to SD 3.0, DeepFloyd IF, and an as yet, unnamed model that appears amazing. In all, it will be a very short-lived time in the spotlight for MidJourney. I'd guess they have 6-18 months left.
holy cow, did I really misspell the title?
SD 512x512. Dall-E 1024x1024. MJ 2048x2048. Honestly having that much resolution in your training data really will be seen, and even felt.
Source??? I thought they did normal training but are just upscaling like gods
@@lewingtonn I read this on some university level website last month. I was shocked, but I can understand since they have the mojo and we peons have Stable Diffusion. They aren't upscaling those babies to look so darn good because the source it draws upon to generate is just so big. If SD was 2048x2048 no single cards could handle that for training. I bet they have some sort of render farm or one of those big professional setups because the amount of memory required to train at 2048x2048 is insane. Now if they were able to get SD training down to 10 gigs I am unsure how much theirs take, but no single card could do it that I know of.
Soooouuuuuurceeeee :'(
@@lewingtonn I don't remember, I appologize, but I did go looking and on their github page it says they upscale to 1024x1024 BUT they aren't telling you what was the resolution of the original training images. That, I am sure, is kept close to their chest. Where I read what I did is beyond me as there is so much scholarly information out there (along with mis, and dis, information) that I can't keep track these days. 512x512 upscaled to 2048x2048 you can see it no matter how much of a god does it. It just is too much of a stretch. Now I bet the source is 2048x2048, as I read, but what they hand to you is the default size of the initial 2x2 grid is 512x512 pixels. What I am trying to say is that it is akin to how a crappy picture on the TV when shrunk down to the size of a cell phone suddenly looks sharp, and good. It is the same difference.
So we are having Lexica as a source to pre generated images, if Lexica would add in a feature to upvote images it could help a lot to create a dataset that is based on user bias...
Also I just had the idea to scrape Lexica for regularization images, it would probably help Dreambooth a lot if those were coming of of images generated from decent prompts. Rather than just using standard prompts?
Good luck man! Your content is very insightful, energy is great, and I'd like to recommend it but...audio!
If you're audio sucks, no one bothers watching. If your video sucks but audio is good, people will still listen.
Your audio would be a lot better if you knew what distortion, gain, and a limiter is and how they work. They're basically knobs you can turn to sweeten audio (read: listenable audio people will stay for). Basically, keep your microphone at a constant 4-8 inches from your mouth, around your upper chest or neckline where there's the best resonance. That alone goes a long way!
Keep up the insights!
I think its important to give users creative freedom to make what they want, even if NSFW. Mainly because, without NSFW content allot of things just wouldn't exist. In the future there will be 2 big AI companies. 1 that has control over the image generation, that can limit the images to ethical SFW for professional api access, like what OpenAI and Imagen profess. Then there will be another that is completely open, for the degenerates to use.
Yeah, I like this take, I think you're spot on
They are using models trained by the work of their users. This is how they do it, it is even in the statement hidden under those petabytes and trillions of operations.
Thats super interesting, how does the training work? The users are just feeding MJ prompts right?
The prompts are the captions of the images. And it's feed into the ai to finetune the previous model
Haaaaaang on, the prompt is used to generate the image, you can't then go and use the same prompt to then gather MORE info about the relationship between the two, right?
It could be similar to pic2pic in Stable, only they use a huge amount of good quality images helping to decide the prompt outcome.
@@lewingtonn From what I understand, all the generated images above a certain rating is basically used to generate an authentic gradient. But it's not a single gradient. Different types of prompts create their own gradient.
Even if I could rank my own images I'd be happy. Yeah it'd be a slower process, but maybe it'd be quick enough to learn what I'm looking for
I was trying to do the same thing for stable diffusion. Even trying to incorporate prompt corrections that could just be handed over to emad. Doesn't seem to be enough interest though and automatic1111 wouldn't reply to it
Can't post the link here but it's issue 2764 on automatic1111 if you wanna see the code and screenshots of the UI so far
Added a link in my channel about section
I read the feature, I personally think that collecting data about prompts would be a huge benefit to the community, but as you say multiple times in the issue, it WOULD be a lot of work
Sure, midjourney is currently "better" than stable diffusions
But do you know one thing it cant do ?? well i guess you knew it already and that's the exact reason why i haven't tried mjv4 yet
Aaah I see you're a man of culture as well
love the content, but isn’t a gan also an iterative process? Adversarial Network… one network generates images, another “judges” them and the process iterates. Or have I got that entirely wrong?
good point: so the training of GAN IS iterative, with both networks slowly improving, but when training has finished, boom, you do the image in one go
Thanks! Your audio... check that before recording.
I diiiiiid, but not well enough, I'm trying some new software to try go improve it
@@lewingtonn just need to boost/amplify it! ;)
@@2PeteShakur it's haaaaaaaaaard
@@lewingtonn those clip-on mics might do the trick! ;)
V4 doesn't seem to do likenesses of people very well now compared to the --test and --testp versions or SD. I think this was a deliberate move on the part of the MJ team
Yes, they've stated that in an office hours meeting, they didn't like the uncanny look of it and stated that they wanted to shy away from it in their own model. David also said that they didn't want their generator to create something that you couldn't tell whether it was ai or not. in my opinion kind of shooting themselves in the foot and in turn midjourney gens all kind of look the same and have that "Midjourney look" to shy away from not being distinguishable as generations. I stopped using midjourney because as an artist I felt like I had no room to breathe or carve out my own look or style. Also in a more previous office hours they had talked about the idea of people being able to train their own style which I was super excited about but months went by and a long time later it never became anything more than an idea.
Can you please do a tutorial for dreambooth training ?? It's been added to auto1111
Also activate your windows
After what they did to Vista???? NEVER!!!!
👋
i agree completely
@@lewingtonn
Is that response a thing on YT? Because on one of the chess channels I follow, after watching the latest upload I add my 👋 comment and someone there always comments what you did in response.
It's either a thing or a fascinating coincidence. Either way, z'all good.
To be honest, what baffles me about V4 is that despite all the amazing new features and aesthetics HANDS still suck. The same thing happens to Stable Diffusion 1.5 and even Dall-E sometimes generates weird ass hands. This should be as important as fixing the faces and eyes if they really are aiming to automate art, and the saddest part is that there are guys "unintentionally" fixing hands by just training a model with Dreambooth. I mean, there is an adult-oriented SD model that is 10 times better at making hands than SD 1.5 and MD V4, this is sad.
I wonder how many of those 4 million users are actually paying customers? I've heard that many people continue to create new accounts so they don't have to pay.
Honestly , I wouldn't even know where to begin making alt accounts, and the free trial lasts five seconds anyway. I burnt up the credit on my basic subscription in no time (days), so will prob push the button on a years subscription.
ITs more realistic than V3.. V5 will be more so and it will handle higher resolution renders
Seriously, if you find midjourney "so much better", then you are just not creative enough, or good enough to get similar results in SD.
ooooooooooooh, shots fired! MJ fans out there gonna let him just SAY this????
Hustler Magazine, Inc. v. Falwell has already been decided - Here's my unsolved deliema, comment if you have a solution.
The "rights" of a photograph go to the photogrpaher, not the model. So if I take a picture of female "A", I have the rights to that.
So now I create a new picture that female A, now XXX, would find offensive. Does female "A" have a course of action, or are we now in
a world where it is cool that anyone can be genereated doing anything?
I've been making the mistake of focusing my attention on what Stable Diffusion can do. We (viewers of your channel) understand the limits,
and no where near enough time thinking about how it will impact society.
Furthermore, exactly how many days away will the first occurance of a "stable diffusion genererated" picture will surface
as political leverage? The public already isn't fact checking "Jack", a picture of anything could sway the course of history, if
people believe it real.
the general public doesn't have a clue about what is possible now.
A fake picture being created, to sway the public is a guarenteed ticking time bomb ~ the only thing up for discussion, is where and how it will go off.
The comment I'm looking for would be: We stop this obvious bomb from going off by........ (I haven't a clue)
I'm really behind on all the social and legal stuff, let me check out that case
@@lewingtonn the case basics are "others don't get to decide whats too lude for concenting adults" - My testing shows anyone CAN generate any image. There are A LOT of issures here ~ the more I think about it, the more I realize I'm underestimating the scope in the very near future
yeah, very true, things abt to get WILD
Also the side agrueing that better win lol
1ST!
feedback lol?
@@lewingtonn loved the video! Just had to put 1st on there for the 1st comment. 😂