What makes Modjourney v4 so much Better?

koiboi

Просмотров 4,4 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 ноя 2022
Midjourney v4 dropped and along a lot of metrics it blows evetything else, including stable diffusion out of the water. In this video we explore the techniques used by midjourney to make their models better than everything else.
Discord: / discord
------- Links -------
Midjourney V2 vs V3 comparison (very lit): github.com/willwulfken/MidJou...
David Holz interview: www.theregister.com/2022/08/0...
Other David Holz interview: www.theverge.com/2022/8/2/232...
------- Music -------
Music from freetousemusic.com
‘Onion’ by LuKremBo: • (no copyright music) l...
‘Snow’ by LuKremBo: • lukrembo - snow (royal...
‘Sunset’ by ‘LuKremBo’: • (no copyright music) j...
‘Affogato’ by LuKremBo: • lukrembo - affogato (r...
Many thanks to LuKremBo
#stablediffusion #aiart #news #art #midjourney #ai #technology #breakingnews #
Наука

Комментарии • 70

@g.kirilov1352 Год назад ⁺⁹
Best content around. Very sincere and nice to watch.
@lewingtonn Год назад ⁺¹
damn dude, that's so nice!
@BigJthumpalump Год назад ⁺²
I have also been wondering how Midjourney is so damn good. When you said they integrated user feedback. It all made perfect sense. Look at the difference between GPT3 and ChatGPT (GPT3.5). They added user feedback to create 3.5 and it made it SO much better. That's definitely the key.
@PeppePascale_ Год назад ⁺⁴
I think I found my fav youtube channel about AI
@lewingtonn Год назад
clearly you haven't seen: ruclips.net/user/SirajRavalvideos
@PeppePascale_ Год назад ⁺¹
@@lewingtonn i know that channel ty. You should believe in yourself more
@lewingtonn Год назад ⁺¹
@@PeppePascale_ haha thanks, I was kind of joking since that guy was like shown to be a scammer
@jessebwilson Год назад ⁺³
You may have missed the office hours that DavidH held yesterday, through the mj discord. Interesting to hear his viewpoint and insights into future mj
@mrpixelgrapher Год назад ⁺¹
what were the hightlights?
@lewingtonn Год назад
Was it recorded?
@jessebwilson Год назад ⁺¹
@@lewingtonn looks like not recorded, by mj team at least. If you search for 'office hours' or 'davidh' in discord you might find some reactions and tidbits. It was maybe 2 hours long. I think it's a weekly thing, but this was my first one
@lewingtonn Год назад
@@jessebwilson yup I might give it a browse... there's probably some sigma male out there recording every episode and uploading
@BartoszBielecki Год назад ⁺⁵
While I like the V4 quality I feel we will soon end up with DallE 3 fiasco where the model is closed and you are free to use the slow API ... for money. That's why I thank God there is SD to equalize the market.
@jessedart9103 Год назад
Interesting to revisit this now 3 months later. How are they doing it? We can surmise a couple of things. First, their model has no "consumer" hardware memory restrictions so it can be much larger and more accurate. Second, with the (public) release of the "offset noise" technique, we now know how they're making their images darker. Stability will address both of these very quickly and is already teasing its own "large" model called SD-XL, in addition to SD 3.0, DeepFloyd IF, and an as yet, unnamed model that appears amazing. In all, it will be a very short-lived time in the spotlight for MidJourney. I'd guess they have 6-18 months left.
@lewingtonn Год назад ⁺⁵
holy cow, did I really misspell the title?
@generalawareness101 Год назад ⁺²
SD 512x512. Dall-E 1024x1024. MJ 2048x2048. Honestly having that much resolution in your training data really will be seen, and even felt.
@lewingtonn Год назад
Source??? I thought they did normal training but are just upscaling like gods
@generalawareness101 Год назад ⁺²
@@lewingtonn I read this on some university level website last month. I was shocked, but I can understand since they have the mojo and we peons have Stable Diffusion. They aren't upscaling those babies to look so darn good because the source it draws upon to generate is just so big. If SD was 2048x2048 no single cards could handle that for training. I bet they have some sort of render farm or one of those big professional setups because the amount of memory required to train at 2048x2048 is insane. Now if they were able to get SD training down to 10 gigs I am unsure how much theirs take, but no single card could do it that I know of.
@lewingtonn Год назад ⁺²
Soooouuuuuurceeeee :'(
@generalawareness101 Год назад ⁺¹
@@lewingtonn I don't remember, I appologize, but I did go looking and on their github page it says they upscale to 1024x1024 BUT they aren't telling you what was the resolution of the original training images. That, I am sure, is kept close to their chest. Where I read what I did is beyond me as there is so much scholarly information out there (along with mis, and dis, information) that I can't keep track these days. 512x512 upscaled to 2048x2048 you can see it no matter how much of a god does it. It just is too much of a stretch. Now I bet the source is 2048x2048, as I read, but what they hand to you is the default size of the initial 2x2 grid is 512x512 pixels. What I am trying to say is that it is akin to how a crappy picture on the TV when shrunk down to the size of a cell phone suddenly looks sharp, and good. It is the same difference.
@swannschilling474 Год назад ⁺²
So we are having Lexica as a source to pre generated images, if Lexica would add in a feature to upvote images it could help a lot to create a dataset that is based on user bias...
Also I just had the idea to scrape Lexica for regularization images, it would probably help Dreambooth a lot if those were coming of of images generated from decent prompts. Rather than just using standard prompts?
@casuallymentioned 11 месяцев назад
Good luck man! Your content is very insightful, energy is great, and I'd like to recommend it but...audio!
If you're audio sucks, no one bothers watching. If your video sucks but audio is good, people will still listen.
Your audio would be a lot better if you knew what distortion, gain, and a limiter is and how they work. They're basically knobs you can turn to sweeten audio (read: listenable audio people will stay for). Basically, keep your microphone at a constant 4-8 inches from your mouth, around your upper chest or neckline where there's the best resonance. That alone goes a long way!
Keep up the insights!
@garethbridges2983 Год назад ⁺⁵
I think its important to give users creative freedom to make what they want, even if NSFW. Mainly because, without NSFW content allot of things just wouldn't exist. In the future there will be 2 big AI companies. 1 that has control over the image generation, that can limit the images to ethical SFW for professional api access, like what OpenAI and Imagen profess. Then there will be another that is completely open, for the degenerates to use.
@lewingtonn Год назад ⁺²
Yeah, I like this take, I think you're spot on
@sandnerdaniel Год назад ⁺³
They are using models trained by the work of their users. This is how they do it, it is even in the statement hidden under those petabytes and trillions of operations.
@lewingtonn Год назад
Thats super interesting, how does the training work? The users are just feeding MJ prompts right?
@SuperSigma69 Год назад ⁺¹
The prompts are the captions of the images. And it's feed into the ai to finetune the previous model
@lewingtonn Год назад
Haaaaaang on, the prompt is used to generate the image, you can't then go and use the same prompt to then gather MORE info about the relationship between the two, right?
@philm325 Год назад ⁺¹
It could be similar to pic2pic in Stable, only they use a huge amount of good quality images helping to decide the prompt outcome.
@Beyondarmonia Год назад
@@lewingtonn From what I understand, all the generated images above a certain rating is basically used to generate an authentic gradient. But it's not a single gradient. Different types of prompts create their own gradient.
@miranda.cooper Год назад ⁺²
Even if I could rank my own images I'd be happy. Yeah it'd be a slower process, but maybe it'd be quick enough to learn what I'm looking for
@steve_jabz Год назад ⁺¹
I was trying to do the same thing for stable diffusion. Even trying to incorporate prompt corrections that could just be handed over to emad. Doesn't seem to be enough interest though and automatic1111 wouldn't reply to it
@steve_jabz Год назад
Can't post the link here but it's issue 2764 on automatic1111 if you wanna see the code and screenshots of the UI so far
@steve_jabz Год назад
Added a link in my channel about section
@lewingtonn Год назад ⁺¹
I read the feature, I personally think that collecting data about prompts would be a huge benefit to the community, but as you say multiple times in the issue, it WOULD be a lot of work
@reyhan0447 Год назад ⁺⁵
Sure, midjourney is currently "better" than stable diffusions
But do you know one thing it cant do ?? well i guess you knew it already and that's the exact reason why i haven't tried mjv4 yet
@lewingtonn Год назад ⁺²
Aaah I see you're a man of culture as well
@j.j.maverick9252 Год назад ⁺¹
love the content, but isn’t a gan also an iterative process? Adversarial Network… one network generates images, another “judges” them and the process iterates. Or have I got that entirely wrong?
@lewingtonn Год назад ⁺¹
good point: so the training of GAN IS iterative, with both networks slowly improving, but when training has finished, boom, you do the image in one go
@devnull_ Год назад ⁺¹
Thanks! Your audio... check that before recording.
@lewingtonn Год назад
I diiiiiid, but not well enough, I'm trying some new software to try go improve it
@2PeteShakur Год назад ⁺¹
@@lewingtonn just need to boost/amplify it! ;)
@lewingtonn Год назад
@@2PeteShakur it's haaaaaaaaaard
@2PeteShakur Год назад
@@lewingtonn those clip-on mics might do the trick! ;)
@KadayiPolokov Год назад
V4 doesn't seem to do likenesses of people very well now compared to the --test and --testp versions or SD. I think this was a deliberate move on the part of the MJ team
@Death-777 Год назад
Yes, they've stated that in an office hours meeting, they didn't like the uncanny look of it and stated that they wanted to shy away from it in their own model. David also said that they didn't want their generator to create something that you couldn't tell whether it was ai or not. in my opinion kind of shooting themselves in the foot and in turn midjourney gens all kind of look the same and have that "Midjourney look" to shy away from not being distinguishable as generations. I stopped using midjourney because as an artist I felt like I had no room to breathe or carve out my own look or style. Also in a more previous office hours they had talked about the idea of people being able to train their own style which I was super excited about but months went by and a long time later it never became anything more than an idea.
@reyhan0447 Год назад ⁺¹
Can you please do a tutorial for dreambooth training ?? It's been added to auto1111
Also activate your windows
@lewingtonn Год назад ⁺³
After what they did to Vista???? NEVER!!!!
@LouisGedo Год назад ⁺¹
👋
@lewingtonn Год назад
i agree completely
@LouisGedo Год назад
@@lewingtonn
Is that response a thing on YT? Because on one of the chess channels I follow, after watching the latest upload I add my 👋 comment and someone there always comments what you did in response.
It's either a thing or a fascinating coincidence. Either way, z'all good.
@petitemasque5784 Год назад
To be honest, what baffles me about V4 is that despite all the amazing new features and aesthetics HANDS still suck. The same thing happens to Stable Diffusion 1.5 and even Dall-E sometimes generates weird ass hands. This should be as important as fixing the faces and eyes if they really are aiming to automate art, and the saddest part is that there are guys "unintentionally" fixing hands by just training a model with Dreambooth. I mean, there is an adult-oriented SD model that is 10 times better at making hands than SD 1.5 and MD V4, this is sad.
@KC_79 Год назад ⁺¹
I wonder how many of those 4 million users are actually paying customers? I've heard that many people continue to create new accounts so they don't have to pay.
@monealiza7553 Год назад
Honestly , I wouldn't even know where to begin making alt accounts, and the free trial lasts five seconds anyway. I burnt up the credit on my basic subscription in no time (days), so will prob push the button on a years subscription.
@riseofthethorax Год назад
ITs more realistic than V3.. V5 will be more so and it will handle higher resolution renders
@earthequalsmissingcurvesqu9359 Год назад ⁺¹
Seriously, if you find midjourney "so much better", then you are just not creative enough, or good enough to get similar results in SD.
@lewingtonn Год назад
ooooooooooooh, shots fired! MJ fans out there gonna let him just SAY this????
@leafdriving Год назад ⁺¹
Hustler Magazine, Inc. v. Falwell has already been decided - Here's my unsolved deliema, comment if you have a solution.
The "rights" of a photograph go to the photogrpaher, not the model. So if I take a picture of female "A", I have the rights to that.
So now I create a new picture that female A, now XXX, would find offensive. Does female "A" have a course of action, or are we now in
a world where it is cool that anyone can be genereated doing anything?
I've been making the mistake of focusing my attention on what Stable Diffusion can do. We (viewers of your channel) understand the limits,
and no where near enough time thinking about how it will impact society.
Furthermore, exactly how many days away will the first occurance of a "stable diffusion genererated" picture will surface
as political leverage? The public already isn't fact checking "Jack", a picture of anything could sway the course of history, if
people believe it real.
the general public doesn't have a clue about what is possible now.
A fake picture being created, to sway the public is a guarenteed ticking time bomb ~ the only thing up for discussion, is where and how it will go off.
The comment I'm looking for would be: We stop this obvious bomb from going off by........ (I haven't a clue)
@lewingtonn Год назад
I'm really behind on all the social and legal stuff, let me check out that case
@leafdriving Год назад ⁺¹
@@lewingtonn the case basics are "others don't get to decide whats too lude for concenting adults" - My testing shows anyone CAN generate any image. There are A LOT of issures here ~ the more I think about it, the more I realize I'm underestimating the scope in the very near future
@lewingtonn Год назад
yeah, very true, things abt to get WILD
Also the side agrueing that better win lol
@moe_joe_man Год назад
1ST!
@lewingtonn Год назад ⁺¹
feedback lol?
@moe_joe_man Год назад ⁺¹
@@lewingtonn loved the video! Just had to put 1st on there for the 1st comment. 😂

Следующие

Автовоспроизведение