Firefox Has A PERFECT Use For AI Text Generation
HTML-код
- Опубликовано: 4 июн 2024
- A lot of the use cases for AI have been replacing the thing we already do with AI to save some money but Mozilla Firefox actually has a use that I think is really good and a massive boon for accessibility, alt-text generation
==========Support The Channel==========
► Patreon: brodierobertson.xyz/patreon
► Paypal: brodierobertson.xyz/paypal
► Liberapay: brodierobertson.xyz/liberapay
► Amazon USA: brodierobertson.xyz/amazonusa
==========Resources==========
Mozilla Mastodon Post: mozilla.social/@mozilla/11255...
Mozilla Blog Post: hacks.mozilla.org/2024/05/exp...
=========Video Platforms==========
🎥 Odysee: brodierobertson.xyz/odysee
🎥 Podcast: techovertea.xyz/youtube
🎮 Gaming: brodierobertson.xyz/gaming
==========Social Media==========
🎤 Discord: brodierobertson.xyz/discord
🐦 Twitter: brodierobertson.xyz/twitter
🌐 Mastodon: brodierobertson.xyz/mastodon
🖥️ GitHub: brodierobertson.xyz/github
==========Credits==========
🎨 Channel Art:
Profile Picture:
/ supercozman_draws
🎵 Ending music
Track: Debris & Jonth - Game Time [NCS Release]
Music provided by NoCopyrightSounds.
Watch: • Debris & Jonth - Game ...
Free Download / Stream: ncs.io/GameTime
#Linux #Firefox #OpenSource #AI #LLM #GPT
DISCLOSURE: Wherever possible I use referral links, which means if you click one of the links in this video or description and make a purchase I may receive a small commission or other compensation. Наука
this is the kind of thing AI should automate - a useful, needed work that NO human wants to do. This is literally the most (use-case-wise) ethical AI application I've seen yet
I largely agree, but I always think use cases like this should be *opt-in* for two reasons:
1. Environmental impact-if this is a feature the user doesn't need, then they're burning inference cycles and driving up their power bill (or depleting their battery) unnecessary
2. Intellectual property: *All* of these models are _extremely cagely_ about what datasets they were trained on. The reason is almost certainly because the training datasets contain copyrighted work. No matter how noble the use case, if you believe that artists should have a say in how their works are used, you shouldn't be tricked into being complicit in what is essentially theft.
@@GSBarlev they explain in the video it is opt-in.
I think its worth looking at why people insist on being used as resources when robots could and should be used to improve our lives in every way they can.
If we weren't deprived of basic necessities by default, maybe the fear of being replaced would subside, and instead we could spend our time looking for things to enjoy together.
Why the requirement of "No human wanting to do it"? We didn't halt new job creation, when machinery was invented, despite some people wanting to preserve their manual jobs that the newly created machines were taking from them.
Why AI should be different? Better ask governments for an U.B.I. rather to stop progress.
Or learn to adapt and use AI in your workflow.
Here's another case, a RUclipsr I know has a crawling text next to his narration, which took forever to manually synchronize. I used an AI model to listen to his narration, extract the timing of each paragraph, and automate creating the text crawl as a video for he could simply import into Premiere.
It was work he himself was already doing, and it just made his workflow simpler.
This one actually looks like a good idea, i love how firefox want to handle it
it's telling that google comes out of the gate with buggy and misleading ai search, while firefox waits longer to test this reasonable and actually useful use of ai. this is part of why, even though mozilla isnt perfect, i will continue to use firefox over chrome / chromium based browsers. i still want to hear from people who are blind or already make use of screenreaders to see how helpful this feature will be before making a judgement though.
I'm not sure the buggy Google AI search summaries are real. The screenshots of it being bad are all 1) funny and 2) are cropped to cut out the sources section. I'm pretty certain a lot of them are just people using the F12 key.
Google was testing and waiting, openai and microsoft kind of pushed them to keep shareholders happy. Let me point out a few things, used google translate sometime since november 2016? Well, between 2016 and 2020 you were using Google Neural Marchine Transtlation system (a system they started putting together in 2011), a precursor to Generative Pretrained Transformer (GPT), by 2020, well google behind the scenes switched out to a GPT based system.
OpenAI managed to a do a good job with training of a scaled up version and b get lucky with said training. But they were working off of what|others, mostly google had already done.
Still I prefer firefox too, but let us be clear, there is more here than it looks like.
Sentiment is great but Mozilla gets millions from Google so Google having buggy AI used by millions of users and potentially being funded by stealing all that data is what funds Mozilla as well. People always seem to forget this and they keep boasting the argument of using FF against Chromium browsers as if it's noble or something. I'm not saying DON'T use it. I'm saying let's be real about it.
The tab grouping feature request has been done about 2 years ago with many comments in the (mozilla) forums and they are "still looking on it" meaning they don't give a shit and they won't do it - a feature that had 2 thousand upvotes which is extremely high considering how many people use FF and then how many of those use community forums to ask for a feature.
Then, they suddenly come out with "articles" which a lot of people don't read and they always come off with the wrong impression. This is partially people's responsibility, yes. Thanks to Brodie we can have a video about it. However, if it happens so often then they should revise their communication strategy. This let alone with them shipping with telemetry which many people dislike and also having Google as default search engine, unlike before that it used to have DDG and have Ecosia and others as options right there.
This, if it helps people with disability, it's GREAT. I'm ALL for it. But let's not keep pretending FF is a golden standard anymore. It's just another one that luckily, is not Chromium based, but might as well be and it'd be the same.
@hoovysimulator2518 I use it - depending on what and how, the AI summarizer is FINE. Is not perfect and SOME AI Answers seem to conflict on certain things because of the sources/info that pulls (at least for the queries I've done myself, maybe other topics are much better or much worse), but otherwise in general is pretty neutral which I like. It usually gives a general description, then pros and cons of something according to that info and then a conclusion as a summary. It's good enough imo.
@hoovysimulator2518 or the person who used F12 to fake it copied it from a Reddit post.
As a VIP, visually impaired person, I greatly appreciate this sort of feature. As a machine learning engineer I very much appreciate these sorts of features when implemented to run locally.
Agreed. Relying on the internet for it is so dumb, might as well be fully online at that point
As a web developer (among other things, the webdev is more of a side thing now) I do have a problem in that if I can't come up with good alt text to use, I doubt the AI will be able to either, but other than that, it is a great idea generally. Not all web developers are good at alt text consistency and client corporations tend to be attrocious if once allows them to directly add content.
Now we're getting somewhere.
Edit: honestly, when the cause is clear and noble, I'd WANT to donate my data to be used in training. Please give me that option in the telemetry settings.
i would absolutely love opt-in telemetry for this purpose
The problem is that you can't know for sure they aren't breaking the law. Sometimes businesses make the calculation and decide that the fines wouldn't impact the profits enough for it to matter that they've broken the law
and have a setting to not share from private sessions 💀
Many social media sites don't even allow you to put alt text. In misskey for example, all the alt text is simply the name of the uploaded file. Which usually isn't helpful. So this seems very nice.
that's so stupid, why do they do that
@@felixfourcolor Laziness, i guess. Most people don't care about alt-text.
If you don't allow a real useful alt text you should actually not include the attribute at all. Having the screen reader exclaim "Image!" and nothing more is after all more useful than having it exclaim "Image twenty million two houndred fourty thousand three hundred thirty underscore zero zero ay ex fifty three jey peg"; since both exclaimations only tells you that the people who made the site was lazy and didn't bother with alt-texts but the first one does so in much less time and annoyance...
It won't validate of course, that's because you are supposed to supply an alt-text, but supplying junk isn't doing anything except fooling a silly static code analysis tool.
In fact fairly often the correct solution (which of course only works in an editorial setting, not in social media) is to include a deliberately empty alt-text. If the image is _purely_ decorative like some abstract patterns or geometric shapes or a completely irrelevant stock photo with _no_ real function to the context beyond breaking up a wall of text with some pretty colours; these purely decorative images should really be ignored by the screen reader, and by explicitly supplying an empty alt-text you're telling the screen reader to ignore it completely.
Though; arguably if the image is completely meaningless (i.e. seeing visitors will also just completely ignore it while reading), you probably should not even include it. Why waste bandwidth and screen space with something that everyone will skip? I'd much prefer some SVG decoration or some dingbat or flourish character from a webfont with text colour as a colourful chapter divider, which does't kost many kilobytes at all over a meaningless 1 megabyte stock photo that I have to waste resources downloading and then waste brain cycles figuring out if it's meaningful or not just to mentally discard it immediately.
Btw, decorative SVG's or dingbats should also have an aria-hidden attribute to tell the screen reader to ignore it.
If it also functions as a visual indication of a new chapter or section in the text; and this would be meaningful to indicate with a brief pause if you were reading it aloud; you should probably end a section tag before the symbol and start a new section after it, so that the screen reader can announce it (if configured to do so) and so that the screen reader user can use the list of sections on the page as landmarks to quickly skip back and forth in the text just like seeing visitors can use the symbol as a landmark.
It would be useful for the model to add an additional line, something like "ai generated: 75% confidence" at the end of its output.
The problem is that those confidence values are often not very accurate, so are more misleading than anything. High confidence hallucinations would give people a false sense of security. "AI assessment, errors may exist" is a better one since it is vague.
I think that attaching a confidence percentage would get muddy, but you have a point. Maybe when this is expanded to the rest of the browser there is an option to distinguish AI generated alt text that came from the website. For sighted people this could be in the form of maybe a different color for the text that pops up, and for the visually impaired I'm assuming the screen reader could use a different tone of voice (I don't need to and haven't really used screen reader software so I may be totally wrong here).
A few years ago I was working with software that output a giant list of results with text like "Good", "Warning", "Error", etc..., but as an image so "it looked nice". Unfortunately, one of my co-workers was vision-impaired and the software did not provide alt-text. The at-the-time solution was to use a python script to manually modify the output and add alt-text tags for every image.
But I can see how such a tool would be useful for vision-impaired people in an on-demand context. Something where they can configure when to generate descriptions, whether or not to wait for confirmation, etc...
I thought the firefox model's description was great but the human one was definitely very good. Both are probably better than anything I may have come up with.
Finally, some good fucking AI
It was there all along. Just without that much hype and usually called "machine learning" instead.
The only big change in the past 2 years is we now have models that can generate highly intelligible text. And lots of bogus funding.
Jeff Geerling has another example in his video he also mentions the fact that AI is kind of a buzzword for the situation since machine learning has been worked on for a while, but using a dedicated attachement to the Pi to improve it's capabilities with machine learning for a camera to be used as a security camera is good.
Their page translation feature is also powered by locally-run AI. So this is a welcome expansion of that
The more specialized/small the AI the better the result.
@@olemortensen3354 It's like the old saying. Jack of all trades, master of one
A lot of things would be dead or stagnant without people narrowing down their skill into a niche
Alt text is a fundamental basic requirement of making a website it's insane that we have reached the point where this is needed.
At least this is a valid use case of AI and being open source project I can buy that it will stay local.
I agree with the mozilla model of AI. Give me local models on demand and don't touch my data. I wish they would bundle a really good text to speech engine, even if you need a GPU.
This is a cool feature
Year of the Linux console
year of the linux browser
linux console was always superior.
year of the linux kernel panic logs
I'd love to see this running in lynx.
@@ChrisWijtmans tbh if you compare the stock experience between the windows terminal and the stock experience of something like konsole or the gnome terminal then id say the windows terminal wins. its just that there are many, better terminal emulators on linux and you can actually customize them which is their real power
Actually, on social media there are image-to-text models that add description for an image automagicaly
I can imagine it being used to automatically generate alt texts for images, which are then automatically scraped by someone to trained their own model, which is used to generate alt texts for images, and then scraped again, and so on. I wonder how much they will start to hallucinate 😅
why would scrapers scrape locally generated text?
Case in point: only leave its outputs locally, don't add similar stuff to WordPress so you can "comply" with accessibility guidelines, just like it is used by Firefox already.
As a web developer I think this is one of the few good uses for AI. I saw an article title that read "I want AI to do my dishes and laundry so I have time for art and writing NOT AI to do art and writing so I have time for dishes and laundry."
AI should not replicate human behaviour, AI should assist human behaviour to improve our efficiency in things we don't want to do. Alt-tags is a great example of an important but tedious task that often get forgotten. AI is a great use here, under the restrictions that it is not taken for granted. AI today is not good enough to replace humans - we still need to check it to ensure it is not hallucinating.
I actually use firefox pdf editor, its super useful for me lol. Also Firefox is doing something good, that is opt in and local, and is being done for improving the firefox user experience??
Hope the model is trained on stuff thats not in copyright (or was fully paid for and got permission to do so), because I think thats the only way its ethical.
It seems less unethical to train AI used to generate text from an image using copyrighted images than the opposite. When you're converting an image to text its hard to say you didn't transform it. When you're generating images using copyrighted images you can often see artifacts of watermarks in the images (given certain parameters) and arguing that it was transformative enough seems like wishful thinking from AI enthusiasts.
Somebody came up with an actually good usecase for AI? I actually can't believe it!
I didn't know this isn't already in production since you can probably technically have a more power consuming version run on gpu-acceleration for a while now (like probably even when YOLO came out years before ChatGPT). But I guess it makes more sense now that it is much less process-taxing.
They could also build in local translation for images like foreign signboard or, for you weebs, manga panels. There is also Phi-3 vision and paligemma for GPT4o style descriptions.
The nuanced middle position is - if you want to process a mass of data and only need a resonable amount of good results, not all good results, then AI is great. If you want to process one specific thing to get one specific result you'll get frustrated and it'll take you longer than doing it manually. Generating alt text is pretty smart if it doesn't use too much resources.
This is why I love using Ice Cubes for Mastodon. I hate describing images in my posts. So I let AI describe the image for me. I’m all for AI doing tasks like this.
Sounds like they hit a reasonable compromise.
This however reminds me of a thing I sorely miss from the olden days of browsers - the ability to easily (hotkey) turn images on and off. Was so much easier to check for missed alt-tags with that (also, made a lots of pages easier to read).
Good enough image descriptions when they are missing is finally an actually useful use for LLMs!
Nice to see it implemented filly locally with a small model that hopefully will be able to run reasonably well even on just a CPU!
This one is so interesting! Thank you for reporting it. It's so interesting that I may have check it out and apt install firefox-nightly
FE dev here. The problem with that approach is that the alt text is going to describe the image while in my experience in most cases it's more helpful to describe what is the purpose of the image with alt-text rather than the image itself.
For example I have an button with an image inside. The alt text in this case is what that button will say if it was a text button and not describing the image itself. Or sometimes I have a card of an item and the image is based on the type of that item (it's a thing that could be one of a few types). So in that case the alt does not describe the image itself but is something like "An image for a of that ".
It's a good thing I enforce all images having an `alt ` attribute with ESLint rule so FF won't be generating anything for our projects (when they introduce it for all pages not just PDFs).
This is true, and we obviously can't outsource alt text generation to AI from a dev perspective, but from the perspective of a user, in a situation where the page provides absolutely no information on what an image contains, or what its purpose is, I think it's inarguable that this feature represents an improvement to the user experience.
Besides, any image that doesn't have an alt text represents a failure on part of the developer, so even if a FF generated alt text is wrong and a user gets confused or reports it as an issue, you're still at fault for not adding proper accessibility metadata in the first place.
Finally AI being put into good use
Seems great. I've been personally really impressed with the open image captioning models and think they're quite a bit better than useless on almost all images.
The Firefox pdf editor is pretty nice, but standard for a web browser
Haven't used it much, but just like with page translation. I like this, and they could even reuse the same interface
already in place for languages of pages to translate. Don't need alt text, but I like that platforms like Mastodon and Bluesky
offer a toggle to not let you post an image until you add it. Helps me not forget, think making a default would make a lot more people aware
The fact that its mozilla, not bundled with difficult opt out (Microsoft) makes this acceptable
i wouldnt say that this is the only good use case but this falls under repetitive tasks where it has to do one thing only, which i think in general many of those can also be usecases for ai which have now drawbacks to them asuming its implemented correctly.
As someone, who hates LLMs with a passion, I think this is an actually good use case (actually helping people) with an implementation that doesn't raise horrible concerns for me
That's great, it even avoids publishing generated alt texts that would be used for training future models and thus keeps the quality of scraped data
I knew about and used the Firefox PDF Editor beforehand. It's great because I have my browser always open anyways and use it for reading PDFs (e.g. assignments found on the uni website).
The Firefox AI features are awesome, e.g. I've been searching for private website translations for years and never found a proper alternative besides Google web translate addons. This alt text accessibility feature is a great continueation of this private AI work they've done.
I've always thought that small models like this were much cooler than huge clouded models like GPT4. Small programs that do one task pretty well hashtag unix philosophy.
I'm sure it's gone unnoticed, but I appreciate seeing Firefox in your videos now more instead of Brave. Good on ya mate, we can't keep supporting chromium for as long as google controls it
Mozilla is funded by Google. So...?
@@LautaroQ2812 only because Google is forced to do it, so that it can't have a monopoly. it certainly would be better if it didn't receive it, but Firefox forks like LibreWolf and Pale Moon also exist.
@@LautaroQ2812 It's the lesser evil.
@@LautaroQ2812 Yea but it's more of a:
Google: Here's money for your foundation to make my reputation better
Mozilla: Nice this will work great on that project that will make you look really stupid
Google: What?
Mozilla: What?
@@LautaroQ2812 The only thing Google funds is the default search contract deal.
Just about the only real criticism I can think of is that it should be part of the screen reader, not the browser.
There's the matter of resource utilization and battery life, but that's honestly just a question of good defaults and visibility.
10:10 OCR (Optical Character Recognition) for example
Finally a good use for ai. Big props to Firefox.
I have already seen some automatically generated alt-text on websites, those used "simple" machine vision technologies and were very limited, and often bad. A good on-device neural network could strike a good balance, although the end of the text should always be "This image description was automatically generated and may not be accurate."
I'm legally blind. I have some vision, and I will still be using this myself. I don't trust Mozilla not to collect data from me, I trust them to tell me if they're doing it and to give me the chance to say no and respect my decision. And it's important that this be done, because it makes the web more accessible to all.
(And yes I knew about Mozilla's PDF editing features. Firefox is my primary browser because of multiple account containers (which Chromes of any color don't have) and temporary containers (3rd party thing). The result is that every new either goes into a pre-defined container (which retains its cookies) or a temporary container which is effectively a private browsing window, gone the instant I close it.
I want a loop where you provide text, it generates an image based on that text, it generates text based on the resulting image, etc. etc.
damn, that zoom out is a game change, production value go brrrrrr
Sounds like a good use to me. I wonder how it'll work when deployed to work on web images as well, in particular for images where there is already alt text that appears when hovered over, but does not describe the image - like the extra jokes on xkcd's comics.
Huh. Finally an actual use for the NPUs which will be on every CPU from now on. Good job Mozilla!
I find this use and how it is implemented OK.
Looking at the very good description by ChatGPT-4 and the shortened version of the text, two tags should be part of W3C standards. Maybe Alt-text and Alt-text-long, if not already existing.
The long version being particularly useful to visually impaired people.
As hardware/software becomes more efficient, the sort of description provided in the ChatGPT-4 example may become available locally. Web developers could generate and vet text locally before publishing, same for social media content posters. As a last resort, empty tags could be generated locally at the readers end.
FF pdf editor is awesome. Been using it for a while.
8:52 Ok, that's pretty surprising.
Because the PDF viewers and editors of browser (doesn't matter if Firefox or Chromium based) at this point became so good, that if somebody doesn't need it for their day to day stuff (like work or school), I don't think you needs to install one.
I expect the situation in social media to be better than the average website - for example Instagram have been using AI generated alt text since forever.
5:24 A random human content editor will quite often _also_ focus on the completely wrong thing in an image when writing a manual alt-text too. Especially if they are creating alt-text for a generic stock photo or generic press photo of some person without knowing the context the image will be used in.
It would be cool if websites could hook into an API and have an option to generate alt text for images on the fediverse automatically.
alt text can also be the popup text when mouse overing
This is a really nice concept and totally worth use of AI, not to mention how noble of a subject it deals with(accessibility)
That's a great use of AI, saving work no one wants to do and helping people who otherwise would not be able to interact with the page fully
So this guy got a haircut but left that beard totally untouched?? Good for him. This is Linuxland.
That cut is sharp
He's still a few decades away from the unix beard tho
Mozilla blessings for Rust and this are 2 of Mozilla recent Ws. I support the way they implement the AI specifically: totally optional plus future plan to provide to the user an interface to manage and install AI models. This is actually good stuff.
Noce job firefox. Good reporting
Very good topic and video on it.
Such micro-AI can be very helpful in many situations.
AI is a tool (more a range of tools), as such it is not "good" nor "evil" by itself. How will one use it - to do good or bad - is something, related to humans, not to AI.
Killer feature maybe, big w for moz
When I heard several month ago Mozilla jumped on ai hype train I was sceptical, but so far they seem to deliver. Considering they've crowdsourced and released Common Voices dataset, next model is going to be tts.
As someone that's against AI for literally anything and everything (I also have a bit of a hype aversion in general), I might actually spin up an open-source, self-host-able LLM for alt text generation while web dev-ing
This sounds rather useful, especially for those images where you just don't know how to describe them. Sounds like it will give you a good starting point, before translating it into your own words.
what are the best ai text and ai imagine generators to use at the moment?
This is the only real and working normal use of AI i have ever seen.
Amazing I love it. Even though I will probably never use it
AS is definitely over hyped but it shouldn't be demonized. Many are being closed minded about it, but I don't see the reason why.
Is a good idea.
sounds like a good use, that said i do not use a screen reader so i would choose to not use it, if the image did not load i would find it to be useful, but i would assume if the image does not load alt text can not be generated
Yes, generating alt text sounds like a great use for AI, as long as it's specified in the text that it's AI generated. "Halucinations" will probably always be an issue, but users should be able to understand that.
In limited fairness to the Baseline Model, she's touching the chair and could been seen as holding. So it's not a crazy hallucination
HDR has been a "bug" in Firefox for years, but never mind that. It's time to work on AI and get the implementation done in just a few months!
It's a good feature for accessibility. I'm not complaining about the feature itself. Just questioning why they're draghing their feet on HDR so kuch when they get a feature like this out so quickly
I just hope that it won't overwrite alt-text that is intentionally set to be blank in the future. I have a weird use case in a userscript were I need to set [alt=""] on an img element.
that should be pretty easy to implement. It's probably good to have empty alt-text on some less esoteric stuff, too: I imagine you don't want screen-readers going "the logo" several times on every page.
@@Poldovico It is implemented, I was saying I hope it won't overwrite that in the future. My use case is kind of a hack job that makes the img element 0px X 0px if it's broken,(instead of showing a broken image icon) it's not really the right way to do what I'm trying to, but it's what I (a self taught hobbyist) can do to modify an existing script to do what I need it to, and it's good enough so I don't care. But that is a good point, about there being cases where you wouldn't want alt-text.
Ah alt text, the text you feel compelled to supply because html has a spot for it, but on never seeing any results/benefits from, soon loose interest, and on receiving no negative responses, choose to spend one's time on something that does bring 'something' for your effort.
Does it detect and omit design elements? Those would probably get descriptions that are not very useful.
But it definitely should label tracking pixels as such!
Okay, but Firefox still doesn't support the XDG Base Directory specification.
One thing i think it needs is a small disclaimer, something like "AI alt text" to denote alt text was ai generated and may not be completly accurate.
This the kind of thing AI should be used for. People just hate that big corps are trying to replace creative jobs like Art and writing with AI and that just leads to uninspired garbage or at worst misinformation in Google's case
Seeing Brodie not have Brave open 👀👀👀I wonder if this is a one time thing.
I think he uses FF for quiet some time now. Around when he made the first videos about Plasma 6 I think. But he still burns our retinas with those light themes.
@@renner0395 I must have not noticed it in the last few months. I'm assuming it's cus of MV3
Yeah, accessibility (and data analysis in science and perhaps medicine) is probably the one good use of this kind of AI.
Wow, firefox actually does something ahead of chrome/ium.
corrected: Wow, firefox actually does something
Damn, the one good use of this billion dollar technology
The firefox pdf editor if fairly new (one or two months I think", so that explains why you didn't know about it.
About 'legal': if there is no regulations in place, it cannot be illegal. If it is not explicitly forbidden, it is allowed, and cannot be punished
I hope it is Opt-in feature.
I'm still a bit wary about the ultimate consequences of AI, especially as there seems to be a push towards truly generalized AI (and quite frankly, we may already be there and not know about it).
That said - I do like Firefox's approach. Look for legitimate cases where it can be genuinely useful, and build from there. This is far better than slapping an AI button on everything you can, which is what I see a lot in other products.
this is the type of ai automation i can get behind.
Hope they do the organise tabs with AI thing
You didnt know Firefox had a PDF editor? Doesn't every browser have that?
I knew it had a viewer
finally, a good use for ai!!
I HAVE BEEN SAYING THIS SINCE IMAGE GEN CAME OUT LETS GOOOOOOO
My concern around automated alt text generation is that it feels as though people start to think that because of all the automatic ways to generate it, they don't need to do it themselves any longer.
This is kind of something that seems to have happened on Mastodon that was once great with alt text, but now it feels like people have given up because they think that things like gpt4o are a perfectly valid replacement, which they are not.
It's a good alternative to having some alt text where there was previously none, not a replacement.
When you mentioned that a certain version of alt text was too long, I disagree. I always prefer alt text that goes into finest detail and helps me imagine the setting and stuff, although this may be more appropriate for social media rather than every single image on a website being described this way.
Hillucination is a huge issue with AI, and so is censorship of inappropriate things (or things that the model detects as inappropriate anyway), which means that your access is limited.
Consider Gemini. I wanted to know what a social media post screenshot said, and it refused to tell me because the content isn't something that it was designed to say. Turns out it was a post by someone who was ranting about a book that discusses children as soldiers or shield in wars, and making a point that if you need to consider or ask if it's OK to hurt children in war, you are a bad person. It was nothing graphic, but even if it was, I think it should try to phrase it in such way that still protects people, but doesn't limit access.
I've always said that my problem with machine learning isn't machine learning itself, it's when you combine machine learning, capitalism, and disregard for consent and privacy.
At this point the high performance computing that LLM's do that costs an arm and a leg to run and relies on basically slavery, on top of being trained on non-consenting people's data is misrepresenting the whole of machine learning IMO.
An on-device, offline machine learning trained on legal/ethical data is great.
Bonus points if it's also open source.
Heck, even online machine learning that makes you train your own model and doesn't use potentially copyrighted data of non-consenting people I see no issue with.
The fact that people are throwing a fit over Mozilla's on-device machine learning feels really meh as a screen reader user.
Considering that Apple especially has been using on-device machine learning for a while now, including their noise reduction, siri, tts voices, etc. It just feels like the marketting term AI has made everyone (understandably) jumpy and we probably need a better term for legal/ethical machine learning.
I can't help but not think of adult video industry when I talk about ethical and legal stuff. Very sad in both cases that those aren't respected by default instead of being something special.
My negative opinions about IA are about exaggerations, overhype and overpromises. Also I against strapping IA everywhere to solve problems that nobody have.
For example speech syntesis is a legit and usefull case. Email templating, is barelly better than regular templates. AI on search engines are horrible.
As you said Firefox has found a good and actually usefull way to use it.
I remember when AI was simply a bunch of experiments, before it went mainstream, before it became the money-making fad. This feels like a call back to those days.
There are cases when it's expected to have empty alt text and this in that cases the alt text should not be filled in automatically by the browser.
For example, if you have UI with notification constructed from "warning" icon image and notification message, you want to have alt text set as something like "warning: " or "important: " for that image. However, if you have UI with notification designed as "warning" icon image followed by "warning" word and then by the notification itself, it is expected to have empty alt for the icon - the alt would not serve any purpose for blind person, image it's used only as decorative element.
True, but if I were hard-of-sight, I would totally take redundant alt-text like that over no alt-text at all on images that should have it.
You can mark images as decorative which makes screen readers and other tools disregard the images
UI elements generally don't use img tags, though. Almost universally img tags represent content. UI pictograms are generally done with svg tags, css (be it background-image or pure CSS shapes), or fonts (e.g. fontawesome).
I would disagree. If you have an image saying warning, and text saying warning, it is emphasised for sighted people. The repetition applies a similar level of emphasis to a partially sighted person.
We are a while away from that anyway, this is aimed at the pdf editor (which I kinda want to try now). As much as descriptions of images are useful, for browsing it really it does need to take the context into account, so wants to be a transformer on the whole page when we get to that point. Then we could fine train it to ignore redundant images anyway. "It might get it wrong" is not a good reason for criticising an assistant tool, particularly given it almost certainly will in unexpected and creative ways!
Deliberate omission seems such a corner case (and unhelpful, because you can never know whether it was deliberate), that I don't see a good reason for pushback here. In those cases what you really want is an AI looking at the warning image, seeing it is followed by warning text, and replacing empty alt text with an empty string. A blind person without assistance will have no idea what is in the image. They just see an image with missing alt text. It's probably a redundant warning image, but they actually have no idea.
Ok, this is genius
8:55 Yeah I basically use it exclusively for PDF reading now, what's the point in installing something else if the builtin one does me just fine? As for creating them I can just export from libreoffice.
Instead of generating alt text it should recognise images with a missing or wrong alt text and not display them. Thus forcing everyone to comply with the html spec.
Everyone who was freaking out about "Firefox adding AI" owes Mozilla an apology and needs to learn to wait until details are available to make judgements
Making me wonder if people from danbooru are hired to tag thise to train LLMs as they go over the top with tagging