I marked it as paid promotion since I work at the organization offering this fellowship program, that paid Alex while he developed his project. Alex is a viewer of the channel, applied after hearing about it through here, and put together a very cool piece of software. I look forward to seeing what can be produced into the future!
That sounds perfectly fair - although I wonder if the legal requirement to disclose sponsorship is related to potential conflicts of interest, or profitability, or some third thing.
I am HIGHLY interested in this application. Even if you are to charge a couple of bucks per download. I would download this app. I just hope that you will make a video once it's done showing what it can do. And if he can do live transcribing. That would be fucking amazing. I'm literally transcribing this right now because I hate typing on a phone.
@@enb3810 I'm pretty damn sure that any money from this is going to go right back into food to and or if the creator of that app. If this app does take off. Perfect, the creator of the app has a source of income and can become independent
7:02 Ackshually.. Google's mic typing on their keyboard does work offline. There's an option to download the language model for offline use. But I wouldn't be surprised if this also has the "voice memo" feature where it records and saves forever everything you say while it's transcribing, offline or online. One thing people forget is that even if a phone has no network connection, it has local (flash) storage so it can upload the data another time once it has a network connection.
@@houghwhite411 Google's speech-to-text has punctuation... if you dictate it. It also helpfully capitalizes random words. And makes the most pathetic grammatical mistakes, e.g. they're versus their.
Years ago I installing text to voice on Android and set the option for local data (or something). But it didn't work when the network was off. So I knew it wasn't safe and shut it off. So nice to know you have a working system here that understands what savvy users really want. I hope this software does great. I love it's goals.
You made me see if my android's voice to text ran locally, and that appears to be the case. It may send data to google anyway, but internet is not necessary for it to function it seems.
It's crazy how this took this long to happen. I was experimenting with the local windows speech recognition features 10+ years ago in C# programming for the fun of it. Always annoyed me that android didn't have a local speech to text feature when computers did a decade earlier. I guess, like everything, it came down to there wasn't enough money in it. No programmers ever seemed to use the windows speech API (I think that is what it is called.) in their programs.
The way this works, if there's a pre-existing option from a large tech company, that is the default. The problem with that being the default is that nobody else is going to develop something at cost to themselves when the default is free, even if the default is abusive when it comes to things like privacy. Let's change that
Well, the reality is that speech to text is a complicated problem. You're only seeing English support here, that works with a particular accent. Supporting other languages, and being able to understand different accents, tones, voices, dealing with noise and other sounds that can happen at the same time, all of that requires a heck of a lot more data samples, research and a huge amount of storage space that just won't be possible in general on a single device. Although, I guess with some caveats it's possible.
@@FlabbyTabby Speech recognition was first came standard in Windows Vista (2007). The install requirements were 15 GB for the OS. I believe it also came with a training system. Unlike Google system this is trained to understand your voice and your way of speaking, not a catch all. So the more you trained it the more it learn how to understand you. It's not like it takes up a lot of space, it's not storing the recordings. It's processing them and using the results. It definitely wasn't a perfect system but there's no reason they can't run on today's smartphone tech. Like any decent software, different languages can be separated into different packs that don't all need to be installed. Even Google translate allows you to download different languages for the app. (For offline use.) I'm not a professional programmer in any shape manner or form, but I played around enough to know what is possible. Watching 720 resolution video can be a lot harder specs wise than this.
@@Mysdia Makes sense, it vaguely reminds me of Windows' voice-activated macro system(I forget the name.), but VoiceAttack seems more purpose-built and user-friendly. I used that to call out location and number of incoming enemies in chat on a game I played capture the flag on. Was only marginally useful but served as a good experiment to understand it.
6:11 as cool as this project is, it should be noted, that Google's offical TTS has already been running locally (albeit with absolutely no guarantee that they aren't sending it to a server later anyways for training/data mining/etc.), and is doing puncuation too. I believe the offline TTS is/was exclusive to Pixel devices though, so not sure if that has rolled out to other devices now. But it's super cool to see something completely independent and safe to use!
Once upon a time in Electronics hobbies there was the SP0256 Speech synthesiser IC sold via Radio Shack. Somewhere in the early days of the PC Clone wars was both the text to speech and reverse by your PC sound card input, for free, open source and ran in a form of DOS. It is out there somewhere and can be converted ( like any old DOS program ) to any current OS or MPU. One that uses something based on it is ReSpeaker Core from Seed Studios so you can make an off grid Alexa ( with a ton of you programming ). There is little that is new for this tech as far as I am concerned and ' cloud storage ' is just someone else's drive, refer to the Mugs you can purchase from The Rossmann Group!
Reading through the comments I see people are really excited about this (and they should be). What confuses me however is this is not new. I was comfortably using Dragon Dictation back in the 90s on a Pentium I. Cell phones, even from 10 years ago, had MORE than enough power to do this. This whole idea that "the cloud" needed to get involved in the first place was FUD pushed by companies like Google. I refuse to anything that needs "the cloud" to process any of my data. I'm glad to see that work removing "the cloud" from our lives where it is not needed in under way.
@@nunyabusiness7602 Google was using the stored data to help improve the model.... To begin with, anyway. Then they started just storing it for whatever uses they came up with, which wasn't good.
In fairness, these old versions were kinda trash. They were a fun proof-of-concept but we're not reliable enough to actually use. It was only with neural network technology that a really usable system was created, but these neural networks required more processing then was available in your average smartphone 10 years ago. Newer smartphones are far more capable.
@@adamhixon The SP0256 speech chip was ran on 8 bit systems, you may recognise it better as either Covox Speach Thing or Intelevoice. They ran off of 8 bit systems that did not even go to 1Ghz clock speeds and the DOS text to speach and Speach to Text software ( including Dragon Naturally Speaking ) ran on as low as 286 PCs. AI and Neural networks were not around and did quite well. As to on a phone, it is only recently phones could compete with the power of number crunching of a 486 that makes it useful in real time on a phone.
Very cool!! I knew the april-asr library had potential, but this is WAY BEYOND anything I had considered. Kinda bonkers! Gonna build it when I get home and am hyped to check out the pretrained (THANK YOU!) model. This, THIS right here is why I LOVE the open source community, so A HUGE, MASSIVE THANK YOU TO THE DEV, AND TO THE FOLKS WHO WORKED ON THE MODEL!! AND A MASSIVE THANKS TO LOUIS FOR TELLING US ABOUT THIS PROJECT, AND THIS DEVELOPER PROGRAM!! 😀👍🐧🐧
*Don't Feed Trolls!* Thank You Louis & Alex for *sharing your ability to provide One Liberty.* An almost 'lost' *possessive power,* what One speaks, truly remains their own. Cheers!
Reading through the github page it uses a pre-trained model that is capable of running locally. This type of research, particularly AI research that uses these types of models, will revolutionize more than just chatbots. Pretrained AI models that can run on local devices may be the tide turning blow to constant surveillance as you no longer need or can justify a constant connection back to some corporate home base. Thanks for bringing this to my attention.
Essentially, instead of releasing a half-baked product to users and then collecting all the user data in the name of ongoing "real-time" improvements to the product functionality and accuracy, this developer did the hard work first, released a product ready to function on the device, already performing pretty damn well for a project in beta. THIS is how it was supposed to be.
I just wanted to point that Google speech services do have offline support for few languages including English (We can't still gurantee that Google isn't sending the inputs to it's servers when phone is connected to internet). Go to Settings > System > Languages and input > Voice input (set it to Speech services by Google if it's not set already, then click the settings gear icon) > Add a language (offline).
Awesome stuff, Louis. One of my long-term goals is to detach myself from the Google ecosystem. Right now, I'm working on a project that combines quantized AI models, something like Home Assistant, microphone arrays, and speakers to replace the home automation functions of Google Home/Alexa. Its way more expensive than buying these products, but I think in the long run it will be worth it, and with a little TLC, it will be an even better experience.
I remember back in my 486 days Creative Labs had some primitive voice recognition software that shipped with the Sound Blaster 16 ASP. You had to train it word-for-word, but you could control Windows 3.1 with it by saying things like "Edit. Cut." Wasn't terribly useful, but it worked. Years later I tried some dictation software for Windows 98 or XP. It worked, too, and would do a half-decent job of turning what you said into text in a document, although it still needed a fair bit of editing. While more useful, I still found it easier to just type what I wanted. Neither piece of software required a network connection. And there's the problem with voice-recognition systems: they're a solution to a problem that's already been solved in a better way. For just controlling programs, just clicking your way through menus is better, and for text-entry, you can't beat typing (with a touch-screen swipe-keyboard as an acceptable substitute). Talking is tiring and inaccurate, yet there's this obsession with making talking to computers a thing. I don't want to talk to a dumb machine if there's some other way to do it. It's particularly annoying when some call-centre system that used to just ask you to press a button now asks you to speak to it. If it's not a human, I shouldn't have to _talk_ to it! Give me the damn menu back!
Voice recognition is good for times you can't use your hands. While they're dirty, while you're driving, when you have a disability, etc... So it's not a waste of time. But in general I agree.
@@thesenamesaretaken Depends on what you're doing. If you're going back-and-forth over multiple messages, then yes, in which case you might as well just call the other person. For a one-off carefully-composed one, I'd still take the on-screen keyboard over talking. You're probably going to be going back and editing it, anyway, so you might as well do it all with the keyboard.
This reminds me how the Swipe feature when it was introduced was much more accurate (even with mistakes) than today's version that even when hitting every letter accurately it brings unrelated results and sometimes predictive text somehow can't decipher it either 😢
I’ve been watching for a while, and gotta say … Louis, you look much better today than you’ve looked in the recent past … nice to see you looking so well :-)
I love it! If I may make a suggestion / request, I would love an option that can ingest a pre-recorded sound file and make a transcript, possibly even with time codes that can be turned on/off. This would be something that would help myself and many others immensely in hard of hearing difficulties.
That's amazing and very well done! I've also been thinking over the last few years that smartphones have gotten so advanced that a lot of tools which required processing by external services as Google could very well be done locally. This is an excellent example and I'm thoroughly impressed by the result so far!
this is awesome to see. I still use a blackberry because I can't stand to type on a touchscreen, this kind of application would be incredibly useful for me when I inevitably have to ditch a physical keyboard
Blackberry Bold 9780 was probably the best phone I ever owned. I could type blindly with perfect punctuation (and form perfect grammatically correct sentences when face down in the gutter drunk ;)) (I dare anyone with a touchscreen phone to write an important email blindly, and hit send without looking) I miss the tactile feedback of the buttons, and I miss being able to know where I am on the keyboard by feel. Oh, and having SO MANY special characters at your disposal by just clicking a readily available key. Instead of switching to the numbers on my iPhone keyboard, THEN special characters, and still having less at your disposal. Oh, and having the number keys... Just always. I got so much done with that phone... Switching to iPhone / Android phones was a horrible experience, and my productivity sank like you wouldn't believe. (It was like someone shot me in both feet, and then trying to run a marathon).
Can confirm that LiveCaptions actually runs on the Pinephone Pro. So the CPU requirement isn't too bad, honestly. I would still like to see GPU acceleration. But the project is definitely awesome! Good choice for funding, I think.
I have 3 questions. 1: Where can I find the akp to install this on my phone? I only see the source material but no installation stuff. (I am not a github person knowing and downloading something from the "release" section is sadly the best I can do.) 2: Does this support other languages? Like german for example. 3: How can I tell the keyboard app I'm currently using to not use the google service but this third party app? I use the old minuum keyboard and I can not imagine to go back to a casual standard 3 row layout with separate keys. I am just too used to have all keys in one row and slopy typing now. Maybe a swift type keyboard is arguable but only if the keyboard supports dual swift input (using both thumbs for swift at the same time).
When the Big Tech is asked: why not do stuff locally? The response always consisted of "not enough resourses", that is "not enough to serivce big data.
I just built this on Linux. Recognition of Louis' voice is stellar. I then tried it with a video from Big Clive. Pure comedy gold! No i need to pipe the output of this software into an voice synth that outputs along with the video 🤪
something that is overlooked, I'm not on a contract for my phone because I hardly use it, So therefore if I want to use data it's overpriced like hell, so I always prefer applications that just run on my phone so I don't have to pay for expensive data
We mucked around with Dragon Naturally Speaking way back with Windows 9x. I think it was when I had a crappy Cyrix 266 (which was closer to a Pentium 200). Many microcontrollers have more power than that, and basically all phones do. The limitation is usually the RAM more than the CPU, I think.
Google included Offline voice-to-text transcription back in the day, at some point between gingerbread to jelly bean. it was not as good as the online transcription, but faster and good enough. Then they decided to take it away. Nowadays you can "prefer" offline, but they keep control and they can decide to send your voice to their servers at anytime. I'm looking forward to this project maturing. And I'd like to know what is it based on. Whisper? Julius? Please, state which STT engine are you using. It is not only polite and respectful to the developers your project depends on, but It would avoid this sense of false advertisment I've got since I've seen your video. Edit: Followed the links. project is called april-asr. it is an independent engine that used Whisper to create the models. Nowadays only english, polish and french models are available.
I have been using anysoft keyboard . Its really great keyboard gives you control of everything .from font, vibrations, sound,haptics, theme, layouts,even if u want mic on your keyboard or not. I highly recommend it. Also it doesn't stores any data like other apps.
Hell yeah! this is amazing! thanks for all the good advice and just being such a genuine human. recently got a pixel 6 just based on your security recommendation
You can turn off the saving of voice recordings and still use transcription with Google, and even offline. Just worth being accurate with your information.
I'm using simple Keyboard on Android, because of privacy concerns. It's great to see local and *private* transcription comming. I'll wait a little until it's a bit more polished. I'm looking forward to installing the final App.
Microsoft's Kinect for both the Xbox 360 and Xbox One had built in speech recognition, which meant it did not need to be connected to the internet to understand voice commands. However, they did later change the way it worked about 2017 or so so that it did require an internet connection, in order to allow the same functionality without a Kinect, but also affecting the Kinect...
I'm not sure 6:40 but this exact test of voice to text in airplane mode works on my native pixel keyboard. It doesn't require internet. Maybe they updated it?
It would be really cool if you could swap out the ML model doing the audio transcription with different ones from huggingface, or one you trained yourself.
been using it for 2 months. really freaking good. English is really really good, Italian not that much, but i still haven't spent time tweaking or installing updates
the application of this is pretty cool - if personal chatGPT's/LLMs will be the norm then an app that converts spoken minutes or meeting videos transcriptions into text for an LLM to process into actions and duties - data that doesn't go to some other corporation to mine and train their AI. Instead it trains the Ai of regular people.
Man this needs to happen for all platforms. Very nice, it would be good to see it translating as you go but it seemed to translate well. I would pay money for this.
This will NEVER happen with Apple, and Google would fight it. That's why (as Louis said) only 1% of people use their phone fully. Also, paying may be the correct and right thing to do, but once money is involved, things tend to go to shit.
Wow thats very impressive - recognizes punctuation and speech patterns to have minimal error - locally? That's insane, I've not used the voice recognition feature, but certainly this would make me consider
HTC phones of the early 2010s give me such nostalgia. Too bad I didnt get to use ones other than my wildfire since I was still a kid, so no new phone until this one breaks.
I would definitely ask to have it show progress on the screen once you tap the icon after talking into the mic. It looked like it was hanging until it finished.
I have been trying to find a solution to this for YEARS. Closest I've gotten is disabling Gboards access to the internet but for some reason now my autocorrect suggestions are getting worse every day. Can't wait for this solution!!
That second translation is very accurate. (Even with the long sentence In the beginning of it; "and...and...and...") Usability-wise: maybe it would be good to get a bit more feedback / first results as you talk. The translated sentences popping up as you go (trailing behind, of course) Right now it looks like it's only processing when you hit 'stop'. The notes app will auto save, so as soon as it's printed, your work is safe. (I'm thinking about the memory management on one hand, And from user experience perspective; what if the dictation app crashes? I think printing to screen asap, to (perhaps) clear out some memory, and save your work as you go... Logically (but I'm not an Android developer) that would be an easy route to take. If it crashes, you can pick up where you left off (maybe losing the last 1 or 2 sentences, but that's manageable). It would also feel a lot faster (even though it might not be AS fast as now). Plus it would assure the user that 'it is indeed working'. (If it crashes before anything has printed to screen... You would need to have auto save functionality for the audio file, which seems like a more difficult job) - I realize that 'printing to screen as you go, and free up some memory' is a more difficult thing to do than how it looks in this sentence. But it seems to me that would solve some future problems, and maybe would allow making a public beta version quicker. - All and all - cool. It already works way better than I'd have expected!
Sayboard for Android not only acts as a separate keyboard, but also prints speech to the screen in real-time. It would be great if the Live Captions Android app, which already includes capitalization and punctuation features unlike Sayboard, could somehow adopt said features.
I used to use voice for texting while driving. I think its for the best that I've stopped using as much Google as possible, and no longer text while driving.
TBH i've never used any sort TTS not because i was concerned what the cloud will know of me but of people AROUND ME knowing it , not that i have anything to hide (most of the time) but thought of anyone around me knowing what i'm doing in my phone (directly , rather than posting it on facebook) troubles me, not to mention that is sort of intrusive to keep hearing people do that, and the thought of any backround noises, or speech clarity will impede kind finished the job for me.
Actually Google already does this for voice typing, just not sure baout assistant (they may do that anyway). Not open source but they already do it on the keyboard and live transcription apps
the only time i have used the voice to text feature is when i push the button by mistake or if i drop the phone and try to grab it and have the keyboard up..
Great work Mr. Grossman. God bless you. Quick Question. Why are not lobbyist exposed as being planet unfriendly? I know You did good work with repair your own device.
This is exciting. I refuse to get a gugl cell precisely bc it eavesdrops and I still refuse to download apps that require I give them wall to wall permission to do anything it wants in order to work. No. Another reason why I sub to your channel. Thanks Rossmann.
That is really impressive - even more so than the *nix version shown earlier. What I am waiting for is the version that'll run in Win 7! Pretty please, Alex? :)
Take My Money. I have been waiting for this for too long.
Wait until we produce something amazing that is worthy of your money, that is a finished product. Then, and only then, will we accept your money.
@@rossmanngroup Count me in too please.
@@David-ty6my I think they will go with the donation modal, so what is a fair price for you and your income? :)
@@rossmanngroup
That is a leagues better stance than what a lot of big companies do nowadays.
Im in. Ive been going without for. 2 years. I WANT IT!
I marked it as paid promotion since I work at the organization offering this fellowship program, that paid Alex while he developed his project. Alex is a viewer of the channel, applied after hearing about it through here, and put together a very cool piece of software. I look forward to seeing what can be produced into the future!
That sounds perfectly fair - although I wonder if the legal requirement to disclose sponsorship is related to potential conflicts of interest, or profitability, or some third thing.
I am HIGHLY interested in this application. Even if you are to charge a couple of bucks per download. I would download this app. I just hope that you will make a video once it's done showing what it can do. And if he can do live transcribing. That would be fucking amazing. I'm literally transcribing this right now because I hate typing on a phone.
Did you use voice software to write this? 😮
@@enb3810 I'm pretty damn sure that any money from this is going to go right back into food to and or if the creator of that app. If this app does take off. Perfect, the creator of the app has a source of income and can become independent
@@TheMrRatzz Im curious about that myself
7:02 Ackshually.. Google's mic typing on their keyboard does work offline. There's an option to download the language model for offline use. But I wouldn't be surprised if this also has the "voice memo" feature where it records and saves forever everything you say while it's transcribing, offline or online. One thing people forget is that even if a phone has no network connection, it has local (flash) storage so it can upload the data another time once it has a network connection.
I used it, it's sooo so bad. No punctuation
@@houghwhite411 Google's speech-to-text has punctuation... if you dictate it. It also helpfully capitalizes random words. And makes the most pathetic grammatical mistakes, e.g. they're versus their.
Does it have smt stops to prevent you from disabling that app's internet access in the settings?
It's not a LLM, it's more indexing and Google made it bad by design.
Years ago I installing text to voice on Android and set the option for local data (or something). But it didn't work when the network was off. So I knew it wasn't safe and shut it off.
So nice to know you have a working system here that understands what savvy users really want. I hope this software does great. I love it's goals.
That option was for text to speech, not voice to text.
You made me see if my android's voice to text ran locally, and that appears to be the case. It may send data to google anyway, but internet is not necessary for it to function it seems.
@@tom12323456 It may hold it in cache and then transferred at a later time. I wouldn’t trust it to not be sent over to their datacenters.
It's crazy how this took this long to happen. I was experimenting with the local windows speech recognition features 10+ years ago in C# programming for the fun of it. Always annoyed me that android didn't have a local speech to text feature when computers did a decade earlier. I guess, like everything, it came down to there wasn't enough money in it. No programmers ever seemed to use the windows speech API (I think that is what it is called.) in their programs.
The way this works, if there's a pre-existing option from a large tech company, that is the default. The problem with that being the default is that nobody else is going to develop something at cost to themselves when the default is free, even if the default is abusive when it comes to things like privacy.
Let's change that
Well, the reality is that speech to text is a complicated problem. You're only seeing English support here, that works with a particular accent.
Supporting other languages, and being able to understand different accents, tones, voices, dealing with noise and other sounds that can happen at the same time, all of that requires a heck of a lot more data samples, research and a huge amount of storage space that just won't be possible in general on a single device.
Although, I guess with some caveats it's possible.
@@FlabbyTabby Speech recognition was first came standard in Windows Vista (2007). The install requirements were 15 GB for the OS. I believe it also came with a training system. Unlike Google system this is trained to understand your voice and your way of speaking, not a catch all. So the more you trained it the more it learn how to understand you. It's not like it takes up a lot of space, it's not storing the recordings. It's processing them and using the results. It definitely wasn't a perfect system but there's no reason they can't run on today's smartphone tech.
Like any decent software, different languages can be separated into different packs that don't all need to be installed. Even Google translate allows you to download different languages for the app. (For offline use.)
I'm not a professional programmer in any shape manner or form, but I played around enough to know what is possible. Watching 720 resolution video can be a lot harder specs wise than this.
I noticed VoiceAttack used the Microsoft Windows speech engine for parsing customized voice commands and dictation -- it's not entirely unused.
@@Mysdia Makes sense, it vaguely reminds me of Windows' voice-activated macro system(I forget the name.), but VoiceAttack seems more purpose-built and user-friendly. I used that to call out location and number of incoming enemies in chat on a game I played capture the flag on. Was only marginally useful but served as a good experiment to understand it.
6:11 as cool as this project is, it should be noted, that Google's offical TTS has already been running locally (albeit with absolutely no guarantee that they aren't sending it to a server later anyways for training/data mining/etc.), and is doing puncuation too. I believe the offline TTS is/was exclusive to Pixel devices though, so not sure if that has rolled out to other devices now. But it's super cool to see something completely independent and safe to use!
FUTO is so inspiring. I'm still in education, but it's exactly the kind of open system of tech funding I've dreamed of.
As someone who uses voice to text often for accessibility I find this very exciting :D
Once upon a time in Electronics hobbies there was the SP0256 Speech synthesiser IC sold via Radio Shack. Somewhere in the early days of the PC Clone wars was both the text to speech and reverse by your PC sound card input, for free, open source and ran in a form of DOS. It is out there somewhere and can be converted ( like any old DOS program ) to any current OS or MPU. One that uses something based on it is ReSpeaker Core from Seed Studios so you can make an off grid Alexa ( with a ton of you programming ). There is little that is new for this tech as far as I am concerned and ' cloud storage ' is just someone else's drive, refer to the Mugs you can purchase from The Rossmann Group!
Clouds are for Rain!
Reading through the comments I see people are really excited about this (and they should be). What confuses me however is this is not new. I was comfortably using Dragon Dictation back in the 90s on a Pentium I. Cell phones, even from 10 years ago, had MORE than enough power to do this. This whole idea that "the cloud" needed to get involved in the first place was FUD pushed by companies like Google. I refuse to anything that needs "the cloud" to process any of my data. I'm glad to see that work removing "the cloud" from our lives where it is not needed in under way.
@@nunyabusiness7602
Google was using the stored data to help improve the model.... To begin with, anyway. Then they started just storing it for whatever uses they came up with, which wasn't good.
In fairness, these old versions were kinda trash. They were a fun proof-of-concept but we're not reliable enough to actually use. It was only with neural network technology that a really usable system was created, but these neural networks required more processing then was available in your average smartphone 10 years ago. Newer smartphones are far more capable.
@@adamhixon The SP0256 speech chip was ran on 8 bit systems, you may recognise it better as either Covox Speach Thing or Intelevoice. They ran off of 8 bit systems that did not even go to 1Ghz clock speeds and the DOS text to speach and Speach to Text software ( including Dragon Naturally Speaking ) ran on as low as 286 PCs. AI and Neural networks were not around and did quite well. As to on a phone, it is only recently phones could compete with the power of number crunching of a 486 that makes it useful in real time on a phone.
Very cool!! I knew the april-asr library had potential, but this is WAY BEYOND anything I had considered. Kinda bonkers! Gonna build it when I get home and am hyped to check out the pretrained (THANK YOU!) model. This, THIS right here is why I LOVE the open source community, so A HUGE, MASSIVE THANK YOU TO THE DEV, AND TO THE FOLKS WHO WORKED ON THE MODEL!! AND A MASSIVE THANKS TO LOUIS FOR TELLING US ABOUT THIS PROJECT, AND THIS DEVELOPER PROGRAM!!
😀👍🐧🐧
*Don't Feed Trolls!* Thank You Louis & Alex for *sharing your ability to provide One Liberty.*
An almost 'lost' *possessive power,* what One speaks, truly remains their own. Cheers!
Reading through the github page it uses a pre-trained model that is capable of running locally. This type of research, particularly AI research that uses these types of models, will revolutionize more than just chatbots. Pretrained AI models that can run on local devices may be the tide turning blow to constant surveillance as you no longer need or can justify a constant connection back to some corporate home base.
Thanks for bringing this to my attention.
Essentially, instead of releasing a half-baked product to users and then collecting all the user data in the name of ongoing "real-time" improvements to the product functionality and accuracy, this developer did the hard work first, released a product ready to function on the device, already performing pretty damn well for a project in beta. THIS is how it was supposed to be.
you should promote your FUTO channel more often, sir. I really appreciate your work for the community and for the independence of the end-users
I just wanted to point that Google speech services do have offline support for few languages including English (We can't still gurantee that Google isn't sending the inputs to it's servers when phone is connected to internet). Go to Settings > System > Languages and input > Voice input (set it to Speech services by Google if it's not set already, then click the settings gear icon) > Add a language (offline).
Android voice to text works in airplane mode and has for quite a while. That's not to say it doesn't still send recordings when it can, of course.
Awesome stuff, Louis. One of my long-term goals is to detach myself from the Google ecosystem. Right now, I'm working on a project that combines quantized AI models, something like Home Assistant, microphone arrays, and speakers to replace the home automation functions of Google Home/Alexa. Its way more expensive than buying these products, but I think in the long run it will be worth it, and with a little TLC, it will be an even better experience.
I remember back in my 486 days Creative Labs had some primitive voice recognition software that shipped with the Sound Blaster 16 ASP. You had to train it word-for-word, but you could control Windows 3.1 with it by saying things like "Edit. Cut." Wasn't terribly useful, but it worked. Years later I tried some dictation software for Windows 98 or XP. It worked, too, and would do a half-decent job of turning what you said into text in a document, although it still needed a fair bit of editing. While more useful, I still found it easier to just type what I wanted. Neither piece of software required a network connection.
And there's the problem with voice-recognition systems: they're a solution to a problem that's already been solved in a better way. For just controlling programs, just clicking your way through menus is better, and for text-entry, you can't beat typing (with a touch-screen swipe-keyboard as an acceptable substitute). Talking is tiring and inaccurate, yet there's this obsession with making talking to computers a thing. I don't want to talk to a dumb machine if there's some other way to do it. It's particularly annoying when some call-centre system that used to just ask you to press a button now asks you to speak to it. If it's not a human, I shouldn't have to _talk_ to it! Give me the damn menu back!
Voice recognition is good for times you can't use your hands. While they're dirty, while you're driving, when you have a disability, etc... So it's not a waste of time.
But in general I agree.
Windows Vista had a voice recognition kit. Was fun to toy with.
Talking is a better experience than using a smartphone onscreen keyboard imho
@@thesenamesaretaken Depends on what you're doing.
If you're going back-and-forth over multiple messages, then yes, in which case you might as well just call the other person.
For a one-off carefully-composed one, I'd still take the on-screen keyboard over talking. You're probably going to be going back and editing it, anyway, so you might as well do it all with the keyboard.
The work you're doing is sooo important! Thank you very much!
the fact this doesn't already exist is wild. great work!! ty!
That is brilliant. Hope we can get a lot of other stuff that takes us OUT of the CLOUD!
This reminds me how the Swipe feature when it was introduced was much more accurate (even with mistakes) than today's version that even when hitting every letter accurately it brings unrelated results and sometimes predictive text somehow can't decipher it either 😢
This is one of the features that have kept google on my phone. Thank you for showing this.
I’ve been watching for a while, and gotta say …
Louis, you look much better today than you’ve looked in the recent past … nice to see you looking so well :-)
This feels like 2006. In a great sense. Innovation and privacy. Linux moved at light speed. New stuff came along. Before centralisation.
Loving it.
I love it! If I may make a suggestion / request, I would love an option that can ingest a pre-recorded sound file and make a transcript, possibly even with time codes that can be turned on/off. This would be something that would help myself and many others immensely in hard of hearing difficulties.
That's amazing and very well done! I've also been thinking over the last few years that smartphones have gotten so advanced that a lot of tools which required processing by external services as Google could very well be done locally. This is an excellent example and I'm thoroughly impressed by the result so far!
this is awesome to see. I still use a blackberry because I can't stand to type on a touchscreen, this kind of application would be incredibly useful for me when I inevitably have to ditch a physical keyboard
I use an Anker bluetooth keyboard with my phone.
Blackberry Bold 9780 was probably the best phone I ever owned.
I could type blindly with perfect punctuation (and form perfect grammatically correct sentences when face down in the gutter drunk ;))
(I dare anyone with a touchscreen phone to write an important email blindly, and hit send without looking)
I miss the tactile feedback of the buttons, and I miss being able to know where I am on the keyboard by feel.
Oh, and having SO MANY special characters at your disposal by just clicking a readily available key. Instead of switching to the numbers on my iPhone keyboard, THEN special characters, and still having less at your disposal.
Oh, and having the number keys... Just always.
I got so much done with that phone... Switching to iPhone / Android phones was a horrible experience, and my productivity sank like you wouldn't believe.
(It was like someone shot me in both feet, and then trying to run a marathon).
Can confirm that LiveCaptions actually runs on the Pinephone Pro. So the CPU requirement isn't too bad, honestly. I would still like to see GPU acceleration. But the project is definitely awesome! Good choice for funding, I think.
I have 3 questions.
1: Where can I find the akp to install this on my phone? I only see the source material but no installation stuff. (I am not a github person knowing and downloading something from the "release" section is sadly the best I can do.)
2: Does this support other languages? Like german for example.
3: How can I tell the keyboard app I'm currently using to not use the google service but this third party app? I use the old minuum keyboard and I can not imagine to go back to a casual standard 3 row layout with separate keys. I am just too used to have all keys in one row and slopy typing now. Maybe a swift type keyboard is arguable but only if the keyboard supports dual swift input (using both thumbs for swift at the same time).
When the Big Tech is asked: why not do stuff locally? The response always consisted of "not enough resourses", that is "not enough to serivce big data.
I'm a big fan of tech demos in this style. No flashy stage presentation, just a short video from Louis's Chair
I just built this on Linux. Recognition of Louis' voice is stellar.
I then tried it with a video from Big Clive. Pure comedy gold!
No i need to pipe the output of this software into an voice synth that outputs along with the video 🤪
something that is overlooked, I'm not on a contract for my phone because I hardly use it, So therefore if I want to use data it's overpriced like hell, so I always prefer applications that just run on my phone so I don't have to pay for expensive data
He should connect with the people behind Home Assistant, they are also working on this right now, they have dedicated all of 2023 to Voice.
I remember windows XP came with voice recognition, it was still kinda spotty but somehow it was possible without the cloud back then.
We mucked around with Dragon Naturally Speaking way back with Windows 9x. I think it was when I had a crappy Cyrix 266 (which was closer to a Pentium 200). Many microcontrollers have more power than that, and basically all phones do.
The limitation is usually the RAM more than the CPU, I think.
Thank you! This, with some polish, is something that should come standard on every phone. Happy to pay if it's ever released.
Google included Offline voice-to-text transcription back in the day, at some point between gingerbread to jelly bean. it was not as good as the online transcription, but faster and good enough. Then they decided to take it away. Nowadays you can "prefer" offline, but they keep control and they can decide to send your voice to their servers at anytime.
I'm looking forward to this project maturing. And I'd like to know what is it based on. Whisper? Julius?
Please, state which STT engine are you using. It is not only polite and respectful to the developers your project depends on, but It would avoid this sense of false advertisment I've got since I've seen your video. Edit: Followed the links. project is called april-asr. it is an independent engine that used Whisper to create the models. Nowadays only english, polish and french models are available.
It could be extended to allow the language processing run on your own server or PC to take the load off of the phone.
I have been using anysoft keyboard . Its really great keyboard gives you control of everything .from font, vibrations, sound,haptics, theme, layouts,even if u want mic on your keyboard or not. I highly recommend it. Also it doesn't stores any data like other apps.
Looking forward to the Android version, I'm hoping it will be available on F-Droid.
Hell yeah! this is amazing! thanks for all the good advice and just being such a genuine human. recently got a pixel 6 just based on your security recommendation
THIS IS FREAKING RAD!!!!
THANK YOU ALEX AND LOUIS🙏🫀🙏🫀🙏
You can turn off the saving of voice recordings and still use transcription with Google, and even offline. Just worth being accurate with your information.
I'm using simple Keyboard on Android, because of privacy concerns. It's great to see local and *private* transcription comming. I'll wait a little until it's a bit more polished. I'm looking forward to installing the final App.
I LOVE the new smug thumbnails
Brilliant Louis & FUTO co. 👍
Microsoft's Kinect for both the Xbox 360 and Xbox One had built in speech recognition, which meant it did not need to be connected to the internet to understand voice commands. However, they did later change the way it worked about 2017 or so so that it did require an internet connection, in order to allow the same functionality without a Kinect, but also affecting the Kinect...
I'm not sure 6:40 but this exact test of voice to text in airplane mode works on my native pixel keyboard. It doesn't require internet. Maybe they updated it?
depends on the phone and OS version
You're doing incredible things, Louis!
Amazing accuracy! This is well presented and very interesting.
THat is SO awesome!! Thank you FUTO!!! that is the kind of thing that could change the future for the better.
This is reinvigorating my desire to develop. FUTO!
Love it!!! Cannot wait to get this on my phone and give another piece of Google software the boot!
This is awesome man. Well done to the developer.
It would be really cool if you could swap out the ML model doing the audio transcription with different ones from huggingface, or one you trained yourself.
Wow. Really cool project.
been using it for 2 months.
really freaking good.
English is really really good, Italian not that much, but i still haven't spent time tweaking or installing updates
Fantastic work Alex! This wuld be super useful for me as I dont trust googles app one milimeter. All we need now is a release date!
To be fair, on my pixel 4a the recorder does transscripts also offline and you can deactivate the sync. Works pretty well actually.
Are there any products available that seek to overload the data collectors with scrambled and inaccurate data?
the application of this is pretty cool - if personal chatGPT's/LLMs will be the norm then an app that converts spoken minutes or meeting videos transcriptions into text for an LLM to process into actions and duties - data that doesn't go to some other corporation to mine and train their AI. Instead it trains the Ai of regular people.
I want this so much! I spend 2.5 hours a day on the road and would love to be able to write and drive.
1:35 I thought about this possibility before, but to hear it actually happened. These companies are insane.
I was yelled at by a Karen just Yesterday that told me to wear brighter close when walking across a crosswalk at dusk... Good advice!
Man this needs to happen for all platforms. Very nice, it would be good to see it translating as you go but it seemed to translate well. I would pay money for this.
This will NEVER happen with Apple, and Google would fight it. That's why (as Louis said) only 1% of people use their phone fully.
Also, paying may be the correct and right thing to do, but once money is involved, things tend to go to shit.
@@TheMrRatzz You can donate. Big difference between 'paying for software' and saying ''thank$ dude.''
brilliant. This might be the first time I will take advantage of that technology.
Wow thats very impressive - recognizes punctuation and speech patterns to have minimal error - locally? That's insane, I've not used the voice recognition feature, but certainly this would make me consider
HTC phones of the early 2010s give me such nostalgia. Too bad I didnt get to use ones other than my wildfire since I was still a kid, so no new phone until this one breaks.
I would definitely ask to have it show progress on the screen once you tap the icon after talking into the mic.
It looked like it was hanging until it finished.
I have been trying to find a solution to this for YEARS. Closest I've gotten is disabling Gboards access to the internet but for some reason now my autocorrect suggestions are getting worse every day. Can't wait for this solution!!
@Louis Rossmann The lip sync is off on your video. You may want to check your settings, setup and system resources and see what is going on.
very very very cool program you have there with futo.
Thanks for the video! I look forward to getting this to use on mobile.
This is a very cool project! Hopefully I’ll become a part of it one day once I’ve done more programming work. FOSS ftw!
That IS really cool! If I ever have need of voice to text, I know what I'm downloading.
Is there a built android version that can be installed via apk or f-droid?
Or does it need to be built from source?
That second translation is very accurate. (Even with the long sentence In the beginning of it; "and...and...and...")
Usability-wise: maybe it would be good to get a bit more feedback / first results as you talk.
The translated sentences popping up as you go (trailing behind, of course)
Right now it looks like it's only processing when you hit 'stop'.
The notes app will auto save, so as soon as it's printed, your work is safe.
(I'm thinking about the memory management on one hand,
And from user experience perspective; what if the dictation app crashes?
I think printing to screen asap, to (perhaps) clear out some memory, and save your work as you go...
Logically (but I'm not an Android developer) that would be an easy route to take.
If it crashes, you can pick up where you left off (maybe losing the last 1 or 2 sentences, but that's manageable).
It would also feel a lot faster (even though it might not be AS fast as now).
Plus it would assure the user that 'it is indeed working'.
(If it crashes before anything has printed to screen... You would need to have auto save functionality for the audio file, which seems like a more difficult job)
-
I realize that 'printing to screen as you go, and free up some memory' is a more difficult thing to do than how it looks in this sentence.
But it seems to me that would solve some future problems, and maybe would allow making a public beta version quicker.
-
All and all - cool.
It already works way better than I'd have expected!
Sayboard for Android not only acts as a separate keyboard, but also prints speech to the screen in real-time. It would be great if the Live Captions Android app, which already includes capitalization and punctuation features unlike Sayboard, could somehow adopt said features.
Can't wait for the pixel fold GrapheneOS release!
I can't believe this is real. Thank you so much for everything you do.
Great! And, thank you. Hopefully it'll speed up as it develops.
So, Dragon Point And Speak, but for Android! Awesome, I loved that application!
Live stream transcriptions for peertube servers?
I used to use voice for texting while driving. I think its for the best that I've stopped using as much Google as possible, and no longer text while driving.
TBH i've never used any sort TTS not because i was concerned what the cloud will know of me but of people AROUND ME knowing it , not that i have anything to hide (most of the time) but thought of anyone around me knowing what i'm doing in my phone (directly , rather than posting it on facebook) troubles me, not to mention that is sort of intrusive to keep hearing people do that, and the thought of any backround noises, or speech clarity will impede kind finished the job for me.
niiiiiice ! … irl applicable ideas …
best of luck to everyone involved
Cannot wait for the app to come out
Perfect. When can we expect this app or Android plugin on F-Droid?
later
Love this, thank you so much. Is there anything like this going on for Apple products?
Actually Google already does this for voice typing, just not sure baout assistant (they may do that anyway). Not open source but they already do it on the keyboard and live transcription apps
You're as a real world Tony Stark. Respect.
Awesome. Another step in the right direction.
the only time i have used the voice to text feature is when i push the button by mistake or if i drop the phone and try to grab it and have the keyboard up..
So.....where do you download it for android?
Great work Mr. Grossman. God bless you. Quick Question. Why are not lobbyist exposed as being planet unfriendly? I know You did good work with repair your own device.
This is exciting. I refuse to get a gugl cell precisely bc it eavesdrops and I still refuse to download apps that require I give them wall to wall permission to do anything it wants in order to work. No. Another reason why I sub to your channel. Thanks Rossmann.
Where can we download it?
That is really impressive - even more so than the *nix version shown earlier. What I am waiting for is the version that'll run in Win 7! Pretty please, Alex? :)
Even if it was still on server, I'd be happy with anything that isn't Google and it's constant listening.
I have a question about the Futo fellowship program, is it open to those outside the USA?