🎯 Key points for quick navigation: 00:16 *🧑💻 Midene JS is an open-source JavaScript library that can control web browsers, performing tasks in a human-like manner.* 00:45 *🔄 It automates tasks using natural language and can extract data in JSON format.* 01:40 *🔗 Comes with a Chrome extension for easy integration and supports various large language models (LLMs).* 03:00 *🖼️ The video is sponsored by Photogenius AI, an art generation tool with multiple features.* 03:42 *🛠️ Using Midene requires configuring a model like Gemini 2.0 with an API key in a straightforward interface.* 04:32 *🔍 The action feature allows Midene to perform tasks like clicking and querying to extract data.* 05:14 *✅ Assertion capabilities assist in UI testing, verifying elements like button colors and functionality.* 06:46 *🗂️ Midene can output data in structured JSON format, making it useful for web scraping.* 08:19 *📂 For more complex applications, YL configuration files and NPX can be used for automation tasks.* 09:14 *🎯 Midene JS is an effective tool for UI testing and repetitive tasks, comparable to Claude's computer use option.* Made with HARPA AI
Great episode, I like it! We should see more of these tools coming up this year, it's the foundation of autonomous agents. Thanks for your wonderful work, keep it up!
Automation test engineer maintaining and using complex cucumber-bdd-selenium framework for UI automation and testing and here we can achieve only through yaml file , I am seeing here biggest practical use....
Because there is none for the moment,like most of the videos this guy promote,like Deepseek artifacts,who is writing one page websites or other useless tools :))it takes more to write the agent what to do then to search yourself 😅
Perfect, you should have tried the use case on their website, automated testing of web apps. I have this issue when coding with AI's that testing is taking the majority of my time once the project reaches a certain complexity.
Very cool! Do you happen to know if you can control it in a live browser programmically without the plugin? I tried the sample YAML, puppeteer and playwright versions but they run behind the scenes. I wanted to see if it could possibly be used with the latest OpenAI realtime WebRTC to control the browser via voice. Other methods don't have the capabilities of this tool so would be awesome if it could be used together.
Thanks for the great tool, but when installing an extension I can't open the extension menu when I click on it, I tried different ways but it didn't work for me :(
"Cannot access a chrome-extension:// URL of different extension Error: Cannot access a chrome-extension:// URL of different extension" I get this error message, how do I solve it?
U have to be on the google home page to fix this , it doesn’t work in pages that are chrome based like chrome:extensions or chrome:about chrome:settings
Como que consigo pegar a API do Gemini 2.0 flash de graça pra colocar no cline dentro do vscode? Fui no Google Studio, mas nao consegui gerar a chave da API.
🎯 Key points for quick navigation:
00:16 *🧑💻 Midene JS is an open-source JavaScript library that can control web browsers, performing tasks in a human-like manner.*
00:45 *🔄 It automates tasks using natural language and can extract data in JSON format.*
01:40 *🔗 Comes with a Chrome extension for easy integration and supports various large language models (LLMs).*
03:00 *🖼️ The video is sponsored by Photogenius AI, an art generation tool with multiple features.*
03:42 *🛠️ Using Midene requires configuring a model like Gemini 2.0 with an API key in a straightforward interface.*
04:32 *🔍 The action feature allows Midene to perform tasks like clicking and querying to extract data.*
05:14 *✅ Assertion capabilities assist in UI testing, verifying elements like button colors and functionality.*
06:46 *🗂️ Midene can output data in structured JSON format, making it useful for web scraping.*
08:19 *📂 For more complex applications, YL configuration files and NPX can be used for automation tasks.*
09:14 *🎯 Midene JS is an effective tool for UI testing and repetitive tasks, comparable to Claude's computer use option.*
Made with HARPA AI
Resist getting hypnotized by watching in 1.5 speed :D
Great episode, I like it! We should see more of these tools coming up this year, it's the foundation of autonomous agents. Thanks for your wonderful work, keep it up!
Cool stuff though I've not really found a practical use for these browser agents yet.
I’ve tried to have another one play an online game & it seems it’s not really able to
Yes we can type the query in a browser plugin or directly in a search engine. Not too much benefit there.
Automation test engineer maintaining and using complex cucumber-bdd-selenium framework for UI automation and testing and here we can achieve only through yaml file , I am seeing here biggest practical use....
Bro you can run any digital business on autopilot
Because there is none for the moment,like most of the videos this guy promote,like Deepseek artifacts,who is writing one page websites or other useless tools :))it takes more to write the agent what to do then to search yourself 😅
Perfect, you should have tried the use case on their website, automated testing of web apps.
I have this issue when coding with AI's that testing is taking the majority of my time once the project reaches a certain complexity.
Great vid, Love you brother, peace
This is really great
This comment is really great.
@@KoprofileYour reply is really great.
@@JoePAcalaughs It's really great that you acknowledge really great replies to really great comments.
Wow thank you!
Very cool! Do you happen to know if you can control it in a live browser programmically without the plugin? I tried the sample YAML, puppeteer and playwright versions but they run behind the scenes. I wanted to see if it could possibly be used with the latest OpenAI realtime WebRTC to control the browser via voice. Other methods don't have the capabilities of this tool so would be awesome if it could be used together.
Thank you for the video. Can we use local LLM in its workflow?
It’s a good tools, but currently it didn’t support for targeting the elements inside the , which is needed for my current project 😢
Thanks for the great tool, but when installing an extension I can't open the extension menu when I click on it, I tried different ways but it didn't work for me :(
Very nice tool, but is there ollama support planned in the near future?
"Cannot access a chrome-extension:// URL of different extension
Error: Cannot access a chrome-extension:// URL of different extension"
I get this error message, how do I solve it?
same problem here
U have to be on the google home page to fix this , it doesn’t work in pages that are chrome based like chrome:extensions or chrome:about chrome:settings
@TopCuby this works! Thanks a lot!
It's mainly due to conflicts with other extensions injecting or into the page. Try disabling the suspicious plugins and refresh.
Como que consigo pegar a API do Gemini 2.0 flash de graça pra colocar no cline dentro do vscode?
Fui no Google Studio, mas nao consegui gerar a chave da API.
Create a new account and try to get the API key at once
P.S. I had the same problem, I solved it by creating a new account
@@Rom-lu7qx I will try. Tks
Does it have a real-time vision of the page?
Another Gem!!
gemini*
ooh, i like