Midscene : This FULLY FREE AI Agent can CONTROL BROWSERS & DO ANYTHING!

Поделиться
HTML-код
  • Опубликовано: 6 янв 2025

Комментарии • 38

  • @ytpah9823
    @ytpah9823 4 дня назад +15

    🎯 Key points for quick navigation:
    00:16 *🧑‍💻 Midene JS is an open-source JavaScript library that can control web browsers, performing tasks in a human-like manner.*
    00:45 *🔄 It automates tasks using natural language and can extract data in JSON format.*
    01:40 *🔗 Comes with a Chrome extension for easy integration and supports various large language models (LLMs).*
    03:00 *🖼️ The video is sponsored by Photogenius AI, an art generation tool with multiple features.*
    03:42 *🛠️ Using Midene requires configuring a model like Gemini 2.0 with an API key in a straightforward interface.*
    04:32 *🔍 The action feature allows Midene to perform tasks like clicking and querying to extract data.*
    05:14 *✅ Assertion capabilities assist in UI testing, verifying elements like button colors and functionality.*
    06:46 *🗂️ Midene can output data in structured JSON format, making it useful for web scraping.*
    08:19 *📂 For more complex applications, YL configuration files and NPX can be used for automation tasks.*
    09:14 *🎯 Midene JS is an effective tool for UI testing and repetitive tasks, comparable to Claude's computer use option.*
    Made with HARPA AI

  • @zaxadim
    @zaxadim 4 дня назад +8

    Resist getting hypnotized by watching in 1.5 speed :D

  • @ewm5487
    @ewm5487 4 дня назад +4

    Great episode, I like it! We should see more of these tools coming up this year, it's the foundation of autonomous agents. Thanks for your wonderful work, keep it up!

  • @matthewblott
    @matthewblott 4 дня назад +20

    Cool stuff though I've not really found a practical use for these browser agents yet.

    • @___Truth___
      @___Truth___ 4 дня назад +1

      I’ve tried to have another one play an online game & it seems it’s not really able to

    • @PeterJung-cx1ib
      @PeterJung-cx1ib 4 дня назад

      Yes we can type the query in a browser plugin or directly in a search engine. Not too much benefit there.

    • @amarjeet2162
      @amarjeet2162 4 дня назад +5

      Automation test engineer maintaining and using complex cucumber-bdd-selenium framework for UI automation and testing and here we can achieve only through yaml file , I am seeing here biggest practical use....

    • @juancortes8617
      @juancortes8617 4 дня назад +2

      Bro you can run any digital business on autopilot

    • @ionut2671
      @ionut2671 День назад

      Because there is none for the moment,like most of the videos this guy promote,like Deepseek artifacts,who is writing one page websites or other useless tools :))it takes more to write the agent what to do then to search yourself 😅

  • @TheReferrer72
    @TheReferrer72 4 дня назад +3

    Perfect, you should have tried the use case on their website, automated testing of web apps.
    I have this issue when coding with AI's that testing is taking the majority of my time once the project reaches a certain complexity.

  • @SiliconSouthShow
    @SiliconSouthShow 2 дня назад

    Great vid, Love you brother, peace

  • @shugan9245
    @shugan9245 4 дня назад +3

    This is really great

    • @Koprofile
      @Koprofile 4 дня назад +2

      This comment is really great.

    • @JoePAcalaughs
      @JoePAcalaughs 4 дня назад +2

      ​@@KoprofileYour reply is really great.

    • @sprinteroptions9490
      @sprinteroptions9490 3 дня назад +1

      @@JoePAcalaughs It's really great that you acknowledge really great replies to really great comments.

  • @carryuindonesia1638
    @carryuindonesia1638 3 дня назад

    Wow thank you!

  • @mikew2883
    @mikew2883 4 дня назад

    Very cool! Do you happen to know if you can control it in a live browser programmically without the plugin? I tried the sample YAML, puppeteer and playwright versions but they run behind the scenes. I wanted to see if it could possibly be used with the latest OpenAI realtime WebRTC to control the browser via voice. Other methods don't have the capabilities of this tool so would be awesome if it could be used together.

  • @deltarestherogue5123
    @deltarestherogue5123 3 дня назад

    Thank you for the video. Can we use local LLM in its workflow?

  • @benjaminng8882
    @benjaminng8882 4 дня назад +1

    It’s a good tools, but currently it didn’t support for targeting the elements inside the , which is needed for my current project 😢

  • @Rom-lu7qx
    @Rom-lu7qx 4 дня назад +1

    Thanks for the great tool, but when installing an extension I can't open the extension menu when I click on it, I tried different ways but it didn't work for me :(

  • @alexk8541
    @alexk8541 4 дня назад

    Very nice tool, but is there ollama support planned in the near future?

  • @sercanba3432
    @sercanba3432 4 дня назад +1

    "Cannot access a chrome-extension:// URL of different extension
    Error: Cannot access a chrome-extension:// URL of different extension"
    I get this error message, how do I solve it?

    • @martinvarga7211
      @martinvarga7211 4 дня назад

      same problem here

    • @TopCuby
      @TopCuby 4 дня назад +1

      U have to be on the google home page to fix this , it doesn’t work in pages that are chrome based like chrome:extensions or chrome:about chrome:settings

    • @martinvarga7211
      @martinvarga7211 4 дня назад

      @TopCuby this works! Thanks a lot!

    • @bablooze9439
      @bablooze9439 3 дня назад

      It's mainly due to conflicts with other extensions injecting or into the page. Try disabling the suspicious plugins and refresh.

  • @Armagedom666
    @Armagedom666 4 дня назад

    Como que consigo pegar a API do Gemini 2.0 flash de graça pra colocar no cline dentro do vscode?
    Fui no Google Studio, mas nao consegui gerar a chave da API.

    • @Rom-lu7qx
      @Rom-lu7qx 4 дня назад

      Create a new account and try to get the API key at once
      P.S. I had the same problem, I solved it by creating a new account

    • @Armagedom666
      @Armagedom666 4 дня назад

      @@Rom-lu7qx I will try. Tks

  • @TheRealUsername
    @TheRealUsername 4 дня назад

    Does it have a real-time vision of the page?

  • @chadpogs7973
    @chadpogs7973 4 дня назад +2

    Another Gem!!

  • @ctwolf
    @ctwolf 4 дня назад

    ooh, i like