Voxel51
Voxel51
  • Видео 257
  • Просмотров 97 213
Computer Vision Meetup: Multiview Scene Graph
Motivated by how humans perceive scenes, we propose the Multiview Scene Graph (MSG) as a general topological scene representation. MSG constructs a place+object graph from unposed RGB images and we provide novel metrics to evaluate the graph quality. We combine visual place recognition and object association to build MSG in one Transformer decoder model. We believe MSG can connect dots across classic vision tasks to promote spatial intelligence and open new doors for topological 3D scene understanding.
Read the paper: arxiv.org/abs/2410.11187
About the Speaker:
Juexiao Zhang is a second-year PhD student in computer science at NYU Courant, advised by Professor Chen Feng. He is interested in l...
Просмотров: 60

Видео

Computer Vision Meetup: Simple & Scalable Approach to Improve Vision Model Robustness to Corruptions
Просмотров 234 часа назад
Deep neural networks perform exceptionally on clean images but face significant challenges with corrupted ones. While data augmentation with specific corruptions during training can improve model robustness to those particular distortions, this approach typically degrades performance on both clean images and corruptions not encountered during training. In this talk, we present a novel approach ...
Computer Vision Meetup: CLIP: Insights into Zero-Shot Image Classification with Mutual Knowledge
Просмотров 944 часа назад
We interpret CLIP’s zero-shot image classification by examining shared textual concepts learned by its vision and language encoders. We analyzes 13 CLIP models across various architectures, sizes, and datasets. The approach highlights a human-friendly way to understand CLIP’s classification decisions. Read the paper: arxiv.org/abs/2410.13016 Fawaz Sammani is a 2nd year PhD student at the Vrije ...
Computer Vision Meetup: Intrinsic Self-Supervision for Data Quality Audits​
Просмотров 274 часа назад
Benchmark datasets in computer vision often contain issues such as off-topic samples, near-duplicates, and label errors, compromising model evaluation accuracy. This talk will discuss SelfClean, a data-cleaning framework that leverages self-supervised representation learning and distance-based indicators to detect these issues effectively. By framing the task as a ranking or scoring problem, Se...
Computer Vision Meetup: Map It Anywhere: Empowering BEV Map Prediction using Public Datasets
Просмотров 439 часов назад
Top-down Bird’s Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more ...
Computer Vision Meetup: Understanding Bias in Large-Scale Visual Datasets
Просмотров 329 часов назад
Truly general-purpose vision systems require pre-training on diverse and representative visual datasets. The “dataset classification” experiment reveals that modern large-scale visual datasets are still very biased: neural networks can achieve excellent accuracy in classifying which dataset an image is from. However, the concrete forms of bias among these datasets remain unclear. In this talk, ...
Computer Vision Meetup: No "Zero-Shot" Without Exponential Data
Просмотров 599 часов назад
Web-crawled pretraining datasets underlie the impressive “zero-shot” evaluation performance of multimodal models. However, it is unclear how meaningful the notion of “zero-shot” generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during “zero-shot” evaluation. In this work, we ask: How is the p...
Computer Vision Meetup: An AI-Powered Teaching Assistant for Scalable and Adaptive Learning
Просмотров 51День назад
The future of education lies in personalized and scalable solutions, especially in fields like computer engineering where complex concepts often challenge students. This talk introduces Lumina (AI Teaching Assistant), a cutting-edge agentic system designed to revolutionize programming education through its innovative architecture and teaching strategies. Built using OpenAI API, LangChain, RAG, ...
Computer Vision Meetup: Scaling Semantic Segmentation with Blender
Просмотров 50День назад
Generating datasets for semantic segmentation can be time-intensive. Learn how to use Blender’s Python API to create diverse and realistic synthetic data with automated labels, saving time and improving model performance. Preview the topics to be discussed in this Medium post. About the Speaker Vincent Vandenbussche has a PhD in Physics, is an author, and Machine Learning Engineer with 10 years...
Computer Vision Meetup: WACV 2025 - Elderly Action Recognition Challenge
Просмотров 43День назад
Join us for a quick update on the Elderly Action Recognition (EAR) Challenge, part of the Computer Vision for Smalls (CV4Smalls) Workshop at the WACV 2025 conference! This challenge focuses on advancing research in Activity of Daily Living (ADL) recognition, particularly within the elderly population, a domain with profound societal implications. Participants will employ transfer learning techn...
Computer Vision Meetup: Using Machine Vision to Create Sustainable Practices in Fisheries
Просмотров 66День назад
Fishing vessels are on track to generate 10 million hours of video footage annually, creating a massive machine learning operations challenge. At AI.Fish, we are building an end-to-end system enabling non-technical users to harness AI for catch monitoring and classification both on-board and in the cloud. This talk explores our journey in building these approachable systems and working toward a...
Visual AI for Geospatial: Evaluating Earth Observation Foundation Models
Просмотров 121День назад
Geospatial and Earth Observation have benefited from the new advances in computer vision. In this talk we are going to evaluate the accuracy and ease of use for two of these great new models - the Satlas and Clay foundational models. The evaluation will look at distinct different areas on the globe. Come see how this gift of foundational models improves your work in geospatial or Earth observat...
Visual AI for Geospatial: Earth Monitoring for Everyone with Earth Index
Просмотров 184День назад
Earth Index is a end user focused application that preprocesses global imagery through AI foundation models to enable rapid in-browser search and monitoring. Earth Genome builds Earth Index for critical applications in the environment, and is being used today to report on illegal airstrips built in the Peruvian Amazon, track cattle factory farms across the planet for emissions modeling, and exp...
Visual AI for Geospatial: Is AI Creating a Whole New Earth-Aware Geospatial Stack?
Просмотров 140День назад
The latest wave of AI innovation is profoundly changing many domains. In remote sensing, despite efforts like ours at Clay and others, it is been less so. In this talk we will share our experience as we realize, and explore, if geoAI represents a whole new stack to work with Earth data. About the Speaker Dr. Bruno Sanchez-Andrade Nuno is the executive director of the non-profit project Clay, an...
The Frontier of VisualAI in Medical Imaging
Просмотров 19421 день назад
Explore the transformative potential of Visual AI in medical imaging with Daniel Gural, Machine Learning and Developer Relations expert at Voxel51. 🤖🩺 In this forward-looking talk, Daniel dives into the intersection of medicine and AI in 2025, addressing critical industry and research challenges: - Key Industry Challenges: Rising costs, doctor shortages, and time-intensive processes. - Key Rese...
How to Unlock More in Self Driving Datasets
Просмотров 19521 день назад
How to Unlock More in Self Driving Datasets
Streamlined Retail Product Detection with YOLOv8 and FiftyOne
Просмотров 163Месяц назад
Streamlined Retail Product Detection with YOLOv8 and FiftyOne
Hands-On with Meta AI's CoTracker3: Parsing and Visualizing Point Tracking Output
Просмотров 101Месяц назад
Hands-On with Meta AI's CoTracker3: Parsing and Visualizing Point Tracking Output
How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos
Просмотров 246Месяц назад
How We Built CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos
The NeurIPS 2024 Preshow: Creating SPIQA: Addressing the Limitations of Existing Datasets for VQA
Просмотров 4792 месяца назад
The NeurIPS 2024 Preshow: Creating SPIQA: Addressing the Limitations of Existing Datasets for VQA
The NeurIPS 2024 Preshow: Zero-Shot Learning: A Misnomer?
Просмотров 3242 месяца назад
The NeurIPS 2024 Preshow: Zero-Shot Learning: A Misnomer?
The NeurlPS 2024: A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis
Просмотров 1442 месяца назад
The NeurlPS 2024: A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis
The NeurlPS 2024 Preshow NaturalBench Evaluating Vision Language Model on Natural Adversarial Sample
Просмотров 2272 месяца назад
The NeurlPS 2024 Preshow NaturalBench Evaluating Vision Language Model on Natural Adversarial Sample
Computer Vision Meetup: Do It Yourself LLMs
Просмотров 1862 месяца назад
Computer Vision Meetup: Do It Yourself LLMs
The NeurIPS 2024 Preshow: A Label is Worth a Thousand Images in Dataset Distillation
Просмотров 4152 месяца назад
The NeurIPS 2024 Preshow: A Label is Worth a Thousand Images in Dataset Distillation
The NeurIPS 2024 Preshow: What matters when building vision-language models?
Просмотров 7192 месяца назад
The NeurIPS 2024 Preshow: What matters when building vision-language models?
ECCV 2024: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Просмотров 1622 месяца назад
ECCV 2024: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
ECCV Redux: Zero-shot Video Anomaly Detection: Leveraging LLMs for Rule-Based Reasoning
Просмотров 1532 месяца назад
ECCV Redux: Zero-shot Video Anomaly Detection: Leveraging LLMs for Rule-Based Reasoning
ECCV 2024 Redux: Day 3- Closing the Gap Between Satellite & Street View Imagery Generative Models
Просмотров 1162 месяца назад
ECCV 2024 Redux: Day 3- Closing the Gap Between Satellite & Street View Imagery Generative Models
ECCV 2024 Redux: Day 3- High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians
Просмотров 4382 месяца назад
ECCV 2024 Redux: Day 3- High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians

Комментарии

  • @cyberhard
    @cyberhard 15 часов назад

    Excellent presentation! Thank you.

  • @RimaBegum-v3t
    @RimaBegum-v3t 2 дня назад

    Hello sir, How are you? I am a regular viewer of yours. I follow your RUclips channel. I really your videos. But your videos get very few views. Have you thought about this?

  • @gobajoseph5064
    @gobajoseph5064 3 дня назад

    Super c'est modèles sont disponibles pour l'ensemble de la planète ? l'Afrique par exemple

  • @redforestx7371
    @redforestx7371 2 месяца назад

    Amazing work!

  • @redforestx7371
    @redforestx7371 2 месяца назад

    WOW. This looks so amazing. I can't wait to use this!

  • @adamaustad
    @adamaustad 2 месяца назад

    Really interesting

  • @sylviaschmitt
    @sylviaschmitt 2 месяца назад

    Thank you for sharing the video. Does this plugin assume a vector engine like qdrant is used as backend?

  • @sylviaschmitt
    @sylviaschmitt 2 месяца назад

    Thank you for sharing this video on the Active Learning plugin. Is it possible to use the plugin for multi-class multi-label tasks as well?

  • @sai.sankarwork
    @sai.sankarwork 3 месяца назад

    When I try the dev install process in a git bash terminal, it fails at a point because of a package error. "Collecting shapely>=1.7.1 (from -r requirements\extras.txt (line 7)) Using cached shapely-2.0.6-cp312-cp312-win_amd64.whl.metadata (7.2 kB) ERROR: Could not find a version that satisfies the requirement open3d>=0.16.0 (from versions: none) ERROR: No matching distribution found for open3d>=0.16.0" How can this be solved?

  • @NishantRoy-h4d
    @NishantRoy-h4d 3 месяца назад

    Very interesting demo; would you mind sharing the Colab link?

  • @deemon101
    @deemon101 4 месяца назад

    ok, and how does one start it?

  • @AmeeliaK
    @AmeeliaK 4 месяца назад

    This was very helpful! Llama Index grows so fast, it feels overwhelming for a beginner.

  • @BD_Gaming2013
    @BD_Gaming2013 5 месяцев назад

    !second comment

  • @SergeyPavlov-b4c
    @SergeyPavlov-b4c 5 месяцев назад

    I want to work with my custom dataset. I'd like you to show me how to do it and which benefits I can get using your product. Examples, how can I refine my own data with fiftyone

  • @menghuitan1628
    @menghuitan1628 6 месяцев назад

    Isn't the "Grid Trick" similar to using ControlNet, a type of model for controlling image diffusion models by conditioning the model with an additional input image?

  • @ByTobys
    @ByTobys 6 месяцев назад

    Love your product!

  • @MohitAkhakharia
    @MohitAkhakharia 7 месяцев назад

    How to we execute the plugin logic in the code? This doesn't seem to work: logging.info("removing approximate duplicates") operator_uri = "@jacobmarks/image_deduplication/remove_all_approximate_duplicates" params = { "sim_choices": "sim", # You may need to adjust this based on your similarity run key "threshold_value": 0.4 } # Create an invocation request request = foe.InvocationRequest(operator_uri, params=params) # Create an executor and execute the request executor = foe.Executor(requests=[request]) result = executor.trigger(operator_uri, params=params) print(result.to_json()) # logging.info(f"Found approximate duplicates: {result.result}") return result

  • @alivirat6926
    @alivirat6926 7 месяцев назад

    The video was great, thanks mate for explination.

  • @ashwinkumar5223
    @ashwinkumar5223 8 месяцев назад

    Wonderful 👍

  • @rishiraj2548
    @rishiraj2548 8 месяцев назад

    Great!

  • @HarisonRoberto
    @HarisonRoberto 9 месяцев назад

    the search result are only online images? or can it be local images?

    • @voxel51
      @voxel51 8 месяцев назад

      you can drag and drop a local image in :)

  • @kai_harm942
    @kai_harm942 9 месяцев назад

    Made that look *way* too easy. I spent a whole hour last night trying to get the first line of code to work! It was because my Python paths were thrown about the place

  • @technologyencroyable
    @technologyencroyable 9 месяцев назад

    How to build the js part of code to generate umd.js file in dist folder. I am build using yarn build but the generated umd file is not working and not opening new panel. Please help

  • @technologyencroyable
    @technologyencroyable 9 месяцев назад

    How to build the js part of code to generate umd.js file in dist folder. I am build using yarn build but the generated umd file is not working and not opening new panel. Please help

    • @voxel51
      @voxel51 9 месяцев назад

      Great question. Try `yarn install` as well. Make sure that the plugin is in your plugins directory. And when you want to change the plugin, make sure you use `yarn dev`. If you have more questions about FiftyOne Plugins, check out the #plugins channel in the FiftyOne community Slack! slack.voxel51.com/

  • @MyJunkEmail
    @MyJunkEmail 10 месяцев назад

    great tutorial, can you use a local instance of SD?

  • @AlainPilon
    @AlainPilon Год назад

    It is great that you give us a list of next steps, but a link to each of these points would have been nice!

  • @ZixuWang-ul8hr
    @ZixuWang-ul8hr Год назад

    nice job!

  • @aimadnessbot
    @aimadnessbot Год назад

    This is good! But i believe the data should also grab eye movement. Eye movement is crucial to map intention and will aid in robot navigation. Apple's headset has the hardware to monitor both eye direction and head direction.

  • @huynhphanngockhang5733
    @huynhphanngockhang5733 Год назад

    I have a idea for build a autonomous drone using computer vision to detect objects that is labled with a GPS location before.

  • @rezamahmoudi163
    @rezamahmoudi163 Год назад

    please slide share?

  • @SeedmancChitOKun
    @SeedmancChitOKun Год назад

    How does it select which images would be kept as "representatives" and which removed?

  • @aldem34
    @aldem34 Год назад

    I want words like these intitle:"keyword" For better search efficiency for topics

  • @wata1991
    @wata1991 Год назад

    Is it possible to use this and find the most similar image given user submitted photos? For example I'm trying to do something to detect trading cards, where the input would be photos of cards submitted by users.

  • @tyronetyrone2652
    @tyronetyrone2652 Год назад

    Hello, I downloaded and installed FiftyOne, but I don’t know how to use it. All your videos didn’t explain how to use it.

    • @ByTobys
      @ByTobys 6 месяцев назад

      There is lots of documentation online on their website, check it out! Its really not difficult to get it running, but its "only" an API, so some python Experience is definetly helpful to get it running. :)

    • @ByTobys
      @ByTobys 6 месяцев назад

      There is lots of documentation online on their website, check it out! Its really not difficult to get it running, but its "only" an API, so some python Experience is definetly helpful to get it running. :)

  • @davidgrayson181
    @davidgrayson181 Год назад

    I love this

  • @beiddouwang6643
    @beiddouwang6643 Год назад

    good

  • @omarelsherif010
    @omarelsherif010 Год назад

    Thanks for clear explanation❤

  • @jasonwell5299
    @jasonwell5299 Год назад

    Thank you so much bro. Nice tutorial.

  • @divyanshnautiyal8110
    @divyanshnautiyal8110 Год назад

    getting Not Found

  • @robosergTV
    @robosergTV Год назад

    Thanks, great overview

  • @ChrisWiggins1
    @ChrisWiggins1 Год назад

    Look promising, I was going through your tutorial, and I was hoping to see how you can import your own database.

  • @ritagislason
    @ritagislason Год назад

    🤩 Promo'SM

  • @vanessacrosbyfitzgerald
    @vanessacrosbyfitzgerald Год назад

    Can you perform the initial labeling on images that have not been annotated yet? On part 5 and I have not seen that information yet. Did I miss it?

  • @DigiDriftZone
    @DigiDriftZone Год назад

    Can you edit/correct or add/remove annotations directly in FiftyOne?

  • @sapsan1234
    @sapsan1234 Год назад

    I am really excited about this product! Thank you for this hands-on video!

  • @akshayiitk4440
    @akshayiitk4440 2 года назад

    "Wow, this video is incredibly informative and well-produced! The speaker does a fantastic job of explaining the complex topic of speech recognition and the new Whisper model from OpenAI in a way that's easy to understand. Great job, highly recommended to anyone interested in this field!"

  • @magdalenakate6781
    @magdalenakate6781 2 года назад

    splendid 🙂✌️️️!! Find out how your competition ranks better = 'Promosm'!!

  • @AliHamza-ys8dt
    @AliHamza-ys8dt 2 года назад

    how to add our own dataset into FiftyOne. I want to label my own data.

    • @ByTobys
      @ByTobys 6 месяцев назад

      As mentioned in the video, fiftyone isn't a classical annotation tool, but it provides hooks to do that with cvat, labelbox etc and then load the labeled data back into fiftyone. For me the cvat solution worked perfectly fine. Everything is perfectly documented on their website, check it out! :) If you want to load your annotation data which is in your own format, and not in a typical dataformat (COCO,...) you'll have to write a few lines of python codes yourself. For that purpose I have implemented a DatasetHandler-class. You'll have to convert into fiftyone-format by iterating through your data and turn them into fiftyone Detection-Objects: detections.append( fo.Detection(label=my_label, bounding_box=my_bbox) ) Fiftyone doesn't work "out of the box", but it's a great tool for working with CV-Data!