DeepSeek Janus-Pro: DeepSeek's Revolution in Multimodal AI?

Поделиться
HTML-код
  • Опубликовано: 30 янв 2025

Комментарии • 3

  • @midlivad
    @midlivad День назад +3

    The DeepSeek Janus-Pro module, like many other AI models, can struggle to differentiate between illustrations and photographs of real animals. This limitation stems from how these models are trained: they learn patterns and associations from massive datasets, often without explicit instruction on the fundamental distinction between representations and reality.
    Here's a breakdown of the potential issues:
    Data Bias: If the training data heavily favors photographs, the model might prioritize features common in photos (e.g., realistic lighting, textures, perspectives) over those specific to illustrations (e.g., stylized features, exaggerated proportions).
    Lack of Explicit Training: Unless the model is specifically trained on datasets that clearly label illustrations versus photographs, it may not develop a robust understanding of the inherent differences between these image types.
    Hallucinations: The inability to distinguish between real and imagined can lead to the generation of "hallucinations" - images or descriptions that combine elements from different sources in ways that don't reflect reality. For example, a model might generate an image of a "photosynthetic tiger" based on its understanding of tigers and photosynthesis, even though such a creature is impossible.
    Addressing this limitation requires:
    Curated Datasets: Training data should include a balanced representation of both photographs and illustrations, with clear labels for each category.
    Fine-tuning: Models can be fine-tuned on specific tasks that emphasize the distinction between real and imagined images.
    Meta-learning: Techniques like meta-learning can help models learn to generalize and adapt to new types of images, including those they haven't encountered during training.
    In essence, while Janus-Pro is a powerful multimodal model, it's crucial to acknowledge and address its limitations in understanding the nuances between real and imagined imagery. By incorporating these considerations into the training and development process, we can strive for more robust and reliable AI models that can better navigate the complexities of the visual world.

  • @ppocka-XD
    @ppocka-XD 20 часов назад

    IMO entropy scoring would be the ultimate fakeness estimate. However detecting subtle entropy gradations could be problematic on systems that smooth it out by design 🤔

  • @DustinRodriguez1_0
    @DustinRodriguez1_0 2 дня назад

    Does the model fail at distinguishing between illustrations and images of actual animals similar to how the narrator fails at this basic test of image comprehension? Normally I wouldn't mention it as most people fail at distinguishing between images and reality, conflating them regularly, but in this situation it's pretty important... given the way models are trained on imagery, and how they relate that to text, it would be easy to overlook teaching the model that images, illustrations, drawings, paintings, etc, are fundamentally distinct from reality. It might take the model a very long time to determine that such a distinction exists, which would both encourage hallucinations of things that would not be possible as well as heavily restricting its creative possibilities in the space of images.