This is strange, in Cubase Pro12, we were able to separate drums, instruments and vocals. But in Pro 13, Spectralayers 10 is limited to seperate vocals only. Shall we revert to previous version with full feature. There is no answer from Steinberg regarding this which is kind of a disappointment with Pro13.
1. Data Availability: There is a wealth of publicly available visual data on the internet, such as images and videos, which can be used to train AI models for video generation. Large-scale datasets like ImageNet and RUclips provide a vast amount of labeled visual content that can be leveraged for training video-related AI models. In contrast, obtaining large-scale, high-quality, and diverse audio datasets is more challenging.
2. Complexity of Audio: Audio data is inherently more complex and nuanced than visual data. Generating realistic and high-quality audio requires capturing various aspects such as pitch, rhythm, timbre, and subtle details. These elements are more challenging to model accurately using current AI techniques. In contrast, although video synthesis also requires capturing complexity, visual information is more readily represented using pixels, making it somewhat easier to generate visually plausible content.
3. Human Perception: Humans have a higher tolerance for visual imperfections compared to audio. Our visual system is relatively forgiving when it comes to small visual artifacts or minor deviations from reality, allowing AI-generated videos to be visually appealing even if they are not entirely realistic. On the other hand, human hearing is more sensitive to audio inconsistencies, and even slight deviations or artifacts can be quickly noticed, making it more challenging for AI to generate high-quality and convincing audio.
4. Research Focus: The field of AI has seen significant advancements and research focus in computer vision tasks, such as image recognition, object detection, and video understanding. This increased attention and progress in visual tasks have contributed to the development of more advanced techniques and models for generating video content using AI. In contrast, while there has been progress in audio-related AI research, it has not received the same level of attention and resources as visual tasks.
Essential and Highly Recommended! 👍👍
This is strange, in Cubase Pro12, we were able to separate drums, instruments and vocals. But in Pro 13, Spectralayers 10 is limited to seperate vocals only. Shall we revert to previous version with full feature. There is no answer from Steinberg regarding this which is kind of a disappointment with Pro13.
Bravo!
Incredible development, bravo!
Will this or any other de-mixing technology work for MONO recordings?
Well I got full version installed but only separate the vocals no instruments any idea why it won’t separate the instruments and only vocals
WHY THERE'S NO SPECTRALAYERS ONE 10?😔
It will be bundled with Cubase 13 whenever that arrives
Hey in cubase 13 , it's not full version ?
Can I create a Nuendo project from the unmixed tracks/layers? Can I create a Nuendo project from the unmixed song layers? Thank you
Can anyone recommend software that separates recorded drum-kit into separate elements for further individual processing?
Now we need an AI generator to correct the artefacts like TOPAZ does for video and we're all set :)
Chat GPT: Producing video content using AI is often considered easier compared to generating high-quality audio. There are a few reasons for this...
1. Data Availability: There is a wealth of publicly available visual data on the internet, such as images and videos, which can be used to train AI models for video generation. Large-scale datasets like ImageNet and RUclips provide a vast amount of labeled visual content that can be leveraged for training video-related AI models. In contrast, obtaining large-scale, high-quality, and diverse audio datasets is more challenging.
2. Complexity of Audio: Audio data is inherently more complex and nuanced than visual data. Generating realistic and high-quality audio requires capturing various aspects such as pitch, rhythm, timbre, and subtle details. These elements are more challenging to model accurately using current AI techniques. In contrast, although video synthesis also requires capturing complexity, visual information is more readily represented using pixels, making it somewhat easier to generate visually plausible content.
3. Human Perception: Humans have a higher tolerance for visual imperfections compared to audio. Our visual system is relatively forgiving when it comes to small visual artifacts or minor deviations from reality, allowing AI-generated videos to be visually appealing even if they are not entirely realistic. On the other hand, human hearing is more sensitive to audio inconsistencies, and even slight deviations or artifacts can be quickly noticed, making it more challenging for AI to generate high-quality and convincing audio.
4. Research Focus: The field of AI has seen significant advancements and research focus in computer vision tasks, such as image recognition, object detection, and video understanding. This increased attention and progress in visual tasks have contributed to the development of more advanced techniques and models for generating video content using AI. In contrast, while there has been progress in audio-related AI research, it has not received the same level of attention and resources as visual tasks.