dacort - Data Analytics
dacort - Data Analytics
  • Видео 41
  • Просмотров 72 973
Damons Data Lake
A short little video that shows how I gather data from GitHub and RUclips using my custom container framework ( github.com/dacort/cargo-crates/ ).
I run containers in ECS on a scheduled basis, deployed via CDK ( github.com/dacort/damons-data-lake/tree/main/data_containers ) and pipe the output from the container into paths on S3 extracted from the JSON data using a custom tool called Forklift ( github.com/dacort/forklift ).
Hope you enjoy!
00:00 - Intro
01:18 - Cargo Crates
03:51 - Forklift
06:03 - Damons Data Lake
07:32 - Querying in Athena
Просмотров: 255

Видео

Remote Debugging with PyCharm and EMR
Просмотров 70310 месяцев назад
Shows how to use PyCharm with EMR on EKS and EMR Serverless to interactively debug your Spark applications. 00:00 - Intro 01:16 - Getting Started 02:12 - CDK Deploy Output 03:43 - PyCharm Debugger 05:09 - Building Spark dependencies 06:52 - Running an EMR on EKS job 07:34 - Using the EMR CLI 08:45 - Enabling debugging on your Spark job 12:41 - Debugging EMR Serverless The CDK stack for this vid...
Amazon EMR and S3 Access Grants
Просмотров 1,2 тыс.Год назад
A demo that follows along with the blog post at aws.amazon.com/blogs/big-data/use-amazon-emr-with-s3-access-grants-to-scale-spark-access-to-amazon-s3/ 00:00 - Intro 01:34 - CloudFormation Stack Overview 03:07 - Create S3 Access Grants 05:04 - Create READ and READWRITE grants 06:32 - EMR on EC2 example 07:27 - Diving into data writer role permissions 11:16 - EMR Studio and EMR Serverless 16:50 -...
Generate real-time code suggestions in EMR Studio notebooks
Просмотров 427Год назад
See how EMR Studio now integrates with Amazon CodeWhisperer to provide real-time code suggestions for Apache Spark. Docs: docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-codewhisperer.html 00:00 - Intro 00:33 - Create EMR Serverless application 01:19 - Connect EMR Studio to EMR Serverless 02:34 - First CodeWhisperer auto-complete
Amazon EMR - When to use EMR on EC2, EKS, and Serverless
Просмотров 3,3 тыс.Год назад
A high-level overview of the different deployment options for EMR including EMR on EC2, EMR on EKS, and EMR Serverless. 00:00 - Intro 00:30 - EMR on EC2 02:52 - EMR Serverless 05:24 - EMR on EKS
A Tour of the Amazon EMR Console
Просмотров 408Год назад
In this video, we take a look at a few common tasks in the new Amazon EMR Console. Redesigned with ease of use and less clicks in mind, we hope you enjoy it! 00:00 - Intro 00:18 - Create cluster 02:30 - Cluster summary 03:13 - Add step 04:21 - Instances tab 04:51 - Collapsible notifications 05:18 - Cluster list with expandable rows 06:02 - Search bar usage 07:14 - Events page 07:29 - Block publ...
EMR Serverless Pre-initialized capacity overview
Просмотров 396Год назад
It can be a little tough to understand how jobs use pre-initialized capacity in EMR Serverless. This video demonstrates how resources are provisioned, acquired, and refilled during the course of two jobs. 00:00 - Intro 00:35 - Job 1 01:05 - Job 2 01:33 - Refilling the pool 01:52 - Finishing Job 2
Running Spark jobs on Amazon EMR Serverless
Просмотров 10 тыс.2 года назад
Get an overview of how to run Apache Spark jobs in EMR Serverless from the AWS Console, CLI, and using Amazon Managed Workflows for Apache Airflow (MWAA). Also see how to use the new CloudWatch Metrics to monitor EMR Serverless usage, Live Dashboard UI, and package your PySpark jobs with virtual environments. Table of Contents: 00:00 - Intro 02:01 - Create application in the console 02:47 - Pre...
Intro to Amazon EMR Toolkit
Просмотров 2,1 тыс.2 года назад
See how to install and use the Amazon EMR Toolkit for VS Code. - Browse your EMR on EC2, EMR on EKS, and EMR Serverless resources. - Explore your Glue Data Catalog and view table details. - Create a local PySpark development container based on EMR. - Deploy PySpark jobs to EMR Serverless Table of Contents: 00:00 - Intro 00:40 - Installing the EMR Toolkit 01:22 - EMR Explorer 02:25 - Glue Data C...
Modern Data Lake Storage Layers
Просмотров 12 тыс.2 года назад
An overview of Apache Hudi, Apache Iceberg, and Delta Lake. In this video, we talk about the basics of how Hudi, Iceberg, and Delta Lake work. You'll see how to insert, update, and delete data in your data lake and how each of these frameworks work behind the scenes. Blog post: dacort.dev/posts/modern-data-lake-storage-layers/ GitHub Repo with CloudFormation and Notebooks: github.com/dacort/mod...
Amazon EMR Studio - SQL Explorer
Просмотров 1,3 тыс.2 года назад
With the new SQL Explorer in Amazon EMR Studio, you can now easily look at your database tables and run SQL queries without having to embed them in code. In this demo, we show how to browse your database and tables and execute queries in the SQL Editor. What's new post: aws.amazon.com/about-aws/whats-new/2022/01/introducing-sql-explorer-in-emr-studio/ Documentation: docs.aws.amazon.com/emr/late...
Amazon EMR Studio - Real-time Collaboration
Просмотров 1,7 тыс.2 года назад
Real-time collaboration is a new feature in EMR Studio that allows multiple users to share a single notebook workspace. In this video, we’ll show you how you can both use the same workspace to collaborate on the same notebook. What's New post: aws.amazon.com/about-aws/whats-new/2022/01/real-time-collaborative-notebooks-emr-studio/ Documentation: docs.aws.amazon.com/emr/latest/ManagementGuide/em...
Running Hive and Spark jobs on Amazon EMR Serverless
Просмотров 6 тыс.2 года назад
Now in preview, Amazon EMR Serverless allows you to run big data analytics without worrying about infrastructure. In this demo, we show how to instantly run both Spark and Hive jobs on EMR Serverless as well as how to debug the jobs in real-time using logs on Amazon S3 and the Spark History Server and Tez UI. 00:00 - Intro 01:10 - EMR Serverless Overview 02:10 - Running a Spark job 06:43 - Usin...
Getting Started with EMR Studio
Просмотров 2,1 тыс.3 года назад
See how to create a new Amazon EMR Studio using IAM Authentication Mode. How to set up an Amazon EMR Studio: docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-set-up.html EMR Studio CloudFormation Templates: github.com/aws-samples/emr-studio-samples EMR Studio IAM Permissions: docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-iam-permissions-table.html Table of Contents: 00:00 -...
Query Athena from EMR Studio
Просмотров 1,3 тыс.3 года назад
See how to install pyathena and query Athena from an EMR Studio notebook. Setting up EMR Studio: docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-set-up.html Example notebook: github.com/dacort/demo-code/blob/main/emr/studio/notebooks/emr-studio-athena.ipynb Table of Contents: 00:00 - Intro 00:51 - Install pyathena 01:25 - Query your data! 02:37 - Querying with SparkSQL
What's new with EMR Studio
Просмотров 3763 года назад
What's new with EMR Studio
Using IAM Authentication with EMR Studio
Просмотров 1,1 тыс.3 года назад
Using IAM Authentication with EMR Studio
Interactive data processing on Amazon EMR from Amazon SageMaker
Просмотров 2,4 тыс.3 года назад
Interactive data processing on Amazon EMR from Amazon SageMaker
Intro to AWS Analytics
Просмотров 2403 года назад
Intro to AWS Analytics
Running EMR jobs with Airflow
Просмотров 8 тыс.3 года назад
Running EMR jobs with Airflow
Amazon EMR on EKS Custom Images
Просмотров 3523 года назад
Amazon EMR on EKS Custom Images
Intro to data processing on AWS
Просмотров 9413 года назад
Intro to data processing on AWS
Amazon EMR on EKS - pod templates
Просмотров 4513 года назад
Amazon EMR on EKS - pod templates
Amazon EMR Studio pipelines
Просмотров 7293 года назад
Amazon EMR Studio pipelines
dacort's Data Lab - Docker in EMR Studio
Просмотров 733 года назад
dacort's Data Lab - Docker in EMR Studio
EMR on EKS: Pod Templates in 60 seconds
Просмотров 3723 года назад
EMR on EKS: Pod Templates in 60 seconds
EMR on EKS - Optimizing Spark Jobs
Просмотров 3073 года назад
EMR on EKS - Optimizing Spark Jobs
Amazon EMR Studio - Creating a new Studio Workspace
Просмотров 5 тыс.3 года назад
Amazon EMR Studio - Creating a new Studio Workspace
Connecting to Git in Amazon EMR Studio
Просмотров 1,5 тыс.3 года назад
Connecting to Git in Amazon EMR Studio
Amazon EMR on Amazon EKS Demo - Job Creation
Просмотров 9173 года назад
Amazon EMR on Amazon EKS Demo - Job Creation

Комментарии

  • @amzar-2024
    @amzar-2024 16 дней назад

    where are you storing `pi.py`? thank you

  • @viewermm1588
    @viewermm1588 3 месяца назад

    Does anyone here knows if it is possible to use Spark to select/collect multiple Parquet files from s3 bucket ( all in "ABC" folder) and combined them in one Parquet file in ( "DEF") file in the same location? and if so what is the code , thanks

  • @LayneSadler
    @LayneSadler 3 месяца назад

    Isn't it redundant to call out to an Athena cluster from within an EMR cluster? Is the reason for using Athena its integration with Glue's Catalog?

  • @Daveooooooooooo0
    @Daveooooooooooo0 3 месяца назад

    ❤ thanks ❤

  • @kidslearningscience
    @kidslearningscience 4 месяца назад

    I am in video two and looks like it would be beneficial to provide details on how to create vcluster

  • @woliveiras
    @woliveiras 5 месяцев назад

    Amazing job. Thank you!! What is the best way to read this delta tables now? Data Catalog and then Athena? I would like to see this data in our QuickSight.

  • @tysong9883
    @tysong9883 6 месяцев назад

    Hi Dacort, I can’t finish the “Reopen in Container” step, as after it downloads the docker image, an error pops up saying “yum doesn’t have enough cache data to continue” at step “RUN yum install -y sudo && “Hadoop ALL=(ALL) NOPASSWD:ALL”.. Wondering if you might know any reason behind? Thank you so much!

  • @sval4020
    @sval4020 7 месяцев назад

    When reading "cargo crates", my mind immediately jumped into the rust ecosystem and the cargo package manager :) Great video man! Learning a ton from you especially when it comes to AWS and EMR! Please keep up doing the great work!

  • @mertsevenz
    @mertsevenz 8 месяцев назад

    Hey @dacort, Thanks for the great video. - What about Glue? Can we say that Glue and EMR serverless do more or less the same thing? - Let's say we only have Spark jobs to run based on some triggers. Since it is a transient job, I should run it with EMR serverless. On the other hand, if I need a long-running cluster, I should go with EMR on EC2/EKS. Can I extract the formula like this :)

  • @basrivlogsFinland
    @basrivlogsFinland 8 месяцев назад

    Thanks it was helpful, what about aws athena and aws s3 access grants? I am struggling with a PoC where data is in one aws account and athena in another account and I would like to use aws s3 access grants to manage permissions

  • @elidexterdiaz
    @elidexterdiaz 9 месяцев назад

    I tried to work with notebooks.. I succesful attach serverless but It can't run.. always show error when I try to run..aws docs say nothing....for me could be role. maybe "interactive rol"... what is it?... any advice?

  • @kpicsoffice4246
    @kpicsoffice4246 10 месяцев назад

    You made EMR fun. Good luck with everything.

  • @kpicsoffice4246
    @kpicsoffice4246 10 месяцев назад

    Really nice content. You should do more of such Developer tooling and productivity boosts while working with EMR.

  • @itsyashagrawal
    @itsyashagrawal 11 месяцев назад

    @dacort Thanks for the video. In my case as well I'm using EmrContainerOperator to submit jobs to EMR on EKS cluster which is working fine. Now to track the cost for each jobs I want to assign tags to each jobs being executed in EMR on EKS virtual cluster. While assigning tags to the EmrContainerOperator, the jobs are being executed by aws_default connection id and is ignoring the aws_conn_id I've provided to my operator but once I remove the tags from the operator it is perfectly using the custom aws_conn_id I've provided. Any help here would be greatly appreciated. For your reference: emr_spark_submit = EmrContainerOperator( task_id="task_id", virtual_cluster_id="VIRTUAL_CLUSTER_ID", execution_role_arn="emr-on-eks-job-execution-role", release_label="emr-6.7.0-latest", job_driver=job_driver_arg, configuration_overrides=configuration_overrides_arg, name="submit_pyspark_job", aws_conn_id="emr-on-eks", tags={'job':'test'}, dag=dag, ) FYI: I'm using from airflow.providers.amazon.aws.operators.emr import EmrContainerOperator and not the from emr_containers.operators.emr_containers import EMRContainerOperator

  • @DE-YASH
    @DE-YASH Год назад

    Thanks for the video @dacort. I tried submitting the Job to my EKS cluster but it got failed. I'm unable to even check the logs why it got failed. Any help here would be really appreciated

    • @dacort
      @dacort Год назад

      Hm, was it the job itself that failed? There's a lot of different reasons that could happen, so more details would be useful. :)

    • @DE-YASH
      @DE-YASH Год назад

      @@dacort The issue was with the permission granted to the role due to which it was unable to spin up the driver containers. Now the issues have been resolved, thanks :)

    • @dacort
      @dacort Год назад

      @@DE-YASH Sweet! Glad you got it figured out and sorry it took so long to reply. :)

  • @itzikpaz6796
    @itzikpaz6796 Год назад

    Hi, thanks for the great video 😊 I am searching for an article or any direction to build a development pipeline on top of emr-eks from local/dev to production.

    • @dacort
      @dacort Год назад

      Sorry for the delay - this article here is for CI/CD with EMR Serverless, but could be adapted to EMR on EKS fairly easily.

  • @HenryLiang-z4o
    @HenryLiang-z4o Год назад

    the video talks about the advantages of using EMR on EC2 and EMR serverless, so what is benefit of using EMR on EKS?

    • @dacort
      @dacort Год назад

      EKS (Kubernetes) is great for want to share your compute/memory resources across different variable workloads. Many orgs are adopting k8s, so EMR on EKS helps make it easier to run EMR workloads (like Spark and Flink) on top of EKS.

    • @nathanbenton2051
      @nathanbenton2051 8 месяцев назад

      indeed@@dacort. but one of the catches being that without quota or limit thresholds set at the k8s level, it's very easy for various team/apps to cripple resources in the "emr" namespace for emr containers. anyways, great vid and thanks for the content!

  • @BreezyParenting
    @BreezyParenting Год назад

    If you use PySpark engine on EMR to read the data from S3, will the resultant dataframe be stored on EMR or S3?

    • @dacort
      @dacort Год назад

      Not entirely sure what you mean - when PySpark reads the dataframe it stores that in memory. You decide where to write it back out to.

  • @jenjayhsu1671
    @jenjayhsu1671 Год назад

    why EMR serverless does not support Flink? and also why EMR on EKS does not support Hive?

    • @dacort
      @dacort Год назад

      Each deployment model of EMR has different use-cases and customer bases. In other words, "folks that tend to run a modern k8s environment, also run modern workloads like Spark or Flink, but not Hive."

  • @vicnowo
    @vicnowo Год назад

    Q would this work with a EMR on EKS cluster?

    • @dacort
      @dacort Год назад

      Unfortunately not at this time.

  • @AlDamara-x8j
    @AlDamara-x8j Год назад

    Thanks for this great tutorial. Question: After a detached my notebook from a running cluster and stop my workspace(idle)... am I going to be charged for something else? (of course, assuming I terminated my cluster but still want to keep my workspace in emr studio). Thanks

    • @dacort
      @dacort Год назад

      Nope, no additional charge for notebooks/Studio. Just the underlying compute when you have a cluster or EMR serverless application running.

  • @giku1035
    @giku1035 Год назад

    좋아요

  • @panchao
    @panchao Год назад

    Thanks for creating this nice tool. In the Glue catalog view, can we also see the partition keys of a table?

    • @dacort
      @dacort Год назад

      Unfortunately not yet!

  • @JayanthNaidu-w5e
    @JayanthNaidu-w5e Год назад

    Is there a way to install custom Java versions without creating custom images?

    • @dacort
      @dacort Год назад

      We now support Java 17 ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-java-runtime.html ). Unfortunately not another way to use custom Java versions without custom images.

  • @subhomoysikdar
    @subhomoysikdar Год назад

    Is there a way to run EMR serverless with GPU? I want to run pyspark jobs with NVIDIA RAPIDS

    • @dacort
      @dacort Год назад

      Not as of today. For that you'll still need EMR on EC2 or EMR on EKS.

    • @subhomoysikdar
      @subhomoysikdar Год назад

      @@dacort Ok. Thank you

  • @llllloolllll
    @llllloolllll Год назад

    What happened if the ip addresses have changed? do we need to re create the studio?

  • @disrupcao4674
    @disrupcao4674 Год назад

    great video

  • @conorflanagan8655
    @conorflanagan8655 Год назад

    This was extremely helpful, thank you! One suggestion for future videos on EMR: It's sort of implied in this video, but I think it would be helpful to new people to really emphasize that this will NOT WORK if you are signed in as the root user. I imagine there's a long list of reasons why using root user is bad practice, but I also imagine it's very common for people just starting out, and cause many wasted hours. 😅

    • @dacort
      @dacort Год назад

      Thanks, Conor! I hadn't run into that before but will definitely keep it in mind!

  • @julsgranados6861
    @julsgranados6861 Год назад

    Great video!! , Is there any way to run a dbt project using emr serverless?, I have seen that they have the Thrift option to connect to EMR on EC2, but I am not sure if it is possible to connect it to EMR serverless :(

    • @dacort
      @dacort Год назад

      Unfortunately not as of today. :(

  • @julsgranados6861
    @julsgranados6861 Год назад

    Is there any way to run a dbt project using emr serverless?, I have seen that they have the Thrift option to connect to EMR on EC2, but I am not sure if it is possible to connect it to EMR serverless :(

    • @dacort
      @dacort Год назад

      Not as of today. :(

  • @dishantshah4296
    @dishantshah4296 Год назад

    Hello, I am getting the following error while retrieving databases

  • @adishjain7351
    @adishjain7351 Год назад

    Hi I am getting USER_ERROR everytime I use custom image. Any solution for that ?

    • @dacort
      @dacort Год назад

      Hi Adish - tough to say without more details. We do have a tool to help validate that image, perhaps that'll help? github.com/awslabs/amazon-emr-on-eks-custom-image-cli

  • @Major.Tom.1973
    @Major.Tom.1973 Год назад

    Does any of these have a "vacuum" equivalent, or how do you do housekeeping / maintenance on these incremental data lakes?

    • @dacort
      @dacort Год назад

      Both Hudi and Iceberg have "maintenance' operations you can run, including compaction. For Iceberg ( iceberg.apache.org/docs/1.2.0/maintenance/#compact-data-files ) and Hudi ( hudi.apache.org/docs/compaction/ ).

  • @VikasGK
    @VikasGK Год назад

    Thats an excellent demonstration

  • @ManishBhandari-df2xf
    @ManishBhandari-df2xf Год назад

    Hi Great video - can you please also show steps on how to install external libraries on EMR - bootstrap script replacement?

    • @dacort
      @dacort Год назад

      Assuming you're talking about EMR Serverless, there's a couple different options. You can use custom images ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/application-custom-image.html ) to install OS-level dependencies. If you're just talking about PySpark dependencies you can also bundle a virtual environment ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-python-libraries.html ).

    • @srirajvasireddy2615
      @srirajvasireddy2615 10 месяцев назад

      For pyspark dependencies like pandas or kafka. How to bundle a virtual environment? New to python, any help or suggestions are greatly appreciated.

  • @MrWakewater
    @MrWakewater Год назад

    Can you covering the set up in more detail? specifically the IAM roles, EMR service role etc?

  • @sahilbhatia2014
    @sahilbhatia2014 2 года назад

    Hi Decort. Thank you for the video but somehow I am unable to connect this with my aws account as we use the SSO (single Sing On) authentication. But I am able to connect via AWS toolkit using the 'aws configure sso' command. Could you please help what can I do in this case?

  • @kingsleywen3889
    @kingsleywen3889 2 года назад

    Amazing. Could you do a tutorial about using step function with EMR Serverless? Thanks.

    • @dacort
      @dacort 2 года назад

      EMR Serverless is not natively supported with Step Functions today, but there is a way to do it using Lambda functions. We have a blog post about it here, if it's helpful! aws.amazon.com/blogs/big-data/run-a-data-processing-job-on-amazon-emr-serverless-with-aws-step-functions/

  • @ritesh5282
    @ritesh5282 2 года назад

    nice work

  • @NM-jq3sv
    @NM-jq3sv 2 года назад

    great tutorial

  • @atoz12325
    @atoz12325 2 года назад

    Thanks for the video Dacort..Please share how to connect my own S3 from Docker container. I try to add AWS keys in docker file as ARG. but it did not worked after rebuild the container. Please advise..

  • @bariowd
    @bariowd 2 года назад

    Amazing video do you know if there is any chance to send parameters from airflow DAG to the called notebook? For example the DAG receives a random date&&number then when you trigger the DAG it send those parameters to the notebook. Thank you! :)

    • @dacort
      @dacort 2 года назад

      I didn't use notebooks in this video, the EMR StartNotebookExecution API allows you to pass parameters to notebook runs. We have a blog post about that here: aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-on-amazon-emr-notebooks-using-amazon-mwaa/

  • @arjunshah6594
    @arjunshah6594 2 года назад

    This is great Damon. Just a quick question. Does this need a connection to AWS similar to the AWS Toolkit in Visual Studio? Somehow even after having AWS local profile and the necessary permissions, the explorers does not load

    • @dacort
      @dacort 2 года назад

      It's not quite as robust as the AWS Toolkit authentication. It just uses the default profile from your environment. Do you get an error message or is there just nothing showing up? As long as you can run "aws emr list-clusters" or "aws emr-serverless list-applications" from your terminal, it should work in VS Code. There is also an "EMR: Select AWS Region" command to change regions.

    • @arjunshah6594
      @arjunshah6594 2 года назад

      @@dacort That's great. So the terminal commands used to work but not load up the explorer. But selecting the EMR:Select AWS Region solved the issue and now I can see all the explorers populated. Thank you so much. This is great :)

    • @dacort
      @dacort 2 года назад

      @@arjunshah6594 WOO HOO! Awesome. :) Thanks for giving it a try!

  • @AnGELsPearhead
    @AnGELsPearhead 2 года назад

    Amazing Demo!!!

  • @zeal0502
    @zeal0502 2 года назад

    Thanks for the concise demo!

  • @srikanthjaggari4610
    @srikanthjaggari4610 2 года назад

    This is wonderful demo video and very helpful. We want to create & submit jobs from either Terraform or open source Airflow. But Terraform supporting only application creation where as Airflow support from V5. Could you please share list of ways to create and run jobs.

  • @anirudhvyas6069
    @anirudhvyas6069 2 года назад

    Can we customize pods other than driver and executor pods? I wish to mount files that need to be available on job-runner container

    • @anirudhvyas6069
      @anirudhvyas6069 2 года назад

      Some context - job runner container is the one that spins up driver pod

  • @Madurai2USA
    @Madurai2USA 2 года назад

    have you configured EMR STUDIO for EMR on EKS? Please shared cloud formation stack info and RUclips video. Thanks!

    • @dacort
      @dacort 2 года назад

      There's a repository here that should help you get everything deployed you need: github.com/aws-samples/amazon-emr-on-eks-emr-studio

  • @anjim7877
    @anjim7877 2 года назад

    Amazing job, Thank you Dacort

  • @Madurai2USA
    @Madurai2USA 2 года назад

    I am trying to create Endpoint for EKS cluster to attach with EMR Studio. Endpoint requires Certificate. I have no clue how should I create certificate. I tried with certificate manager to create public certificate but not sure what is the domain I should provide there. Could you please explain about it ?

    • @dacort
      @dacort 2 года назад

      You can use a wildcard domain like *.ec2.internal and self-signed certificate. The instructions here specify how to do so: docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html#emr-encryption-certificates Once that certificate is created, you can import it into acm using a command like this: aws acm import-certificate --certificate fileb://trustedCertificates.pem --private-key fileb://privateKey.pem There's also a CDK example here that might be useful: github.com/aws-samples/aws-cdk-for-emr-on-eks