
- Support for Spark streaming ETL jobs
- Support for managing Data Catalog after running an ETL job
- Accepts Apache Avro data format as input and output in AWS Glue
- Provides support for EMRFS S3-optimized committer for writing Parquet data into Amazon S3
- Supports machine learning transforms as a resource managed by AWS resource tags
- Support for non-overrideable job arguments
- Support for new transforms to work with datasets in Amazon S3
- Support for reading from MongoDB and Amazon DocumentDB
Also AWS Glue jobs, based on Apache Spark, to run continuously and consume data from streaming platforms such as Amazon Kinesis Data Streams and Apache Kafka (including the fully-managed Amazon MSK).
Source:https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
#ETL#DATAStreaming#Amazon#AmazonGlue#Amazon#DataAnalysis#Cloud




Leave a comment