How to Use AWS IoT Analytics

AWS IoT Analytics Components and Concepts

Channel
A channel collects data from an MQTT topic and archives the raw, unprocessed messages before publishing the data to a pipeline. You can also send messages to a channel directly with the BatchPutMessage command.
Pipeline
A pipeline consumes messages from one or more channels and allows you to process the messages before storing them in a data store. The processing steps, called activities (Pipeline Activities), perform transformations on your messages such as removing, renaming or adding message attributes, filtering messages based on attribute values, invoking your lambda functions on messages for advanced processing or performing mathematical transformations to normalize device data.
Data store
Pipelines store their processed messages in a data store. A data store is not a database, but it is a scalable and queryable repository of your messages. You can have multiple data stores for messages coming from different devices or locations, or filtered by message attributes depending on your pipeline configuration and requirements.
Data set
You retrieve data from a data store by creating a data set. IoT Analytics allows you to create a SQL data set or a container data set.

After you have a data set, you can explore and gain insights into your data through integration with Amazon QuickSight. Or you can perform more advanced analytical functions through integration with Jupyter Notebooks. Jupyter Notebooks provide powerful data science tools that can perform machine learning and a range of statistical analyses. For more information, see Notebook Templates.

SQL data set
A SQL data set is similar to a materialized view from a SQL database. In fact, you create a SQL data set by applying a SQL action. SQL data sets can be generated automatically on a recurring schedule by specifying a trigger.
Container data set
A container data set allows you to automatically run your analysis tools and generate results (Automating Your Workflow). It brings together a SQL data set as input, a Docker container with your analysis tools and needed library files, input and output variables, and an optional schedule trigger. The input and output variables tell the executable image where to get the data and store the results. The trigger can run your analysis when a SQL data set finishes creating its content or according to a time schedule expression. A container data set automatically runs, generates and then saves the results of the analysis tools.
Trigger
You can automatically create a data set by specifying a trigger. The trigger can be a time interval (for example, create this data set every two hours) or when another data set’s contents have been created (for example, create this data set when “myOtherDataset” finishes creating its content). Or, you can generate data set content manually by calling CreateDatasetContent.
Docker container
According to www.docker.com, “a container is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings.” You can create your own Docker container to package your analysis tools or use options provided by Amazon SageMaker. You can store a container in an Amazon ECR registry that you specify so it is available to install on your desired platform. Containerizing A Notebook describes how to containerize a notebook. Docker containers are capable of running your custom analytical code prepared with Matlab, Octave, Wise.io, SPSS, R, Fortran, Python, Scala, Java, C++ and so on.
Delta windows
Delta windows are a series of user-defined, non-overlapping and contiguous time intervals. Delta windows allow you to create data set contents with, and perform analysis on, new data that has arrived in the data store since the last analysis. You create a delta window by setting the deltaTime in the filters portion of a queryAction of a data set (CreateDataset). Usually, you’ll want to create these data set contents automatically by also setting up a time interval trigger (triggers:schedule:expression). Basically, this allows you to filter messages that have arrived during a specific time window, so the data contained in messages from previous time windows doesn’t get counted twice. See Example 6 — Creating a SQL dataset with a Delta Window (CLI):.

Accessing AWS IoT Analytics

As part of AWS IoT, AWS IoT Analytics provides the following interfaces to interact with your devices and the data they generate:

AWS Command Line Interface (AWS CLI)
Run commands for AWS IoT Analytics on Windows, OS X, and Linux. These commands allow you to create and manage things, certificates, rules, and policies. To get started, see the AWS Command Line Interface User Guide. For more information about the commands for AWS IoT, see iot in the AWS Command Line Interface Reference.

Important

Use the aws iotanalytics command to interact with AWS IoT Analytics using the CLI. Use the aws iot command to interact with other parts of the IoT system using the CLI.

AWS IoT API
Build your IoT applications using HTTP or HTTPS requests. These API actions allow you to create and manage things, certificates, rules, and policies. For more information about the API actions for AWS IoT, see Actions in the AWS IoT API Reference.
AWS SDKs
Build your AWS IoT Analytics applications using language-specific APIs. These SDKs wrap the HTTP/HTTPS API and allow you to program in any of the supported languages. For more information, see AWS SDKs and Tools.
AWS IoT Device SDKs
Build applications that run on your devices that send messages to AWS IoT Analytics. For more information see AWS IoT SDKs.

AWS IoT Analytics Message Payload Restrictions

The field names of message payloads (data) that you send to AWS IoT Analytics:

  • Must contain only alphanumeric characters and underscores (_); no other special characters are allowed.
  • Must begin with an alphabetic character or single underscore (_).
  • Cannot contain hyphens (-).
  • In regular expression terms: "^[A-Za-z_]([A-Za-z0-9]*|[A-Za-z0-9][A-Za-z0-9_]*)$".
  • Cannot be greater than 255 characters.
  • Are case-insensitive. (Fields named “foo” and “FOO” in the same payload are considered duplicates.)

For example, {"temp_01": 29} or {"_temp_01": 29} are valid, but {"temp-01": 29}{"01_temp": 29} or {"__temp_01": 29} are invalid in message payloads.

AWS IoT Analytics Service Limits

APILimit DescriptionAdjustable?
SampleChannelData1 transaction per second per channelyes
CreateDatasetContent1 transaction per second per data setyes
RunPipelineActivity1 transaction per secondyes
other management APIs20 transactions per secondyes
BatchPutMessage100,000 messages or 500MB total message size per second per channel; 100 messages per batch; 128Kb per messageyes; yes; no
ResourceLimit DescriptionAdjustable?
channel50 per accountyes
data store25 per accountyes
pipeline100 per accountyes
activities25 per pipelineno
data set100 per accountyes
minimum data set refresh interval15 minutesyes
concurrent data set content generation2 data sets simultaneouslyno
container data sets that can be triggered from a single SQL data set10no
concurrent container data set runs20no