We are thrilled to announce that the public preview of streaming dataflows in Power BI is now available for everyone to try out.
Back at MBAS we provided a sneak peek of streaming dataflows as we worked with dozens of customers around the world using streaming and real-time data for their analytics, to tweak and improve this new capability before its public availability.
Power BI was a pioneer in providing easy-to-use real-time data visualization when we launched back in 2015 with streaming datasets. Since then, Power BI customers have been using Power BI and integrating with other Microsoft products and services to visualize their data immediately as it happens.
And now with streaming dataflows, we are taking real-time to the next level in Power BI, removing most of the restrictions we had on data limits and visualization types while bringing key analytics capabilities such as streaming data prep and no code authoring to the mix with the end goal to allow more users to tap into the potential of streaming and real-time data.
Arun Ulag couldn’t have said it better back in May for the announcement as part of BUILD: “Customers want to work with data as it comes in, and not days, or weeks later. Our vision is simple — the distinctions between batch, real-time, and streaming data today will disappear. Customers should be able to work with all data as soon as it is available.”
OK, so what are streaming dataflows again?
Similar to its dataflows sibling, streaming dataflows allow authors to connect to, ingest, mashup, model, and build reports but based on continuous streaming, near real-time data. This is done directly in the Power BI service with beautiful, drag and drop, no-code experiences.
Users can mix and match streaming data with batch data if they need to. This is done through a friendly UI that includes a diagram view for easy data mashup. The final artifact produced is a dataflow, which can be consumed in real-time, creating highly interactive, near real-time reporting. All of Power BI’s rich data visualization capabilities work with streaming data just as it does with batch data today.
To get started, you will need a premium workspace (either as part of capacity or PPU) without any dataflows in it. Then all you have to do is click on the “+New” dropdown, select streaming dataflows, give it a cool name and you’re good to go.
Once you land the streaming dataflow UI you will find four main areas where authoring happens:
- Ribbon: In the ribbon, you will see different sections that follow the order of a “classic” analytics process: inputs (also known as data sources), transformations (streaming ETL operations), outputs, and finally, a button to Save your progress.
- Diagram view: graphical representation of your dataflow from inputs to operations to outputs
- Side pane: depending on which component you have selected in the diagram view, you will have settings to modify each input, transformation, or output.
- Data preview / Authoring errors / Runtime errors: for each card shown, the data preview will show you the results for that step (live for inputs and on-demand for transformations and outputs). This section also summarizes any authoring errors or warnings that you might have in your dataflows. Clicking on each error or warning will select that transform. Lastly, you will have access to runtime errors once the dataflow is running such as dropped messages. You can always minimize this section of streaming dataflows clicking on the arrow in the top-right corner of the preview pane.
So now it is time to start bringing some streaming data. As of today, streaming dataflows support two inputs: Azure Event Hubs and Azure IoT Hubs. You will need a connection string for both (the hubs “credentials” sort of speaking). Add the input, past the connection string and that is it.
After adding an input, you can ask streaming dataflows to auto-detect the inputs that are coming in. You can also preview the data coming in live in the data preview pane.
You will see that the transformations available for streaming dataflows are very standard (think filter, join, group by, etc.) with one caveat: most of them have a new time dimension component embedded I them, which is core to streaming data. We won’t go over the details for transformations in this post but you can head to our documentation and take a look about the options for these.
As a glimpse into the opportunities, you can join streams over a period of time, group by and summarize with different time windows such as tumbling and session, and much more!
Outputs will be your dataflow table (also known as entity) that can be used to create reports in Power BI Desktop. You will also need to join the nodes of the previous step with the output you are creating to make it work. After that, all you need to do is name the table and you are good to go.
There is one major difference between this output and a regular Power BI table. Streaming dataflows will create two tables for you: one with historical data (cold storage) and one with real-time data (hot storage). The idea as that you have all data available for you and depending on the type of analysis you want to perform you can tap into one or the other or both if needed.
The default retention policy for hot storage is 24 hours (the minimum) but it can be changed in the streaming dataflow settings.
More about that when creating reports. Let us just say for now that you can create as many output tables as you want and each of them will have this hot and cold component.
Start your streaming dataflow
Once you save your streaming dataflow and confirm there are not any errors, you are good to “start” your streaming dataflow and start bringing data into Power BI.
To start your streaming dataflow head to the workspace and hover over your streaming dataflow’s name. You will see a “play” icon. Click it and your dataflow will start. It is important to mention that depending on your streaming logic starting the dataflow can take some time so please be aware of that.
By now it is safe to assume that you know this is not a regular refresh. It is more like an infinite or continuous refresh if the dataflow is running. If data is coming in, the dataflow will process it, perform the data preparation logic that you defined and drop the data in Power BI for analysis.
The streaming dataflow will keep running until one of two things happens: it is stopped by the user or it fails for other reasons.
You can find all this information in the refresh history of the dataflow.
Creating real-time reports
Once your streaming dataflow is running, then the fun begins. Same as with regular dataflows, you just need to connect to streaming dataflow from Power BI desktop and start creating visuals.
As we mentioned before, there is one difference. When connecting to your dataflow you will see two versions of each output table: hot (for real time visuals) and cold (for historical analysis visuals) storage.
To connect to your streaming data, go to “get data” and search for the Power Platform dataflows (Beta) connector, available in the July 2021 release of Power BI desktop. You can also use the Power BI dataflows connector with two caveats:
- You will only be able to connect to the hot storage.
- The data preview will not work.
You will see both versions of each table with their original names followed by Streaming/Hot or Archived/Cold. For this example we will ahead and only select streaming to point out some of the unique steps for real-time visuals.
Once you do this and click on load, Power BI Desktop will ask you for the type of connection for these tables. If you want to create real-time visuals, you must choose DirectQuery.
And now you’re ready to create real-time reports using automatic page refresh to update visuals as often as one second on top of the data being brought in by streaming dataflows.
Some quick tips and reminders to make sure you can create these real-time reports:
- You will have to use automatic page refresh or change detection to make sure the reports are updated automatically. Make sure you know your admin and tenant settings for how frequently you can update visuals.
- Take advantage of the relative time slicer and filters in Power BI to have visuals that only show you the latest X time for real-time clarity.
What is next?
To learn more about streaming dataflows you can head to our documentation to start using this new capability right away.
The Azure team has a great toll sensor simulator sample that we have used on all these screenshots that can be deployed to Azure with one click to try it out with streaming dataflows. You can also use the Raspberry PI simulator in conjunction with IoT hub free tier to try out this new feature.
We would love to hear your feedback as we make streaming dataflows better and ready for GA in the time to come. We have created a public forum in our community site for you to provide feedback, give us ideas, and ask questions about this new capability.
Thank you again for all the support and feedback you give the Power BI team and let us know if you have any questions or comments about streaming dataflows in the comments and in the forum.