Make the Most of Your Data Analysts Part 5: The Data Analyst Workbench
Data analysts currently use a plethora of tools to find, prepare, visualize, analyze and share data. That is changing!
For decades, data analysts were left to their own devices to find, massage, analyze, visualize, and share data. Most used tools, such as spreadsheets and desktop databases, to handle these tasks. Data analysts spent 60% of their time preparing data, 20% building reports, and, if they were lucky, 20% analyzing the data.
Today, these ratios are changing thanks to the advent of specialized, self-service technologies geared to data analysts: data visualization, data preparation, data catalogs, and advanced analytics. Collectively, these tools provide most of the functionality data analysts need to do their jobs.
Platforms versus purpose-built tools. Unfortunately, there is a downside to these technologies: there are too many of them! Companies often source the tools from different vendors and leave it up to data analysts to make the products work together. This represents a missed opportunity since data analysts lose efficiency when they have to context-switch between products and figure out how to share data between them.
Recognizing this problem, vendors now combine the functionality of these tools into a single platform powered by a data fabric and enhanced with embedded workflows. Called a data analyst workbench, these new environments turbocharge self-service initiatives and boost data analyst productivity tenfold or more.
Data Visualization. About a decade ago, self-service visualization tools debuted, enabling data analysts to to create interactive dashboards without IT assistance. The tools also enable them to slice and dice data and apply complex rules to answer business questions. But by itself, data visualization leaves data analysts stranded: they can create dashboards, but are still dependent on IT for data.
Data Catalog. To address this problem, vendors offered data catalogs. These products scan corporate databases to create a metadata inventory of data sets, SQL queries, data views, reports, schema, and other artifacts. With a data catalog, data analysts can search for relevant data, profile that data, understand its lineage and quality, and examine who is using it elsewhere in the organization for what purpose. Basically, a catalog creates a data inventory—or data marketplace—of all data available data for analysis. Many companies now use catalogs to curate and govern data as well.
Data Preparation. Although data catalog makes it easy to find relevant data, it doesn’t help data analysts clean, format, combine, derive, or aggregate data. That’s the job of data preparation tool, another component of the data analyst workbench. Data preparation tools go well beyond what Excel offers; most are designed to handle big data with large volumes of multi-structured data. Most importantly, data preparation tools keep a visual audit trail of data manipulations, allowing data analysts to edit or reuse an existing analysis that they or a colleague created.
Advanced Analytics. Data analysts are increasingly asked to perform complex analyses that involve modeling data using statistical or machine learning tools. Although data analysts are not data scientists, many vendors now ship lightweight analytics tools for “citizens data scientists.” These include AutoML tools that automate the process of creating and deploying ML models, making them ideal for data analysts who want to take their analyses to the next level. These features are also part of data analyst workbench.
Data Fabric. Powering the data analyst workbench is a data fabric that reaches out and connects with every data source in an organization and provides a unified view across them. The data fabric enables data analysts to find and profile data via the data catalog and query and massage the data via a data preparation or visualization tool.
The best data fabrics offer comprehensive support for all data sources, applications, and file systems and work with both structured and semi-structured data. It also provides a semantic view of data that simplifies data access for business users. The view masks the physical location of data, so users can obtain data that comes from multiple systems in real-time. Finally, this layer can enforce security policies by controlling data access at the data set, row, and column level.
Lifecycle and workflows. By incorporating the above technologies, the data analyst workbench supports the data analysis lifecycle. (See figure 5-1.) It also embeds workflows that enable people to collaborate across different parts of the lifecycle. For instance, a business user can submit a question to a data analyst who can profile and analyze existing data sets to derive an answer. If the data doesn’t exist to answer the question, the data analyst can ping a data engineer to find and deliver it. All the interactions and outputs are orchestrated by the data analyst workbench.
Openness. Workbench vendors recognize that most customers have existing tools to perform some or all of the steps in the data analysis lifecycle. For example, a company may want to use an existing visualization or data preparation tool rather than the one that comes with the self-service workbench. To support customer needs, workbench platforms often provide built-in support for popular visualization and preparation tools and come with APIs that enable customers to build customer integrations, if needed.
A data analyst workbench integrates data catalog, data preparation, advanced analytics, visualization, and data fabric functionality and is geared to data analysts. Data analysts don’t want to jump from tool to tool when working with data to answer a business question. They want a data platform that incorporates all the functionality required to do their jobs.
The data analyst workbench also supports built-in workflows that make it easy for business users, data analysts, and data engineers to collaborate around business questions. These workflows increase worker productivity and promise to help organizations stay ahead of self-service workloads.