如何使可视化更有效:以Elizamary Nascimento的一篇文章为例

How to Make a Visualization More Effective: Take an article of Elizamary Nascimento as example

[Chinese] 如何使可视化更有效:以Elizamary Nascimento的一篇文章为例

Effective data visualization can help us to better extract and convey the information from the images. Because of the love of software learning and in order to further study visualization, we focus on a paper published in Journal of system and software, which named Software engineering for artificial intelligence and machine learning software: A systematic literaturewritten by Elizabeth Nascimento (corresponding author) in November 2020. We interpret and analyze a picture visualization of software engineering and Al software in this paper, and optimize and improve it according to the relevant visualization knowledge.

The visualization that we found in the paper is as follows:

如何使可视化更有效:以Elizamary Nascimento的一篇文章为例_第1张图片

Background

Recent technological advancements regarding cloud computing, big data management, algorithms, and tools have enabled a lot of opportunities for businesses, industries, and societies to make use of Artificial Intelligence.

Software engineering, as a discipline that studies the construction and maintenance of effective, practical and high-quality software with engineering methods, has been applied in various industries, such as industry, agriculture, banking, aviation, government departments and so on. These applications promote the development of economy and society, and also improve work efficiency and life efficiency.

The Guide to the Software Engineering Body of Knowledge (SWEBOK Guide) describes generally accepted knowledge about software engineering.There are 11 knowledge areas of SWEBOK: Software Requirements, Software Design, Software Construction, Software Testing, Software Maintenance, Software Configuration Management, Software Engineering Management, Software Engineering Process, Software Deployment, Software Professional Practice and Software Quality.

Therefore, the research on the relationship between SWEBOK knowledge area and AI software can provide effective guidance for software development, standardize the steps software development , and clarify the focus which is the foundation of software development

The story of pictures

This visual image uses the bar chart type and shows us 11 kinds of SWEBOK knowledge area associated with AI software and how relevant they are.

By analyzing this visual image, we get the following conclusions:

  • The visual patterns are : size/heightof the bar which represents the association between these areas and Al software, positionof the bar. In particular, in this visualiaztion picture,coloris also the visual pattern where different colors represent different SWEBOK knowledge area which is redundant.
  • Data type: categoricalfor x-axis and ratiofor y-axis

From the cognitive theory, this picture label the values of each bar directly,to some extent, unnecessary cognitive tunneling is avoided. The area for Testing presents 46% of the cases, Design, and Configuration Management presents 43% in each area. The areas of Construction (27%), Professional Practice (21%), Requirements and Process (18%) are areas that also show technologies and/or practices that are being developed to meet these areas. The areas with the lowest number of software engineer for AI/Machine Learning practices were Deployment, Maintenance, Quality, and Management.

From the diagram, we can conclude that design, testing and configuration management are the three items most related to Al software in SWEBOK knowledge areas. This information reminds us that we should pay moren attention to these three points when developing al software.

Replication

We use Python’s Matplotlib to reproduce the visual image, and try to keep the color and data consistent with the original image.

如何使可视化更有效:以Elizamary Nascimento的一篇文章为例_第2张图片

Analysis of the disadvantages

Although this image can express some useful information, we can still improve it based on some visual knowledge.

a. The grid lines of the background of the picture is unnesscarry.

Reason:Grid lines are Non-Data-Ink which can distract the reader from the data.

Solution:We should get rid of the grid lines to increase the Data-Ink Ratio.

b. There is no need to use different colors to represent different classifications.

Reason:Our working memory is limited so we should aviod information overload. And, according to cognitive tunneling, when readers focus on color classification, they may miss important information that the picture want to express so information visualizations should avoid unwanted cognitive tunneling.

Solution:We choose to use two colors: the first three most relevant areas use the same color, and the remaining areas use another color. In this way, users can avoid using unnecessary cognitive channels, and can significantly distinguish the three most relevant areas, so that readers can clearly understand the important information expressed by the visual images.
In particular, in order to make sure that our colors are color-friendly, we test our image with the tool Coblis and pass the test.

c. The box around the figure is unnesscary

Reason:Our brain can only process 1% of the visual data. Reduce the number of elements it has to process in the figure.

Solution:Turn off the box around the figure.

d. The ratio of the y-axis is not necessary.

Reason:Cognitive tunnelling may naturally happen when looking at a visual element in the graph. If that happens, the user will not lose
relevant information as long as the labels
are close. If the user goes back and forth to the legend, it will have to process other elements in the graph which may overload the working memory. Moreover, the ratio of y-axis belongs to redundant data-ink.

Solution:Label elements directly, avoiding indirect look-up.

e. There is no clear, self-explanatory title.

Reason:Readers may not be clear about the theme of this visual image. It may waste readers more time and energy to interpret and find.

Solution:Add an intuitive title to the visual image.

f. Text label should never be rotated.

Reason:Rotating x-label will make readers waste unnecessary time to distinguish. Good visualization should be intuitive and simple under the premise of effective expression of information.

Solution:Rotate bar
chart when category names are too long.That is, exchange the X and Y axes.

g. The data are not sorted for easier comparisons.

Reason:Beacuase our working memory capacity is limited so that we can not remember 11 areas of the SWEBOK and find the three most prominent areas at the same time.

Solution:The histogram should be sorted by size.

For the above theoretical knowledge modification, we use Python’s Matplotlib again. In the color selection of bars, in order to highlight the three most relevant SWEBOK knowledge areas and consider the color blindness friendly, we choose two colors: red and gray. The final modified pictures are shown as follows:

如何使可视化更有效:以Elizamary Nascimento的一篇文章为例_第3张图片

Conclusion

In this report, we selected a very popular picture of software engineering recently and analyzed it, including the content of the image data expression, the analysis of the shortcomings of the picture, and improved the picture. From this process, we understand that the focus of data visualization is to clearly express data, we should avoid unnecessary data-ink, and a good data visualization should fully consider the psychological cognition of readers.

Author

This report is completed jointly by three students in LZU.

If you have any question or you want codes, do not hesitate to contact us.

Figure Source

[1] E. Nascimento, A. Nguyen-Duc, I. Sundbø, and T. Conte, “Software engineering for artificial intelligence and machine learning software: A systematic literature review,” 2020.

你可能感兴趣的:(数据可视化,可视化)