A Challenge of the Software Distribution

Posted on February 14, 2021 by Sven Ruppert Leave a comment

The four factors that are working against us

Software development is more and more dependent on Dependencies and the frequency of deployments is increasing. Both trends together are pushing themselves higher. Another element that turns the delivery of software into a network bottleneck is the usage of compounded artefacts. And the last trend that is working against us, is the exploding amount of edges or better-called edge nodes.All four trends together are a challenge for the infrastructure.But what we could do against it?

Edge-Computing

Before we look at the acceleration strategies I will explain a bit the term “Edge” or better “Edge-Computing” because this is often used in this context.

What is Edge or better edge computing?

The principle of edge computing states that data processing takes place at the Edge of the network. Which device is ultimately responsible for processing the data can differ depending on the application and the implementation of the concept.
An edge device is a device on the network periphery that generates, processes or forwards data itself. Examples of edge devices are smartphones, autonomous vehicles, sensors or IoT devices such as fire alarms.
An edge gateway is installed between the edge device and the network. It receives data from edge devices that do not have to be processed in real-time, processes specific data locally or selectively, sends the data to other services or central data centers. Edge gateways have wireless or wired interfaces to the edge devices and the communication networks for private or public clouds.

Pros of Edge Computing

The data processing takes place in the vicinity of the data source, minimising transmission and response times. Communication is possible almost in real-time. Simultaneously, the data throughput and the bandwidth usage reduction in the network, since only specific data that are not to be processed locally need to be transmitted to central data centres. Many functions can also be maintained even if the network or parts of the network fail—the performance of edge computing scales by providing more intelligent devices at the network periphery.

Cons of Edge Computing

Edge computing offers more security due to the locally limited data storage, but this is only the case if appropriate security concepts are available for the decentralised devices, due to the heterogeneity and many different devices, the effort involved in implementing the security concepts increases.

Fog Computing

Edge computing and fog computing are both decentralised data processing concepts. Fog Computing inserts another layer with the so-called Fog Nodes between the edge devices and the cloud. These are small, local data centres in the access areas of the cloud. These fog nodes collect the data from the edge devices. You select the data to be processed locally or decentrally and forward it to central servers or process it directly yourself.
Selecting the best of both worlds means we are combining both principles of Edge- and Fog-Computing.

What are the acceleration options for SW Distribution?

There are different strategies to scale the distribution of binaries, and every solution suits a specific use-case. We will not have a few on cloud solutions only because companies are operating worldwide and have to deal with different governmental regulations and restrictions. Additionally, to these restrictions, I want to highlight the need for hybrid solutions as well. Hybrid solutions are including on-prem resources as well as air gaped infrastructure used for high-security environments.

a) Custom Solution based on replication or scaling servers

One possibility to scale inside your network/architecture is scaling hardware and working with direct replication. Implementing this by yourself will most-likely consume a higher budget of workforce, knowledge, time and money based on the fact that this is not a trivial project. At the same time, this approach is bound into the borders of the infrastructure you have access to.

b) P2P Networks

Peer to Peer networks is based on equal nodes that are sharing the binaries between all nodes.The peer to peer approach implies that you will have a bunch of copies of your files. If you are downloading a file from the network, all nodes can serve parts independently. This approach of splitting up files and delivering from different nodes simultaneously to the requesting node leads to constant and efficient network usage and reduced download times.

c) CDN – Content Delivery Network

CDN’s are optimised to deliver large files over regions. The network itself is build-out of a huge number of nodes that are caching files for regional delivery. With this strategy, the original server will not be overloaded.

Check out on my Youtube Channel the video with the title "DevSecOps - the Low hanging fruits".This video describes the balance between writing the code itself or adding a dependency in each Cloud-Native App layer. The question is, what does this mean for DevSecOps?

JFrog Solution

With the three mentioned techniques you can build up huge and powerful architecture that fit´s to your needs. But the integration of all these technologies and implementing products is not easy. We faced this challenge as well and over the years we found solutions that we integrated into a single DevSecOps Platform called “The JFrog Platform“. I don´t want to give an overview of all components, for this check out my youtube channel. Here I want to focus on the components that are responsible for the Distribution of the binaries only.

JFrog Distribution

With the JFrog Distribution, the knowledge about the content of the repositories and the corresponding metadata is used to provide a replication strategy. The replication solution is designed for internal and external repositories to bring the binaries all the way down to the place where it is needed. The infrastructure can be built in a hybrid model, including on-prem and cloud nodes.Even air-gapped solutions are possible with import/export mechanisms. In this scenario, we are focussing on a scalable caching mechanism that is optimised for reads.

What is a Release Bundle?

A Release bundle is a composition of binaries. These binaries can be of different types, like maven, Debian or Docker. The Release Bundle can be seen as a Bills Of Materials (BOM).The content and well as the Release Bundles itself are immutable. This immutability makes it possible to implement efficient caching and replication mechanisms across different networks and regions.

What is an Edge Node in this context?

An Edge Node in our context is a node that will provide the functionality of a read-only Artifactory.With this Edge Node, the delivery process is optimised, and we will see that replication is done in a transactional way. The difference to the original meaning of an Edge Node is that this instance is not the consuming or producing element. This can be seen as a Fog-Node, that is the first layer above the real edge nodes layer.

P2P Download

The P2P solution focuses on environments that need to handle download bursts inside the same network or region.This download bursts could be scenarios like “updating a server farm” or “updating a Microservice Mesh”. The usage is unidirectional, which means that the consumer is not updating from their side. They are just waiting for a new version and all consumer updating at the same time.This behaviour is a perfect case for the P2P solution. Artifactory, or an Edge Node in the same network or region, is influencing an update of all P2P Nodes with a new version of a binary. The consumer itself will request the binary from the P2P node and not from the Artifactory instance anymore.The responsible Artifactory instance manages the P2P nodes, which leads to zero maintenance on the user side. Have in mind, that the RBAC is active at the P2P nodes as well.

CDN Distribution

The CDN Solution is optimised to deliver binaries to different parts of the world. We have it in two flavours. One is for the public and mostly used to distribute SDK’s, Drivers or other free available binaries. The other flavour is focussing on the private distribution.Whatever solution you are using, the RBAC defined inside the Access Module is respected, including solutions with Authentication and Authorisation and unique links including Access Tokens.

Conclusion

Ok, it is time for the conclusion.What we discussed today;
With the increasing amount of dependencies, a higher frequency of deployments and the constantly growing number of applications and edge-nodes, we are facing scalability challenges.
We had a look at three ways you could go to increase your delivery speed.The discussed solution based on

a) JFrog Distribution helps you build up a strong replication strategy inside your hybrid infrastructure to speed up the development cycle.
b) JFrog P2P that will allow you to handle massive download bursts inside a network or region. This solution fits tasks that need to distribute binaries to a high number of consumers concurrently during download bursts.
c) JFrog CDN to deliver binaries worldwide into regional data centres to make the experience for the consumer as best as possible.

Sven Ruppert