At Camphor Networks, we set out to test our platform at scale, in a large K8 based cluster. But no sooner we started did we realize that bringing up a large k8 cluster itself is a big challenge, leave alone running complex applications on top of it. Many of the challenges we encountered with various public clouds were actually non-technical such as
1. CPU, Memory and Public IPs quota limitations
2. Arbitrary/Non-Deterministic bring up time
3. Broken/Inaccessible VMs
4. High cost of resources
etc.
It was very frustrating because we could never even get to the state where we can start to test and harden our camphor networks platform, running at scale. Inhibitions encountered to bring up the required elastic infrastructure in the first place were just too many..
Then we asked ourselves the hard question. Since using Camphor Network Platform, users can [supposedly] seamlessly run complex VMs and VNFs at scale, why can’t we use it to build the necessary infrastructure by ourselves! i.e Run Camphor Networks Kubernetes Cluster on top of another Camphor Networks Kubernetes Cluster!
It was an arduous journey, to say the least. We faced several intricate technical challenges like
1. Simple Internet connectivity issues from nested pods!
2. DNS Issues
3. MTU Issues
4. Disk IO bottlenecks
5. Memory fragmentations and caching issues
6. Network I/O issues
7. Layer0 and Layer1 Kernel performance issues
8. Nested File System mounting issues across the nested k8 cluster
9. Intricate software defects in our own code and even worse, such issues in third party code
etc. etc. etc.
The list of issues seemed endless. Goal seemed very ambitious. But we realized that with constant focus and unwavering dedication, any number of technical challenges we faced can indeed be solved or at least be worked around!
First, we launched 100 Linux Devices (VMs) on top of a Layer0 Camphor Networks Platform running on a single bare metal server! On these 100 servers, we installed another Camphor Kubernetes Platform (Layer1).
Then we launched several projects inside the Layer1 platform to showcase and test various cool aspects of the distributed Camphor Networks Kubernetes Platform such as
1. Data Center with BGP EVPN
2. L3 CLOS Leaf and Spine integrated with third party web applications
3. Service Provider BGP + MPLS + RSVP/LDP L3VPN
This entire test suite which brings up both Layer0 & Layer1 clusters and projects & applications on top of the Layer1 cluster is completely automated. It can run on just a single bare metal server (or over many) as desired.
When we run this over a single bare metal server at aws spot market, the marathon test completes in just under an hour and believe it or not, costing only a single US Dollar!
Please take a look at this short video to see that we don’t just talk the talk, but indeed walk the walk. We shall be more than happy to answer any of your questions or comments!