Post

Troubleshooting Network Issues with AWS Network Access Analyzer and Reachability Analyzer

Abstract

AWS has recently introduced two powerful network tools: the Network Access Analyzer and the Reachability Analyzer. Both of these tools utilize a similar algorithm, producing similarly comprehensive reports. However, the Network Access Analyzer is geared towards identifying unauthorized access. It assumes that there should be no access between the source and target that you have configured, and alerts you to any deviations from that expectation.

On the other hand, the Reachability Analyzer operates in the opposite manner. It detects the path between the source and target, providing detailed information about the intermediate components involved (such as ENIs, subnets, routing tables, NACLs, NATs, IGWs, etc.). If there is no connection, the tool will alert you to the issue.

According to the AWS documentation, these tools do not utilize actual network traffic or send any packets. Instead, they use a mathematical model to analyze the VPC configuration. This allows for efficient and accurate analysis, without any of the latency or security concerns associated with sending network packets.

Network setup

In this blog post, we will be exploring a standard VPC setup with a private subnet that has limited outbound internet access via a NAT Gateway. The instances in the private subnet are able to communicate within the VPC using the local route, but are not directly accessible from the internet. To facilitate system updates and package installations, a NAT instance is provisioned in the public subnet. The NAT instance is set as the default route in the private subnet.

vpc.png

We will be utilizing both the Network Analyzer and the Reachability Analyzer in this setup and will intentionally introduce several faults into various components of the traffic chain to demonstrate how check tools can detect and report them. This will provide a hands-on demonstration of the capabilities of these powerful AWS network tools.

Network analyzer setup:

When setting up an analyzer in AWS, there are several options to choose from. We will be focusing on two of these options: access from the Internet Gateway and access to the Internet Gateway.

As source and destination we can detect communication between following entities:

  • VPCs
  • Subnets
  • Network interfaces
  • Internet Gateways
  • VPN Gateways
  • VPC Endpoint Service
  • Peering Connections
  • Transit Gateway Attachments
  • Resource Groups
  • Security Groups

nwa_options.png

Step1: choose the source:

nwa_types.png

Step2: choose the target:

nwa_targets.png

Once the configuration is complete it is also displayed in JSON format in aws console:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
  "MatchPaths": [
    {
      "Source": {
        "ResourceStatement": {
          "Resources": [
            "i-0xxxxxx"
          ]
        }
      },
      "Destination": {
        "ResourceStatement": {
          "ResourceTypes": [
            "AWS::EC2::InternetGateway"
          ]
        }
      }
    }
  ]
}

You can trigger analyzer not only from aws console but also using aws cli by specifying this JSON config:

1
aws ec2 create-network-insights-access-scope --cli-input-json file://path-to-access-scope-file.json

Traffic from EC2 to Internet GW

Analysis is done for both TCP and UDP protocols, we can switch between finding. nwa_results_protocols.png

We can observe comprehensive view of the communication pathways within your VPC. The results display the full network path and all the components between the source and destination, along with detailed information about each component. With this task Analyzer performs perfectly well.

nwa_result_p1.png

One of the key benefits of these tools is the ability to quickly identify any issues with the configuration of your VPC. If any component is configured in a way that does not allow traffic, it will be marked in red and a detailed message indicating the issue will be displayed.

This level of detail and visibility into the communication pathways in your VPC is invaluable for ensuring that your VPC is optimally configured for both security and accessibility. By quickly identifying and resolving any issues, you can ensure that your VPC is functioning as expected, and that your resources are protected and accessible.

nwa_results_p2.png

Traffic from Internet GW to EC2

Now we will test inbound traffic from Internet gateway to EC2 instance. Since it is located in private subnet there should be no connection. Following configuration is applied:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
  "MatchPaths": [
    {
      "Source": {
        "ResourceStatement": {
          "ResourceTypes": [
            "AWS::EC2::InternetGateway"
          ]
        }
      },
      "Destination": {
        "ResourceStatement": {
          "Resources": [
            "i-0xxxxxxx"
          ]
        }
      }
    }
  ]
}

As expected no finding detected (no inbound traffic): no_findings.png

Reachability Analyzer

Now lets use another tool - Reachability Analyzer. Configuration is pretty similar. reachability.png

Fault injection

The process of identifying the root cause of network configuration issues can be a time-consuming and challenging task, given the numerous players involved, such as instances, attached Elastic Network Interfaces (ENI), security groups, access control lists (ACL), routing tables, subnets, NAT gateways, and internet gateways. However, the use of a Reachability Analyzer can simplify this process and make it more efficient.

Fault1: Security Group break (delete outbound)

We have misconfigured Outbound Security Group on ENI of our running EC2 instance - and analysis report that Internet gateway is not reachable. nwr_not_reachable.png

Report looks awesome, it identifies exact root cause - the eni security group and error message: ENI_SG_RULES_MISMATCH.

nwr_sg.png

Fault2: Routing table (delete default route to NAT)

Now lets update routing table of private subnet where EC2 instance is deployed and delete default route to NAT gateway. As a result, our EC2 instance will be fully isolated from external communication and unable to receive software updates or make outbound connections. nwr_routetable.png

Error message NO_ROUTE_TO_DESTINATION again perfectly target misconfigured component.

Fault3: Detached NAT gateway

nwr_routetable.png Since NAT gateway was deleted, now its route in private subnet is marked as BlackHole. Error message NO_ROUTE_TO_DESTINATION is not so informative, but we have good starting point for troubleshooting.

Fault4: NACL in public SN Deny all outbound traffic

We are now proceeding with the restoration of all configurations to initial working state and next will modify the public SN Access Control List to restrict any outgoing network traffic ensuring that no instances have outbound connection to internet. nwr_routetable.png

Unfortunately, the results of the analysis were not as anticipated. The Reachability Analyzer identified an issue with the routing table in the private subnet, but there was no indication of any problems with the public subnet.

This discrepancy is not necessarily a bug, as the Reachability Analyzer evaluates connectivity by tracing the path from source to destination. If there is no configuration in the private subnet to reach destinations with internet access, the routing table is considered misconfigured.

Therefore, it appears that the Reachability Analyzer is correctly marking the public subnet as having no internet access.

However, this result should not be taken as the exact root cause, but rather as a starting point for manual troubleshooting.

After making necessary changes, we are ready to run the analysis again to verify the results and identify the next wave of potential issues.

Conclusions

Both tools are highly effective in troubleshooting network availability and detecting unauthorized access.

Although there was a small error in report of the last experiment, a skilled human technician would likely have arrived at the same conclusion.

Another key benefit is the ability to integrate these tools into various pipelines and scans, enabling their regular use and contributing to the development of a more robust infrastructure.

Looking ahead, it is possible that future advancements in network availability search algorithms will be able to resolve this case with public subnet misconfiguration with even greater accuracy.

This post is licensed under CC BY 4.0 by the author.