At&t U-verse comes with a vendor-supplied router that is designed for home-use, even when having a "small business" contract.
This shows in the router's config interface, designed for non-tech-savvy people, but also in terms of a bunch of
limitations. The main problem, especially when used in a business context, is a limitation of the router's NAT table,
capped at 2048 sessions (with the At&t support claiming already flaky behaviour when over 1000). Not sure why At&t,
defining small business as up to 50 people, is even selling this, as a lot of them will end up calling support
very soon, complaining about intermittent packet loss.
The box provides some passthrough modes for people that run their own routers, but although it sounds like
this would easily avoid that limitation, it doesn't. For whatever not-so apparent reason, all of those passthrough modes
actually just route, still filling up the NAT table. Some older firmwares had an exploitable vulnerability, which allowed
to root some models and enable a true
bridge mode, but newer versions plugged this; and ideally we would like to have a solution for a business
environment, that doesn't need tampering with At&t's equipment, anyways.
Looks like many
others
are
having the same problem (across different router models), debating workarounds, but no real hands-off solution
was found that doesn't involve to either root the router or having to re-rig things occasionally.
One solution that does work and is easy to set up is to tunnel all the office's outgoing traffic, exactly generating one entry
in the At&t box' NAT table. However, although a valid solution, this post will try to focus on an internal-network-only
workaround. Also, such a tunnel wouldn't necessarily fix the problem for inbound connections, either.
Why can't we just use our own hardware? Well, the U-verse home-use uplinks require 802.1X
authentication, with therefore a certificate that is on the router. It additionally sends every 24h a CWMP
periodic inform message to At&t, which we probably should keep sending, also. I think it would be possible to open the
router and dump the box' ROM contents to get a hold of the cert, then reimplement the logic on a better box. Or, if the limitation
is software based, we could even attempt flashing the router with a modified version, as the firmwares used seem
to be all open source and available (excluding the cert, of course). However, both approaches would be tampering with their equipment.
All of this means, that since we don't want to tamper with the router, it has to stay part of the equation.
So, let's try to have the At&t router still connected to the uplink to do the authentication and heartbeat, but
not pass any real traffic through it, that doesn't need to. Basically, we want the following:
uplink
│
│
┌──── magic ────┐
│ │
│ At&t
│ router
intranet
Note that the traffic that goes from uplink to the intranet does not pass through the At&t box, at all. The latter
will simply stay attached to the uplink (whether you use fiber, ONT, etc., in my case it was a fiber cable), but that's
the only link connected to it. No other cable will be attached to it.
So, magic now has to split the traffic into:
- all 802.1X/EAP and management traffic should be allowed to and from the At&t box
- everything else should flow between uplink and the intranet
Let's hook up a box in between the At&t router and the uplink, first, so we can intercept the traffic. It doesn't
really matter what type of box it is, but needs to be more programmable/flexible than your standard managed switch's
or router's vendor GUI/CLI. In my case, an Ubiquiti EdgeRouterPro was used, as it allows full Linux shell access.
So, with the At&t box connected to... let's say eth6, and the uplink connected to eth7, let's just bridge those
interfaces (br0), so the At&t box can talk to the uplink. Turns out that the 802.1D
standard (which is about bridging), says that standards compliant bridges aren't supposed to pass MAC addresses in
the range of 01:80:c2:00:00:00 to 01:80:c2:00:00:0f. 802.1X uses 01:80:c2:00:00:03, so it's effectively not going
through the bridge. So much about the general assumption that a bridge is just a "virtual switch"... not.
Anyways, looks like we can enable this on our bridge br0, so if this is a >=3.2 kernel:
echo 8 > /sys/class/net/br0/bridge/group_fwd_mask
If this is a pre-3.2 kernel, some folks maintained
a patch to achieve the same.
With this out of the way, the At&t router, with an uplink connection going through magic, should now sync with
the At&t uplink just fine (and all sync/broadband LEDs turn steady green).
On to the next step. What's basically missing is to now direct all incoming traffic to the correct next hop, depending
on whether it's for the At&t box itself (802.1X, management traffic, etc.), or else. This is luckily fairly easy to
figure out, on paper. The way this is setup for a U-verse "small business" contract that comes with 5 public IPs, is the
following:
- the At&t box gets provisioned with a static IP, which the support guy called "street IP", which "usually doesn't
change, except if the phyisical wiring changes or something like that"; note that that IP is different from our 5 public
ones
- this is the IP assigned to the box itself, so everything sent to the box itself is using that one
- the rest, with one of our 5 public IPs as destination, is also sent to the At&t box, for further routing
- in other words, everything from the outside is passed to the At&t box, as either destination or to be routed for our public
IP subnet; this means in layer 2 terms, that every incoming ethernet frame has the At&t box' MAC as destination
address
- for outgoing traffic, everything coming from the intranet needs to go directly to the uplink (which sounds obvious,
but given that the gateway address for our public IP block ARPs
to the At&t router, we also need a rule on magic to redirect those packets, so they actually go out and don't get
dropped on the WAN interface of the At&t router)
Note that the last bullet point mentioned the gateway address - as a sidenote here: we will use the same gateway address
with this setup as we would by using the At&t box the default way (running all traffic through it). This will be the
address used by the intranet as internet gateway. This address can be looked up in the At&t box' configuration, if you don't
know it - it definitely has to match the one in the configuration, though, or the below won't work. The benefit here is that
this allows for removing magic at any time from the setup, replugging everything in the old way, and it will work (except
for then having NAT table limitations again).
So, what we want to do is to filter on layer 2 by matching on destination IP address, and rewrite the MACs to either go
to the At&t box, or to the intranet. Well, for the latter, we need a destination now, so we need a box there, somewhere,
as router, with one of our public IPs set. This would be your main firewall (in my case it runs simply on the EdgeRouterPRO,
also, but virtually separated):
uplink
│
│
┌──── magic ────┐
│ │
router/fw At&t
│ router
intranet
We can use ebtables to do the traffic-splitting, as this is layer 2 logic. Unfortunately, there is another stumbling block.
At&t seems to use (not sure if always) a VLAN, here, probably per
customer.
The current version of ebtables can either match layer 3 addresses but not VLAN tags, or vice versa, but not both at the same
time. So we need to strip the VLAN/802.1Q tag from the ethernet frames, first, to be able to make use of ebtables. To do that,
we need to figure out the tag they assigned to us, first.
So, on magic's eth7, run a tcpdump with -e, and make sure some traffic goes through there (e.g. plug a machine into the LAN
ports of the At&t router, and visit a website, or so):
tcpdump -ei eth7
Output will be something like this, with the VLAN tag displayed
16:36:56.631607 10:20:30:40:50:60 (oui Unknown) > 60:50:40:30:20:10 (oui Unknown), ethertype 802.1Q (0x8100), length 147: vlan 2, p 0, ethertype IPv4, 1.2.3.4.5555 > 4.3.2.1.7777: UDP, length 101
So, in our case it's VLAN 2 that At&t assigned to us. Let's create VLAN interfaces eth6.2 and eth7.2, and also add them to br0. Now we
can run ebtables rules on eth7.2 with matching of IP address, as this interface receives the incoming eth7 traffic with the VLAN
tag removed. Before we get to the rules, we need to additionally add to our bridge the interface that links to the intranet. Let's
say this is eth5, so add eth5 to br0 for our example. Then for ebtables:
ebtables -t nat -A PREROUTING -p IPv4 -i eth5 --ip-src $OUR_PUB_IP_RANGE -j dnat --to-dst $MAC_ATT_UPLINK --dnat-target ACCEPT
ebtables -t nat -A PREROUTING -p IPv4 -i eth7.2 --ip-dst $OUR_PUB_IP_RANGE -j dnat --to-dst $MAC_INTERNALFW --dnat-target ACCEPT
ebtables -t nat -A POSTROUTING -d $MAC_ATT_UPLINK -j snat --to-src $MAC_ATT_RG_WAN --snat-target ACCEPT
As you can see, there are some blanks to fill in, namely the following variables:
OUR_PUB_IP_RANGE | the public IP block given to you from At&t, e.g. 1.2.3.4/29 |
MAC_INTERNALFW | MAC address of interface connected to eth5, the interface of the internal firewall box |
MAC_ATT_RG_WAN | MAC address of the At&t box' WAN interface; usually printed on At&t box' back |
MAC_ATT_UPLINK | MAC address of uplink hop's equipment, see below |
In order to figure out MAC_ATT_UPLINK, which is the MAC address of the device at the other end of the cable coming out of your wall,
you can use the following on magic:
brctl showmacs br0
This will list all interfaces that are part, or directly attached to br0:
port no mac addr is local? ageing timer
5 44:77:44:77:44:77 no 0.00
1 10:20:30:40:50:60 no 0.00
1 00:11:77:44:22:05 yes 0.00
2 00:11:77:44:22:06 yes 0.00
4 00:11:77:44:22:07 yes 0.00
3 a4:7a:77:a7:7a:77 no 44.82
Filtering out the three local addresses, which are the bridge interfaces eth5, eth6 and eth7 (note that eth6.2 and eth7.2 use the
same MACs as eth6 and eth7), three others are left. Two of those are addresses we know, namely the firewall's (here: 10:20:30:40:50:60)
and the At&t box' MAC (let's say this is a4:7a:77:a7:7a:77). The only one left is 44:77:44:77:44:77 in our example, which is the one
we are looking for. (If you have more than one remaining line, this might come from having had something else plugged in, which the bridge
learned. In that case, just check the ageing timer column for an ever increasing value, and let it time out. Eventually there
should be only one address left.)
Now, put together your ebtables rules and give it a test.
This is basically it. You might want to think of the following though:
- At&t might change the uplink hardware, so MAC_ATT_UPLINK might change; so it's a good idea to have your brctl showmacs br0 in some cronjob to update the uplink MAC in case of it changing
- theoretically, At&t might also change the VLAN id, however, I don't think this is realistic as it's already an abstraction and easier to keep for them than when replacing hardware
- be aware that every time you destroy and recreate br0, the group_fwd_mask needs to be reset to let 802.1X traffic pass
UPDATE (2016-07-22): new methods to root the At&t box were discovered, in the meantime (see comments, below); so the statement in the introduction doesn't fully hold, anymore
UPDATE (2016-10-02): as reported by others in the comments below, there are uplink setups not using any VLAN tagging, but still using 802.1Q frames with special value of 0
for the VLAN ID, using it as a priority tag, only. See comments below for more info.