Freifunk Frankfurt:Arp-stability

Zur Navigation springenZur Suche springen


user perspective

  • The connection between Router and Servers seems to break down
  • IP-addresses are not assigned to client computers
  • the effect lasts for a couple of minutes and then resolves itself.

server perspective


  • This is the total amount of clients in the network based on the alfred-data. The other graphs is the amount of arp-entries on the batbridge-interface that resides within the IP-range of each fastd-Server.
  • for fastd1 and fastd4 there are visible cutoffs when reaching around 80 clients. The values for fastd2 and fastd3 (kernel 3.2) seem stable.
  • fastd5 does not hand out IP-addresses at the moment so the value is bogus.
  • The drop-offs do not happen after a given interval but seem to be correlated to client-count:
  • Affected debian-kernels: 3.16, 4.1, confirmed in vanilla kernels as well - see below.


  • The issue seems to be load-induced as can be seen by the much more frequent dropoffs on fastd5 when more clients are connected.


server logs

There are no logs that are created around the time. We are however seeing this on fastd1 but not on fastd2:

[422870.439430] batbridge: Multicast hash table chain limit reached: bat0


things we looked at

  • openvpn: the openvpn-tunnels are still active when this happens
  • Systems running on kernel 3.2 seem fine both x64 and i686
  • kernel 4.1 is affected as well

data on the servers


  • Linux fastd1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u4 (2015-09-19) x86_64 GNU/Linux
  • hosted by Hetzner


  • Linux 3.2.0-4-686-pae #1 SMP Debian 3.2.65-1+deb7u2 i686 GNU/Linux
  • hosted by Datafabrik


  • Linux fastd3 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 GNU/Linux
  • hosted by Hetzner


  • Linux 4.1.0-0.bpo.2-amd64 #1 SMP Debian 4.1.6-1~bpo8+1 (2015-09-09) x86_64 GNU/Linux
  • hosted by Contabo

things we are currently looking at

  • Bisecting: 89441 revisions left to test after this (roughly 17 steps) 240c3c3424366c8109babd2a0fe80855de511b35 - this clearly shows the symptom
  • Bisecting: 45036 revisions left to test after this (roughly 16 steps) 4ff63e47f7b9dbd72031c364db44526b3c295591 - this shows an entirely different symptom. There are no dropoffs visible like on 3.16, 3.9 and 4.1 but the arp table-size is much smaller than expected. I guess there are no cutoffs visible because of the timely resolution of the data (snapshot taken every minute).Arp-brokenness-4ff63e47f7b9dbd72031c364db44526b3c295591.png As can be seen in the graph, I manually shifted load away from fastd2 to increase load on fastd6, the test-machine however the load did not increase as excpected => we are seeing another symptom different from the one we are investigating on this page. The arp-table entries rotate very quickly, meaning that arp-caching is pretty much ineffective in this commit.
  • Bisecting: 22499 revisions left to test after this (roughly 15 steps) [2b8318881ddbcb67c5e8d2178b42284749442222] Merge tag 'fbdev-for-3.8' of git://

git bisect of vanilla kernel

  • affected:
    • 19583ca584d6f574384e17fe7613dfaeadcdc4a6 (3.16)
    • 240c3c3424366c8109babd2a0fe80855de511b35 (3.9)

  • unaffected:
    • 805a6af8dba5dfdd35ec35dc52ec0122400b2610 (3.2)
    • 4ff63e47f7b9dbd72031c364db44526b3c295591 (3.6) - apparently broken but in other ways...


  • commit 54951194656e4853e441266fd095f880bc0398f3 changes the arp-behavior.


similar symptom: [1]