Domain names are ASCII (American Standard Code for Information Interchange), which is a character encoding standard for electronic communication. ASCII gave a choice of 128 characters, of which the first 30 were not printable, and capitals and lower-case count as one each. Unsurprisingly, this didn’t leave room for accented characters like é, or Cyrillic (“Russian”) characters.
We now have Unicode – a unifying character set that currently offers 137,000 characters. The most common representation, UTF-8, is used in over 90% of websites, however, it is very new to DNS.
What does this mean for web filtering?
Even though DNS queries still do not support UTF-8 or Unicode, browsers – which we update much more often – have taken on the role. International domains are now translated by the browser. Bücher.de is an example of an International Domain Name (IDN). It’s a German bookstore, as the domain top-level domain name, .de, is used for German-language websites. It’s not visible in a DNS lookup – and some web filters fail here too. The browser, however, will translate it to the ASCII representation xn-- bcher-kva.de – which redirects to www.buecher.de.
It’s important to check two things when considering a quality web filter.
- How well does your web filter categorize search terms in the languages used by your students?
- Can it filter based on international domain names?
It is common for searches for illicit material to start in a student’s native language, often because filters pay less attention to this.
Interested in discussing your school or district’s web filtering needs?
Click here to contact us with any questions you have or to schedule a demonstration of the only content-aware web filter solution for schools.