Further, the vast majority of online content isn’t available in a form that’s easily indexed by electronic archiving systems like Google’s. Rather, it requires a user to log in, or it is provided dynamically by a program running when a user visits the page. If we’re going to catalog online human knowledge, we need to be sure we can get to and recognize all of it, and that we can do so automatically.
How can we teach computers to recognize, index and search all the different types of material that’s available online? Thanks to federal efforts in the global fight against human trafficking and weapons dealing, my research forms the basis for a new tool that can help with this effort.
Understanding what’s deep
The “deep web” and the “dark web” are often discussed in the context of scary news or films like “Deep Web,” in which young and intelligent criminals are getting away with illicit activities such as drug dealing and human trafficking — or even worse. But what do these terms mean?
The “deep web” has existed ever since businesses and organizations, including universities, put large databases online in ways people could not directly view. Rather than allowing anyone to