America is at war. But this is not a conventional war waged with tanks, battleships and planes in conventional battlefields –at least, not yet. It is a secret, insidious type of war whose battleground is its people, democracy and truth.
'Remember, any state, any state, has a primary enemy: its own population.' ~ Noam Chomsky
FAIR USE NOTICE
FAIR USE NOTICE
A BEAR MARKET ECONOMICS BLOG
This site may contain copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in an effort to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. we believe this constitutes a ‘fair use’ of any such copyrighted material as provided for in section 107 of the US Copyright Law.
In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information go to: http://www.law.cornell.edu/uscode/17/107.shtml
If you wish to use copyrighted material from this site for purposes of your own that go beyond ‘fair use’, you must obtain permission from the copyright owner.
FAIR USE NOTICE FAIR USE NOTICE: This page may contain copyrighted material the use of which has not been specifically authorized by the copyright owner. This website distributes this material without profit to those who have expressed a prior interest in receiving the included information for scientific, research and educational purposes. We believe this constitutes a fair use of any such copyrighted material as provided for in 17 U.S.C § 107.
FAIR USE NOTICE FAIR USE NOTICE: This page may contain copyrighted material the use of which has not been specifically authorized by the copyright owner. This website distributes this material without profit to those who have expressed a prior interest in receiving the included information for scientific, research and educational purposes. We believe this constitutes a fair use of any such copyrighted material as provided for in 17 U.S.C § 107.
For example, a digital image
may include metadata that describe how large the picture is, the color
depth, the image resolution, when the image was created, and other data.[1]
A text document's metadata may contain information about how long the
document is, who the author is, when the document was written, and a
short summary of the document.
Metadata are data. As such, metadata can be stored and managed in a database, often called a Metadata registry or Metadata repository.[2] However, without context and a point of reference, it might be impossible to identify metadata just by looking at them.[3]
For example: by itself, a database containing several numbers, all 13
digits long could be the results of calculations or a list of numbers to
plug into an equation - without any other context, the numbers
themselves can be perceived as the data. But if given the context that
this database is a log of a book collection, those 13-digit numbers may
now be identified as ISBNs - information that refers to the book, but is not itself the information within the book.
The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of programming language concepts" [4]
where it is clear that he uses the term in the ISO 11179 "traditional"
sense, which is "structural metadata" i.e. "data about the containers of
data"; rather than the alternate sense "content about individual
instances of data content" or metacontent, the type of data usually
found in library catalogues.[5][6]
Since then the fields of information management, information science,
information technology, librarianship and GIS? have widely adopted the
term. In these fields the word metadata is defined as "data about data".[7]
While this is the generally accepted definition, various disciplines
have adopted their own more specific explanation and uses of the term.
Durham, NC - Duke Sociologist Kieran Healey of the Kenan Institute for Ethics has written a brief primer
on how one might use the sort of metadata available to National
Security Agency snoops to figure out a burgeoning "terrorist" cell -- in
1772 Boston.
Written in a sort of pseudo-18th century
grammar (with mostly modern spellings, thankfully) Healy walks through
the steps one would take to turn the membership lists of various
organizations into a matrix of connections showing who the key agitators
might be.
Using this "Social Networke Analysis," but knowing
nothing else about these names, he quickly fingers Sam Adams and Paul
Revere as "persons of interest."
The clever and instructive post
has taken off on social media in the last day -- reaching 100,000 views
on Tuesday, and spreading even more strongly on Wednesday.
"I
wanted to give non-specialists a sense of how the structural analysis of
what's being called 'metadata' works, and to show in a fun but
hopefully telling way how much you can get out of that approach," Healy wrote in his blog the next day.
"So I tried to emphasize that I was using one of the earliest, and (in
retrospect) most basic methods we have, but one that still has the
capacity to surprise people unfamiliar with (social network analysis)."
I have been asked by my superiors to give a brief demonstration of
the surprising effectiveness of even the simplest techniques of the
new-fangled Social Networke Analysis in the pursuit of those
who would seek to undermine the liberty enjoyed by His Majesty’s
subjects. This is in connection with the discussion of the role of
“metadata” in certain recent events and the assurances of various respectable parties
that the government was merely “sifting through this so-called
metadata” and that the “information acquired does not include the
content of any communications”. I will show how we can use this
“metadata” to find key persons involved in terrorist groups operating
within the Colonies at the present time. I shall also endeavour to show
how these methods work in what might be called a relational manner.
The analysis in this report is based on information gathered by our field agent Mr David Hackett Fischer and published in an Appendix to his lengthy report to the government.
As you may be aware, Mr Fischer is an expert and respected field Agent
with a broad and deep knowledge of the colonies. I, on the other hand,
have made my way from Ireland with just a little quantitative training—I
placed several hundred rungs below the Senior Wrangler during my time
at Cambridge—and I am presently employed as a junior analytical scribe
at ye olde National Security Administration. Sorry, I mean the Royal
Security
Administration. And I should emphasize again that I know
nothing of current affairs in the colonies. However, our current
Eighteenth Century beta of PRISM has been used to collect and analyze
information on more than two hundred and sixty persons (of varying
degrees of suspicion) belonging variously to seven different
organizations in the Boston area.
Rest assured that we only collected metadata on these
people, and no actual conversations were recorded or meetings
transcribed. All I know is whether someone was a member of an
organization or not. Surely this is but a small encroachment on the
freedom of the Crown’s subjects. I have been asked, on the basis of this
poor information, to present some names for our field agents in the
Colonies to work with. It seems an unlikely task.
The organizations are listed in the columns, and the names in the
rows. As you can see, membership is represented by a “1”. So this Samuel
Adams person (whoever he is), belongs to the North Caucus, the Long
Room Club, the Boston Committee, and the London Enemies List. I must
say, these organizational names sound rather belligerent.
Anyway, what can get from these meagre metadata? This table is large
and cumbersome. I am a pretty low-level operative at ye olde RSA, so I
have to keep it simple. My superiors, I am quite sure, have far more
sophisticated analytical techniques at their disposal. I will simply
start at the very beginning and follow a technique laid out in a
beautiful paper by my brilliant former colleague, Mr Ron Breiger, called “The Duality of Persons and Groups.”
He wrote it as a graduate student at Harvard, some thirty five years
ago. (Harvard, you may recall, is what passes for a university in the
Colonies. No matter.) The paper describes what we now think of as a
basic way to represent information about links between people and some
other kind of thing, like attendance at various events, or membership
in various groups. The foundational papers in this new science of social
networke analysis, in fact, are almost all about what you can tell
about people and their social lives based on metadata only, without much
reference to the actual content of what they say.
Mr Breiger’s insight was that our table of 254 rows and seven columns is an adjacency matrix,
and that a bit of matrix multiplication can bring out information that
is in the table but perhaps hard to see. Take this adjacency matrix of
people and groups and transpose it—that is, flip it over on its side,
so that the rows are now the columns and vice versa. Now we
have two tables, or matrices, a 254x7 one showing “People by Groups” and
the other a 7x254 one showing “Groups by People”. Call the first one
the adjacency matrix A and the second one its transpose, AT. Now, as you will recall there are rules for multiplying matrices together. If you multiply out A(AT),
you will get a big matrix with 254 rows and 254 columns. That is, it
will be a 254x254 “Person by Person” matrix, where both the rows and
columns are people (in the same order) and the cells show the number of
organizations any particular pair of people both belonged to. Is that
not marvelous? I have always thought this operation is somewhat akin to
magick, especially as it involves moving one hand down and the other one
across in a manner not wholly removed from an incantation.
I cannot show you the whole Person by Person matrix, because I would
have to kill you. I jest, I jest! It is just because it is rather large.
But here is a little snippet of it. At this point in the eighteenth
century, a 254x254 matrix is what we call “Bigge Data”. I have an upcoming EDWARDx talk about it. You should come. Anyway:
You can see here that Mr Appleton and Mr John Adams were connected
through both being a member of one group, while Mr John Adams and Mr
Samuel Adams shared memberships in two of our seven groups. Mr Ash,
meanwhile, was not connected through organization membership to any of
the first four men on our list. The rest of the table stretches out in
both directions.
Notice again, I beg you, what we did there. We did not start with a
“social networke” as you might ordinarily think of it, where individuals
are connected to other individuals. We started with a list of
memberships in various organizations. But now suddenly we do have a social networke of individuals, where a tie is defined by co-membership in an organization. This is a powerful trick.
We are just getting started, however. A thing about multiplying
matrices is that the order matters. It is not like multiplying two
numbers. If instead of multiplying A(AT) we put the transposed matrix first, and do AT(A),
then we get a different result. This time, the result is a 7x7
“Organization by Organization” matrix, where the numbers in the cells
represent how many people each organization has in common. Here’s what
that looks like. Because it is small we can see the whole table.
Again, interesting! (I beg to venture.) Instead of seeing how (and
which) people are linked by their shared membership in organizations, we
see which organizations are linked through the people that belong to
them both. People are linked through the groups they belong to. Groups
are linked through the people they share. This is the “duality of
persons and groups” in the title of Mr Breiger’s article.
Rather than relying on tables, we can make a picture of the
relationship between the groups, using the number of shared members as
an index of the strength of the link between the seditious groups.
Here’s what that looks like.
And, of course, we can also do that for the links between the people,
using our 254x254 “Person by Person” table. Here is what that looks
like.
What a nice picture! The analytical engine has arranged everyone
neatly, picking out clusters of individuals and also showing both
peripheral individuals and—more intriguingly—people who seem to bridge
various groups in ways that might perhaps be relevant to national
security. Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.
Once again, I remind you that I know nothing of Mr Revere, or his
conversations, or his habits or beliefs, his writings (if he has any) or
his personal life. All I know is this bit of metadata, based on
membership in some organizations. And yet my analytical engine, on the
basis of absolutely the most elementary of operations in Social Networke
Analysis, seems to have picked him out of our 254 names as being of
unusual interest. We do not have to stop here, with just a picture. Now
that we have used our simple “Person by Event” table to generate a
“Person by Person” matrix, we can do things like calculate centrality
scores, or figure out whether there are cliques, or investigate other
patterns. For example, we could calculate a betweenness centrality
measure for everyone in our matrix, which is roughly the number of
“shortest paths” between any two people in our network that pass through
the person of interest. It is a way of asking “If I have to get from
person a to person z, how likely is it that the quickest way is through
person x?” Here are the top betweenness scores for our list of suspected
terrorists:
Perhaps I should not say “terrorists” so rashly. But you can see how
tempting it is. Anyway, look—there he is again, this Mr Revere! Very
interesting. There are fancier ways to measure importance in a network
besides this one. There is something called eigenvector centrality,
which my friends in Natural Philosophy tell me is a bit of mathematics
unlikely ever to have any practical application in the wider world. You
can think of it as a measure of centrality weighted by one’s connection
to other central people. Here are our top scorers on that measure:
Here our Mr Revere appears to score highly alongside a few other
persons of interest. And for one last demonstration, a calculation of Bonacich Power Centrality, another more sophisticated measure. Here the lower score indicates a more central location.
And here again, Mr Revere—along with Messrs Urann, Proctor, and Barber—appears towards the top or our list.
So, there you have it. From a table of membership in different groups
we have gotten a picture of a kind of social network between
individuals, a sense of the degree of connection between organizations,
and some strong hints of who the key players are in this world. And all
this—all of it!—from the merest sliver of metadata about a single
modality of relationship between people. I do not wish to overstep the
remit of my memorandum but I must ask you to imagine what might be
possible if we were but able to collect information on very many more
people, and also synthesize information from different kinds
of ties between people! For the simple methods I have described are
quite generalizable in these ways, and their capability only becomes
more apparent as the size and scope of the information they are given
increases. We would not need to know what was being whispered between
individuals, only that they were connected in various ways. The
analytical engine would do the rest! I daresay the shape of the real
structure of social relations would emerge from our calculations
gradually, first in outline only, but eventually with ever-increasing
clarity and, at last, in beautiful detail—like a great, silent ship
coming out of the gray New England fog.
I admit that, in addition to the possibilities for finding something
interesting, there may also be the prospect of discovering suggestive
but ultimately incorrect or misleading patterns. But I feel this problem
would surely be greatly ameliorated by more and better metadata. At the
present time, alas, the technology required to automatically collect
the required information is beyond our capacity. But I say again, if a
mere scribe such as I—one who knows nearly nothing—can use the very
simplest of these methods to pick the name of a traitor like Paul Revere
from those of two hundred and fifty four other men, using nothing but a
list of memberships and a portable calculating engine, then just think
what weapons we might wield in the defense of liberty one or two
centuries from now.
Note: After I posted this, Michael Chwe emailed to tell me that Shin-Kap Han has published an article analyzing Fischer’s Revere data in rather more detail. I first came across Fischer’s data when I read Paul Revere’s Ride
some years ago. I transcribed it and worked on it a little (making the
graphs shown here) when I was asked to give a presentation on the
usefulness of Sociological methods to graduate students in Duke’s
History department. It’s very nice to see Han’s much fuller published
analysis, as he’s an SNA specialist, unlike me.
No comments:
Post a Comment