Political trends visualization tools with Independent Component Analysis

This was originally posted at my personal blog on 2017 June 7.

This is a quick dump to get this out there, further discussion and documentation to be provided over time.

Google Trends doesn’t have a real public API; it’s heavily rate limited and so it’s hard to get even moderate amounts of data, and large amounts are right out. I worked with pytrends and scripted some delays and automation for downloading larger amounts of trend data for lists of search terms.

I used this to download just over a year’s worth of day-by-day trend data for (currently) 815 search terms, mostly related to current political events. Then I subjected this matrix  to Independent Component Analysis using Scikit-learn, which turns out to do a great job of separating out components with clear meaning on the events timeline. The components are displayed on a stacked plot with key political events labeled on the timeline. Some of the political events I entered a priori because they were obvious (like the election itself), and then some of them were entered after I researched archived news corresponding to peaks I saw in the plots. Note that many of the events in the latter categories were things that I wasn’t thinking about at all when I put together the list of search terms, but the ICA on the large set of terms brings out those peaks automatically.

Here’s an example plot with 13 components (choosing number of components for ICA is more art than science, 13 seemed to work well but other numbers did too):


There are a few other interesting trends graphs I’ve seen online. Here I’ve placed two big ones overlaid on that plot, for comparison of timelines. The first one is the plot of Alfa bank DNS server logs, which may or may not have something to do with anything nefarious. The second one is from Echelon Insightsannual promotional “year in news” article. Of course I don’t know their exact methodology (the point is that they’re selling their service, after all) but it’s likely that the trends displayed here are based on starting with the keywords labeling the peaks, rather than a component analysis of some sort. I would love to hear from someone who knows more.


The columns at the left are the top 15 terms, and bottom 15 terms, associated with that component. The positively associated search terms trend much more frequently when you see a positive spike in the associated component. The negatively associated search terms trend much more frequently when you see a negative spike in the component, or when the overall value of the component is low. The code automatically detects the spikes and rectifies the signal (ICA outputs are unpredictably, if not arbitrarily, scaled, so some ordering and rectifying are helpful) so the positively associated terms are usually more meaningful, but there are some interesting trends in the negative words too.

To improve visualization and insight related to the search terms, the code also generates word clouds of the top positively and negatively associated terms. Here they are in the same order as on the left side of the plot.

Although this is preliminary, I can make a few observations here.

  • The burst in Alfa Bank server activity is nicely bracketed by two events: Trump claiming Russia won’t go into Ukraine (also, Trump publicly asking Russia to hack Hillary’s email); and then Guccifer 2.0 tweeting to Roger Stone “paying u back”.
  • This article by Roger Sollenberger on how SEO is used for political influence describes gaming Google search results with a lot of fake posts. It uses the specific example of how the Seth Rich story started to get pushed after Trump brought alleged Russian spies into the Oval Office. In component 9 (sorry they’re not labeled, you’ll have to count from the top) you can see a predominance of terms related to both “trump_russians_oval_office” and “seth_rich”, which is consistent with that article’s demonstration.
  • Trump’s “grab ’em by the pussy” tape leaked on October 11. Shortly afterwards, a massive amount of anti-Hillary trends started and didn’t let up until the election. They dropped off almost immediately after the election.
  • The stuff glommed together as “wikileaks/hacking” in the Echelon graph is not completely clearly separated out in the components. Below, you can see what it looks like when I specified 15 components instead of 13. The Wikileaks email trend is clearer there, in component 10. Maybe I should have used that one as my main example instead of the 13 components.
  • In case it isn’t obvious, the component number 12 (13 in the 15-component plot below) represents the strong weekly periodicity in Google search trends due to the work week.


Please consider this a preliminary post, I wanted to get these tools and data out there. You can access all the code and data at my GitHub profile here. (As of this writing, 2017 June 6,  I haven’t added documentation yet, and I probably won’t have time to for a few weeks. Sorry!)

Here’s pdf versions of the big images above:


To do:

  • Stitch the word clouds and component timelines together into a single easy-to-view output
  • Label components
  • Single-word search terms are more frequent than multi-word ones. Consider ways to normalize this.
  • Look into ways to integrate this with Reddit data from @conspirator0

Background: The facts that inspired CoPsyCon

Not long after the unexpected results of the 2016 US election, the US intelligence community collectively published a declassified summary entitled Assessing Russian Activities and Intentions in Recent US Elections, a joint statement from the FBI, CIA, and NSA unequivocally stating that Russian state-sponsored actors had interfered in the election. Since then, it would be an understatement to say that the plot has thickened. Russia’s role has received copious media attention, and is the primary target of Robert Mueller’s special counsel probe. However, less attention has been given to other facts of the acute threat. In particular, the methods that take advantages of weaknesses in our political and media systems would not be remediated by identifying and punishing Russian perpetrators and their collaborators. The same threats to democracy, freedom, and the integrity of American systems would still be vulnerable to attack from other state or, more disturbingly, non-state actors. There is evidence that non-state actors, specifically mega-wealthy oligarchs, are currently using the same tools towards the same goal of disrupting America.

The first article I read that alerted me to the threats of big-data-based marketing tools applied to political manipulation by non-state actors was the article The Data That Turned the World Upside Down, translated from German and published on 2017 Jan 28 by Vice Motherboard, which described Cambridge Analytica,  botnets, and political manipulation. I already knew about work on big data and politics from colleagues who worked for other companies doing similar work, but this article suggested that it was already powerful, and being used by hostile actors against the US.

The great British Brexit robbery: how our democracy was hijacked, published in the UK Observer on 2017 May 7, describes the use of similar tools by the same actors to manipulate the political process in the UK in favor of Brexit.

Mother Jones published an article on 2017 Apr 4, Twitter Has a Serious Problem—And It’s Actually a Bigger Deal Than People Realize: Bots can undermine democracy, which presented a detailed analysis of how Twitter botnets are used to spread disinformation and manipulate the media and political narrative, and points out the difficult challenges in solving this problem.

Paste Magazine published How the Trump-Russia Data Machine Games Google to Fool Americans on 2017 June 1, which describes another tool in the disinformation arsenal: search engine optimization. This article also describes how a specific false disinformation narrative, was broadcasted and amplified in a planned chronology to counter the real breaking news of Trump’s meeting with Russians in the Oval Office.

As early as 2005, the military was warned of the potential threat of “meme warfare”. Mike Prosser wrote a thesis titled MEMETICS—A GROWTH INDUSTRY IN US MILITARY OPERATIONS for the United States Marine Corps School of Advanced Warfighting. Later, in 2011 October at the Social Media for Defense Summit in Alexandria, Virginia, Dr. Robert Finkelstein presented a Tutorial on Military Memetics.

And yet, the Department of Defense Cyber Strategy, presented by USCYBERCOM in 2015 April, has no mention whatsoever of “meme”, “memetic”, “disinformation”, or anything about media manipulation, social media, or any other related topics.

CoPsyCon exists because this blind spot in the US defense system critically needs to be filled.




CoPsyCon hello world

Counter-Psyops Consortium is now online! Rough draft website.

To Do:

  • Write concise mission statement DONE
  • Write long-form mission statement
  • Add contact/participate info page
  • Assemble board
  • Register nonprofit

P.S. The first video is the Big Round Cubatron on display at Google headquarters. The second video will remain a mystery.