Tuesday, September 25, 2018

Wrangler as a data manipulation tool

Stanford Visualization Group's DataWrangler should not be included in the software repertoire of any person serious about their data. The tool comes with a myriad of shortcomings. To put the following statements in context, Data Wrangler was created as part of a research project, rather than as a commercialized product.

Perhaps the most blaring issue I had with the tool did not have to do with the data manipulation itself, but of the blatant attack on the user's data privacy. There's no such thing as a free lunch. This tool was not created out of these researcher's benevolence towards those struggling to manipulate their data as much as it was created as a net to gather user behavior while manipulating their data sets. While using the tool, DataWrangler logs the user's transformation steps, clicks and keystrokes. Data elements in selected ranges are reported back to the researchers. I assume this content is used to further improve the tool, to show the researchers how they might want to alter the UI, and to suggest the wrangling methods on the left menus (below).
In the same vein as the above statement, DataWrangler primary objective is not to be an end-use data manipulation tool. The tool's designer's clearly opted to trade a great deal of functionality for ease of use for users new to data manipulation. In my short experience with it, I did not find any methods which could not be executed with more ease by anyone with even of few hours of experience in MS Excel. Even if Excel cannot perform the exact function you need, it comes with the ability to write a VBA macro to perform any function imaginable. 

I'll conclude this highly critical review of DataWrangler by noting its performance limitations. It is impossible to work with any serious data set in DataWrangler. It's limited in the size of the data you can import, and when operating on your data, it is forced to access the webpage cache, rather than accessing your computer's memory.

In summary, I would advise anybody new to data parsing/tidying/manipulation to skip this tool, and perhaps others like it, and just learn the gold standard, Microsoft Excel. More advanced users looking to handle serious data will use programs like VBA, R, Python, and even SQL, but it is still incredibly useful to troubleshoot the data manipulations in an excel spreadsheet. 


Post a Comment

Subscribe to Post Comments [Atom]

<< Home