bird poops on plum branch

buster


Buster Benson

No advice column.


Previous Entry Share Next Entry
Taking navel-gazing to the next level
bird poops on plum branch
buster
I've written 513 entries mentioning love over the last 10 years:



Only 78 entries about hate:



Talking about foxes spiked at about 10 mentions a month in late 2005:



343 posts about McLeod Residence had a good run, but are now trailing off:



Looks like I've been talking more and more about being drunk over the last couple years:



Could be because of my growing appreciation of champagne:



Here's when I met Kellianne:



I'm an early adopter of Flickr:



And Twitter:



And the iPhone:



But my paranoia of asteroids and super volcanoes seems to have passed for the most part:



Adding a Lucene search engine to all 12,388 entries from the last 10 years is pretty interesting. Mixing the data with Google's chart API makes it even more interesting (to me at least). Now I just have to create more content for it to eat up.

Suggest other ways to mess with all of this.

  • 1
Can you scatterplot love/month vs hate/month? :)

What does a scatterplot do? Is it different from overlaying the two lines and turning them into dots? Or does it find some kind of correlation?

Yeah, correlation (or lack of). For example, you assign the x axis to love and y axis to hate. Each point would be a month (or other time period), so for example if you said love 60 and times and hate 2 times in Apr 2009, you would have a point at x=60, y=2. Then another point for each month you have data.

http://code.google.com/apis/chart/types.html#scatter_plot

You can also do a linear least squares or something and calculate correlation coefficients to get a number for how correlated they are, but if there's anything really interesting you'll probably see it right away on the graph.

Oh, wait, I get it. So, for each month, plot the number of mentions of "love" as the x-coord and the number of mentions of "hate" as the y-coord to see if they go up together or are inversely related...

http://en.wikipedia.org/wiki/Scatter_plot

Use a search algorithm to figure out groups of words that cluster together. What adjectives do you associate with iPhone, for example?

I AM looking for a good Ruby library for creating Markov chains, so that I could construct new full sentences by seeing which words often follow other words. Not surprisingly, the only good code sample I could find was written by Eric Hodel.

I've been trying to analyze content like this with "latent semantic analysis" for about a year, off and on. (Survey results, mostly.) Here's an example of one guy's attempt to write Python code to analyze this kind of data:

http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html

What this method can do is find collections of words that tend to be found together... it assumes that there are some innate relationships between words that are found together in the same documents / paragraphs / sentences / whatever.

But this might be TOO complex...

It's not too complex, but the question is.... what will this analysis tell me about ME? I can see why it's useful for a search engine, but how is it useful as a self-discovery tool?

It can tell you what words cluster together. Does "iPhone" tend to be found with "love"? "Foxes" with "slapping"? "Awesome" with "bjorn"? :) It helps to define the way you use words in the text by seeing what other words you use at the same time.

Yeah, that sounds interesting. And, I suppose I could use it to find entries that are related to any given entry.

I should also see how many words I've used in 10 years... probably a good approximation of the size of my vocabulary.

If you do find a nice and easy way to implement this, let me know! (Especially if it's in Python...) I might be able to use this for some research I've been doing. :) Might even be able to put your name to a scientific paper...

I would build it in Ruby if I did it myself. What are you using this for and why are you trying to make me work for you! :)

I'm trying to figure out if the text of survey responses vary according to different variables, mainly with regards to location.

And think of it less as "working for me" and more as an incidental way of gaining a little academic glory as a useful byproduct. :) Nah... I'm thinking more that if there are some good tools out there, just let me know. It's a topic that I've been interested in for a while.

I hope you're not trying to prove that poor people have bad grammar. :)

Okay, I'll let you know what I find. The field of latent semantic indexing is pretty established though... are you just looking for something really simple that lets you pass text into it and get results out of it? Or does it have to work with a particular file format?

Nah... I'm trying to see what words people use to define their neighborhood when they live closer to urban forests. No grammar testing of any kind.

LSA/LSI seems pretty well-established in the information technology field, but I have yet to find it applied to the field I'm examining, which means that it would make for a great new contribution! Not a bad way to get published.

There are some commercially available packages, but I don't think that they would be able to do some of the things that I'd like to do with it... in particular, it would be great to get my hands on Python code (or Ruby code, depending on how quickly I could pick it up) that I could change around myself. On the other hand, it would be great to have code that I could use out of the box quickly without modification... no need to reinvent the wheel to answer very simple questions.

Not to mention that I'm quite cheap when it comes to things that I might not use too often for research. I'd rather use Python modules that I can play with for free than to buy a text mining program for several hundred dollars that is not customizable, that could quite possibly get only a few uses.

I'd be nice to have a consistent scale on the y-axis if you want to compare trends between different graphs. Like, it's hard to really compare your love and hate graphs.

True... I could put multiple results in one graph for easy comparison, sort of like Google Trends. The scatterplot mentioned above would also help comparisons.

Re: more pretty charts

That's awesome. I'm gonna have to use that.

!

how much skill does that require of someone, to make those charts?

Not that much. Assuming you have the data, you just have to connect it up to Google's Chart API:

http://code.google.com/apis/chart/

  • 1
?

Log in

No account? Create an account