Importing multiple text files to a mapreduce sentiment analysis job
I've been following a basic sentiment analysis tutorial here:
http://www.alex-hanna.com/tworkshops/lesson-6-basic-sentiment-analysis/
I have a mapper and reducer (modified, but for all intends and purposes
identical to those in the link) which feed off one input file, calculate
positive v negative sentiment and output the aggregation of them for each
minute. I start the program like so:
cat Tweets/FlumeData.txt | python sentimentMapper | sort | python
avgNReduce 2
I am doing a mass download of tweets however and wanted to be able to
import many files (currently there are three in the directory, although
later there will be hundreds). I changed the command to:
cat Tweets/*.txt | python sentimentMapper | sort | python avgNReduce 2
It seems to work, except for the fact I now get two `No JSON object could
be decoded' at the start of the results, like this:
andrew@andrew-VirtualBox:~/Python$ cat Tweets/*.txt | python
sentimentMapper | sort | python avgNReduce 2
No JSON object could be decoded
No JSON object could be decoded
2013-08-05-Mon 10:17:00 mufc 0.031164021164
2013-08-05-Mon 10:17:00 rooney -0.0203703703704
2013-08-05-Mon 10:18:00 mufc -0.033664073034
2013-08-05-Mon 10:18:00 rooney -0.0191292490034
Does anybody have any idea why I am getting that error and does it mean
the program isn't working as I intend it to?
No comments:
Post a Comment