Friday, August 21, 2009

Stata 32bit vs. 64 bit - Try it for free!

Stata offers free upgrade for Mac users from 32 to 64 bit. This is significantly faster as Stata itself is trying to show with this comparison:
command 32-bit 64-bit
sort 2.00 1.64 seconds
regress 0.88 0.80
timings from 1.4 GHz AMD Opteron 240
running 64-bit Windows.
Obviously you need to have a copy of Stata 10 up and running (IC, SE or MP) and then you can download and upgrade free to 64 bit version. Remember to update afterwards your Stata.
The 64 bit upgrade can be downloaded from:
Installation is a banality like most Mac Apps.
Have fun!

Wednesday, July 29, 2009

New Stata 11

The new version of Stata (11) ships July 27. Among the new features: Multiple imputation, GMM (both linear and nonlinear models), competing-risk regressions, multivariate time series, unit root tests and marginal analysis, topping a new interface and fonts.

Order at:

Monday, June 15, 2009

Closure or a new Start?

It is done. The thesis itself. Putting the pieces together and transferring everything from Scientific Workplace and a couple of files from MS Word proved to be easier than anticipated. The final output looks great and I am thinking now to switch totally to WindEdt or maybe TexMacs, rather than SWP. Once you master the code, it comes easy; plus the net is quite full with examples and wiki pages. Now it is time to chill a bit. Until the defense and the subsequent paperwork that I will have to clear in a couple of days before my trip back home.

Sunday, May 31, 2009

LaTeX, Bread and Butter

That's how my Facebook twitter sounded like a couple of days ago. And yes, it was actually like that, except the bread and butter part, of course. Not that I would have minded them, it's just that I am more like a "coffee with milk & a banana" type of person.
So, yes, about two weeks ago I slowly started to migrate my work from Scientific Workplace and Word to LaTeX. Initially I tried to get support from people here at RPI, but they all seem to have magically disappeared and to this day, I only exchanged a couple of emails with a person that was nice enough to care. Of course, no help there.However, motivated by whoknowswhat I started to dwell in to it myself and look for online resources to deal with it. There are some nice websites (just google your question) that will get you a long way but in my case, just like yours, I suspect, the problem is the actual understanding of how LaTeX really works. Like the fact that &, % or _,are not accepted in a sentence unless they're treated as special characters or in math mode. Various stuff like that I would have liked to learn from somebody rather than searching the web and applying the old try& fail techniques. Oh, well, now it doesn't matter. 150 pages of nice LaTeX output awaits to be defended, printed and deposited in the library. I almost forgot all the trouble that I went through just to insert some graphicx or tables in my thesis. And two weeks is not a bad performance at all considering the slope of the learning curve involved:)

Friday, May 08, 2009

PDF Repair Tools

Today I was again scanning some tables of data and some pictures from a book on the international rubber industry. The cool thing about it is that the scanner sends it automatically to your email or a specified folder. The bad thing about it is that this time it didn't came through. I has 110 MB of scans (only 2 X 30 pages) in two batches (first came out wrong). Thank God I was forwarding all my emails to Gmail, otherwise my RPI account would have been flooded (or not? - i don't know the limit nowadays). Anyways, all well...but when to open the files, errors on top of errors. Again, very lucky that wasn't anything essential; just some add-on facts to my paper (best case). However, I was a bit pissed off, thus attempted to recover whatever is to recover. Not a lot of free software out there.. and the results are far from perfect. Try pdf tools online (here) if you don't mind a bit of watermark (i didn't!) or some of the shareware programs on the web. I give the pdf online tool a big thumbs up: fast upload, good recognition and easy interface; plus no installation is necessary. Cheers.

Monday, April 20, 2009

Cambridge Short Interlude

A short trip to Boston for an interview at Harvard. I was excited both to get an interview at the most prestigious school in the world and also to finally get to see a bit Cambridge. Last time in Boston I did not have time to cross the river but now I decided to go around a bit.
It was pretty nice to walk to small yet picturesque streets of Cambridge. However, the freezing weather (33 F) was working steadily against my rather light business attire, so I had to get inside after half and hour of strolling. The interview went very well, and I think I can bring a lot of the requirements to this position and much more beyond. And yes, the opportunities ahead look so exciting. However, throughout my time in Boston I felt great, energized by the city's vibe; lots of young people and very colorful characters. While still a New-York lover, Manhattan fan, Boston always made a good impression on me. Although the subway, called the T, can definitely sustain some improvements (green line was terrible slow and all lines are extremely crowded). But, those are just minor complaints. I look forward to coming back here for good.:)

Can be dangerous. From robotic work in coding the data and transcribing things from paper to electronic resources, I felt drained and fatigued. The fact is, that after 10-13 daily hours of coding data, is it impossible not to be such a vegetable. However, I took comfort in seeing how the data set grew constantly every day (no disappointments there) and reach the finish line sooner than expected (3 weeks overall). However, it took a lot of effort.
Important issue: do you set up codes at the beginning of the process and make different ones as you advance or you just enter everything and code at the end? well, I think starting to code the firms, plants, countries at the beginning is the best way, especially in my case, where things do not change radically from year to year. In this case, you just need to make sure that everything is up to date and just add new codes for new entries (plants, firms, countries). So, if I were to do it again, I would assign codes from the very start. But again, it might be different in each case.

Monday, January 26, 2009

Let Code Together..

..right now..ohh a Sweet Harmony. WHO was around in 1993-1994 might still remember this song of The a video that was clevearly catching your eyes with some possible nudity of a dozen models while blurring their private parts with some creative ..techniques.:))
While this has started as an introductory digression, I do not wish to digress further into the world of incidental , so I will get right to the point. I have now for several weeks start to code my latest dataset and along to my future planed additions this will make (hopefully) a great and flexible tool for various research questions at the plant, firm and country level. Heck, I might throw in some of that multi-level analysis, if possible. And although now the hope are high and the anxiety building on future possible ramifications of this project, the coding is a haunting task. Although it seems faster than anticipated, and with less stress associated, due to other external reasons, still staying still for hours and checking and double checking everything, matching names et al. is not as fun as it looks.
The Publishing Conundrums and some Surprising Fast Rewards

I have opted in the end for a more trade oriented journal as suggested by Mr. X, the editor of my previous try, and it seems that he was right. Lesson #1: do your homework before submitting!! but on the other hand, don't get me wrong the former journal was fitted like a glove, in terms of geographical and comparative scope of the work debated there. However, Lesson #2: The editor is ALWAYS right! so there is no point in being upset if your work doesn't make it..which brings me to Lesson #3: Search & you will find it. There are so many economic journals nowadays that if you indeed pursue a new endeavor, did something new or interesting, you will find THE ONE which is better suited for your work. However, keep in mind, the final lesson of this small publishing grievance article: "Small can be beautiful! (but especially a lot faster)" I have heard stuff like it takes 2-3 years to publish in second tier A journals, and from my experience even with lesser ranked journals in Economics it takes on average 12-15 if you have the right stuff for the next AER or JEL issue, hold your breath and un-cross your might take a while..
On the other hand, exceptions are always welcomed. And I have just benefited from one, which made me happy and quite optimistic that not the whole world functions in the same agnostic way.

Sunday, January 11, 2009

Choosing the best OCR software for you

Proved to be a haunting task for me (at least). I was trying to get some scanned data into excel. However, multiple problems:
- scanning quality (some pages were too dark for OCR)--> hence, unusable
- miss-recognition (getting an O instead of 0 and Gs instead of 6ers can become a common pain in the @@$)
- complex tables (this is the biggest challenge, since messing up the columns is hard to fix post-processing).
Choosing an OCR software
I have tried several software (both under Win and Mac): first my personal favorite..a small OCR simple program called Able2Extract Professional. Obviously it does the work for simple tables usually from clean-cut pdfs. In most cases, doesn't go beyond that. Then moved up to the big guns: ABBYY Finereader Pro 9.0 , IRIS.Readiris.Pro.v11.5.6 and OmniPage.Professional.v16.0. However, none of the above blew me away. Abby is terrible slow but seems to have a bit more options for customization, Omnipage is the fastest and the best quality, but I had trouble in doing what I wanted to.
In the end, none of the above could do both a FAST and HI-Q OCR recognition of text considering the difficulties associated with my PDF files. My Chinese names and other foreign firms were painful to distinguish even for me, not to mention any OCR soft, thus in the end, I opted for manual data recognition (MDR) and I just entered it myself in Excel. Lots and lots of hours and nerves wasted, but in the end I think it would have taken the same (or more) hours just to follow, correct and change the OCR outputs.