The bottleneck is definitely the conversion from long back to str. As can be seen below, the time to format a 20,000 digit number is more than 15 times the combined time to parse and multiply two 10,000 digit numbers!
>>> from timeit import *
>>> a = Timer("x = int(y)", "y = str(10 ** 10000)")
>>> a.timeit(1000)
4.8757819778770681
>>> b = Timer("x = y * y", "y = 10 ** 10000")
>>> b.timeit(1000)
2.4265480668632335
>>> c = Timer("x = str(y)", "y = 10 ** 20000")
>>> c.timeit(1000)
191.91779877051408
Since the formatting code is already in C, it's not surprising that psyco wouldn't make much difference. I agree with dhoni -- this one just isn't currently possible in Python.