Actually, I did some profiling and the results were surprising for me. The simplest algorithm should not take very much to finish even in Python because the bottleneck is not in the multiplication itself but in converting from strings to integers.
it's not necessary to implement a Karatsuba (or faster) algorithms in Python because it is already implemented in the interpreter and is fast enough on my pc:
Read 2*1000 strings of 10000 digits: 0.12 sec
Multiply 2*1000 integers of 10000 digits: 4.5 sec
Convert int(string) for 2*1000 strings of 10000 digits: >70 sec
Convert string(int) for 2*1000 integers of 10000 digits: > 70 sec
Print 2*1000 strings of 10000 digits: 0.07 sec
So the problem lies in the converter from strings to bignum and not in the algorithm.
This is surprising because if not for this bottleneck, it could probably finish the test case in time.
The thing is, if the strings to convert are not too large (<200digits), then the speed of int() or long() is decent.
The same simple algorithm (but in LISP) finishes TMUL in about 5 sec (and it takes about 12-15 seconds on my machine for 2*1000 numbers of 10000 digits)