Saturday, January 26, 2008

PhD Subject in Computer Science?

Having written about how to complete your PhD (and for the balance - how to not) I still haven't touched the core which is the research subject, but here is a try.

Computer Science Research
Computer Science has a large menu with exotic sub-disciplines where many are indistinguishable from magic for most people. Several of these disciplines are hybrid, i.e. crossover with other fields (e.g. bioinformatics and environmental informatics). It also has crossover with itself, e.g. the diff between Computer Engineering and Computer Science is rather blurry (and maybe culturally dependent?), so for the sake of simplicity I will treat them as synonyms.
Research is described as "a human activity based on intellectual investigation and aimed at discovering, interpreting, and revising human knowledge on different aspects of the world. Research can use the scientific method" (Source: Wikipedia). Should you use the scientific method when doing computer science research? That is left as deductive exercise for the reader, i.e. deduction of combining "Computer Science" and "Research" in the same phrase. (hint: the answer is yes).

Purpose(Computer Science) == Automation
In my opinion the sole purpose of Computer Science is efficient automation, please don't forget that (many tend to do, even many experienced people in the field).

Why you should go back to the basics
Even though there are plenty of exotic CS sub-disciplines, one stands out as the clearly most important one, and that is software engineering (SE), which is roughly about how to create software and tools/languages supporting it. And software can make users more productive, i.e. have a multiplier effect. My guess is that few world-scale problems (e.g. related to climate) can't benefit from software, and the same goes for the long tail of problems to person-level scale.

Multiplying the multipliers
.. Now you're getting overly vague.

Thanks for reminding me, the only point I want to make is that software is extremely important :). And since software has a multiplicative effect that few other technologies can beat (e.g. 1 persons code can effect a large amount of users in a positive way), making software engineers more productive can have a massive impact on society

What to do research on in Software Engineering?
I've added some personal hypothesis to get you started.
  • Hypothesis 2: Test-Driven Development (TDD), and its siblings {Behavior, Domain, API} Driven Development is a step in the right direction, but needs more automation.
  • Hypothesis 3: Software testing today is to a large degree a manual process (e.g. writing unit tests), that probably won't be the case in the future. Machine learning and statistics will be used ensure that.
  • Hypothesis 4: Refactoring of code is today a manual process, that probably won't be the case in the future. (e.g. extract method is basically about finding the start line and end line, extract method between those lines and rename it. Automatic method naming is probably the hardest problem to solve). Machine learning, heuristics and statistics will be used to ensure that.
  • Hypothesis 5: How to deal with large amounts of code is still unsolved.
  • Hypothesis 6: Code metrics has a future, but which one is yet to be discovered.
  • Hypothesis 7: Duplicate detection in code can be significantly improved. Current tools typically use simple syntactic approaches, they don't even do unification of names (variables and methods) or trying to rewrite the code. If you write a short-to-medium length method, chances are high that it has a clone somewhere else at least if you unify methods, and that clone might even have unit tests => productivity increase.
  • Hypothesis 8: Given a sufficiently good compiler or runtime environment technology any programming language can perform well. (If it doesn't perform well then the compiler or runtime environment isn't sufficiently good enough :)
  • Hypothesis 10: Current programming languages doesn't sufficiently deal with concurrent programming (e.g. to utilize multicore and grid/cloud/web service systems) from a developer point of view. Some claim that threads are a bad idea (for most purposes). Concurrency is crucial and beneficiary in general, but the annual cost of errors in concurrent software systems in the world is not a small amount.
  • Hypothesis 11: Abstractions currently used to represent entities in distributed programming can be improved, e.g. the master-slave/client abstraction implies that the master controls the slave, but in a distributed environment that isn't really the case. Alternatives are needed, and Agent-Oriented Software Engineering (AOSE) can be one of them, but it still needs more industry influence. A related abstraction - Mobile Agents - deals with the issue that code is less that the data, so that code should move to the data, not vice versa. The choice of abstraction also affects the complexity of reliability model.
  • Hypothesis 12: Legacy code is still written every day (and will be for a while), and dealing with it is still largely unsolved.


AP said...

it is nice because you listed many Hypothesis.

duko smith said...

I have just apply for PHD in computer science, hope I will complete it successfully. Managed IT services