Basic extraction from Wikipedia (from a few specific lists to DB)

Closed Posted Mar 22, 2011 Paid on delivery
Closed Paid on delivery

===================

BACKGROUND

===================

I will provide you with a few lists from Wikipedia website (list of ballet companies, list of operas, list of musicals, etc.) and your job would be to write a script to extract details into two basic mySQL tables (I will provide the structure of the two tables below).

As part of the deliverables of this project, I'm looking for (a) populated tables with data and (b) the scripts themselves which were used to extract the data.

**This is the first trial project of any such extraction undertakings. There is more extraction work ahead.**

===================

DATA STRUCTURE

===================

There will be two tables: "entities" table and "entity_names" table:

**entities** table:

- ID

- Wikipedia_Page

- Type

- Primary name ID (which will point to "ID" from "entity_names" table)

**entity_names** table:

- ID

- entity_ID (which will point to "ID" from "entity" table)

- Name

- Type (primary or secondary)

The reason we're using two tables, is that a given entity could later have more than one name/alias (for example "San Francisco Symphony" could be called "SF Symphony"). For all the stuff you will be extracting, you can set the value of "type" field of "entities_table" to "primary".

## Deliverables

===================

WHAT TO EXTRACT

===================

1) List of all ballet companies

Source: <[login to view URL]>

Fields to grab:

Name = "Company Name" from the table

Type = ballet_company

Wikipedia page = page for each ballet company (example: [login to view URL])

2) List of Operas

Source: <[login to view URL]>

Name = opera name from the list

Type: opera

Wikipedia page = page for each opera (example: [login to view URL])

*(below, I will only provide the type as the other fields are self-explanatory based on the above two examples)

*3) List of Opera Companies

Source: [[login to view URL]

][1] Type: opera_company

4) List of Musicals:

Sources: <[login to view URL]:_A_to_L>

<[login to view URL]:_M_to_Z>

Type: musical

5) List of Orchestras:

Source: <[login to view URL]>

Type: orchestra

6) List of Improv Theater Companies

Source: <[login to view URL]>

Type: improv_theater_company

7) List of Comedians

Source: <[login to view URL]>

Type: comedian

Note: Please only extract those who are still alive (i.e. do not take someone like "Bud Abbott (1895-1974)")

8) List of Stand-up Comedians

Source: [[login to view URL]

][2] Type: stand_up_comedian

Note: Please only extract those who are still alive

9) List of dance companies:

Source: <[login to view URL]>

Type: dance_company

10) List of pop punk bands

Source: [[login to view URL]

][3] Type: pop_punk_band

Java JavaScript MySQL PHP Script Install Shell Script Software Architecture Software Testing Web Hosting Website Management Website Testing XML XSLT

Project ID: #3191040

About the project

28 proposals Remote project Active Apr 13, 2011

28 freelancers are bidding on average $177 for this job

repmovsd

See private message.

$382.5 USD in 5 days
(144 Reviews)
7.0
samirkumardas

See private message.

$297.5 USD in 5 days
(241 Reviews)
7.0
sktn

See private message.

$143.65 USD in 5 days
(262 Reviews)
7.1
pbradaric

See private message.

$85 USD in 5 days
(28 Reviews)
6.1
mastirlaa

See private message.

$85 USD in 5 days
(76 Reviews)
6.1
novepi

See private message.

$212.5 USD in 5 days
(42 Reviews)
5.9
Bitquark

See private message.

$170 USD in 5 days
(44 Reviews)
5.9
tomkusvw

See private message.

$85 USD in 5 days
(62 Reviews)
5.7
webspiderinc

See private message.

$85 USD in 5 days
(53 Reviews)
5.5
topleaseu

See private message.

$212.5 USD in 5 days
(24 Reviews)
5.3
oasis21

See private message.

$127.5 USD in 5 days
(35 Reviews)
4.9
szaszalexmcpd

See private message.

$85 USD in 5 days
(55 Reviews)
4.4
lenzai

See private message.

$340 USD in 5 days
(16 Reviews)
4.2
ragastens

See private message.

$110.5 USD in 5 days
(37 Reviews)
4.4
cwaldbieser

See private message.

$297.5 USD in 5 days
(10 Reviews)
4.3
powzak

See private message.

$85 USD in 5 days
(25 Reviews)
4.1
MrRain

See private message.

$85 USD in 5 days
(13 Reviews)
3.8
rased108

See private message.

$85 USD in 5 days
(29 Reviews)
4.6
Archit88

See private message.

$136 USD in 5 days
(14 Reviews)
3.3
ifailed

See private message.

$85 USD in 5 days
(8 Reviews)
2.4